*****Aleksandra Anić
**** 3rd Decembre 2024

**# FIRST EXAMPLE MLOGIT
****example 1 from Stata mlogit — Multinomial (polytomous) logistic regression
use https://www.stata-press.com/data/r18/sysdsn1, clear
***The insurance is categorized as either an indemnity plan (that is, regular fee-for-service insurance, which may have a deductible or coinsurance rate) or a prepaid plan (a fixed up-front payment allowing subsequent unlimited use as provided, for instance, by an HMO)


label list insure // this command shows value labels, we observe that insure variable has the following 3 cathegories 1 indemnity, 2 prepaid and 3 uninsure
tab insure // tabulate frequencies
*****Compare the relative frequencies of the insure variable with the adjusted probabilities of a multinomial Logit model with only a constant term for each category.

****mlogit without explanatory variables is model with constant only
mlogit insure, nolog // nolog option surpress log likelihood
predict pind pprep punin // don't need to add p option, since pr is assumed after mlogit, predicted probabilities
sum pind pprep punin
drop pind pprep punin


***estimate mlogit model and calculate probabilities of whites and non-whites for prepaid
mlogit insure nonwhite
predict pind pprep punin
sum pind pprep punin
tab insure 
tab insure nonwhite, col
tab pind
tab pprep
tab punin
drop pind pprep punin

*****calculate probabilities by using formulas
***probability that white person has prepaid
dis exp(-0.1879)/(1+exp(-0.1879)+exp(-1.9419))
***probability that nonwhite person has prepaid
dis exp(-0.1879+0.6608)/(1+exp(-0.1879+0.6608)+exp(-1.9419+0.37796))

****this is for more advanced users, you can skip it
* note that /* at the beginning and */ at the end of lines are used to hide part of the code. For running the code, remove /* and */
/*
****calculate probabilities by using matrix algebra
mlogit insure nonwhite
mat coef=e(b)
mat list coef
dis coef[1,colnumb(coef,"Prepaid:nonwhite")]
dis coef[1,colnumb(coef,"Prepaid:_cons")]

***nonwhite=0
***probability of prepaid insurance for whites 
gen pprep_white=exp(coef[1,colnumb(coef,"Prepaid:_cons")])/(1+exp(coef[1,colnumb(coef,"Uninsure:_cons")])+exp(coef[1,colnumb(coef,"Prepaid:_cons")]))

***nonwhite=1
***probability of prepaid insurance for nonwhites 
gen pprep_nonwhite=exp(coef[1,colnumb(coef,"Prepaid:_cons")]+coef[1,colnumb(coef,"Prepaid:nonwhite")])/(1+exp(coef[1,colnumb(coef,"Uninsure:_cons")] + coef[1,colnumb(coef,"Uninsure:nonwhite")])+exp(coef[1,colnumb(coef,"Prepaid:_cons")]+coef[1,colnumb(coef,"Prepaid:nonwhite")]))
sum pprep*
*/

***change base cathegory to prepaid and check probabilities
mlogit insure nonwhite, base(2)
predict pind pprep punin
sum pind pprep punin
tab insure
***CONCLUSION: sample average of predicted probabilities equals observed frequencies for the mlogit with constant
*** it does not matter what is the base cathegory the results are the same

**# Estimate marginal effects

***average marginal effect vs. marginal effect at the mean
mlogit insure nonwhite
margins, dydx(*) predict(outcome(2))
margins, dydx(*) predict(outcome(2)) atmean

****marginal effects sum up to 0, the following code uses matrix algebra to check that the sum of marginal effects is 0
/*
mlogit insure nonwhite
margins, dydx(*) predict(outcome(1))
matrix list r(table) 
scalar me1=r(table)[1,1]
margins, dydx(*) predict(outcome(2))
scalar me2=r(table)[1,1]
margins, dydx(*) predict(outcome(3))
scalar me3=r(table)[1,1]
dis %3.2f me1+me2+me3 // ME sum up to 0
*/

***rrr option for mlogit displays odds ratios, the choice of base cathegory is irrelevant
mlogit insure nonwhite,rrr base(2)
****odds ratios for alternatives A and B that are greater than 1 indicate that the alternative A is more likely, less than one that is less likely
***nonwhites are less likely to choose indemnity comparing with prepaid and less likely to choose it comparing with uninsure
mlogit insure nonwhite,rrr 
***nonwhites are more likely to choose prepaid comparing with indemnity and uninsure comparing with indemnity

***mlogit is used when all regressors are case-specific, i.e. age, male, nonwhite and site vary by individuals
****check IIA 
****explanation in Green THE INDEPENDENCE FROM IRRELEVANT ALTERNATIVES ASSUMPTION chapter 18.2.4
*** est store NAME is the command that stores results. We store results from to mlogit models and compare them
****hausman test is used
****two options are added
**** alleqs use all equations to perform test; default is first equation only
**** include estimated intercepts in comparison; default is to exclude
****if a subset of the choice set truly is irrelevant, then, omitting it from the model altogether will not change parameter estimates systematically. If we fail to reject null hypothesis IIA assumption holds

mlogit insure age male nonwhite i.site
est store m
mlogit insure age male nonwhite i.site if insure!=3
est store m3
mlogit insure age male nonwhite i.site if insure!=2
est store m2
mlogit insure age male nonwhite i.site if insure!=1
est store m1

hausman m m3, alleqs constant
hausman m m2, alleqs constant
hausman m1 m, alleqs constant

****Cameron & Triverdi, Microeconometrics using Stata, ch 15.4
**# Choice of fishing mode
use D:\Microeconometrics_Master\Database\mus15data.dta, clear
cd D:\Microeconometrics_Master\Results
*we analyze data on individual choice of whether to fish using one of four possible modes:
describe
*** mode, price and crate, chosen fishing mode and corresponding price and catch rate for that mode
****d variables dummy variables 
****p & q variables are alternative-specific variables, i.e. price and catch rate for each of the possible four fishing modes
***income is case specific variables
*data are in wide form
***one observation per individual
list * in 1 //one observation providing the data for all four alternatives for individual
tab mode, sum(income)

***mlogit is used when we have case-specific explanatory variable and wide form
mlogit mode income, base(1) nolog
outreg2 using mlogit_fish.out, lab dec(3) replace excel auto(3) 

test income
margins, dydx(*) predict(outcome(1))
margins, dydx(*) predict(outcome(2))
margins, dydx(*) predict(outcome(3))
margins, dydx(*) predict(outcome(4))

generate id=_n

*convert to long form. For every individual we will have four observations corresponding to the four fishing mode
reshape long d p q, i(id) j(fishmode beach pier private charter) string
drop mode price crate //case-specific variables that are not needed

*****we have alternative-specific regressors for price and quality
*****case specific regressor income

***alternative-specific conditional logit model
asclogit d p q, case(id) alternatives(fishmode) casevars(income) basealternative(beach) 
test p=q=0 // if we fail to reject H0, it means that CL and MNL are equal
***calculate Pseudo R-squared, not displayed in output results
***formula for Pseudo R2= 1-e(ll1)/e(ll0), e stands for ereturn list
scalar ll1=e(ll) 
***** intercept only model, no case nor alternative specific variables
asclogit d, case(id) alternatives(fishmode) basealternative(beach) 
scalar ll0=e(ll) 
scalar PseudoR2=1-ll1/ll0
*Alternatives summary for fishmode
estat alternatives 

predict prob, pr

table fishmode, stat(mean d prob) nototal nformat(%5.4f)
***without alternative-specific regressors aslogit command gives the same estimates as mlogit
asclogit d, case(id) alternatives(fishmode) casevars(income) basealternative(beach) 
outreg2 using mlogit_fish.out, lab dec(3) append excel auto(3)