* Bernd Beber * Columbia University * bhb2102@columbia.edu * Last updated September 2008 ******************************************************************************************** ** Program -qi- to get marginal effects, also for multinomial probit ******************************************************************************************** * -qi- is a rough and ready program to simulate parameter estimates and extract marginal effects (specifically, first differences) * This program complements Clarify in two ways: * First, -qi- estimates first differences after multinomial probit estimation, * and second, output is printed to a file for easy post-processing * Output is printed to specified `using' file in comma-separated format * Simulations are saved in datafiles starting with prefix sims* * Specify first differences to be simulated in `q', with syntax similar to -simqi- in Clarify * Use `replace' or `append' to change how values are written to csv-file * NOTE: * Currently, all covariates other than the variables of interest are held at their means * Currently, rounding definition has to be set in macros \$dprob (for change in probability) and \$dpct (for percentage chance in probability) * Function requires package -erepost- for marginal effects for -mprobit- * The program is very inefficient, and Clarify produces the same results and is much preferred for estimations other than mprobit ******************************************************************************************** ** Syntax examples ******************************************************************************************** * Example for logit estimation, with first differences saved in output.csv * logit y x1 x2 * qi using output.csv, replace q(x1 0 1) * Compare to Clarify * estsimp logit y x1 x2 * setx mean * simqi, fd(pr) changex(x1 0 1) * Example for mlogit estimation, with first differences appended to output.csv * mlogit y_m x1 x2 * qi using output.csv, append q(x1 0 1) * Compare to Clarify * estsimp mlogit y_m x1 x2 * setx mean * simqi, fd(pr) changex(x1 0 1) * Example for mprobit estimation * mprobit y_m x1 x2 * qi using output.csv, append q(x1 0 1) ******************************************************************************************** ******************************************************************************************** * Currently global macros need to be set; these will be folded into program as options * Here, round percentages to one decimal, other marginal differences to three decimals, * dump simulation output to file or matrix every 200 iterations, and run 1000 simulations gl dpct=.1 gl dprob=.001 gl ndump=200 gl nsims=1000 pr de qi syntax using, q(string) [replace|append] version 9.2 * What was the estimation procedure and what was the dependent variable? loc cmd=e(cmd) loc depvar=e(depvar) loc model="`cmd' `depvar'" * How many outcomes? loc k_out=e(k_out) if("`k_out'"==".") loc k_out=2 * What are the outcomes? if("`cmd'"=="logit") loc cats = "0 1" if("`cmd'"=="mlogit") { loc eqnames = e(eqnames) loc eqnames : list clean eqnames loc basecat = e(basecat) loc cats = "`eqnames' `basecat'" loc cats : list sort cats } if("`cmd'"=="mprobit") { loc i=1 while `i'>0 { loc eq=e(out`i') if("`eq'"==".") loc i=0 else { loc cats = "`cats' `eq'" loc i=`i'+1 } } loc cats : list clean cats loc cats : list sort cats } * How many parameters were estimated? loc indvars : colfullnames e(b) loc indvarsn: word count `indvars' * Set up names for variables to save simulations mac drop _b foreach i of numlist 1/`indvarsn' { loc b=`"`b' sims`i'"' } * Random draws of parameters preserve qui drawnorm `b', n(\$nsims) means(e(b)) cov(e(V)) clear tempfile bdraws qui save `bdraws', replace loc iter=ceil(\$nsims/\$ndump) foreach i of numlist 1/`iter' { use `bdraws', clear loc from = (`i'-1)*\$ndump + 1 loc to = min(`i'*\$ndump, _N) qui keep in `from'/`to' loc ccount = _N * xpose might be slowing it down!! xpose, clear foreach j of numlist 1/`ccount' { loc cname = (`i'-1)*\$ndump + `j' mkmat v`j', mat(b`cname') mat b`cname'=b`cname'' } } restore * How many different quantities of interest are there? loc q: subinstr loc q "&" "&", all count(loc cases) loc cases=`cases'+1 * Set number of fake observations for which -predict- will be calculated loc oldN=_N loc fromN=`oldN'+1 loc toN=`oldN'+2*`cases' qui set obs `toN' * Set covariate values loc indvars=e(indvars) if("`indvars'"==".") { loc indvars : colnames e(b) loc indvars : list uniq indvars loc constant _cons loc indvars : list indvars - constant } foreach i of varlist `indvars' { * Right now it's hard-coded that all other variables held at mean! qui summ `i' loc `i'_m=r(mean) qui replace `i'=``i'_m' in `fromN'/`toN' } * Change variable values to get desired first differences token `q', parse("&") loc parsed= 2*`cases' - 1 loc counter=1 foreach i of numlist 1/`parsed' { if("``i''"!="&") { loc fd`counter' = "``i''" loc counter=`counter'+1 } } foreach i of numlist 1/`cases' { token `fd`i'' loc repl1=`oldN'+ 2*`i' - 1 loc repl2=`oldN'+ 2*`i' qui replace `1'=`2' in `repl1' qui replace `1'=`3' in `repl2' } * Get names of parameter estimates local names : colfullnames e(b) * Save parameter estimates mat eb = e(b) * Set up temporary files to save predicted probabilities tempfile simP tempfile simPcurr * For each simulated set of parameters, get predicted probabilities * Need to do this in batches because of Stata limits loc iter=ceil(\$nsims/1500) foreach j of numlist 1/`iter' { loc from = (`j'-1)*1500 + 1 loc to = min(`j'*1500, \$nsims) foreach i of numlist `from'/`to' { mat colnames b`i'=`names' if("`cmd'"=="logit") { eret post b`i' qui predict sim`i'P1 in `fromN'/`toN', xb qui replace sim`i'P1=exp(sim`i'P1)/(1+exp(sim`i'P1)) qui gen sim`i'P0=1-sim`i'P1 } else { if("`cmd'"=="mlogit") { eret post b`i' foreach k in `eqnames' { qui predict sim`i'P`k' in `fromN'/`toN', xb o(`k') qui replace sim`i'P`k'=exp(sim`i'P`k') } qui gen sim`i'P`basecat' = 1 qui egen denom = rowtotal(sim`i'P*) foreach k in `cats' { qui replace sim`i'P`k'=sim`i'P`k'/denom } drop denom } else { if("`cmd'"=="mprobit") { erepost b=b`i' foreach m in `cats' { qui predict sim`i'P`m' in `fromN'/`toN', o(`m') } } } } aorder sim`i'P* * Dump probabilities into temporary file once for every \$ndump simulations and after final simulation if(mod(`i',\$ndump)==0 | "`i'"=="\$nsims") { preserve keep sim*P* qui keep in `fromN'/`toN' xpose, clear if(`i'<=\$ndump) { qui save `simP', replace } else { qui save `simPcurr', replace qui use `simP', clear append using `simPcurr', nonote nolabel qui save `simP', replace } restore drop sim*P* } } } * Drop fake observations qui drop in `fromN'/`toN' * Open file to save output qui file open output `using', write text `replace' `append' * Retrieve simulated probabilities preserve qui use `simP', clear * For multiple outcomes, add variable indicating which outcome a given probability refers to egen whichP = fill(`cats' `cats') egen simID = seq(), block(`k_out') * Get first differences, also in percentage terms foreach i of numlist 1/`cases' { loc var1=2*`i' - 1 loc var2=2*`i' qui gen fd`i'bfr=v`var1' qui gen fd`i'aftr=v`var2' qui gen fd`i'diff=v`var2'-v`var1' qui gen fd`i'pct=100*fd`i'diff/abs(v`var1') } drop v* * For multiple outcomes, reshape matrix of simulations qui reshape wide fd*, i(simID) j(whichP) * Save simulations * First number in variable names identify estimated marginal effect, second number refers to outcome * Name of output simulation files are hard-coded! if("`cmd'"=="mprobit" | "`cmd'"=="mlogit") qui save sims_`cmd', replace if("`cmd'"=="logit") qui save sims_`cmd'`depvar', replace * Write header mac drop _header foreach j in `cats' { if("`header'"=="") { loc header "`model',MargEff,BfrChg`j',BfrChgCIL`j',BfrChgCIU`j',AftrChg`j',AftrChgCIL`j',AftrChgCIU`j',Diff`j',DiffCIL`j',DiffCIU`j',PctChg`j',PctChgCIL`j',PctChgCIU`j'" } else { loc header "`header'," "BfrChg`j',BfrChgCIL`j',BfrChgCIU`j',AftrChg`j',AftrChgCIL`j',AftrChgCIU`j',Diff`j',DiffCIL`j',DiffCIU`j',PctChg`j',PctChgCIL`j',PctChgCIU`j'" } } cap file w output "`header'" _n * Calculate summary statistics for simulated probabilities and write to file foreach i of numlist 1/`cases' { mac drop _outval foreach j in `cats' { foreach k in bfr aftr diff { qui summ fd`i'`k'`j' loc `k'`j'=round(r(mean), \$dprob) _pctile fd`i'`k'`j', percentiles(2.5 97.5) loc `k'cil`j'=round(r(r1), \$dprob) loc `k'ciu`j'=round(r(r2), \$dprob) loc `k'`j' `"``k'`j'',``k'cil`j'',``k'ciu`j''"' } qui summ fd`i'pct`j' loc pct`j'=round(r(mean), \$dpct) _pctile fd`i'pct`j', percentiles(2.5 97.5) loc pctcil`j'=round(r(r1), \$dpct) loc pctciu`j'=round(r(r2), \$dpct) loc pct`j' `"`pct`j'',`pctcil`j'',`pctciu`j''"' if("`outval'"=="") { loc outval `"`bfr`j'',`aftr`j'',`diff`j'',`pct`j''"' } else { loc outval `"`outval',`bfr`j'',`aftr`j'',`diff`j'',`pct`j''"' } } cap file w output "`model',`fd`i'',`outval'" _n } restore * Close output file file close output * Repost parameter estimates eret post eb end