Simulation stuff Due: Jan. 28 1. Simulation with heteroskedastic errors Simulation is the workhorse of those (like me) who either do not believe that asymptotics are the be-all and end-all of econometrics (or who are not as smart as Hal White, et al.), or both. MUS ch. 4 should be read here. Please skip 4.3.4 and for now we do not need 4.4 and 4.5. 4.6 is key to this exercise. Using the template in 4.6 a. Generate 100 fixed x's (single iv) and then run a thousand iterations of OLS to estimate b and to compute its se in the model y=1+bx + e. To make it interesting, first do with nice normal GM errors, say e is N(0,1). You have to watch the scale of things, this error works nicely if b is about 2. So in one sentence, what would happen if you had e as a standard normal and, say, b=100000. Do this in your head, not in Stata. Using the methods around p. 136 of MUS, assess the performance of both the OLS estimate of b and the accuracy of the se (do not worry about testing) b. Let us make it a bit interesting. Generate the e's with heteroskedasticity. Let us put in a lot of heteroskedasticity - say the first 50 e's are N(0,.5) and the latter 50 are N(0,2). Simulate 1000 runs of OLS and compare to what you did for a. c. Surprised? Maybe. So how should you generate the e's (in terms of heteroskedasticity) to make OLS perform badly. Generate the errors after answering this and make sure that OLS performs badly. Make sure it performs badly in the way it should perform badly. d. Rerun the simulations, except add ,robust to the regression command. Does this fix things? 2. Asymptotics - how well does regression perform with non-normal errors. You haven't them to run simulations with an infinite number of observations, so let us take a smallish simple as N=50 and a largish sample as N=1000. As in 2, generate 1000 runs of y=a+bx+e . Keep x fixed over all the runs, only redrawing e. (Why this is relevant is the next question.) Take a nice value of a and b, and generate e as N(0,1), B(.5) and Chi-Sq(1). Note that the errors are by assumption zero mean, and the neither B nor Chi-Sq is zero mean, so fix by subtracting off the expected mean (not the observed mean of the N errors, it is the theoretical distribution of the errors that has zero mean, OLS will ensure that the mean of the residuals is zero). My preference for showing histograms of the 1000 runs is to use a kernel density estimate (2.6.4) but you could just do a histogram. (You can use the defaults unless they lead to incredibly ugly things). So you have six density plots for your simulations (N=50,1000) crossed with the 3 types of errors. Compare the 6 plots. What does this tell you about how helpful the CLT is? (Also,why did I ask for two types of non-normal errors - how are Chi-Sq and B fundamentally different). generate 100 values of a single variable, x, from a U[0,1], generate 100 corresponding normal al random errors (e) and then generate y=2+3x + e 3. As in 2 and 3, run a 1000 simulations and assess the quality of the OLS estimate of b and its se. Here the interest is in comparing the fixed and random X case. In 1 and 2, we (you!) redrew errors for each run of the simulation, but kept x fixed. Not shockingly, this is called the fixed x case. The interpretation of this is that in the lapsarian world (here applied to statisticians, not women), god created many world, all with the same x but each with a different e and then generated y=a+bx+e for each world (with identical a and b for each world, they only differ in their e's). The random x case has you redraw the x's (from the same distribution, say uniform between 0 and 1) for each run.(So each world gets a new bunch of x's, but drawn from a common distribution). a. What might you expect to change between the fixed and random x case. b. Check out this intuition using the assessments of the estimates of b and the se's and in question 2. Do this for the smallish and largish (N=50, 1000) cases.Keep the errors normal and make sure that scales do not get out of hand.