Heteroskedasticity Definition Consequences of heteroscedasticity Testing for Heteroskedasticity Breusch-Pagan White test (2 forms) Fixing the problem Robust standard errors Weighted Least Squares Fixing heteroskedasticity in LPM model.

Heteroskedasticity Consequences of heteroskedasticity for OLS OLS still unbiased and consistent under heteroskedastictiy Interpretation of R-squared unchanged. Unconditional error variance is unaffected by heteroskedasticity (which refers to the conditional error variance) Heteroskedasticity invalidates variance formulas for OLS estimators The usual F tests and t tests are not valid with heteroskedasticity Under heteroskedasticity, OLS is no longer the best linear unbiased estimator (BLUE); there may be more efficient linear estimators

Heteroskedasticity Heteroskedasticity-robust inference after OLS estimation Formulas for OLS standard errors and related statistics have been developed that are robust to heteroskedasticity of unknown form All formulas are only valid in large samples Formula for heteroskedasticity-robust OLS standard error Also called White/Huber/Eicker standard errors. Using these formulas for standard errors, the usual t test is valid asymptotically valid with heteroskedasticity The usual F statistic does not work under heteroskedasticity, but heteroskedasticity robust

versions are available in Stata To obtain heteroskedasticity robust standard errors and F-stats in Stata, reg y x, robust Coefficients are unchanged by robust option, but standard errors, t-statistics, and F-statistics change. Heteroskedasticity Example: Hourly wage equation Heteroskedasticity robust standard errors may be larger or smaller than their nonrobust counterparts. The differences are often small in practice. F statistics are also often similar.

If there is strong heteroskedasticity, differences may be larger. To be on the safe side, it is advisable to always compute robust standard errors. Heteroskedasticity Testing for heteroskedasticity Even with robust standard errors, it may still be useful to test whether there is heteroskedasticity because then OLS may not be the most efficient linear estimator anymore Breusch-Pagan test for heteroskedasticity Under MLR.4

The mean of u2 must not vary with x1, x2, , xk Heteroskedasticity Breusch-Pagan test for heteroskedasticity (cont.) Regress squared residuals on all expla-natory variables and test whether this regression has explanatory power. A large test statistic (= a high R-squared) is evidence against the null hypothesis. Alternative test statistic (= Lagrange multiplier statistic, LM). Again, high values of the test statistic (= high R-squared) lead to rejection of the null hypothesis that the expected value of u2 is unrelated to the explanatory variables.

Heteroskedasticity Example: Heteroskedasticity in housing price equations Homoskedasticity rejected homoskedasticity not rejected Heteroskedasticity The White test for heteroskedasticity Regress squared residuals on all explanatory variables, their squares, and interactions (here: example for k=3) The White test detects more general deviations from heteroskedasticity than

the Breusch-Pagan test Disadvantage of this form of the White test Including all squares and interactions leads to a large number of estimated parameters (e.g. k=6 leads to 27 parameters to be estimated) Heteroskedasticity Alternative form of the White test This regression indirectly tests the dependence of the squared residuals on the explanatory variables, their squares, and interactions, because the predicted value of y and its square implicitly contain all of these terms. Example: Heteroskedasticity in (log) housing price equations

Heteroskedasticity Weighted least squares estimation Heteroskedasticity is known up to a multiplicative constant The functional form of the heteroskedasticity is known Transformed model Heteroskedasticity Example: Savings and income Note that this regression

model has no intercept The transformed model is homoskedastic If the other Gauss-Markov assumptions hold as well, OLS applied to the transformed model is the best linear unbiased estimator Heteroskedasticity OLS in the transformed model is weighted least squares (WLS) Observations with a large variance get a smaller weight in the optimization problem

WLS is more efficient than OLS because less weight is placed on observations with a large variance In Stata, suppose estimate WLS with gen inv_inc=1/inc reg sav inc [aw=inv_inc] where inv_inc=(1/inc) aw=analytic weights that are inversely proportional to variance of the residual Heteroskedasticity Example: Financial wealth equation Net financial wealth (in 1000s) Assumed form of heteroskedasticity

Stata: reg nettfa inc age_25_sq male e401k [aw=1/inc] Code for estimation in g:\eco\ evenwe\wls_with_401k Participation in 401K pension plan Heteroskedasticity Special cases of heteroskedasticity If the observations are reported as averages at the city/county/state/country/firm level, they should be weighted by the size of the unit If observations are reported as aggregates, weight by inverse of size. Average contribution to pension plan in firm i

Average earnings and Percentage firm age in firm i contributes to plan Heteroskedastic error term Error variance if errors are homoskedastic at the individual-level If errors are homoskedastic at the individual-level, WLS with weights equal to firm size m i should be used. If the assumption of homoskedasticity at the individual-level is not exactly right, one can calculate robust standard errors after WLS (i.e. for the transformed model).

Heteroskedasticity Skip sections on feasbile GLS (8-4b) prediction intervals with heteroskedasticity (8-4d) Heteroskedasticity WLS in the linear probability model In the LPM, the exact form of heteroskedasticity is known Use inverse values as weights in WLS

Infeasible if LPM predictions are below zero or greater than one If such cases are rare, they may be adjusted to values such as .01/.99 Otherwise, it is probably better to use OLS with robust standard errors Summary of Issues with Heteroskedasticity If heteroscedasticity exists and not corrected for Standard errors, t-statistics, and F-statistics are wrong Coefficient estimates are still unbiased, but inefficient Several tests for heteroscedasticity available Breusch-Pagan White (2 forms)

Corrections for heteroscedasticity Heteroskedasticity robust standard errors No change in coefficients (still inefficient like OLS), but standard errors are correct Weighted Least Squares More efficient than OLS and correct standard errors Requires knowledge of functional form for heteroscedasticity Linear probability model requires correction for heteroscedasticity Robust standard errors are simple fix WLS is more efficient, but creates problems if predicted probabilities are outside unit interval.