Chapter 10

Chapter 10

Chapter 10 Regression with Panel Data Outline 1. 2. 3. 4. 5. 6. Panel Data: What and Why Panel Data with Two Time Periods Fixed Effects Regression Regression with Time Fixed Effects Standard Errors for Fixed Effects Regression Application to Drunk Driving and Traffic Safety Panel Data: What and Why (SW Section 10.1) A panel dataset contains observations on _____ entities (individuals, states, companies), where each entity is observed at ____ or ____ points in time. Hypothetical examples:

Data on 420 California school districts in 1999 and again in 2000, for 840 observations total. Data on 50 U.S. states, each state is observed in 3 years, for a total of 150 observations. Data on 1000 individuals, in four different months, for 4000 observations total. Notation for panel data A double subscript distinguishes entities (states) and time periods (years) i = entity (state), n = number of entities, so i = 1,,n t = time period (year), T = number of time periods so t =1,,T Data: Suppose we have 1 regressor. The data are: (Xit, Yit), i = 1,,n, t = 1,,T 4 Panel data notation, ctd. Panel data with k regressors: (X1it, X2it,,Xkit, Yit), i = 1,,n, t = 1,,T n = number of entities (states) T = number of time periods (years) Some jargon Another term for panel data is data

balanced panel: missing observations (all variables are observed for all entites [states] and all time periods [years]) 5 Why are panel data useful? With panel data we can control for factors that: Vary across entities (states) but do ______ vary over ______ Could cause _________ variable bias if they are omitted are unobserved or unmeasured and therefore ______ be included in the regression using multiple regression Heres the key idea: If an omitted variable does _____ change over time, then any ______ in Y over ______ cannot be caused by the omitted variable.

6 Example of a panel data set: Traffic deaths and alcohol taxes Observational unit: a year in a U.S. state 48 U.S. states, so n = of entities = 48 7 years (1982,, 1988), so T = # of time periods = 7 Balanced panel, so total # observations = 748 = 336 Variables: Traffic fatality rate (# traffic deaths in that state in that year, per 10,000 state residents) Tax on a case of beer Other (legal driving age, drunk driving laws, etc.) 7

U.S. traffic death data for 1982: Higher alcohol taxes, more traffic deaths? 8 U.S. traffic death data for 1988 Higher alcohol taxes, more traffic deaths? 9 What conclusion? An increase in tax a higher fatality rate? If not, why? Other factors that determine traffic fatality rate:

Culture around drinking and driving Density of cars on the road 10 These omitted factors could cause omitted variable bias. Example #1: traffic density. Suppose: a) b) High traffic density means more traffic __________ (Western) states with ____ traffic density have _____ alcohol taxes OVB. Specifically, high taxes could reflect ____________ so the OLS coefficient would be biased _______ high taxes, more deaths Panel data lets us eliminate omitted variable bias when the omitted

variables are _________ over time within a given state. 11 Example #2: cultural attitudes towards drinking and driving: Why it is a problem? arguably are a ___________ of traffic deaths; and potentially are _________ with the beer tax, so beer taxes could be picking up _______________ omitted variable bias. Specifically, high taxes could pick up the effect of cultural attitudes towards drinking (so the OLS coefficient would be biased) Panel data lets us ________ omitted variable bias when the omitted variables are ______ over time within a given state.

12 Panel Data with Two Time Periods (SW Section 10.2) Consider the panel data model, FatalityRateit = 0 + 1BeerTaxit + 2Zi + uit Zi is a factor that does not change over time (density), at least during the years on which we have data. Suppose Zi is not observed, so its omission could result in omitted variable bias. The effect of Zi can be eliminated using T = 2 years. 13 The key idea: Any change in the fatality rate from 1982 to 1988 cannot be caused by Zi, because Zi (by assumption) does not change between 1982 and 1988. The math: consider fatality rates in 1988 and 1982: FatalityRatei1988 = 0 + 1BeerTaxi1988 + 2Zi + ui1988 FatalityRatei1982 = 0 + 1BeerTaxi1982 + 2Zi + ui1982 Suppose .

Subtracting 1988 1982 (that is, calculating the change), eliminates the effect of Zi 14 FatalityRatei1988 = 0 + 1BeerTaxi1988 + 2Zi + ui1988 FatalityRatei1982 = 0 + 1BeerTaxi1982 + 2Zi + ui1982 so FatalityRatei1988 FatalityRatei1982 = 1(BeerTaxi1988 BeerTaxi1982) + (ui1988 ui1982) 1) New error term, (ui1988 ui1982), is _________ with either BeerTaxi1988 or BeerTaxi1982. 2) This difference equation can be estimated by ____, even though Zi isnt observed. 3) The omitted variable Zi does____ change, so it _______ be a determinant of the _______ in Y This differences regression doesnt have an intercept it was eliminated by the subtraction step

4) 15 Example: Traffic deaths and beer taxes 1982 data: FatalityRate = 2.01 + 0.15BeerTax (.15) (.13) 1988 data: FatalityRate = 1.86 + 0.44BeerTax (.11) (.13) (n = 48) (n = 48) Difference regression (n = 48) FR1988-FR1982 = .072 1.04(BeerTax1988BeerTax1982) (.065) (.36) An intercept is included in this differences regression allows for the mean change in FR to be nonzero more on this later 16 U.S. traffic death data for 1982:

Higher alcohol taxes, more traffic deaths? 17 U.S. traffic death data for 1988 Higher alcohol taxes, more traffic deaths? 18 FatalityRate v. BeerTax: 19 Fixed Effects Regression (SW Section 10.3) What if you have more than 2 time periods (T > 2)? Yit = 0 + 1Xit + 2Zi + uit, i =1,,n, T = 1,,T Rewrite this in two useful ways: n-1 binary regressor regression model

Fixed Effects regression model first rewrite this in fixed effects form. Suppose we have n = 3 states: California, Texas, Massachusetts. 20 Fixed Effect Regression:Example Yit = 0 + 1Xit + 2Zi + ui, i =1,,n, T = 1,,T Population regression for California (that is, i = CA): YCA,t = 0 + 1XCA,t + 2ZCA + uCA,t = ____________+ 1XCA,t + uCA,t Or YCA,t = CA + 1XCA,t + uCA,t CA = 0 + 2ZCA doesnt change over time CA is the intercept for CA, and 1 is the slope The intercept is _______ to CA, but the slope is the ______in all the states: parallel lines.

21 Fixed Effect Regression:Example For TX: YTX,t = 0 + 1XTX,t + 2ZTX + uTX,t = (0 + 2ZTX) + 1XTX,t + uTX,t Or YTX,t = TX + 1XTX,t + uTX,t, where TX = 0 + 2ZTX Collecting the lines for all three states: YCA,t = CA + 1XCA,t + uCA,t YTX,t = TX + 1XTX,t + uTX,t YMA,t = MA + 1XMA,t + uMA,t or

Yit = i + 1Xit + uit, i = CA, TX, MA, T = 1,,T FE regression 22 The regression lines for each state in a picture Y = CA + 1X Y CA Y = TX + 1X CA Y = MA+ 1X TX TX MA MA X

Recall that shifts in the intercept can be represented using dummy variables (How?) 23 Y = CA + 1X Y CA Y = TX + 1X CA Y = MA+ 1X TX TX MA MA X In binary regressor form: Yit = 0 + CADCAi + TXDTXi + 1Xit + uit

DCAi = 1 if state is CA, = 0 otherwise DTXt = 1 if state is TX, = 0 otherwise leave out DMAi (why?) 24 Summary: Two ways to write the fixed effects model n-1 binary regressor form Yit = 0 + 1Xit + 2D2i + + nDni + uit 1 for i =2 (state #2) where D2i = , etc. 0 otherwise Fixed effects form: Yit = 1Xit + i + uit

i is called a state fixed effect or state effect it is the constant (fixed) effect of being in state i 25 Fixed Effects Regression Model Fixed Effects Form(general case) Yit=1X1,it+ + kXk,it +i+uit Binary Regressors Form Yit= 0+1X1,it+ + kXk,it +2D2i+ +nDni +uit In principle, it can be estimated by OLS. How many regressors? 26 Fixed Effects Regression: Estimation Three estimation methods: 2)

n-1 binary regressors OLS regression ___________________ OLS regression 3) Changes specification, without an intercept (only for T = ) 1) All produce estimates of the regression coefficients, and standard errors. We already did the changes specification (1988 minus 1982) but this only works for years Methods #1 and #2 work for general T Method #1 is only practical when ________ 27 1. n-1 binary regressors OLS regression Yit = 0 + 1Xit + 2D2i + + nDni + uit where

1 for i =2 (state #2) D2i = 0 otherwise (1) etc. 1) First create the binary variables D2i,,Dni 2) Then estimate (1) by _______ Inference (hypothesis tests, confidence intervals) is as usual (using heteroskedasticity-robust standard errors) This is impractical when (for example if n = 1000 workers) 3) 4) 28 2. Entity-demeaned OLS regression

The fixed effects regression model: Yit = 1Xit + i + uit The state averages satisfy: 1 T Yit = T t 1 Deviation from state averages: 1 T Yit Yit = T t 1 29 Entity-demeaned OLS regression, ctd. 1 T Yit Yit = T t 1 or Yit = 1 X it + uit T T 1 1 where Yit = Yit Yit and X it = Xit X it T t 1 T t 1

For i=1 and t = 1982, Yit is the difference between the fatality rate in Alabama in 1982, and its average value in Alabama averaged over all 7 years. 30 Entity-demeaned OLS regression, ctd. Yit = 1 X it + uit (2) T 1 where Yit = Yit Yit , etc. T t 1 First construct the demeaned variables Yit and X it Then estimate (2) by regressing Yit on X it using - Similar to the changes approach, but instead Yit is deviated from the state average instead of Yi1. Inference is as usual (using heteroskedasticity-robust standard errors) This can be done in a single command in STATA 31 Example: Traffic deaths and beer taxes

in STATA First let STATA know you are working with panel data by defining the entity variable (state) and time variable (year): . xtset state year; panel variable: state (strongly balanced) time variable: year, 1982 to 1988 delta: 1 unit . xtreg vfrall beertax, fe vce(cluster state) Fixed-effects (within) regression Group variable: state R-sq: within = 0.0407 between = 0.1101 overall = 0.0934 corr(u_i, Xb) = -0.6885 Number of obs Number of groups Obs per group: min avg max F(1,47) Prob > F = =

= = = = = 336 48 7 7.0 7 5.05 0.0294 (Std. Err. adjusted for 48 clusters in state) -----------------------------------------------------------------------------| Robust vfrall | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------beertax | -.6558736 .2918556 -2.25 0.029 -1.243011

-.0687358 _cons | 2.377075 .1497966 15.87 0.000 2.075723 2.678427 ----------------------------------------------------------------------------- The panel data command xtreg with the option fe performs fixed effects regression. The reported intercept is arbitrary, and the estimated individual effects are not reported in the default output. The fe option means use fixed effects regression The vce(cluster state) option tells STATA to use clustered standard errors more on this later Example, ctd. For n = 48, T = 7: FatalityRate = .66BeerTax + State fixed effects (.20) Should you report the intercept? How many binary regressors would you include to estimate this using the binary regressor method?

Compare slope, standard error to the estimate for the 1988 v. 1982 changes specification (T = 2, n = 48) (note that this includes an intercept return to this below): FR1988-FR1982 = .072 1.04(BeerTax1988BeerTax1982) (.065) (.36) 34 By the way how much do beer taxes vary? Beer Taxes in 2005 Source: Federation of Tax Administrators http://www.taxadmin.org/fta/rate/beer.html EXCISE TAX RATES ($ per gallon) Alabama SALES TAXES APPLIED OTHER TAXES Arizona Yes Taxes $0.52/gallon Beer

in 2005local tax 1.07 n.a. $0.35/gallon small breweries Source: Federation of Tax Administrators 0.16 Yes http://www.taxadmin.org/fta/rate/beer.html Arkansas 0.23 Yes California 0.20 Yes Colorado 0.08 Yes

Connecticut 0.19 Yes Delaware 0.16 n.a. Florida 0.48 Yes Alaska $0.53 under 3.2% - $0.16/gallon; $0.008/gallon and 3% off10% on-premise tax 2.67/12 ounces on-premise retail tax Georgia

0.48 Yes $0.53/gallon local tax Hawaii 0.93 Yes $0.54/gallon draft beer Idaho 0.15 Yes over 4% - $0.45/gallon Illinois 0.185 Yes

$0.16/gallon in Chicago and $0.06/gallon in Cook County Indiana 0.115 Yes Iowa 0.19 Yes Kansas 0.18 -- Kentucky 0.08 Yes* 9% wholesale tax

Louisiana 0.32 Yes $0.048/gallon local tax Maine 0.35 Yes additional 5% on-premise tax over 3.2% - {8% off- and 10% on-premise}, under 3.2% 4.25% sales tax. Maryland 0.09 Yes $0.2333/gallon in Garrett County Massachusetts

0.11 Yes* 0.57% on private club sales Michigan 0.20 Yes Minnesota 0.15 -- Mississippi 0.43 Yes Missouri 0.06

Yes Montana 0.14 n.a. Nebraska 0.31 Yes Nevada 0.16 Yes New Hampshire 0.30 n.a.

New Jersey 0.12 New Mexico 0.41 Yes Yes under 3.2% - $0.077/gallon. 9% sales tax New York 0.11 Yes $0.12/gallon in New York City North Carolina 0.53 Yes

$0.48/gallon bulk beer North Dakota 0.16 -- Ohio 0.18 Yes Oklahoma 0.40 Yes Oregon 0.08 n.a. Pennsylvania

0.08 Yes Rhode Island 0.10 Yes South Carolina 0.77 Yes South Dakota 0.28 Yes Tennessee 0.14 Yes

17% wholesale tax Texas 0.19 Yes over 4% - $0.198/gallon, 14% on-premise and $0.05/drink on airline sales 7% state sales tax, bulk beer $0.08/gal. under 3.2% - $0.36/gallon; 13.5% on-premise $0.04/case wholesale tax Utah 0.41 Yes over 3.2% - sold through state store Vermont 0.265

no 6% to 8% alcohol - $0.55; 10% on-premise sales tax Virginia 0.26 Yes 0.261 Yes West Virginia 0.18 Yes Wisconsin 0.06 Yes Wyoming

0.02 Yes Dist. of Columbia 0.09 Yes Washington U.S. Median $0.188 8% off- and 10% on-premise sales tax Regression with Time Fixed Effects (SW Section 10.4) An omitted variable might vary over time but not across states:

Safer cars (air bags, etc.); changes in national laws These produce intercepts that change over time Let these changes (safer cars) be denoted by the variable St, which changes over time but not . The resulting population regression model is: Yit = 0 + 1Xit + 2Zi + 3St + uit 40 Time fixed effects only Yit = 0 + 1Xit + 3St + uit In effect, the intercept varies from one year to the next: Yi,1982 = 0 + 1Xi,1982 + 3S1982 + ui,1982 = (0 + 3S1982) + 1Xi,1982 + ui,1982 or Yi,1982 = 1982 + 1Xi,1982 + ui,1982, 1982 = 0 + 3S1982 Yi,1983 = 1983 + 1Xi,1983 + ui,1983, etc. 1983 = 0 + 3S1983

41 Two formulations for time fixed effects 1. T-1 binary regressor formulation: Yit = 0 + 1Xit + 2B2t + TBTt + uit 1 when t =2 (year #2) where B2t = , etc. 0 otherwise 2. Time effects formulation: Yit = 1Xit + t + uit 42 Time fixed effects: estimation methods 1. T-1 binary regressor OLS regression Yit = 0 + 1Xit + 2B2it + TBTit + uit Create binary variables B2,,BT B2 = 1 if t = year #2, = 0 otherwise Regress Y on X, B2,,BT using OLS Wheres B1? 2. Year-demeaned OLS regression Deviate Yit, Xit from year (not state) averages Estimate by OLS using year-demeaned data These two methods can be combined

43 Estimation with both entity and time fixed effects Yit = 1Xit + i + t + uit When T = 2, computing the first difference and including an intercept is equivalent to (gives exactly the same regression as) including entity and time fixed effects. When T > 2, there are various equivalent ways to incorporate both entity and time fixed effects: entity demeaning & T 1 time indicators (this is done in the following STATA example) time demeaning & n 1 entity indicators T 1 time indicators & n 1 entity indicators entity & time demeaning . . . .

. . . . gen y83=(year==1983); First generate all the time binary variables gen y84=(year==1984); gen y85=(year==1985); gen y86=(year==1986); gen y87=(year==1987); gen y88=(year==1988); global yeardum "y83 y84 y85 y86 y87 y88"; xtreg vfrall beertax $yeardum, fe vce(cluster state); Fixed-effects (within) regression Number of obs = 336 Group variable: state Number of groups = 48 R-sq: within = 0.0803 Obs per group: min = 7 between = 0.1101 avg =

7.0 overall = 0.0876 max = 7 corr(u_i, Xb) = -0.6781 Prob > F = 0.0009 (Std. Err. adjusted for 48 clusters in state) -----------------------------------------------------------------------------| Robust vfrall | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------beertax | -.6399799 .3570783 -1.79 0.080 -1.358329 .0783691 y83 | -.0799029 .0350861 -2.28 0.027 -.1504869

-.0093188 y84 | -.0724206 .0438809 -1.65 0.106 -.1606975 .0158564 y85 | -.1239763 .0460559 -2.69 0.010 -.2166288 -.0313238 y86 | -.0378645 .0570604 -0.66 0.510 -.1526552 .0769262 y87 | -.0509021 .0636084 -0.80 0.428 -.1788656 .0770615 y88 | -.0518038 .0644023 -0.80

0.425 -.1813645 .0777568 _cons | 2.42847 .2016885 12.04 0.000 2.022725 2.834215 -------------+---------------------------------------------------------------- Are the time effects jointly statistically significant? . test $yeardum; ( ( ( ( ( ( Yes 1) 2) 3)

4) 5) 6) y83 y84 y85 y86 y87 y88 = = = = = = 0 0 0 0 0 0 F( 6,

47) = Prob > F = 4.22 0.0018 The Fixed Effects Regression Assumptions and Standard Errors for Fixed Effects Regression (SW Section 10.5 and App. 10.2) Under a panel data version of the least squares assumptions, the OLS fixed effects estimator of 1 is normally distributed. However, a new standard error formula needs to be introduced: the clustered standard error formula. This new formula is needed because observations for the same entity are (its the same entity!), even though observations across entities are __________ if entities are drawn by simple random sampling. Here we consider the case of entity fixed effects. Time fixed effects can simply be included as additional binary regressors. LS Assumptions for Panel Data Consider a single X: Yit = 1Xit + i + uit, i = 1,,n, t = 1,, T 3. ____________________________. (Xi1,,XiT,ui1,,uiT), i =1,,n, are i.i.d. draws from their joint

distribution. (Xit, uit) have finite fourth moments. 4. There is no perfect multicollinearity (multiple Xs) 1. 2. Assumptions 3&4 are least squares assumptions 3&4 Assumptions 1&2 differ Assumption #1: E(uit|Xi1,,XiT,i) = 0 uit has mean zero, given the state fixed effect and the entire history of the Xs for that state This is an extension of the previous multiple regression Assumption #1 This means there are effects (any lagged effects of X must enter explicitly) Also, there is not feedback from u to future X: Whether a state has a particularly high fatality rate this year doesnt subsequently affect whether it increases the beer tax. Well return to this when we take up time series data. 49

Assumption #2: (Xi1,,XiT,ui1,,uiT), i =1,,n, are i.i.d. draws from their joint distribution. This is an extension of Assumption #2 for multiple regression with cross-section data This is satisfied if entities are ___________ sampled from their population by simple random sampling. This does not require observations to be i.i.d. over time for the same entity that would be unrealistic. Whether a state has a high beer tax this year is a good predictor of (correlated with) whether it will have a high beer tax next year. Similarly, the error term for an entity in one year is plausibly correlated with its value in the year, that is, corr(uit, uit+1) is often plausibly nonzero. Autocorrelation (serial correlation) Suppose a variable Z is observed at different dates t, so observations are on Zt, t = 1,, T. (Think of there being only one entity.) Then Zt is said to be ______________ or __________ correlated if corr(Zt, Zt+j) 0 for some dates j 0. Autocorrelation

means correlation with _______. cov(Zt, Zt+j) is called the jth autocovariance of Zt. In the drunk driving example, uit includes the omitted variable of annual weather conditions for state i. If snowy winters come in clusters (one follows another) then uit will be autocorrelated (why?) In many panel data applications, uit is plausibly autocorrelated. Independence and autocorrelation in panel data in a picture: i 1 i 2 i 3 t 1 u11 u21 u31 t T u1T u2T u3T i n u n1

unT Sampling is i.i.d. across entities If entities are sampled by simple random sampling, then (ui1,, uiT) is __________ of (uj1,, ujT) for different entities i j. But if the omitted factors comprising uit are serially correlated, then uit is serially correlated. Under the LS assumptions for panel data: The OLS fixed effect estimator 1 is unbiased, consistent, and asymptotically normally distributed However, the usual OLS standard errors (both homoskedasticity-only and heteroskedasticity-robust) will in general be wrong because they assume that uit is serially uncorrelated. In practice, the OLS standard errors often understate the true

sampling uncertainty: if uit is correlated over time, you dont have as much information (as much random variation) as you would if uit were uncorrelated. This problem is solved by using clustered standard errors. Clustered Standard Errors Clustered standard errors estimate the variance of 1 when the variables are i.i.d. across entities but are potentially autocorrelated within an entity. Clustered SEs are easiest to understand if we first consider the simpler problem of estimating the mean of Y using panel data Clustered SEs for the mean estimated using panel data Yit = + uit, i = 1,, n, t = 1,, T The estimator of mean is Y = .

It is useful to writeY as the average across entities of the mean value for each entity: Y = 1 T where Yi= T Yit t1 = = is the sample mean for entity i. , Because observations are i.i.d. across entities, ( Y1 , Yn ) are i.i.d. Thus, if n is large, the CLT applies and 1 n d Y Y= i n i1

2 where Y= var( Yi). i 2 SE of Y is the square root of an estimator of Yi /n. 2 2 s The natural estimator of Y is the sample variance of Y , Y . This delivers the i 1 clustered standard error formula for Y computed using panel data: The i Clustered SE of Y = sY2 i n

2 s , where Yi = Whats special about clustered SEs? Not much, really the previous derivation is the same as was used in Ch. 3 to derive the SE of the sample average, except that here the data are the i.i.d. entity averages ( Y1 ,Yn ) instead of a single i.i.d. observation for each entity. But in fact there is one key feature: in the cluster SE derivation we never assumed that observations are i.i.d. within an entity. Thus we have implicitly allowed for serial correlation within an entity. What happened to that serial correlation where did it go? It 2 determines Yi, the variance of Y i 2 Yi

Serial correlation in Yit enters : Y 2 = var( i) Yi 1 T = var Yit = T t1 = If Yit is serially uncorrelated, all the autocovariances = 0 and we have the usual (Ch. 3) derivation. If these autocovariances are nonzero, the usual formula (which sets them to 0) will be wrong. If these autocovariances are positive, the usual formula will understate the variance of .

Yi The magic of clustered SEs is that, by working at the level of the entities and their averages , you never Yi need to worry about estimating any of the underlying autocovariances they are in effect estimated automatically by the cluster SE formula. Heres the math: Clustered SE of 2 Yi s = = = = , where 1 n Yi Y n 1 i1

n T n T 2 1 1 Yit Y n 1 i1 T t1 2 1 1 Yit Y

n 1 i1 T t1 2 Clustered SEs for the FE estimator in panel data regression The idea of clustered SEs in panel data is completely analogous to the case of the panel-data mean above just a lot messier notation and formulas. See SW Appendix 10.2. Clustered SEs for panel data are the logical extension of HR SEs for cross-section. In cross-section regression, HR SEs are valid whether or not there is heteroskedasticity. In panel data regression, clustered SEs are valid whether or not there is heteroskedasticity and/or serial correlation.

By the way The term clustered comes from allowing correlation within a cluster of observations (within an entity), but not across clusters. B. Standard Errors B.1 First get the large-n approximation to the sampling distribution of the FE estimator Fixed effects regression model: Y = 1 X + u it it n OLS fixed effects estimator: 1 = it T X Y it it

i 1 t 1 n T 2 X it i 1 t 1 n T so: 1 1 = X u it it i 1 t 1 n T 2 X it i 1 t 1

61 Sampling distribution of fixed effects estimator, ctd. Fact: T X t1 it uit = T T t 1 X it uit t 1 X it X i ui = T X t1 it uit so nT ( 1 1) = where i =

1 nT n T it i 1 t 1 2 X Q = 1 n i n i 1 Q X2 1 T 1

2 , it = X it uit , and Q X = t 1 it T nT n T 2 X it . i 1 t 1 By the CLT, d nT ( 1 1) N(0, 2 /Q X4 ) d p where means converges in distribution and Q Q X2 .

2 X 62 Sampling distribution of fixed effects estimator, ctd. 1 T 2 2 4 nT ( 1 1) N(0, /Q X ), where = var t 1 it T d B.2 Obtain Standard Error: Standard error of 1 : SE( 1 ) = 2

1 nT Q X4 Only part we dont have: what is 2 ? 63 FE Clustered Standard Errors Variance: T 1 2 it = var T t 1 Variance estimator: 2 ,clustered n

T 2 1 1 = , where it it = X it uit . n i 1 T t 1 Clustered standard error: SE( 1 ) = 2 1 ,clustered

nT Q X4 64 Comments on clustered standard errors: The clustered SE formula is NOT the usual (hetero-robust) SE formula! OK this is messy but you get something for it you can have correlation of the error for an entity from one time period to the next. This would arise if the omitted variables that make up uit are correlated over time. 65 Comments on clustered standard errors, ctd. This standard error formula goes under various names: Clustered standard errors, because there is a grouping, or cluster, within which the error term is possibly correlated, but outside of which (across groups) it is not. Heteroskedasticity- and autocorrelation-consistent standard errors (autocorrelation is correlation with other time periods uit and uis correlated) 66 Comments on clustered standard errors, ctd.

Extensions: The clusters can be other groupings, not necessarily time For example, you can allow for correlation of uit between individuals within a given group, as long as there is independence across groups for example i runs over individuals, the clusters can be families (correlation of uit for i within same family, not between families). 67 Clustered SEs: Implementation in STATA . xtreg vfrall beertax, fe vce(cluster state) Fixed-effects (within) regression Group variable: state R-sq: within = 0.0407 between = 0.1101 overall = 0.0934 corr(u_i, Xb) = -0.6885 Number of obs Number of groups Obs per group: min avg max F(1,47) Prob > F

= = = = = = = 336 48 7 7.0 7 5.05 0.0294 (Std. Err. adjusted for 48 clusters in state) -----------------------------------------------------------------------------| Robust vfrall | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------beertax | -.6558736 .2918556

-2.25 0.029 -1.243011 -.0687358 _cons | 2.377075 .1497966 15.87 0.000 2.075723 2.678427 ----------------------------------------------------------------------------- vce(cluster state) says to use clustered standard errors, where the clustering is at the state level (observations that have the same value of the variable state are allowed to be correlated, but are assumed to be uncorrelated if the value of state differs) Fixed Effects Regression Results Dependent variable: Fatality rate BeerTax State effects? Time effects? F testing time effects = 0 Clustered SEs? (1) -.656**

(.203) Yes No No (2) -.656+ (.315) Yes No 2.47 (.024) Yes (3) -.640* (.255) Yes Yes No (4) -.640++ (.386) Yes Yes

3.61 (.005) Yes Significant at the **1% *5% +10% level ++ Significant at the 10% level using normal but not Student t critical values This is a hard call what would you conclude? 69 Summary: SEs for Panel Data in a picture: i 1 i 2 i 3 i n t 1 u11 u21 u31 un 1

t T u1T u2 T u3T unT i.i.d. sampling across entities Intuition #1: This is similar to heteroskedasticity you make an assumption about the error, derive SEs under that assumption, if the assumption is wrong, so are the SEs Intuition #2: If u21 and u22 are correlated, there is _____ information in the sample than if they are not and SEs need to account for this (usual SEs are typically too small)

Hetero-robust (or homosk-only) SEs dont allow for this correlation, but clustered SEs do. 70 Application: Drunk Driving Laws and Traffic Deaths (SW Section 10.6) Some facts Approx. 40,000 traffic fatalities annually in the U.S. 1/3 of traffic fatalities involve a drinking driver 25% of drivers on the road between 1am and 3am have been drinking (estimate) A drunk driver is 13 times as likely to cause a fatal crash as a non-drinking driver (estimate) 71 Drunk driving laws and traffic deaths, ctd. Public policy issues Drunk driving causes massive externalities (sober drivers are killed, society bears medical costs, etc. etc.) there is ample justification for governmental intervention Are there any effective ways to reduce drunk driving? If so, what? What are effects of specific laws: mandatory punishment minimum legal drinking age economic interventions (alcohol taxes)

72 The drunk driving panel data set n = 48 U.S. states, T = 7 years (1982,,1988) (balanced) Variables Traffic fatality rate (deaths per 10,000 residents) Tax on a case of beer (Beertax) Minimum legal drinking age Minimum sentencing laws for first DWI violation: Mandatory Jail Manditory Community Service otherwise, sentence will just be a monetary fine Vehicle miles per driver (US DOT) State economic data (real per capita income, etc.) 73 Why might panel data help? Potential OV bias from variables that vary across states but are constant over time: culture of drinking and driving quality of roads vintage of autos on the road use state fixed effects Potential OV bias from variables that vary over time but are constant across states: improvements in auto safety over time changing national attitudes towards drunk driving use time fixed effects

74 75 76 Empirical Analysis: Main Results Sign of beer tax coefficient changes when fixed state effects are included Fixed time effects are statistically significant but do not have big impact on the estimated coefficients Estimated effect of beer tax drops when other laws are included as regressor The only policy variable that seems to have an impact is the tax on beer not minimum drinking age, not mandatory sentencing, etc. however the beer tax is not significant even at the 10% level using clustered SEs. The other economic variables have plausibly large coefficients: more income, more driving, more deaths 77 Digression: extensions of the n-1 binary regressor idea The idea of using many binary indicators to eliminate omitted variable bias can be extended to non-panel data the key is that the omitted variable is constant for a group of observations, so that in effect it means that each group has its own intercept.

Example: Class size effect. Suppose funding and curricular issues are determined at the county level, and each county has several districts. If you are worried about OV bias resulting from unobserved county-level variables, you could include county effects (binary indicators, one for each county, omitting one county to avoid perfect multicollinearity). 78 Summary: Regression with Panel Data (SW Section 10.7) Advantages and limitations of fixed effects regression Advantages You can control for unobserved variables that: vary across states but not over time, and/or vary over time but not across states More observations give you more information Estimation involves relatively straightforward extensions of multiple regression 79 Fixed effects regression can be done three ways: 1. Changes method when T = 2 2. n-1 binary regressors method when n is small 3. Entity-demeaned regression Similar methods apply to regression with time fixed effects and to both time and state fixed effects

Statistical inference: like multiple regression. Limitations/challenges Need variation in X over time within states Time lag effects can be important You should use heteroskedasticity- and autocorrelationconsistent (clustered) standard errors if you think uit could be correlated over time 80

Recently Viewed Presentations

  • Walls make Homes The best paint The best

    Walls make Homes The best paint The best

    MOVING TOWARDS INTERNATIONAL PRACTISES ADAPTING TO INDIAN SITE CONDITIONS 15-16 Sqft/15mm 15-16 Sqft/15mm Coverage/40 Kg bag (Indicative only) 6-20mm 6-20mm Thickness per coat Water resistant plaster Easy to use Major benefit high medium Water proof Good Good Crack Resistance 30-40...
  • Aquatic insects Ch. 10 All freshwater habitats are

    Aquatic insects Ch. 10 All freshwater habitats are

    Larvae only when life cycle includes pupa (Holometabolous) Colonization of the aquatic habitat. Ephemeroptera, Odonata, Plecoptera, Trichoptera, Megaloptera ... Use surface tension like a spider web. Sense vibrations (waves) and orient. Gyrinidae. Also use surface tension as a sensory web...
  • Learning Target: I can use data to compare

    Learning Target: I can use data to compare

    Learning Target: I can use data to compare mass, volume and density based on graphical information. Calculate Density . Practice reading Mass vs. Volume Graphs. Interpreting Mass vs. Volume Graphs . Practice citing data (Density Unit test preparation)
  • Innovation pilots - Justice

    Innovation pilots - Justice

    Payment by results Innovation pilots 11 August 2011 Today's event Update you on the payment by results programme To set out the details of the innovation pilots To discuss with you some of the opportunities and challenges To hear your...
  • Corporate Finance - pages.stern.nyu.edu

    Corporate Finance - pages.stern.nyu.edu

    Step 3: Estimate a probability of bankruptcy at each debt level, and multiply by the cost of bankruptcy (including both direct and indirect costs) to estimate the expected bankruptcy cost. In practice, analysts often do the first two steps but...
  • 5. Sitzung zur Wohlfahrtsanalyse II: Dualität, Marshall- und ...

    5. Sitzung zur Wohlfahrtsanalyse II: Dualität, Marshall- und ...

    5. Sitzung zur Wohlfahrtsanalyse II: Dualität, Marshall- und Hicksnachfrage,Shepard`s Lemma, Variationsmaße und Zusatzlast Musterlösung 5. Sitzung zur Wohlfahrtsanalyse II 1.b) Intuition von Gossen 2 x2 x1 100 50 100 50 25 75 U = 2500 U = 1875 U U...
  • Weather Crash Course

    Weather Crash Course

    The cooling aids in condensation and the formation of clouds. What's Blowing? Wind - air moving horizontally Air will naturally move where there is less pressure. When the air moves there is wind. When you squeeze a balloon you apply...
  • WHY CELLULOSIC ETHANOL IS NEARER THAN YOU MAY

    WHY CELLULOSIC ETHANOL IS NEARER THAN YOU MAY

    Before and After AFEX Pretreatment Economic Analysis by NREL Results of AFEX Economic Analysis* Reduce ammonia loadings Reduce required ammonia recycle concentrations (manage system water) Reduce capital cost of AFEX *Analysis performed by Dr. Tim Eggeman of NREL Improvements in...