Econometrics Econ. 504 Chapter 8: DUMMY VARIABLE (D.V.) REGRESSION MODELS I. The Natural of Dummy Variables In regression analysis the dependent variable is frequently influenced by variables that are essentially qualitative, in nature, such as sex,

race, color, religion, nationality, geographical region, etc. One way we could quantify such attributes is by constructing artificial variables such that: 1 indicating the presence of that attribute. 0 indicating the absence of that attribute. Variables that assume such 0 and 1 values are called dummy variables. Example: (A) 1 may indicate that a person is a

female and 0 may designate a male; (B) 1 may indicate that a person is a college graduate, and 0 that the person is not, and so on. II. Estimating Models with Dummy Variables the wage gain/loss if the person is a woman rather than a man (holding other things fixed)

Dummy variable (D): =1 if the person is a femal =0 if the person is a male Note: The coefficients attached to the dummy variables are known as the differential intercept coefficients Also Note that Now we have two cases: Di=0 Yi=1+2X2i+ 3(0) +ui Yi=1+2X2i+ui

Di=1 Yi=1+2X2i+3(1)+ui Yi=(1+3) +2X2i+ui Numerical Illustration: Wage (in KD) Education (Year) D

5000 7 1 2000 5

0 3600 6 0 5500 8

0 1000 3 1 1500

4 1 ... So on ... So on ... So on Graphical Illustration:

Holding education, and other variables (if any), women earn 1.81$ less per hour than men II. Caution in the Use of Dummy Variables When dealing with dummy variables in the regression function, you should be aware to some important aspects. Therefore, there are three forms of model that are used to explain the multiple regression

analysis with qualitative information. Dummy variable trap 1- When separating the dummy variable: This model cannot be estimated (perfect collinearity) 2-Alternatively, one could omit the intercept: Disadvantages: 1) More difficult to test for differences between the parameters 2) R-squared formula only valid

if regression contains intercept 3- When using dummy variables, one category always has to be omitted: The base category are men The base category are women III. Interaction Variables We can use dummy variables as standalone independent variables, but also we can interact (multiply) them with quantitative variables. Interacting dummy variables with quantitative

variables provides flexibility to detect differences between groups overall and differences that may vary depending on the value of quantitative variables. The product of the dummy variable (D) with the independent variable (X) results in a new term called interaction term: Yi=0++i DiXi + + ui The inclusion of an interaction term in your econometrics model allows the regression function to

have a different intercept and slope for each group identified by the dummy variables (used in the interaction term). The coefficient for your dummy variable in the regression shifts the intercept, while the coefficient of your interaction term changes the slope. Consider the same case but now with the dummy affecting the slope Yi=1+2X2i+3DiX2i+ui

Now we have two cases Di=0 Yi=1+2X2i+3(0)iX2i+ui Yi=1+2X2i+ui Di=1 Yi=1+2X2i+3(1)iX2i+ui Yi=1+(2+3)X2i+ui IV. Testing of Significance When using dummy variables in the

regression, you have to take into account the collective significance of those variables. Their effect can be collectively significant even if they are individually insignificant. Example: Assume that the determination of the college grade point average (GPA) is reflected by the following regression function: Unrestricted model (contains full set of interactions) College grade point average

Standardized aptitude test score High school rank percentile Restricted model (same regression for both groups) Total hours spent in college courses Null hypothesis:

All interaction effects are zero, i.e. the same regression coefficients apply to men and women Estimation of the unrestricted model: Tested individually, the hypothesis that the interaction effects are zero cannot be rejected

Joint test with F-statistic Null hypothesis is rejected The Chow Test for Structural Stability Alternative way to compute F-statistic ( in the same previous example): Run separate regressions for men and for women; the unrestricted SSR is given by the sum of the SSR of these two regressions Run regression for the restricted model and store SSR

the test is computed in this way it is called the ChowTest Important: Test assumes a constant error variance accross groups The Chow Test for Structural Stability Step 3: Calculate the F-statistic ( SSRr [ RSSun ,1 RSSun , 2 ]) / k 1 ( RSSun ,1 RSSun , 2 ) /( n 2k 2)

Step 4: If F-statistical bigger than F-critical F(k,n-2k-2) then reject the null that the parameters are stable for the whole data set.