# Review of Probability and Statistics

Binary independent and dependent variables FIN822 Li 1 Binary independent variables y = 0 + 1x1 + 2x2 + . . . kxk + u FIN822 Li

2 Dummy Variables A dummy variable is a variable that takes on the value 1 or 0 Examples: male (= 1 if are male, 0 otherwise), NYSE (= 1 if stock listed in NYSE, 0 otherwise), etc. Dummy variables are also called binary variables FIN822 Li

3 A Dummy Independent Variable Consider a simple model with one continuous variable (x) and one dummy (d) y = 0 + 0d + 1x + u This can be interpreted as an intercept shift If d = 0, then y = 0 + 1x + u If d = 1, then y = (0 + 0) + 1x + u The case of d = 0 is the base group FIN822 Li

4 Example, Y=turnover=trading volume/outstanding shares X=size of firms=log (stock price*outstanding shares) D=1 for NYSE stocks, 0 otherwise (AMEX and NASDAQ) If you assume the slope is the same for both groups, then regress y = 0 + 0d + 1x + u FIN822 Li

5 Example of 0 > 0 y y = (0 + 0) + 1x d=1 0 {

slope = 1 d=0 } 0 y = 0 + 1x x FIN822 Li 6

Dummies for Multiple Categories We can use dummy variables to deal with something with multiple categories Suppose everyone in your data is either a HS dropout, HS grad only, or college grad To compare HS and college grads to HS dropouts, include 2 dummy variables hsgrad = 1 if HS grad only, 0 otherwise; and colgrad = 1 if college grad, 0 otherwise FIN822 Li

7 Multiple Categories (cont) Any categorical variable can be turned into a set of dummy variables The base group is represented by the intercept; if there are n categories there should be n 1 dummy variables FIN822 Li 8

Example, Y=turnover=trading volume/outstanding shares X=size of firms=log (stock price*outstanding shares) D=1 for NYSE stocks, 0 otherwise (AMEX and NASDAQ) If you assume the slope and intercept are different for both groups, you could run two separate regression for each group, or FIN822 Li

9 Interactions with Dummies Can also consider interacting a dummy variable, d, with a continuous variable, x y = 0 + 1d + 1x + 2d*x + u If d = 0, then y = 0 + 1x + u If d = 1, then y = (0 + 1) + (1+ 2) x + u This is interpreted as a change in the slope (and intercept). FIN822 Li

10 Example of 0 > 0 and 1 < 0 y y = 0 + 1x d=0 d=1 y = (0 + 0) + (1 + 1) x FIN822 Li x

11 Caveats A typical use of a dummy variable is when we are looking for a program effect For example, we may have individuals that received job training and wish to test the effect of training. We need to remember that usually individuals choose whether to participate in a program, which may lead to a self-selection problem

FIN822 Li 12 Self-selection Problems If we can control for everything that is correlated with both participation and the outcome of interest then its not a problem Often, though, there are unobservables that are correlated with participation. For example, peoples skill before the training. Low skill people

may be more likely to take the training. In this case, the estimate of the program effect is biased. FIN822 Li 13 Self Selection Problem: An example Suppose one professor finds that his evening class students perform better that his daytime class. Does that mean the professor teaches better

at night or students learn better at night? No. It turns out that students participating in night classes might be more hard working. They care to come at night, sometimes after a daytime work, etc. FIN822 Li 14 Self-selection Problems If we can control (control means adding more explanatory variables) for everything

that is correlated with both participation (the dummy variable) and the outcome (y), then its not a problem. FIN822 Li 15 Binary Dependent Variables Logit and Probit models P(y = 1|x) = G(0 + x)

FIN822 Li 16 Binary Dependent Variables A linear probability model can be written as P(y = 1|x) = 0 + x A drawback to the linear probability model is that predicted values are not constrained to be between 0 and 1 An alternative is to model the probability as a function, G(0 + x), where 0

FIN822 Li 17 The intuition FIN822 Li 18 The Probit Model One choice for G(z) is the standard normal

cumulative distribution function (cdf) G(z) = (z) (v)dv, where (z) is the standard normal, so (z) = (2)-1/2exp(-z2/2) This case is referred to as a probit model Since it is a nonlinear model, it cannot be estimated by our usual methods Use maximum likelihood estimation FIN822 Li 19 The Logit Model

Another common choice for G(z) is the logistic function G(z) = exp(z)/[1 + exp(z)] = (z) This case is referred to as a logit model, or sometimes as a logistic regression Both functions have similar shapes they are increasing in z, most quickly around 0 FIN822 Li 20 Probits and Logits

Both the probit and logit are nonlinear and require maximum likelihood estimation No real reason to prefer one over the other Traditionally saw more of the logit, mainly because the logistic function leads to a more easily computed model FIN822 Li 21 Interpretation of Probits and

Logits In general we care about the effect of x on P(y = 1|x), that is, we care about p/ x For the linear case, this is easily computed as the coefficient on x For the nonlinear probit and logit models, its more complicated: p/ xj = g(0 +x)j, where g(z) is dG/dz FIN822 Li 22

For logit model p/ xj =j exp(0 +x(1 + exp(0 +x) ) 2 For Probit model: p/ xj = j (0 +x) = j (2)-1/2exp(-(0 +x2/2) FIN822 Li 23 Interpretation (continued)

Can examine the sign and significance (based on a standard t test) of coefficients, FIN822 Li 24

## Recently Viewed Presentations

• The other founders are: Chicago University (Argonne National Lab), University of Southern California, Los Angeles (Information Sciences Institute) and the PDC, Stockholm, Sweden (4th call out) The EU EGEE project (Enabling Grids for E-Science in Europe) is establishing a common...
• (2) 利用Shephard's Lemma以及要素需求函數為要素價格的零階齊次函數的性質， (3) (2)式對 偏微分 (4) 利用(4)式可得出 (5) 令 i 要素達到技術效率的影子份額為 令 i 要素調整技術效率後的影子份額為 (7) (6) 比較 (6) 與 (7) 兩式知，是否考慮 TI 參數 b，不影響 i 要素的 ...
• European Exploration. Bmcee.uark.edu . Bessie B. Moore Center for Economic Education. Early explorers were in search of gold and silver. The earliest European Explore in what became Arkansas was Hernando de Soto from Spain searching for gold and silver and...
• One of the most common theories suggests that it is an acronym for "constable on patrol," but there is no historical or etymological evidence to back up this interpretation. Most acronym-based words entered the language in the 20th century, and...
• Collaborating with non-committal VOADs. Coordination: Excel files, Google Docs, local self-proclaimed "King of the VOADs," and reports-reports-reports. Communication: Clients call back, and you must play a game of phone to track down work orders. Each VOAD uses their own systems....
• Aim: How can we prepare for the plot and themes that will arise in the play Hamlet?. Read each of the 4 scenarios and answer the questions for each. GROUPS- You will be put into a group to work on...
• MGT410,MGT499,MGT320,MGT340,MGT491 Strategic Planning Understand Strategy and how it differs from plans, and tactics Look at models of strategy development Understand the process of strategy development Product Life Cycle and Generic Strategies Functions of Planning Strategy The major courses of action...
• Dan Ireland, the student body president at Toledo State University, is concerned about the cost to students of textbooks. He believes there is a relationship between the number of pages in the text and the selling price of the book....