C. Logit model, logistic regression, and log-linear model A comparison Leaving home Models of counts: log-linear model R o w i, C o lu m n j S e x : A ,B ln ij u u A i u Bj u AB ij or ln ij i j ij or ln

0 1 x x x .. 1 2 2 3 3 w ith A T IM E [e a rly = 0 ; la te = 1 ] a n d B S E X [fe m a le = 0 ; m a le = 1 ] E A R L Y is re fe re n c e c a te g o ry Leaving home Model 1: null model = 4.887 ij = 133.5 for all i and j (=530/4) Model 2: + TIME = 4.649 i = 0.4291 ln = exp[4.649 + 0.4291 t] 104.5 for early (t=0) and 160.5 for late (t=1) or ln = exp[4.649] = 104.5 for early ln = exp[4.649 + 0.4291] = 160.5 for late

Leaving home M o d el 3 : T IM E A N D S E X = 4 .6 9 7 ; 2 = 0 .4 2 9 1 ; 2 = - 0 . 0 9 8 2 R e fe re n c e c a te g o rie s : e a rly [ 1 = 0 ] a n d F e m a le s [ 1 = 0 ] ln ij i j Table Predicted number of young adults leaving home by age and sex (unsaturated log-linear model) Females Males Total < 20 109.6 99.4 209 20 168.4 152.6 321 Total 278 252 530 Leaving home

Model 3: Time and Sex (unsaturated log-linear model) ln ij i j ij exp i j 11 = exp[4.697] = 109.6 21 = exp[4.697 + 0.4291] = 168.4 12 = exp[4.697 - 0.0982] = 99.4 22 = exp[4.697 + 0.4291 - 0.0982] = 152.8 Leaving home M o d e l 4 : T I M E A N D S E X A N D T I M E * S E X in te r a c tio n (S a tu ra te d lo g -lin e a r m o d e l ln ij i j = 4 .9 0 5 2 = 0 .0 5 7 6 2 = - 0 .6 0 1 2

2 2 = 0 .8 2 0 1 ij o T G T v e ra ll e ffe c t IM E E N D E R IM E * G E N D E R o r ln ij = 0 x 1i= 1 fo r < 2 0 fo r 2 0 x x = 0 = 1 fe m a le s m a le s

= = = = < < 1 i x x x x 2 i 2 i 3 i 3 i 3 i 3 i 0 0 0 1 2 2 2 2 0 0 0 0

a a a a n n n n x 0 d d d d 1 fe m fe m m a m a a le s le s a le s le s S a tu ra te d m o d e l p re d ic ts p e rfe c tly

1i x 2 2i x 3 3 i Leaving home M o d e l 4 : T I M E A N D S E X A N D T I M E * S E X in te r a c tio n = 4 .9 0 5 2 = 0 .0 5 7 5 7 2 = - 0 .6 0 1 2 22 = 0 .8 2 0 1 o v e ra ll e ffe c t T IM E (2 ) S E X (2 ) T IM E (2 )* S E X (2 ) ln ij i j ij Table Predicted number of young adults leaving home by age and sex (saturated log-linear model) Females Males Total

< 20 135 74 209 20 143 178 321 Total 278 252 530 Leaving home Model 4: TIME AND SEX AND TIME*SEX interaction ln ij i j ij ij exp i j ij 11 = exp[4.905 = 135 21 = exp[4.905 + 0.0576] = 143 12 = exp[4.905 - 0.6012]

= 74 22 = exp[4.905 + 0.0576 - 0.6012 + 0.8201] = 178 Log-linear and logit model Political attitudes Log-linear model: A B AB ln ij i j ij Select one variable as a dependent variable: response variable, e.g. does voting behaviour differ by sex Are females more likely to vote conservative than males? Logit model: ln 1j 2j B j Political attitudes Are females more likely to vote conservative than males?

A = Party; B = Sex Effect coding (1) Males voting conservative rather than labour: A B AB A B AB ln 11 1 1 11 2 1 21 Log-odds = logit 21 B B ln - - 2 2 1 ln 1 11 A A AB AB

A AB 1 2 11 21 1 21 21 Females voting conservative rather than labour: ln B B ln - - 2 2 2 ln 2 12 A B AB A B

AB 1 2 12 2 2 22 22 12 22 A A AB AB A AB 1 2 12

22 1 22 Political attitudes Are women more conservative than men? Do women vote more conservative than men? The odds ratio. B ln 2B 2 1 2 - 1 B B B B 1 If the odds ratio is positive, then the odds of voting conservative rather than labour is larger for women than men. In that case, women vote more conservative than men. B

B - *1 B ln 1 1 2 - 1 * 0 B B ln 2 1 B B 2 B 1 p p a bx Logit model: logit(p) ln ln 1- p p 1 with x = 0, 1 2 B 1 B B

and b = 2 1 with a = Log odds of reference category (males) Log odds ratio (odds females / odds males) The logit model as a regression model Select a response variable proportion Dependent variable of logit model is the log of (odds of) being in one category rather than in another. Number of observations in each subpopulation (males, females) is assumed to be fixed. Intercept (a) = log odds of reference category Slope (b) = log odds ratio Logit model: descriptive statistics Political attitudes Counts in terms of odds and odds ratio DATA Sex Party Male Female Conservative 279 352 Labour 335 291 Total 614 1257

Total 631 626 643 Reference categories: Labour; Males Sex Male Female Total Odds 0.8328 1.2096 1.0080 Odds ratio (ref.cat: males): 1.4524 Party Conservative Labour Total Odds 1.2616 0.8687 1.0472 F11 = 279 F21 = 335 = 279 * 335/279 = 279 / 0.8328 F12 = 352 = 279 * 352/279 = 279 1.2616 F22 = 291 = 279 * 352/279 * 291/352 = 279 * 1.2616 * [1/1.2096] Odds ratio 1.4524 Political attitudes LOGIT MODEL DATA Party Conservative Labour

Total Sex Male Female 279 352 335 291 614 643 Proportion voting conservative: Sex Party Male Female Conservative 0.454 0.547 Total 631 626 1257 Odds of voting cons. rather than labour Males Females 0.8328 1.2096 Are females more likely to vote conservative than males? Logit model: logit(p) =a +bX v ln(odds) a= -0.18292 b=

0.37323 a+b = 0.19031 (males reference category) exp(v) p (odds) 0.8328 0.454 Males 1.4524 Odds ratio 1.2096 0.547 Females logit(p) =-0.18292 +0.37323X (with X =0 for males and X =1 for females) If number of males and number of females are known, the counts can be calculated. =0.833/(1+0.833) =1.2096/(1+1.2096) Political attitudes Logistic regression SPSS Reference category: females (X = 1 for males and X = 0 for females) Variable SEX(1) Constant Param .3732 -.1903 S.E. Exp(param) .1133 1.4524 .0792 Females voting labour: 1/[1+exp[-(-0.1903)]] = 45% 291/626 (females ref.cat) Males voting labour: 1/[1+exp[-(-0.1903+0.3732)]] = 55% 335/626

Different parameter coding: X = -0.5 for males and X = 0.5 for females Variable SEX(1) Constant Param -.3732 -.0037 S.E. Exp(param) .1133 0.6885 .0567 Females voting labour: 1/[1+exp[-(-0.0037 + 0.5*(-0.3732))]] = 45% 291/626 Males voting labour: 1/[1+exp[-(-0.0037 - 0.5 * (-0.3732))]] = 55% 335/626 The logit model and the logistic regression Observation from a binomial distribution with parameter p and index m Leaving parental home Leaving Home L o g it m o d e l a n d lo g istic r e g r e ssio n N u m b e r o f y o u n g a d u lts le a v in g h o m e e a r ly : 2 0 9 T o ta l n u m b e r o f y o u n g a d u lts le a v in g h o m e : 5 3 0 P r o b a b ility o f le a v in g h o m e e a r ly : 2 0 9 / 5 3 0 = 0 . 3 9 4 R E F E R E N C E C A T E G O R Y : le a v in g h o m e la te ( la te = 0 ; e a r ly = 1 ) O D D S o f le a v in g h o m e e a r ly v e r s u s la te : 2 0 9 / ( 5 3 0 - 2 0 9 ) = 0 . 6 5 1 1 L o g it o f le a v in g h o m e e a r ly : ln 0 .6 5 1 1 = -0 .4 2 9 1 S p e c if y a m o d e l: L o g it m o d el Logit p ln

p 0.394 ln 1 - p 1 - 0.394 - 0.4 291 Leaving home L o g is tic r e g r e s s io n p 1 - (-0.4291) 1 exp 0.394 S ta n d a r d e r r o r : 1 209 1 321 0 .0 8 9 C o n fi d e n c e i n t e r v a l : - 0 . 4 2 9 1 1 . 9 6 * 0 . 0 8 8 9 = (-0 .6 0 3 , -0 .2 5 5 ) O N L O G IT S C A L E and

1 exp 1 [-( 0.603 )] , 1 1 exp[-( 0.2 549)] (0.3546, 0 .4 3 6 ) O N P R O B A B IL IT Y S C A L E Leaving home Relation logit and log-linear model The unsaturated model Log-linear model: ln ij i j With i effect of timing and j effect of sex Odds of leaving parental home late rather than early: females: 21 168.4 1.536 ODDS21 11 109.6 21 ODDS21

11 exp 2 1 exp 2 - 1 21 exp 0.4291 - 0 1.536 exp 1 1 Leaving home Relation logit and log-linear model The unsaturated model Odds of leaving parental home late rather than early: males: 152.6 1.536 ODDS 99.4 22 21 12 22 exp 2 2 exp - exp 0.4291 - 0 1.536 ODDS21 2 1 21 12 exp 1 2 Logit p 0.4291 for females and males. p late

early Output of logit model gives same result (s.e. 0.0889) Leaving home Relation logit and log-linear model The saturated model Log-linear model: ln ij i j ij With i effect of timing and j effect of sex and ij the effect of interaction between timing and sex Odds of leaving parental home late rather than early: females (ref): 21 143 1.059 ODDS21 11 135 21 exp 2 1 21 exp ( - ) ( - ) ODDS21 1 21 2 21 11 11 exp 1 1 11 exp (0.0576 - 0) (0 - 0) 1.059 Leaving home Relation logit and log-linear model The saturated model Odds of leaving parental home late rather than early: males: 178 ODDS 2.405 74

22 22 12 22 exp 2 2 22 exp ( - ) ( - ) ODDS22 1 22 2 22 12 12 exp 1 2 12 exp (0.0576 - 0) (0.8201 - 0) 2.405 ln 1.059 0.0573 is overall effect of logit model (females ref. cat) ln 2.405 0.8775 is log odds for males 0.8775 - 0.0573 0.8201 is log ODDS RATIO (odds males / odds females [ref]) logit model : logit(p) 0.0573 0.8201 X (with X 0 for females and 1 for males) Leaving home Logit model: X=0 for males p Logit(p) ln 0.8777 - 0.8201X 1- p X=1 for females Logistic regression: probability of leaving home late 1 178 p

0.706 1 exp[-(0.8777)] 252 males 1 143 p 0.514 females 1 exp[-(0.8777 - 0.8201)] 278 Leaving home T a b le N u m b e r o f y o u n g a d u lts le a v in g h o m e b y a g e a n d s e x F e m a le s M a le s T o ta l < 2 0 1 3 5 7 4 2 0 9 2 0 1 4 3 1 7 8 3 2 1 T o ta l 2 7 8

2 5 2 5 3 0 A re m a le s m o re lik e ly to le a v e h o m e e a rly th a n fe m a le s ? D u m m y c o d in g : re fe re n c e c a te g o ry : (i) fe m a le s ; (ii) le a v in g h o m e la te L o g it m o d e l: Logit p ln p i 1 - p i 0 1 x i - 0.05757 - 0.8201 x i x i is 0 fo r fe m a le s a n d 1 fo r m a le s L O G I T p is 0 .0 5 7 5 7 f o r f e m a le s a n d 0 .0 5 7 5 7 0 .8 2 0 1 = - 0 .8 7 7 7 f o r m a le s O D D S F e m a le s ( r e f e r e n c e ) : e x p [ - 0 .0 5 7 5 7 ] = 0 .9 4 4 0 = 1 3 5 /1 4 3

M a le s : e x p [ -0 .8 7 7 7 ] = 0 .4 1 5 7 = 7 4 /1 7 8 O D D S R A T IO O D D S m a le s /O D D S fe m a le s = e x p [ - 0 .8 2 0 1 ] = 0 .4 4 0 4 = 0 .4 1 5 7 /0 .9 4 4 0 Leaving home Dummy coding: ref.cat: females, late Logit p ln pi 0 1 xi - 0.05757 - 0.8201 xi 1 - pi L o g is tic re g re s s io n p f p m 1 exp 1 - (-0.05757) 0.486 1

- 1 exp (-0.05757 - 0.8201) 0.294 Effect coding or marginal coding: females +1; males 1 Logit p ln pi 1 - pi 0 1 x i - 0.4676 0.4101 x i

x i is 1 fo r fe m a le s a n d -1 fo r m a le s L o g it p is 0 .4 6 7 6 + 0 .4 1 0 1 = - 0 .0 5 7 6 f o r f e m a le s a n d - 0 .4 6 7 6 + 0 .4 1 0 1 * ( - 1 ) = - 0 .8 7 7 7 f o r m a le s The logistic regression in SPSS Micro data and tabulated data SPSS: Micro-data Micro-data: age at leaving home in months Create variable: TIMING2 based on MONTH: TIMING2 =1 (early) if month 240 & reason < 4 Crosstabs: Number leaving home by reason (row) and sex (column) Create variable: Age in years Age = TRUNC[(month-1)/12] TIMING2 =2 (late) if month > 240 & reason < 4 For analysis: select cases that are NOT censored: SELECT CASES with reason < 4 SPSS: tabulated data Number of observations: WEIGHT cases (in data) No difference between model for tabulated data and micro-data

Leaving home The logistic regression in SPSS SPSS: regression/logistic Note: Dependent variable: TIMING2 (p = probability of leaving home LATE) Covariate: sex (CATEGORICAL) Logit[p/(1-p)] = 0.8777 0.8201 X with males reference category Males coded 0; hence X is 1 for females OUTPUT SPSS: ---------------------- Variables in the Equation ----------Variable B SEX(1) Constant -.8201 .8777 S.E. Wald df .1831 20.0598 .1383 40.2681 Sig R Exp(B) 1 .0000 -.1594 .4404 1 .0000 Related models Poisson distribution: counts have Poisson distribution (total number not fixed) Poisson regression

Log-linear model: model of count data (log of counts) Binomial and multinomial distributions: counts follow multinomial distribution (total number is fixed) Logit model: model of proportions [and odds (log of odds)] Logistic regression Log-rate model: log-linear model with OFFSET (constant term) Parameters of these models are related