Two Primary Purposes for ANCOVA Adjust the means of the dependent variable to correspond to a common value of the independent variable, thereby providing a "fair" comparison of the groups. Remove extraneous variation in the dependent variable that is associated with the covariate, thereby providing smaller standard errors of group means and more powerful tests for significance of differences between group means. Xuhua Xia Analysis of Covariance (ANCOVA) ANCOVA combines analysis of variance with regression. When it is used Its rationale The relationship between Y and the covariate needs to have the same slope in different groups. ANCOVA can be done in EXCEL by using dummy variables to code categorical variables. Assumptions. Xuhua Xia Knowledge Why ANCOVA? 35 30 25 20 15 10 5 0 Woman Man 5 10 15 20 25 30 Age Avoid a conclusion that there is a significant difference between groups when in fact there is none. Xuhua Xia Why ANCOVA? Little overlap in Weight Loss 10

WtLoss 8 6 4 2 0 0 20 40 60 80 100 Humidity Species Much overlap in Weight Loss 2 1 0 Xuhua Xia 2 4 6 WtLoss 8 10 Avoiding a conclusion that there is no difference between groups when there is a significant difference. Are the two groups different? 6 2 22 2 2 2 2 2 2 2 2 22222 2 22222 2222 2 22 22222 222222 2 2

2 2 2 2 22 2 2 2 222222 2 2 2 2 222 2 2 22 122 2 22 2 2 22 1 2222 22 2 2211 2 22 2 2 2211 122 2122221 22221 2222 22 1 221 2 2 1 11212 2 1 12 1 1 2 1 2 22 2 2 21221122222 1111222 1 122 2 1 2 12211 2222 212 1 2 2212 21 1 11 1 1 211 1 11 2 1 1221222 2 1 2 2 2 221221222211221 2221 1 12 1 2 1 1 2 2 2 1 1 1 2 2 1 1 1 2 1 2 2 2 2 1 1 2 1 2 221 22 222 2112111 2111 1

2 2 221 211 1221111 21 121 1 2 2 2 1 22122 12112 2 12 1 1 2 111 12 11 1211 12212111222121 1 1 2 21 2 1 11 111 112 2 2222 2 2 1 212 221112112122111 21 121 11122 2 1 2 1 2 2 2 12 211 1 1 2212 12 2 21 12 2221212121 212 122 1 1 1 22 111 1 2 1 1 2 2 1 2 22212 11112 1 21 1 11 2212 22 1 221 21 1 1 22 2221 1 212 12 212212 1 1 2 1 1 2 11 1111 2211112 122 1 1 1 1 2 1 1 1 2 2 22

221 2 1 2 11 1 111211 221 12222 21 212222212122222221212 11212121 12221 12 1 21 1 2 2 2 212 2 2 121 2112 2 22222221 1 11 2 11 2 2 121 22 22 221221 1 1 1 2 1 1 22 2 1 222 22 1 12 1 1 2 22 11 22 22222 222 1121 2 1212 12 12 2 2 2 2 2 2 1 2 2 2 22 2 2 2 1 12 22 2 222222 2222 22 2 2 22 2 22 2 22 2 22 22 2 22 2 22 2 22 2 2 2 5 X2 4 3 2 1 0 0 1

2 3 X1 Xuhua Xia 4 5 6 Effect of two drugs PreS AfterS Diff a 1 2 -1 a 2 3.1 -1.1 a 3 4.1 -1.1 a 4 5 -1 c 1 1 0 c

2 1.9 0.1 c 3 3.1 -0.1 6 c 4 4 0 5 Do analyses in EXCEL and explain results: AftScore Drug a a 4 a 3 2 a c 1. Test heterogeneity in slope 1 2. ANCOVA 0 0 1 c c

2 3 PreScore Xuhua Xia c 4 5 ANCOVA in R 3 2 1 AfterS 4 5 md <- read.table("AncovaEx1Drug.txt",header=T) attach(md) minY<-min(AfterS) maxY<-max(AfterS) plot(PreS[Drug=="a"],AfterS[Drug=="a"],xlab="PreS",ylab="AfterS",ylim=c(min Y,maxY),pch=16) points(PreS[Drug == "c"], AfterS[Drug == "c"], col='red',pch=16) # Will ANOVA reveal the difference between the two drugs? # First, do patients in the two groups # differ at beginning? fitANOVA<-aov(PreS~Drug);anova(fitANOVA) fitANOVA<-aov(AfterS~Drug);anova(fitANOVA) # Check the plot for slope heterogeneity # Explicit test of slope heterogeneity fit<-lm(AfterS~PreS*Drug) anova(fit) drop1(fit,~.,test="F") # Type III SS and F-test # No significant interaction: do ANCOVA fit<-lm(AfterS~PreS+Drug) summary(fit) 1.0 1.5 2.0 2.5 3.0 3.5 4.0 anova(fit) PreS Teaching Evaluation Teacher PreS AftS

PAT 71 73 JAY 69 75 PAT 70 73 JAY 69 70 PAT 56 59 JAY 71 73 PAT 77 83 JAY 78 82 ROBIN 72 79 JAY 79 81 ROBIN

64 65 JAY 73 75 ROBIN 74 74 PAT 69 70 ROBIN 72 75 PAT 68 74 ROBIN 82 84 PAT 75 80 ROBIN 69 68 PAT 78 85 ROBIN 76 76

PAT 68 68 ROBIN 68 65 PAT 63 68 ROBIN 78 79 PAT 72 74 ROBIN 70 71 PAT 63 66 ROBIN 60 61 PAT 71 76 PAT 72 78 Xuhua Xia

Three teachers: Jay, Pat and Robin, teaching the same course to three separate classes with 6, 14, and 11 students, respectively. If all students are identical at the beginning of the class, then we only need to check the final performance (AftS). However, if one teacher happens to get good students to start with, then his students will tend to have high grade at the end even if the teacher is not good. So performance at the beginning (PreS) should be taken into consideration. 85 80 75 70 65 60 60 65 70 AftS AftS 75 80 85 Graphic ANCOVA 55 Xuhua Xia 60 65 70 PreS 75 80 55 60 65 70 PreS 75

80 ANCOVA in R md <- read.table("AncovaEx2TeachingEval.txt",header=T) attach(md) minX<-min(PreS) maxX<-max(PreS) minY<-min(AftS) maxY<-max(AftS) plot(PreS[Teacher=="JAY"],AftS[Teacher=="JAY"],xlab="PreS",ylab="AftS",xlim =c(minX,maxX),ylim=c(minY,maxY) ,pch=16) points(PreS[Teacher == "PAT"], AftS[Teacher == "PAT"], col='red',pch=16) points(PreS[Teacher=="ROBIN"], AftS[Teacher == "ROBIN"], col='blue',pch=16) # Will ANOVA reveal the difference between the three teachers? fitANOVA<-aov(PreS~Teacher);anova(fitANOVA) # No significant difference in PreS, so students at the beginning appears # to be similar. Given the same-quality students to begin with, which # teacher will produce high-performing students at the end? fitANOVA<-aov(AftS~Teacher);anova(fitANOVA) # Check the plot for slope heterogeneity # Explicit test of slope heterogeneity fit<-lm(AftS~PreS*Teacher) anova(fit) # Check for significance: if not significant, then do ANCOVA fitANCOVA<-lm(AftS~PreS+Teacher) anova(fitANCOVA) Review a few essential R functions nd1<-subset(md,subset=(Teacher=="JAY")) nd2<-subset(md,subset=(Teacher=="PAT")) nd3<-subset(md,subset=(Teacher=="ROBIN")) nd1<-nd1[order(nd1$PreS),] nd2<-nd2[order(nd2$PreS),] nd3<-nd3[order(nd3$PreS),] y1<-predict(fitANCOVA,nd1,interval="confidence") y2<-predict(fitANCOVA,nd2,interval="confidence") y3<-predict(fitANCOVA,nd3,interval="confidence") Three plots in one row par(mfrow=c(1,3)) plot(PreS[Teacher=="JAY"],AftS[Teacher=="JAY"],xlab="PreS",ylab="AftS",xlim=c(m inX,maxX),ylim=c(minY,maxY) ,pch=16) points(PreS[Teacher == "PAT"], AftS[Teacher == "PAT"], col='red',pch=16) points(PreS[Teacher=="ROBIN"], AftS[Teacher == "ROBIN"], col='blue',pch=16) lines(nd1$PreS,y1[,1],col="black") lines(nd1$PreS,y1[,2],col="black") lines(nd1$PreS,y1[,3],col="black") Re-issue the plot statement lines(nd2$PreS,y2[,1],col="red") before calling lines lines(nd2$PreS,y2[,2],col="red") lines(nd2$PreS,y2[,3],col="red") lines(nd3$PreS,y3[,1],col="blue") lines(nd3$PreS,y3[,2],col="blue") lines(nd3$PreS,y3[,3],col="blue") 55 60 65 70

PreS 75 80 85 60 65 70 AftS 75 80 85 80 75 60 65 70 AftS 60 65 70 AftS 75 80 85 Separate 95% CI plots 55 60 65 70 PreS 75 80 55 60 65

70 PreS 75 80 Conclusions The null hypothesis of the same slope among the three teachers is not rejected (p = 0.7787). This significance test, as well as the plot, suggests that ANCOVA assumption of equal slope is not violated. AftS increases highly significantly (p < 0.0001) with PreS, suggesting that PreS is appropriate as a covariate. Intercepts of lines are different (F=5.33, p=0.0112). This means that AftS values are different among TEACHERs, given the same PreS value. Jay's mean AftS is 1.58 points greater than Robin's mean AftS and Pat's mean AftS is 2.94 points greater than Robin's mean AftS, given the same PreS. Xuhua Xia An Experiment It is known the beetles will lose weight when starved; Different species of beetles seem to differ in weight loss. The biologist wanted to investigate the weight loss upon starvation between two beetle species. Species 1 move rapidly in searching for food when starved, but Species 2 immediately reduce movement upon starvation. Question: Will Species 2 lose less weight than Species 1 upon starvation (because it reduces movement and therefore conserves energy)? Xuhua Xia An Experiment Nine batches of beetles from each species were starved for a week Weight loss is measured. During the experiment, the humidity changes and is beyond the control of the biologist. However, he did record humidity values for each individual starvation experiment. Xuhua Xia Assumptions ANCOVA shares all the assumptions with ANOVA. The dependent variable and the covariate should meet the assumptions for regression. The covariate should span roughly the same range of values in all treatment combinations. The covariate should not be dependent on the classification variable, i.e., if Y is the dependent variable, Z is the classificaiton variable, and X is the covariate, then the term X*Z should have coefficient of zero in the ANCOVA model, i.e,. there is little interaction between X and Z. Xuhua Xia Assignment 1. 2.

3. Check if the data set is suitable for ANCOVA, using both plot and significance test to draw your conclusions If suitable, then do ANCOVA significance test and draw your conclusions Make a plot with two figures, one figure with 95% confidence interval for Sp1 and the other for Sp2. Species Humidity Wtloss Sp1 0 8.98 Sp1 75.5 4.68 Sp1 12 8.14 Sp1 85 4.2 Sp1 29.5 6.67 Sp1 93 3.72 Sp1 43 6.08 Sp1 53 5.9

Sp1 62.5 5.83 Sp2 43 5.08 Sp2 53 5 Sp2 62.5 4.83 Sp2 0 8.08 Sp2 75.5 3.68 Sp2 12 7.14 Sp2 85 3.2 Sp2 29.5 6.07 Sp2 93 3.02