Institute of Education Sciences (IES) 25th Annual Management Information Systems Conference (Feb. 15-17, 2012) Useful and Fair Accountability Data in California Schools Dennis Hocevar University of Southern California Rossier School of Education [email protected] Aime Black University of Southern California Rossier School of Education Kamella Tate Music Center: Performing Arts Center of Los Angeles County

1 First Assertion Comparing school averages makes no sense unless the students in the two schools have taken the same tests. In California, valid comparisons are impossible after grade six in both Math and Science. 2 Second Assertion Accountability begins at the grade or

course level. The assumption that schools can be evaluated without first taking into account grade level or course level (e.g., Algebra I) differences is unwarranted and unneeded. 3 Third Assertion A fully functional school accountability system only requires three simple statistics: 1) A raw score index of success to communicate

results. 2) A standardized norm-referenced index to make within-school diagnostic comparisons. 3) A residualized index to make betweenschool accountability comparisons. 4 Presentation Outline Part 1: Communicating Results: Grade Level and Course Level Success Scores Part 2: Diagnosing Within-School Strengths and Weaknesses: Grade Level Equivalent and Course Level Equivalent Scores Part 3: Fair Between-School Comparisons for Accountability Purposes: Adjusted Grade Level Equivalent and Adjusted

Course Level Equivalent Scores 5 Part 1: Communicating Results Grade Level Success and Course Level Success Scores 6 Limitations Californias API and NCLBs AYP Californias Academic Performance Index (API) is too complex a measurement to adequately communicate school progress.

What does an increase of 10 API points mean? NCLBs Adequate Yearly Progress is better at the elementary school level, but for Math/Science courses at the middle and high school level, students take different tests. Comparing schools using different tests is impossible. 7 A Proposed Alternative I: GLS scores and CLS scores

Grade Level Success (GLS) scores are the raw percentage of students in a given grade level that score Basic or above on the California Standardsbased Tests (CST). Course Level Success (CLS) scores are the estimated percentage of test-takers that score Basic or above in each subject area on the California Standardsbased Tests (CST) that is tested at multiple grade levels. 8 Grade Level Success Scores (GLS) Grade Level Success (GLS) scores are similar to the AYP (percentage proficient), except: GLS scores are based on a count of students that score basic and above rather than

proficient and above. GLS scores are computed in ELA, Math, Science and History rather than just Math and ELA. GLS scores are computed only when all students take the same test in the same grade. 9 Grade Level Success Scores by Grade Level ELA Math 2

* * 3 * * 4 * *

5 * * 6 * * 7

* * 8 * 9 * 10 *

11 * Science History * * *

* * 10 Utility Grade Level Success (GLS) Scores The intended use of GLS scores is to communicate results to the public. An application is shown on the next slide. 11 LAUSD ELA Success Rates 2003 and 2011 White 5th Graders Compared to ELL/RFEP 5th Graders 1

0.9 0.8 0.7 0.6 0.5 ELL/RFEP White 0.4 0.3 0.2 0.1 0

2003 2011 12 LAUSD Math Success Rates and 2011 White 5th Graders Compared to ELL/RFEP 5th Graders 1 0.9 0.8 0.7 0.6 0.5 0.4

0.3 0.2 0.1 0 ELL/RFEP White 2003 2011 13 Interpretation of the Prior Slides

LAUSDs 5th grade ELL/RFEP English Language Arts success rates have increased by 29%. The ELA gap between white students and ELL/RFEP students has been reduced by 44%. LAUSDs 5th grade ELL/RFEP Math success rates have increased by 26%. The Math gap between whites students and ELL/RFEP students has been reduced by 46%. 14

Course Level Success (CLS) Scores Course Level Success scores are similar to the NCLB AYP (percentage proficient), except: CLS scores are based on basic and above rather than proficient and above. CLS scores are computed in ELA, Math, Science and History rather than just Math and ELA. CLS scores are computed only when the same test is given at multiple grade levels. 15 Course Level Success Scores

Algebra I Geometry Algebra II Biology Chemistry Physics World History

16 Utility Course Level Success (CLS) Scores The intended use of CLS scores is to communicate results to the public when tests are taken at different grade levels. An application is shown on the next slide. 17 Torrance Unified School District Algebra II Success Rates 0.7

0.6 0.5 0.4 SED NSED 0.3 0.2 0.1 0 2003 2011 18

Interpretation of the Prior Slide Torrance Unified School District (TUSD) initiation of Algebra for All in 2005 has increased Algebra II success by 12% in the Socio-economically Disadvantaged (SED) subgroup and by 18% in the non-SED (NSED) subgroup. This example illustrates why the focus of accountability indices has to be on subgroup improvement rather than the gap. 19 Part 2: Diagnosing Within-School Strengths and Weaknesses

Grade Level Equivalent and Course Level Equivalent Scores 20 Limitation of GLS and CLS Scores Grade Level Success (GLS) scores cannot be used to make within-school comparisons because CA CSTs are increasingly difficult as students get older. That is, as the standards get more rigorous, tests get more rigorous. Course Level Success (CLS) scores cannot be used to make within-school comparisons because distinct subject matter tests cannot be equated for difficulty.

21 Limitations Californias API and NCLBs AYP Californias Academic Performance Index does not allow for within grade comparisons because it is not computed at the grade level. NCLBs Adequate Yearly Progress (proficiency rates) does not allow for grade level comparisons because standards are increasingly more rigorous and students take different tests at different grade levels, beginning in grade seven. 22

A Proposed Alternative II GLE Scores and CLE Scores Grade Level Equivalent (GLE) scores are average scores on a grade level CST (3rd grade math) that has been standardized (z-scores) at the district or state level. Course Level Success (CLE) scores are average scores on a subject matter CST (e.g., Algebra II) that have been standardized at the district or state level. 23 Computation of GLE and CLE Scores The computation of GLE and CLE scores is a three-step process:

1. Convert raw scores to z-scores. 2. Convert z-scores to percentiles. 3. Convert the percentiles to normal curve equivalents (NCE scores). 24 Utility GLE and CLE Scores The intended use of GLE and CLE scores is to diagnose strengths and weaknesses in a school or school district and to compare a school to district or state norms. Hypothetical applications are shown in the next two slides.

25 District Normed Grade Level Equivalent Diagnostic Profile ELA Math 2 .45 .50

3 .30 .54 4 .25 .52 5 .20

.50 Science History .80 26 District Normed Course Level Equivalent Diagnostic Profile

Algebra I .77 Geometry .65 Algebra II .70 Biology Chemistry

.38 Physics .36 World History .18 .40 27 Part III: Fair Between-School Comparisons Adjusted Grade Level and Adjusted Course Level Equivalent Scores

28 Fairness In Millmans (1997) seminal work on school and teacher accountability, Grading Teachers, Grading Schools: Is Student Achievement a Valid Evaluation Measure?, he writes: The single most frequent criticism of any attempt to determine a teachers effectiveness by measuring student learning is that factors beyond a teachers control affect the amounts that students learn . Educators want a level playing field and do not believe such a thing is possible. Many people would rather have their fortunes determined by a roulette wheel, which is invalid but fair, than by an evaluation system that is not fair (Millman, p. 244). 29 Limitation of Unadjusted GLE and CLE Scores Grade Level Equivalent (GLE) and Course Level Equivalent (CLE) scores cannot be used to make between-school comparisons because they are highly correlated with school characteristics that are beyond a schools control. Specifically,

schools in wealthy areas consistently outperform schools in poor areas. 30 Californias Similar Schools Index Californias Similar Schools Index is a 1-10 tiered score that adjusts for 16 factors that are known to correlate with school test scores. The main shortcoming of this index is that it is not computed at the grade level, and thus, grade level effects are ignored and confounded with school effects. 31

A Proposed Alternative III AGLE Scores and ACLE Scores Adjusted Grade Level Equivalent (AGLE) scores are average scores on a grade level CST for which the California School Characteristics Index (CSI) is statistically held constant. Adjusted Course Level Success (ACLE) scores are average scores on a subject matter CST test (e.g., Algebra II) for which the California School Characteristics Index (CSI) is statistically held constant. 32

Computation AGLE and ACLE Scores The computation of GLE and CLE scores is a three-step process: 1. Regress test scores on the CSI. 2. Convert the standardized residuals for the regression to percentiles. 3. Convert the percentiles to Normal Curve Equivalents. 33 Equations: AGLE And ACLE Scores 1. 2.

3. Y = BX, where Y is the standardized (z-score) predicted achievement, X is the standardized CA School Characteristics Index (SCI), and B is the standardized regression weight. Standardized residual= Y Y, where Y is actual achievement and Y is predicted (expected) achievement based on the SCI. Using computer algorithms, convert the standardized residuals to percentiles and then convert the percentiles to Normal Curve Equivalents. 34

Graphic Display of Residuals 35 Fairness AGLE and ACLE Scores 1. The intended use of AGLE and ACLE scores is to compare grade level or course level performance to district or state norms in a fair manner by controlling for school characteristics. 2. Both Value-Added and AGLE/ACLE scores are residuals.

36 Utility and Fairness AGLE and ACLE Scores versus VAM Scores The intended use of AGLE and ACLE scores is to compare grade levels or course levels to district or state norms in a fair manner by controlling for school characteristics. Value-added scores have the same intended use. 37 One of the algorithms developed by VARC for the Teacher Data Reports

project, NYC DOE. 38 Additional definitions developed by VARC for the Teacher Data Reports project, NYC DOE. 39 Formulae: A Proposed Alternative to Value-Added Modeling (VAM) 1. Course Level Success (CLS). Using Algebra I scores in the 8th and 9th grade as an example, the formula for the school level Algebra I success is: Successalg1= # students scoring basic and above in Algebra I # of first-time students taking Algebra I

2. Course Level Equivalent (CLE). Continuing with the Algebra I example, course level success scores are standardized using the z-scores formula. Standardized Success Score = (successalg1 - mean district success) district standard deviation The standardized success scores are then converted by computer to percentiles and then to normal curve equivalents. The end result is a course level equivalent. 40 Formulae Continued 3. Adjusted Course Level Equivalents. Continuing with the Algebra I example, the formula for the regression of school or district level Algebra I success scores on the standardized CA School Characteristics (SCI) index is: Y' = B1 (SCI). And the formula for the actual minus expected 8th and 9th grade Algebra I

performance is: SRdiff = Y Y where, Y = actual 8th and 9th grade Algebra I success in a standardized (z-score) metric. Y' = expected second grade CST Algebra I performance in a standardized metric. B1 = standardized regression weight for the regression of actual Algebra I performance on the SCI (i.e., the Pearson PM correlation between SCI and Algebra I achievement scores). SRdiff = standardized residualized difference score (actual minus expected 8th 9th grade Algebra I performance). 41 Conclusions I Six Needed Components of an Accountability System

1. Success Scores at the Grade and Course Level. 2. Normal Curve Equivalents at the Grade and Course Level. 3. Adjusted Normal Curve Equivalents at the Grade and Course Level. 42 Conclusions II Further research is needed to determine if Value-Added Modeling (VAM) is useful in terms of cost, utility and fairness at the grade or school level.

43