Chapter 5 Stratified Random Sampling Advantages of stratified random sampling How to select stratified random sample Estimating population mean and total Determining sample size, allocation Estimating population proportion; sample size and allocation Optimal rule for choosing strata Stratified Random Sampling The ultimate function of stratification is to organize the population into homogeneous subsets and to select a SRS of the appropriate size from each stratum. Warmup You are doing a project to study grade inflation in the STEM disciplines. A quick internet search shows that average GPA differs by STEM major. In addition, overall GPA differs by school, so you dont want to limit your sample to convenient schools. Major Avg GPA Public School Avg. GPA Private School Avg. GPA

Education 3.36 UVA 3.32 Harvard 3.65 Foreign Lang 3.34 UNC 3.23 Brown 3.63 Physics 3.15 South Car 3.18 Stanford 3.57 Biology 3.02 Va Tech 3.15 Duke 3.51 Engineering 2.90

NCSU 3.11 Princeton 3.39 Math 2.90 UNC-A 3.11 MIT 3.39 Comp Sc 2.83 App St 3.10 Johns Hop 3.38 Chemistry 2.78 UNC-G 2.95 Wake For 3.36 Warmup (cont.) Data from the National Center for Education Statistics results in the following percentages of majors among STEM disciplines: Computer Science 24% Engineering 21% Bio/Life Sciences 17% Math/Stat 15%

Technology 12% Chemistry 7% Physics 4% Approximately 15 million students are in public colleges and 5 million are in private colleges. Suppose you would like to use a sample size of n = 1,000 students. Stratified Random Sampling Often-used option b/c May produce smaller BOE than SRS of same size Cost per observation may be reduced Obtain estimates of population parameters for subgroups Useful when the population is heterogeneous and it is possible to establish strata which are reasonably homogeneous within each stratum Chapter 5 Stratified Random Sampling Improved Sampling Designs with Auxiliary Information Chapter 5 Stratified Random Sampling Chapter 6 Ratio and Regression Estimators Stratified Random Sampling: Notation Data: from stratum 1: y1,1 , y1, 2 , y1, 3 , , y1, n from stratum 2: y 2 ,1 , y 2 , 2 , y 2 , 3 , , y 2 , n from stratum L: y L ,1 , y L , 2 , y L , 3 , , y L , n 1 2 1 yi

ni ni y L :sample mean of data from stratum i, i,k k 1 i 1, , L ni ( N i ) :sample (population) size for stratum i 1 i Ni Ni Ni Y i,k k 1 Y i i,k k 1 : population mean of stratum i :population total of stratum i population total Warmup (cont.) Selecting the stratified random sample. Recall: approximately 15 million students are in public colleges and 5 million are in private colleges. Suppose you would like to use a sample size of n = 1,000 students. Major(%) % of 1000

Public Private CS(24%) 240 n1 = .75240 = 180 n2 = .25240 = 60 Eng(21%) 210 n3 = .75210=157.5158 n4 = .25210=52.552 Bio (17%) 170 n5 = .75170=127.5128 n6 = .25170=42.542 Math/Stat(15%) 150 n7 = .75150=112.5113 n8 = .25150=37.537 Tech(12%) 120 n9 = .75120=90 n10 = .25120=30 Chem(7%) 70 n11 = .7570=52.553 n12 = .2570=17.517 Physics(4%) 40 n13 = .7540=30 n14 = .2540=10 Stratified Random Sampling Select a SRS within each stratum, so: E ( yi ) i i N i yi ; E (i ) E ( N i yi ) N i E ( yi ) N i i i

Estimate population total by summing estimates of i 1 2 L yst N Stratified Random Sampling: Estimate of Mean 1 yst N1 y1 N 2 y2 N L yL N 1 L N i yi N i 1 1 V ( yst ) 2 N12V ( y1 ) N 22V ( y2 ) N L2V ( yL ) N 2 2 n1 s12 n s n s 1 2 2 2 2 2 L L 2 N1 1 N2 1 NL 1 N1 n1 N 2 n2 N L nL N Stratified Random Sampling:

Estimate of Mean , BOE 1 V ( yst ) 2 N12V ( y1 ) N 22V ( y2 ) N L2V ( y L ) N 2 2 2 n s n s n s 2 2 2 1 1 2 2 L L 2 N1 1 N2 1 NL 1 N N1 n1 N 2 n2 N L nL 1 BOE 2 2 2 n s n s n

s 2 2 2 1 1 2 2 L L 1.96 N1 1 N2 1 NL 1 2 N N1 n1 N 2 n2 N L nL 1 Stratified Random Sampling: Estimate of Population Total L Nyst N1 y1 N 2 y2 N L y L N i yi i 1 V () V ( Nyst ) N 2V ( yst ) 1 2 2 2 N 2 N1 V ( y1 ) N 2 V ( y2 ) N LV ( y L ) N N12V ( y1 ) N 22V ( y2 ) N L2V ( y L ) 2 2 2

n1 s12 n s n s 2 2 2 2 L L N1 1 N2 1 N L 1 N1 n1 N 2 n2 N L nL 2 Stratified Random Sampling: BOE for Mean and Total , t distribution When stratum sample sizes are small, can use t dist. 2 a s 2 L 2 ak sk k 1 Satterwaithe df 2

L k where ak N k ( N k nk ) nk k nk 1 k 1 BOE for : t df 1 N N 1 2 1 2 n s n 1 N 2 1 1 N

2 2 1 1 n s n 2 N 2 2 2 N 2 L 2 1 n s n L N L

t df N 1 2 1 1 N 1 s 2 n 1 1 N 2 2 1 n 2 N 2 s n 2

2 2 N 2 L 1 n L N L s L L BOE for : n 2 2 n L L Degrees of Freedom(worksheet cont.) Stratified Random Sample Summary: a k N (N n ) k k

n k , n 20, n 8, n 12 1 2 3 k N 155, N 62, N 93, 1 2 3 a 1046.25, a 418.5, a 627.75 1 2 3 df 2 1046.25 5.95 418.5 15.25 627.75 9.36 1046.25 5.95 418.5 15.25 627.75 9.36 2 2 19 21.09; t 21.09 2 2 2 7

2 2 2 2.08 (see Excel worksheet) 11 2 Compare BOE in Stratified Random Sample and SRS (worksheet cont.) Stratified Random Sample Summary: n 40, y 27.7;V ( y ) 1.97 st st Strat. random sample has more precision If observations were from SRS: 2 40 11.31 s 11.31, V ( y ) 1 2.79 310 40 Approx. Sample Size to Estimate 2 V ( y st ) B V ( y st ) B 2 4 Let ni ai n, ai prop. of sample from stratum i 2 2

B an s N 1 N N an 4 1 L 2 i i i 2 i 1 i i L n 2 2 N i s i ai i 1 where D L 2 N D Ns i

i 1 2 i B 2 4 Approx. Sample Size to Estimate 2 V ( Ny st ) B V ( y st ) B 2 4N 2 Let ni ai n, ai prop. of sample from stratum i 2 2 B an s N 1 2 N N a n 4N 1 L 2 i i i 2 i 1

i i L n 2 2 N i si ai i 1 where D L 2 N D Ns i i 1 2 i B 2 4N 2 Summary: Approx. Sample Size to Estimate , L n N 2 i 2

i s ai i 1 2 L 2 i i N D N s i 1 B2 D when estimating 4 2 B D 2 when estimating 4N Example: Sample Size to Estimate (worksheet cont.) L n 2 2 N i si ai i 1 2 L N D N i si2 i 1 Prior survey: 1 5, 2 15, 3 10. Estimate to within 2 hrs with 95% conf. allocation proportions are a1 a2 a3 1 3. B 2 D B 3 N 2 2

i i s ai 2 4 1; N 2 D 310 2 96,100 1552 (25) 13 622 (225) 13 932 (100) 13 6,991, 275 i 1 3 2 N s i i 155(25) 62(225) 93(100) 27,125 i 1 6,991, 275 n 56.7 57 96,100 27,125 so n1 n2 n3 13 (57) 19 B2 D 4 L Example: Sample Size to Estimate (worksheet cont.) 2 2 N i si ai

i 1 n 2 L N D N i si2 i 1 Prior survey: 1 5, 2 15, 3 10. Estimate to within 400 hrs with 95% conf. allocation proportions are a1 a2 a3 1 3. D 3 B2 N 4N 2 2 i i 2 4002 4N2 s ai 40,000 2 160,00 ; N D 40, 000 4N2 N2 1552 (25) 13 622 (225) 13

932 (100) 13 6,991, 275 i 1 3 2 N s i i 155(25) 62(225) 93(100) 27,125 i 1 6,991, 275 n 104.2 105 40, 000 27,125 so n1 n2 n3 13 (105) 35 B2 D 2 4N 5.5 Allocation of the Sample Objective: obtain estimators with small variance at lowest cost. Allocation affected by 3 factors: 1. Total number of elements in each stratum 2. Variability in each stratum 3. Cost per observation in each stratum 5.5 Allocation of the Sample: Proportional Allocation If dont have variability and cost information for the strata, can use proportional allocation. Sample size for stratum h : Nh nh n N In general this is not the optimum choice

for the stratum sample sizes. 5.5 Optimal {min V ( yst )} allocation 1 V ( yst ) 2 of the sample: same cost/obs N in each stratum 2 n s 2 i i N 1 i N i ni i 1 L min V ( yst ), subject to g ( n1 , n2 , , nL ) 0, n1 , n2 ,, nL where g ( n1 , n2 , , nL ) n1 n2 nL n Use Lagrange multipliers: Directly proportional to stratum size and stratum variability V ( yst ) g 0, i 1, , L ni n ni ni N i si L N s

k k k 1 This method of choosing n1 , n2 , , nL called Neyman allocation , i 1, , L 5.5 Optimal {min V ( yst )} allocation of the sample: same cost/obs in each stratum From previous slide N i si ni n L , i 1, , L N k sk k 1 ni substitute for ai above gives n 2 L N i si n i 1 L N 2 D N i si2 i 1 L N n 2 i 2 si ai i 1 L 2 N D

i 1 2 B D 4 N i si 2 5.5 Optimal {min V ( yst )} allocation of the sample: same cost/obs in each stratum Worksheet 12 5.5 Optimal {min V ( yst )} allocation 1 of the sample for fixed cost C: ci = cost/obs V ( yst ) 2 N in stratum i. 2 n s 2 i i N 1 i N i ni i 1 L min V ( yst ), subject to g ( n1 , n2 , , nL ) 0, n1 , n2 , , nL where g ( n1 , n2 , , nL ) c1n1 c2 n2 cL nL C

Use Lagrange multipliers: Directly proportional to stratum size and stratum variability V ( yst ) g 0, i 1, , L ni n ni ni N i si ci , i 1, , L L N s k k ck k 1 Inversely proportional to stratum cost/obs 5.5 Optimal {min V ( yst )} allocation of the sample: ci cost/obs in stratum i ni n k k n 2 2 N i si ai i 1 L N D

N s i i 1 ci , i 1, , L L N s 2 From previous slide N i si L ck k 1 ni substitute for ai above gives n L L N k sk ck N i si ci i 1 n k 1 L N 2 D N i si2 i 1 2 i 5.5 Optimal {min V ( yst )} allocation of the sample: ci = cost/obs in each stratum Worksheet 13