Confidentiality and Anonymised Survey Records: The UK Experience

Confidentiality and Anonymised Survey Records: The UK Experience

Creating synthetic sub-regional baseline populations Dr Paul Williamson Dept. of Geography University of Liverpool Collaborators: Robert Tanton (NATSEM, Australia) Ludi Simpson (CCSR, UK) Maja Zaloznik (Liverpool, UK) 1. Context a) What do we want?? Local area microdata containing local-area distributions [eg. smoking by income by sub-region] Sub-region 1 1 1 2 2 3 Person 1 2 3 4 5 6 Age 24 26 10 75

45 64 b) What have we got? Sex F M F F M M Income 25,000 14,000 0 15,500 120,000 18,000 Smoker? N N N Y Y N Large-scale survey SAR District: Leeds (2 Economic position Employee full -time On a Govt scheme Unemployed Retired

Total nd largest in UK) Count Female Total 1525 4146 31 77 168 573 1267 2116 5545 10485 % female 36.8 40.3 29.3 59.9 52.9 95% Confidence Interval 1.5 11.0 3.7 2.1 1.0 Over-exaggerate problem? 2% sample

Minimally multivariate Not based on minorities (e.g. unemployed ethnic minority) Min. geog. threshold: 120k Decadal Solution Reweight survey data... Local smoking distribution Local income distribution Survey distribution [smoking x income] ...BUT weighting DOWN instead of up Synthetic microdata 2. IPF (Raking) Understanding IPF 5 1 2 4 Iteration 3 Young Old 20 80 Male 50 8.9 1 16.7

9.1 2 33.3 42.1 41.1 3 51.2 50 Female 50 11.1 2 10.9 20 11.2 38.9 3 37.9 30 38.8 5 48.8 50 N.B. IPF = Raking = IPF Q. What is IPF/Raking doing? A. Preserving the Odds ratios ... 20 3 36.7 20.1 80 5 63.3 79.9

CAVEAT: variation independence 3. Combinatorial Optimisation Male 5 TARGET: Young Old 2 8 ESTIMATE: Young Old Female 5 2 8 Male 5 5 1 0 4 1 Female 5 5 4 1

4 0 Guided incremental weight adjustment Weight Weight Weight 11 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 ID ID ID 111 222 333 444 555 666

777 888 999 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 17 17 17 18 18 18 19 19 19 20 20 20

Age Age Young Young Young Young Young Young Young Young Young Young Young Young Young Young Young Young Young Young Young Young Old Old Old Old Old Old Old Old Old Old Old Old Old Old Old

Old Old Old Old Old Sex Sex Sex Male Male Male Male Male Male Male Male Male Male Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female Male Male Male

Male Male Male Male Male Male Male Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female 4. IPF/Raking v CO Cells constrained Constraining Census Tabulations per table Age / Sex / Marital status 84 Household composition / Tenure 77 Resident status / Sex 6 Household size / Number of rooms / Tenure 196 Long-term illness / Age / Sex

14 Dependants 7 Socio-economic group of household head / Tenure 100 Age / Sex / Marital status of household head 28 Sex / Marital status / Economic position 56 Age / Sex / Economic position 180 Ethnic group of household head / Tenure 16 Sex / Economic position / Ethnic group 24 Household composition / Car ownership 33 Occupation / Age / Sex 20 814 Comparison for margin-constrained tables Target: age x sex x tenure x economic position (64 counts) at district level (17 districts) % NFC (17 district average) 2% SAR 32 37 IPFU 22 IPFN 18 CO Simpson & Tranmer (2005) Target: Car ownership (2) x Tenure (3) (6 counts; 3%s) for

residents at ward level Source of relationship None 1. Independent margins Average error (RMSE) 9363 wards 6 counts 3 %s Average error (RMSE) 816 wards 6 counts 3 %s 381 0.209 348 0.189 2% SAR 2. England & Wales 3. Direct SAR area sample 3a. Multilevel model 69 62 61 0.158 0.110

0.109 60 62 61 0.059 0.059 0.057 1% SAR, 26 ward types 4. Direct ward type sample 4a. Multilevel model 57 58 0.093 0.093 --- --- --- --- 42 32 0.047 0.045 Combinatorial Optimisation 5. Direct estimate 6. As constraint on IPF

5. GREGWT Understanding GREGWT 6. GREGWT v CO Fit to constraint variables (74 counts): GREGWT convergent SLAs in NSW: Measure of fit GREGWT CO (min. RZ2) 602.9 483.1 OTAE/HH 0.1 0.1 OTAPE 0.2 0.1 ORZ2 60.5 1.9

OTAE Fit to constraint variables (74 counts): GREGWT NON-convergent SLAs in NSW Measure of fit GREGWT CO (min. RZ2) OTAE 7035098.7 1914.7 4653.5 3.2 5.0 0.4 22478155.1 6.7 OTAE/HH OTAPE ORZ2 Fit to margin-constrained distribution (household income x mortgage/rent): GREGWT convergent SLAs State Aust. Capital Territory

CO ABS (Census) GREGWT (min. RZ2) Unaffordable households (n) 5,526 6,147 5,924 New South Wales 169,823 194,394 191,720 Combined 175,349 200,541 197,644 Unaffordable households (%) Aust. Capital Territory 5.9 5.9 5.7

New South Wales 9.1 9.2 9.1 Combined 9.0 9.0 8.9 7. Variation idependence (again...) UNIVARIATE constraints (158 constrained counts) 1: Age x Sex 2: Marital status 3: Country of Birth 4: Ethnicity 5: Religion 6: Health 7: Unpaid care 8: Long-term illness 9: Migration 10: Qualifications 11: 12: 13: 14: 15:

16: 17: 18: 19: 20: Time since last worked Economic activity NS-SEC NS-SEC of Ref. Person Distance to work Mode of travel to work Hours worked Accommodation type Tenure Family type 21: 22: 23: 24: 25: 26: 27: 28: 29: No of people in h/hold Comm. Estab. Res. status Communal estab. type Persons per room Household amenities Occupancy rating Floor level Cars in household Households

8 Townsend Scores for Output Areas (Knowsley) 6 4 CO-based 2 -10 0 -8 -6 -4 -2 0 -2 -4 -6 -8 Census-based

2 4 6 8 BIVARIATE constraints (586 constrained counts) % of non-fitting synthetic combinations PARTIALLY CONSTRAINED DISTRIBUTIONS Distribution SEG / Household composition SEG / Rooms Household composition / Dependants Dependants / Tenure Sex / marital status / tenure Illness / sex Rural (South West) Middling England (East Midlands) Deprived industrial (North) Deprived

urban (Outer London) 0 0 0 0 0.5 0 0 0 0 0 0 0 0 16 0 1.5 0 1.5 0 3.0 0

1.5 0 0 % of non-fitting synthetic combinations UNCONSTRAINED DISTRIBUTIONS Distribution Economic activity / age /sex Migration / age Cars / adults Rural (South West) Middling England (East Midlands) Deprived industrial (North) Deprived urban (Outer London) 0 0 0

0 27 87 77 52 60 96 76 31 Headship / age / sex / marital status Ethnic group / country of birth 1 0 0 0 99 67 100 100 Qualifications / age / sex 24

55 22 88 8. Conclusion (a) Accuracy of estimates (fitness for purpose?) (b) Unanswered questions (c) Applications in the real world HE GPStudent Patient age, sex, location Local socio-economics Survey data [District-level socio-demographics] Estimated HE GP Student Patient Estimated socio-economic socio-economic characteristics characteristics

Recently Viewed Presentations

  • Period 1: Early Christianity to Constantine - The First Three ...

    Period 1: Early Christianity to Constantine - The First Three ...

    Eastern Churches in Communion with Constantinople or another Patriarchate in Australia. The Greek Orthodox Church. The Russian Orthodox Church. The Ukrainian Orthodox Church. The Holy Apostolic Catholic Assyrian Church of the East. The Coptic Orthodox Church. The Armenian Apostolic Church....
  • Los verbos AR - Tredyffrin/Easttown School District

    Los verbos AR - Tredyffrin/Easttown School District

    Los verbos AR - Práctica oral ¿Quéhacemos en el mar? Uds. bucean. Vosotrosbuceáis. ¿Con quiéncaminaSarita? (Jorge) Ella camina con Jorge. Saritacamina con Jorge. ¿Quéhace la familia? La familia ayuda en casa. ¿Dóndebailan Arturo y Carmen?
  • An Overview of Windows Azure Presented by Vince

    An Overview of Windows Azure Presented by Vince

    Defining the Cloud " Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management...
  • Forensic Pathology

    Forensic Pathology

    A death that is unexpected or is thought to have been caused by injury or trauma is always investigated. Coroner. Medical Examiner (how different?) Coroner vs. Medical Examiner. Coroner - a public official . appointed or elected . makes inquiries...
  • The Air Quality NES An Industry Perspective Experience

    The Air Quality NES An Industry Perspective Experience

    Likely cause the airshed to breach the standard for PM10? Discharge into airshed? - How far away? Discharge into airshed? Breach Straight Line Path or NES? Overall Impressions Amendment to definition of hazardous waste may help Questions will remain for...
  • Introduction to Pressure - Mrs. Slovacek's Science

    Introduction to Pressure - Mrs. Slovacek's Science

    Introduction to Pressure. Temperature Conversions. We use three scales to measure temperature: Fahrenheit (F), Celsius (C) and Kelvin (K). The size of each degree is the same for the Celsius and Kelvin scales. Example: there are 100 units between freezing...
  • FCPA Issues in the Mergers & Acquisitions Context

    FCPA Issues in the Mergers & Acquisitions Context

    FCPA Issues in Mergers & Acquisitions and other Business Combinations ... proposed $1.6 billion acquisition of Titan Deferred prosecution agreement entered into by Invision Technologies, Inc. prior to acquisition by GE Potential FCPA Exposures in the M&A Context Acquisition involving...
  • Turn your Colonies Flip Book over to the

    Turn your Colonies Flip Book over to the

    2220 Lockwood Dr Carrollton TX 75007-5746. 333 Teakwood Ln Lewisville TX 75067-6564. 9223 Crestlake Dr Dallas TX 75238-2635. 2320 Magnolia Dr Little Elm TX 75068-5679. 1520 Preston Rd Apt 415 Plano TX 75093-8301. 3006 Rockett Dr Carrollton TX 75007-5216. 2737...