Administrative Data in the IAB Metadata Management System North American DDI Conference University of Kansas David Schiller (IAB) Ingo Barkow (DIPF) 2013/04/02, Lawrence, KA (USA) Outline 1. 2. 3. 4.
Introduction Data source provided by IAB Need for a useful data documentation Example of an exemplary software implementation 5. Administrative data; only a step into the future of research data 6. Conclusion 2 Administrative Data in the IAB Metadata Management System INTRODUCTION
3 FDZ of BA at IAB Clarification of the Acronyms: FDZ (Research Data Centre) of the BA (German Federal Employment Agency) at the IAB (Institute for Employment Research) But why? It is a legal thing BA is only allowed to store data for the administrative process IAB, as research institution, does not have this limitation
4 FDZ of BA at IAB Aim of IAB Research institution for the BA (est. 1967) Independent scientific controlling of BA activities Two statutory mandates justify the work of IAB Aim of FDZ Research data centre for the BA (est. 2004) Provide IAB survey data and BA administrative data as research data to the scientific community Statutory mandate justifies the work of FDZ
5 Administrative Data in the IAB Metadata Management System DATA SOURCE PROVIDED BY IAB 6 7 Collection process for administrative data German Federal Employment Agency 700 job centre in every larger German city
Staff fills in forms (supported by special software tools) Benefit Recipient Participation in labor market programs Jobseeker Social security notification Yearly notification done by the employer for every employee Relevant for pension payment Data flows into the BA Data Warehouse (DWH) 8 Schichtenmodell prozessproduzierter Individualdaten Externe Forschungsprojekte
AnalyseSchicht IAB-Forschungsprojekte IZA II ... SGB II 6c Eval. von Informationen
Projektgesteuerte Rckfluss LI AB IA BS .... Pr oje
kt XY SUF/Schalterstelle/Gastaufenthalt FDZ DataMartSchicht FDZ ... Projektspezifische Datenbereitstellungen durch ITM (intern und extern) Standardaufbereitungen durch FDZ
pallasAnwenderschicht DataMarts von ITM IEB LeH BeH MTH LHG
ASU ASU Entwicklung von Standardprodukten durch ITM gem innerhalb des IAB abgestimmter Regeln DWHSchicht DWH Erschlieung von Quelldaten der BA durch ITM Collection process for administrative data
1. BA 1. Forms supported by different software tools 2. DWH of the BA 3. Specialized Data Marts 2. ITM 1. Extracts in SAS (due to size of dataset) 2. Subsample of SAS files in Stata (95% use this software) 3. FDZ 1. Data editing for scientific use in Stata 2. In parallel: data documentation 10
Short summary 1. Data not collected for research purposes 2. No influence on and little knowledge about data selection process 3. Different storage formats 4. Goal of data editing procedure: build survey-like datasets 5. Data products of IAB/BA: 1. Administrative data 1. Establishment data 2. Survey data 2. Individual / Household data 3. Integrated datasets 3. Integrated establishment and individual 11
Administrative Data in the IAB Metadata Management System NEED FOR A USEFUL DATA DOCUMENTATION 12 Data documentation requirements 1. Researcher view 1. 2. 3. 4.
Standardized documentation Easy to understand documentation (do not cover every possibility) Easy access able documentation (centralized web portal) Software tools for search functionalities 2. Data provider view 1. 2. 3. 4. 5. Standardized documentation Easy to understand documentation (do not cover every possibility)
Uncomplicated preparation of documentation Software tools for preparation Software tools for exchange of documentation 13 Special needs for administrative data 1. Concentrate on data collection process 2. Different data quality topics, e.g.: 1. Why was data collected 2. How was data collected 3. How was data modified 3. Need for interfaces to upstream data editing processes
(you can only document what you know) 4. What special disclosure issues arise with administrative data 14 Short summary: documentation needs 1. In general: 1. Standardized standards are needed 1. No major changes 2. Manageable size 3. Not everything can be covered 2. Machine readable, supported by software, interoperable with common storage formats
2. Specifically: 1. Coverage of different data collection modes 2. At the same time stay as close as possible to the standardized standard 15 Administrative Data in the IAB Metadata Management System EXAMPLE OF AN EXEMPLARY SOFTWARE IMPLEMENTATION 16 IAB-Metadata Project
Done by a consortium (tba21, DIFP, Colectica, OPIT, Alerk Amin) Work in process (runtime 24 months) Implementing into the BA IT infrastructure The steps are: Requirements DDI implementation Software building and implementation Currently only a documentation of collected data 17 IAB-Metadata Project Aim of the project
Update of the IT infrastructure of the FDZ But money into the development Enable interoperability between institutions Build and use it Merge research data and recarding data documentation Be aware of future data sources
18 19 Rogatus tool overview Short summary Standard documentation is essential Standard can only survive if it is used
Build, communicate, enlarge Administrative data can be the back-bone for merged data Survey data Statistical data Future data 21 Administrative Data in the IAB Metadata Management System ADMINISTRATIVE DATA; ONLY A STEP INTO THE FUTURE OF RESEARCH DATA
22 Examples of electronic recordings which can provide data for official statistics: 1. Creditt card transactions 2. Commodity (RFID) tracking 3. Toll road (RFID) recording 4. Electronic tickets for travelling 5. Public services offered electronically 6. Immigration control. 7. Mobile phone use 8. Internet and social media use 9. GPS tracking of traffic and transport 10. Mixed active/passive recording
Slide from EDDI 2012, Key note from Svein Nordbotten OECD report: Categories of Future Data Category A: Data stemming from the transactions of government, for example, tax and social
security systems. Category B: Data describing official registration or licensing requirements. Category C: Commercial transactions made by individuals and organisations. Category D: Internet data, deriving from search and social networking activities. Category E: Tracking data, monitoring the movement of individuals or physical objects subject to movement by humans. Category F: Image data, particularly aerial and satellite images but including land-based video images. 24
DDIs way into the future Survey data Administrative data Future data No extensions but looking for a generalized approach Need for tools that are able to cover the whole data production process 25 Administrative Data in the IAB Metadata Management System CONCLUSION
26 Conclusion Not administrative data is important data collection processes are DDI-paper: Documenting a wider Variety of data using the Data Documentation Initiative 3.1 Build solutions together
Important is the data documentation standard But also: Support data life-cycle Standard for supporting tools is important as well 27 Thanks for listening David Schiller, [email protected] Ingo Barkow, [email protected] http://fdz.iab.de