High quality insights from complex data sources
I started my career as a social epidemiologist nearly two decades ago. I soon spotted the opportunity to get involved with a brand-new birth cohort study. Over the years I’ve sat waiting for my Millennium Cohort Study babies to grow up. I’ve been eagerly pressing ‘refresh’ on the UK Data Service website every few years, ready to test out my latest ideas and hoping to provide new insights for social policy for the next generation.
While we have steadily discovered lots of things, the world of longitudinal research has faced an equally steady increase in data collection challenges in the new millennium. These challenges include:
- falling survey response rates – you can find more information about falling survey response rates on our blog
- increasing costs – you can find more information about the increasing costs of surveys in the UK Statistics Authority annual report for 2020 to 2021
- a potential lack of inclusivity – you can find more information about data and inclusivity on the UKSA website
- the constant problem of running samples which cannot flex to meet current or emerging trends in society
These data are an important resource for identifying cause and effect measured over the long term. But the coronavirus (COVID-19) pandemic showed how important administrative data can be, and how it can help direct policy responses in almost real-time.
The benefits and challenges of administrative data
Administrative data has some unique challenges, especially when it comes to quality. It is usually collected for a specific reason, like running an organisation or providing a service. If administrative data is used for any other purpose it needs to be quality assessed for this new purpose. Luckily there are tools to help with this, such as the Administrative Data Quality Assurance Toolkit.
But there are benefits too. Administrative data may not be built for research purposes like observational studies, but they typically offer generous sample sizes. You can find more information about administrative data sample sizes on the NHS Digital website. Administrative data are useful because:
- they allow researchers to use data that has already been collected, so resources can be used in other parts of the research process
- they’re relatively cheap
- work is underway as part of the Inclusive Data Task Force to make all types of data more inclusive
- much of it is very timely and provides individual level records with short lag times – you can find more information about the timeliness of administrative data on the ONS website
And administrative data has spread – it’s everywhere in government! The challenge is finding it, bringing it together in safe way within a trusted research environment, and understanding the quality of it. This is why the ONS has started an ambitious journey to create the Integrated Data Service (IDS). The IDS will be a secure platform to host administrative data from a wide range of government departments. It will make it quicker and easier for analysts to work together, and it will help improve the speed of decision-making across Government.
The Integrated Data Analysis Team (IDAT)
In preparation for the launch of the IDS, the IDAT has brought together a diverse analytical team of:
- social researchers
- operational researchers
- data scientists
The team works with colleagues across the ONS, other government departments, the Government Statistical Service (GSS) and the private sector. IDAT aims to develop analysis using administrative data. The team aims to use this data to provide high quality insights to inform cross-cutting policy areas. The team also provides feedback to the developers of the IDS to help them create a platform that meets the needs of analysts.
The team uses a range of newly received administrative data to investigate a range of topics relevant to economic, social, and environmental policy. Recent work includes:
- house price analysis of towns in England and Wales using linked data from HM Land Registry and Valuation Office Agency – this showed the extent of regional disparities in house prices and where this gap seems to be widening
- analysis of education, social mobility and outcomes for students receiving free school meals in England – this is based on the Longitudinal Educational Outcomes dataset which showed gender and regional differences in earnings in adulthood depending on whether a person had received free school meals or not
Ongoing projects within IDAT include:
- analysis of the effect of childhood social care on educational attainment – this looks at the Growing up in England data from Census 2011 linked to the All Education Dataset for England
- understanding links between educational attainment and contact with the criminal justice system in later life – this is based on Ministry of Justice information linked to education records
- analysis of geographic mobility and earnings progression – this uses the Department for Work and Pensions’ Registration and Population Interaction Database (RAPID)
- analysis of social effects on health and later routes through healthcare, using Census 2011 linked to Hospital Episodes Statistics
- understanding the causes of house price inflation in England, Wales and Scotland, linking land registry data to a range of open data on the social, economic and demographic characteristics of neighbourhoods
New opportunities with administrative data
Administrative data offers new opportunities for insights and challenges to statistical researchers. But it can be enhanced further. Linkage between administrative sources offers more potential.
Of course, I can’t forget about my Millennium babies. They’re all grown up now and by linking survey responses to administrative education, health or income records we can understand more about their lives. We can study the things that make their lives easier or more difficult. And we can see how their experiences, and their administrative data, can be used for the public good.