|Publication date:||16 January 2019|
|Owner:||Best Practice and Impact Division|
|Who this is for:||People who work with data in government|
Data linking is the process of joining datasets together so that we can make as much use as possible of the information that they hold.
The government holds an enormous amount of data. By using it effectively, we provide insight, drive policy change and answer society’s most important questions. While datasets are useful on their own, bringing them together means that we can take advantage of the combined resource that they offer.
Linking datasets, rather than treating them individually, means that we can draw insights from across the data. Often, linked data helps us to find new patterns and insights that we would not see if we only considered the datasets in isolation.
Data linking matches up data from different datasets. For this to work effectively, the methods we use to perform the linkage must be ethically sound, secure, robust and trustworthy.
Good data linkage practice is a responsibility that extends across the whole Government Statistical Service (GSS). Collaborative work between departments will help to achieve data linkage strongly underpinned by the Code of Practice for Statistics.
After a recent investigation into the role the United Kingdom (UK) statistical system can play in providing greater insight to users via data linkage, the Office for Statistics Regulation (OSR) published their report Joining up data for better statistics.
The National Statistician responded to the Joining up data report alongside two additional annexes:
In light of the National Statistician’s response, Ed Humpherson, the Director General for Regulation at the OSR replied.
The Government Data Quality Hub in the Office for National Statistics have published a review of data linkage methods across government (previously known as a National Statistician’s Quality Review).
This review features articles contributed from experts in academia and government on:
- linkage quality metrics
- linking anonymised data
- a scaling method for linkage
- a framework for longitudinal linkage
- a new open source software package for linking data
The review also includes a series of recommendations for linkage work, including research for new methods and ways we can build capability across government.
National Statistician’s Data Ethics Committee
Strategies to maintain trust and confidence in government data use are supported through guidelines which promote ethical data sharing. The National Statistician’s Data Ethics Advisory Committee (NSDEC) advise the National Statistician on the use of public data to ensure it is ethical and in the interest of the public good.
National Statistician’s Quality Review
The National Statistician’s Quality Review (NSQR) of privacy and data confidentiality methods helps the Government Statistical Service (GSS) to take full advantage of the newest research and innovation in this field, maximising the usefulness of statistics while meeting its legal and ethical obligations to protect data confidentiality.
World leading experts from across academia and the private sector contributed articles to this NSQR that identify challenges in data linking and suggest steps that the statistical community could take to find solutions to these challenges.
One example is the article entitled Privacy, confidentiality and practicalities in data linkage from Associate Professor Kerina Jones and Professor David Ford.
The Office for National Statistics (ONS) is developing a data strategy that will be key in enabling us to put in place the correct infrastructure e.g. data capability, governance and management framework.
This aims to serve the ONS and support the Government Statistical Service (GSS) in the future, balancing the needs to extract value from data against the appropriate safeguards.
A comprehensive framework that underpins this strategy manages and governs data practices to ensure that data is protected and meets legal obligations. This includes linking and matching practices.
The full high-level data management framework comprises:
- a set of data principles which define the scope and path of data management, from acquiring or collecting data, through to publication, and a set of security principles which define the foundation of our data protection practices
- a suite of policies to support data and security principles – these are statements of intent which describe what will be done to ensure it complies with data and security principles
- a set of data standards and security procedures and protocols which define how statistical and business activities are carried out
The linking and matching policy is currently being reviewed to reflect the United Kingdom (UK) Statistics Authority’s systemic review.
Data linking and harmonisation
When multiple datasets are combined through linking, it is very important to use consistent and coherent definitions in data collection wherever possible.
Without this, there is a risk that the linked data will measure the same topic in several, different ways. This can present a confusing picture to users, and might also limit useful analysis because it can be difficult to reconcile such differences.
Harmonisation addresses this challenge by ensuring commonality in the use of definitions, administrative data and in the presentation of outputs.
The GSS Harmonisation Team maintains and develops fully approved harmonised principles (harmonised definitions, survey questions, standards for administrative data and standards to be used when presenting outputs).
If you would like to know more about harmonisation, the GSS Harmonisation Team can support you in developing and implementing harmonised principles.
Connected Open Government Statistics (COGS)
The approach to Connected Open Government Statistics (COGS) is to standardise and harmonise data. This means carefully analysing the structure of the datasets, establishing shared codelists, and being very specific about using metadata to describe the datasets.
While there are different ways of approaching this, linked data provides a convenient framework for modelling and publishing data in this way and is well suited to discovering and accessing the data using the web.
Linked data is about using the web to connect related data that wasn’t previously linked, or using the web to lower the barriers to linking data currently linked using other methods.
The fundamental reason for doing this is to make it easier for people to discover and use the datasets that have been published.
Where to access linked data
Researchers can access linked data through a number of research environments across the United Kingdom (UK). These include:
- The Office for National Statistics (ONS) Secure Research Service
- UK Data Service
- HMRC DataLab
- Administrative Data Research UK (ADRUK)
- The Secure Anonymised Information Linkage (SAIL) DataBank
- The electronic Data Research and Innovation Service (eDRIS)
The Digital Economy Act (2017), sets out the criteria for enabling access to data, including linked data, by accrediting processors, projects and researchers and requiring that the highest ethical standards are maintained.
Part 5 of the Digital Economy Act included important new legal powers to provide the UK Statistics Authority (and the ONS as its executive office) with better access to data to support the production of official statistics and National Statistics and statistical research; and to provide accredited researchers with better access to de-identified public sector data to support research projects for the public good.
The Statistics and Registration Service Act (2007) establishes a legal gateway, known as the Approved Researcher Scheme, which allows the ONS to grant access to researcher data that cannot be published openly, for statistical research purposes.
Accreditation of individuals and projects
Accessing any data in a secure environment requires both researchers and their project proposal to be accredited.
For instance, to access linked data in the ONS Secure Research Service, individuals should hold ONS Researcher Accreditation and have their research proposal approved by the Research Accreditation Panel, an independent panel comprised of representatives from government departments, academia, commercial and voluntary sectors.
Using linked data for statistical purposes that serve the public good requires additional safeguards.
Most secure research environments have developed well tested principles to ensure that access to data is secure, lawful and ethical.
The ONS and UK Data Service follow an internationally recognised set of principles known as the Five Safes Framework. This a set of principles for safely using secure data based on the safe people, safe projects, safe settings, safe outputs and safe data protocols:
- Safe people – trained and Accredited Researchers, trusted to use data appropriately
- Safe projects – data only used for feasible, legal, and ethical research that delivers clear public benefits
- Safe settings – access to data only possible using secure technology systems
- Safe outputs – all research outputs checked to ensure they cannot identify data subjects
- Safe data – researchers can use the appropriate data in a de-identified form
If you would like to learn more about data linking, take a look at the Introduction to data linkage course organised by the Government Statistical Service (GSS) Capability Team.
The GSS Capability Team also organises several other courses that can provide some wider context around the need for data linking:
Guidance available on the Government Statistical Service (GSS) website
Communicating quality, uncertainty and change
The GSS Methodology Advisory Committee
The GSS Methodology Advisory Committee (GSS MAC) may be able to help by providing free methodological advice.
The Journal for Public Health has published GUILD: GUidance for Information about Linking Datasets. This provides direction to ensure that each step in the data linkage process is documented according to a common framework.
The Government Statistical Service (GSS) data linkage symposium took place in London on Wednesday 23 October 2019.
The aims of this symposium were:
- to facilitate sharing of cross-government data linkage methods and experiences through a series of presentations from GSS colleagues
- to open up a discussion about how to work together to facilitate data linkage work
- to signpost people towards resources and training to build GSS capability in data linkage
- to update delegates on the progress of the National Statistician’s Quality Review on data linkage and learn from academic experts on leading linkage methods.
Presentations from the symposium
- Linking with sensitive identifiers in a national statistical institute
- Evaluation of the Troubled Families’ Programme: 2015-2020
- Census linkage: Advances in techniques
If you would like a copy of the slides from the presentations please email GSSHelp@statistics.gov.uk.