GSS sharing webinar: data linkage
- Date
- 10th March 2020 11:00 am to 12:30 pm
- Venue
- Online
- Book now
On Tuesday 10 March, the Best Practice and Impact (BPI) division will be hosting a webinar on data linkage. Sharing this work will enable analysts to learn from each other and innovate. There will be presentations from colleagues at both the Ministry of Justice (MoJ) and Ordnance Survey (OS).
Ordnance Survey: Data Discoverability with Geo6
As the first of four Geospatial Commission projects looking to improve the quality and accessibility of all UK location-based data, the Geo6 recently held its third workshop on Data Discoverability. Collectively known as the Geo6, this refers to a collaboration between OS, British Geological Survey, Coal Authority, HM Land Registry, UK Hydrographic Office and the Valuation Office Agency.
Location data is valuable for businesses and public-sector organisations alike, but this value relies on being able to find the data. This is where the Data Discoverability project comes in.
Over the last few months the Geo6 have worked together to produce a catalogue of all the data they hold. They needed to find a way of matching their data to understand if it is the same, different or if there is richer information on the same thing.
Ministry of Justice: Sparklink
The Analytical Services branch of the MoJ are developing probabilistic record linkage software called Sparklink. This is a Python-based package that implements Fellegi-Sunter’s model of linkage in the distributed computing framework Apache Spark. Sparklink uses the Expectation-Maximisation algorithm to estimate model parameters.
The Sparklink record linkage package aims to:
- work at much greater scale than current open source implementations (100 million records or more)
- get results faster than current open source implementations – with run-times of less than an hour
- have a transparent methodology, so the match scores can be explained both graphically and in words
- have accuracy similar to some of the best alternatives
How to join in
This event is online to enable anyone from any location across the civil service to join. To attend this webinar please go to the GSS Best Practice and Impact YouTube channel after registering on Eventbrite. We will be live from 11:00am – 12:30pm on Tuesday 10 March and taking questions via sli.do.
This event is part of a series of sharing webinars organised by BPI. For information on the next sharing seminar, please sign up to the GSS newsletter or keep an eye on the events page on the GSS website.