RAP in reality: automating the production of high-quality norovirus and rotavirus statistics

Charlie Hunt

The need for norovirus statistics

Every year in the UK, there are an estimated 3 million cases of norovirus, also known as the “winter vomiting bug”. While most people recover quickly, norovirus can cause severe disease in vulnerable groups. The public health of norovirus is considerable, causing ongoing disruption to health and social care,and an estimated annual economic cost to NHS inpatient services of £298m.

The UK Health Security Agency (UKHSA) undertakes norovirus surveillance to understand the impact on the population and monitor circulating strains. Since 2014, the national norovirus surveillance team has published an overview of norovirus activity on GOV.UK, alongside a similar overview for rotavirus, another cause of gastroenteritis. The national norovirus and rotavirus surveillance report is badged as a set of official statistics.

These statistics are valued by a wide variety of users, from the health and social care sector to local and national government, the food industry and the general public.

Why RAP?

Until recently, the process of producing the report was resource intensive. It involved compiling data extracts from three different surveillance systems (synthesising hospital laboratory reports, hospital outbreak reports, and genotyping results), analysis in Excel and Stata, and copying results into a Word template. This would take half a day or more – a significant time burden during weekly winter reporting, with multiple opportunities for errors to creep in.

Given the frequency of publication, standard format, and our commitment to the Code of Practice for Statistics, in 2023 the team decided to automate the process. With the support of UKHSA’s Analytical Quality Assurance and Standards team, we began developing a reproducible analytical pipeline (RAP).

Building the RAP

To set up the RAP, we first mapped out the data workflow, including data collection, processing, analysis and dissemination. We streamlined data collection processes, working closely with colleagues across the organisation. The pipeline was then built in R using software development principles and analytical best practice including modular design, writing functions for repeated actions such as data processing or chart visualisations, and version control (Git and GitHub). We also implemented comprehensive quality assurance checks using separate development branches, pull requests, and code reviews on GitHub. To automate the report itself, we used R Markdown.

The RAP was validated against outputs from the old process and run in parallel for a short time, before we transitioned to the RAP full-time in July 2025. We have developed supporting materials such as a standard operating procedure and a repository README file, and are currently training other members of the wider team to use the RAP.

What’s changed, and what’s next

So far, the RAP has been a success, reducing production time from half a day to approximately an hour, and providing a more robust way of consistently producing accurate, high-quality outputs and building resilience across the team.

Staying attentive to user needs is a priority for the norovirus surveillance team. We have worked with the Analytical Quality Assurance and Standards team to implement a user survey to understand who our users are and their interest in the statistics we produce. Having a modular design and robust quality assurance process has made it easier for us to make changes to the RAP and include new features (such as additional metrics) in response to user feedback. We will be running the survey again towards the end of the reporting season to gather any new feedback and find out how the additional features were received.

We have also tried to make our product as accessible as possible by following Analysis Function best practice guidance on data visualisation and communication. We are keen to share our experience of setting up a RAP with colleagues in other analytical functions to encourage broader uptake of RAP principles. Please email NoroOBK@ukhsa.gov.uk for more information.

 

Danielle Hearn and David Elliott, UK Health Security Agency
Charlie Hunt
Danielle is a Senior Data Scientist at the UK Health Security Agency (UKHSA). She has a background in mathematics and statistics with previous experience in public health and data analytics. In her current role, she works on automating, developing, and improving data workflows for the surveillance of gastrointestinal infections. David is an Epidemiologist at the UK Health Security Agency (UKHSA), with a background in infectious disease surveillance, outbreak investigation and data analytics. David has worked on gastrointestinal pathogens and healthcare-associated infections during his time at UKHSA, prior to which he worked in the development and humanitarian sector.