Vax to the future: reproducibility powered by epidemiological know-how

In September 2024, England’s respiratory syncytial virus (RSV) vaccination programme was launched, protecting those at highest risk of serious illness: newborns (maternal programme) and older adults. Integral to the epidemiological surveillance of any new vaccine programme, the UKHSA monitor the uptake, safety and effectiveness of vaccines. Our job is to ensure data flows that are reliable, accurate, and timely, and to understand clinical pathways and coding practices so that data are correctly interpreted.
For the RSV programme, we needed to track uptake of the vaccine, linking records with demographic and clinical information, and to produce analysis that could be trusted by policymakers, clinicians, and the public.
Achieving this at scale required not only new data pipelines informed by experience from other vaccination programmes, but also reporting mechanisms and automated checks. Because of the complex features of the maternal programme, particularly the need to monitor uptake in women after delivery and the associated data lag, we selected GP-level aggregated data from the ImmForm website for uptake reporting based on the prenatal pertussis programme ( another vaccine programme given to mothers to protect their babies).
Automation as a foundation
We then created a reproducible analytical pipeline (RAP) using SQL to extract data from large national datasets, aggregate, and transform it, and then R to apply validation rules, generate statistical summaries, and produce consistent outputs for monthly reporting.
These pipelines support consistency, reduce human error, and facilitate the analysis of larger and more complicated datasets. We don’t need to rebuild workflows each month – instead, we build on a robust backbone of code. Trends in uptake can be identified quickly across regions or age groups, allowing decision-makers to act quickly.
But RAP is only one part of the puzzle when producing high quality statistics.
The human layer: validation with context
While monitoring the maternal programme, we noticed the monthly number of births were not comparable to the same cohort of women delivering in the same month for the maternal pertussis vaccination programme. After conducting additional quality assurance and validation processes for both programmes, we initially reported data from only one GP IT supplier (representing 40.4 % of practices in England). We worked with the GP IT suppliers to identify the source of the errors and support detailed data quality assurance. This made enough improvements to subsequently report national data covering around 98% of all GP’s across England.
Experiences like this remind us about the importance of human expertise in interpreting epidemiological data. Our RAP can flag an anomaly, but it’s on us to investigate whether it is due to data entry errors, a backlog of records, or the genuine impact of a local vaccination campaign. If uptake among eligible pregnant women appears unusually low, the explanation may lie in the specific context of antenatal clinics: how appointments are scheduled, how eligibility is coded, or how records are transferred between systems. In other vaccine programmes like MMR, we need clinical expertise to interpret individual results within otherwise automated pipelines.
The first year of the RSV vaccination programme has demonstrated the value of an approach that blends RAP with subject matter experience to ensure outputs are consistent yet also meaningful, accurate, and grounded in real-world context.