Reproducible Analytical Pipelines (RAP) case studies
Reproducible Analytical Pipelines (RAP) use software engineering best practice to deliver efficient, high quality and transparent analysis.
Developing RAPs can be difficult for teams. So we have shared some case studies on this page to demonstrate how teams have overcome challenges to produce RAPs.
If you think you have a story that would make a good case study, please contact us.
Despite a challenging IT environment and limited access to software, our RAP champion and a small support team have developed publication ready material that has saved considerable staff time and limited errors.
I cannot wait to get a wider deployment of software so we can develop a full RAP from data ingest to final report for many of our statistical products. This will save large amounts of statisticians’ time. We will be able to invest this time in statistical development and support projects that we have been unable to progress whilst producing statistical reports without use of RAP.
Dr Simon Clarke, Chief Statistician at the Health and Safety Executive (HSE)
Analysts at the Health and Safety Executive (HSE) have successfully started using RAP practices despite significant technical barriers.
The existing IT infrastructure made coding and development of RAP challenging. Their production environment was not designed to meet different professional needs. The team had no access to common coding languages or tools. Analysts in the statistics team only had access to programming languages through shared off-network laptops. But it was not possible to update software or install packages on these computers or communicate directly with colleagues.
A data scientist was able to build a prototype RAP using openly available data in their home computer environment. Using this RAP they were able to recreate existing publications in a more automated way using RMarkdown. While they were only able to produce existing publications in this way, these prototypes functioned as a proof of concept, showing how RMarkdown can be used to automate creation of publications. Within 5 months of this work starting, they were able to create similar templates for 15 different publications.
The analysis team were then able to use this proof of concept to convince colleagues to start using RAP for regular publications. Since then, the analysis sections of around 75% of the statistics team’s publications are automated, with development on others starting soon.
This progress has helped the team make the business case for getting more coding tools installed on their work computers. This will help reduce reliance on off-network machines. By demonstrating the benefits of RAP the team also has the necessary support to start automating data pre-processing. RAP is currently saving approximately 20 hours of analysts’ work per document, per year. Furthermore, efficiency savings are expected to increase when more of the pre-processing is automated.
The success of this work was dependent on demonstrating the value of RAP early on and throughout development. This was done through presentations, sharing seminars and demonstrations. By concentrating on the parts of the process that are easy to automate, using the current IT infrastructure, analysts could quickly make the necessary progress to get support for future development.
The analysts are now looking to get support for automated deployment of pipelines. This will enable them to use RAP as the default approach for new and ad-hoc analyses.