Reproducible Analytical Pipelines (RAP)

Sub-sections:

Reproducible Analytical Pipelines (RAPs) are automated statistical and analytical processes. They incorporate elements of software engineering best practice to ensure that the pipelines are reproducible, auditable, efficient, and high quality.

RAPs increase the efficiency of statistical and analytical processes, delivering value. Reproducibility and auditability increase trust in the statistics. The pipelines are easier to quality assure than manual processes, leading to higher quality.

Statisticians and analysts should look to implement the RAP principles in parts of their processes.

What a RAP does

A RAP will:

improve the quality of the analysis
increase trust in the analysis by producers, their managers and users
create a more efficient process
improve business continuity and knowledge management

In order to achieve these benefits, at a minimum a RAP must:

minimise manual steps, for example copy-paste, point-click or drag-drop operations. Where it is absolutely necessary to include a manual step in the process this must be documented as described below
be built using open source software which is available to anyone, preferably R or python
deepen technical and quality assurance processes with peer review to ensure that the process is reproducible and that the below requirements have been met
guarantee an audit trail using version control software, preferably Git
be open to anyone – this can be facilitated most easily through the use of file and code sharing platforms
follow existing good practice for quality assurance – guidance set by departments or organisations, and by Best Practice and Impact and Data Quality Hub for the Analysis Function and Government Statistical Service
contain well-commented code and have documentation embedded and version controlled within the product, rather than saved elsewhere

Things to note about RAP

There may be restrictions, such as access to databases, which stop analysis producers building a RAP for their full end-to-end process. In this case, the above requirements apply to the selected part of the process.
There may be restrictions, such as sensitive or confidential content, which stop analysis producers from sharing their RAP publicly. In this case, it may be possible to share the RAP within a department or organisation instead.
It is recommended that when possible a RAP should be built collaboratively – this will improve the quality of the final product and helps to facilitate knowledge sharing.

What you need for RAP

There is no specific tool that is required to build a RAP, but both R and Python provide the power and flexibility to carry out end-to-end analytical processes, from data source to final presentation.

Once the minimum RAP has been implemented statisticians and analysts should attempt to further develop their pipeline using:

functions or code modularity
unit testing of functions
error handling for functions
documentation of functions
packaging
code style
input data validation
logging of data and the analysis
continuous integration
dependency management

RAP blog posts

To read analysts’ experiences of RAP please see:

RAP champion network

RAP champions lead the development of reproducible analytical pipelines in their organisations. Find out more about the RAP champion network.