Who this strategy is for
We have focussed on helping leaders and managers of analysis create strategic plans.
This publication will help analysts and users of analysis understand why reproducibility is important and how to deliver it.
Our vision for analysis
Reproducible Analytical Pipelines (RAP) are the default way of conducting quantitative analysis. Analysis is then reproducible, transparent, trustworthy, efficient, and high quality.
Citizens, analysts, public servants, and leaders benefit from analysis being done this way.
Why RAPs are important
Our analysis must be straightforward to reproduce. Reproducible analysis is more efficient, open, and easier to quality assure and re-use.
The most effective way to make analysis reproducible is to build analysis in code using best practice from software engineering. RAPs enable us to deliver high quality and trustworthy research and analysis using good programming practices. We meet user needs by writing efficient and re-usable code for analysis.
What the strategy includes
We make the case for delivering RAP. We outline barriers that analysts face when implementing RAP and propose actions to tackle them. We then suggest how to deliver the strategy, recognising that organisations will approach it in line with local priorities.
The strategy’s goals
- Goal one: tools – give analysts the tools they need to do analysis using the RAP principles
- Goal two: capability – build analyst capability to do analysis using the RAP principles
- Goal three: culture – leaders and organisations see the value of RAP and encourage analysts to use RAP principles
How we will deliver the strategy
The strategy proposes actions to get the right tools, develop the right capability, and build the right culture for RAP. Some organisations have already achieved some of the proposed actions. Some actions may not be relevant to all organisations.
The Analysis Function will enable the strategy by supplying mentoring, guidance, and support to analysts. It will focus on training the teams responsible for building RAP capability and culture within their respective organisations.
Public sector organisations will choose leaders and RAP champions to be responsible for delivering mentoring, review, and communities of practice for RAP within their organisations.
Public sector organisations will develop plans to meet our common goals in line with local circumstances. They will monitor progress towards delivery. The Analysis Function will continue its existing qualitative research and annual programming skills survey to track how well we do.
“Reproducibility is the cornerstone of analysis. Analysts should get the same results as each other when using the same data and methods.
Since 2017, analysts in the UK public sector have made their analysis more efficient and reproducible using a methodology called Reproducible Analytical Pipelines (RAP). They advocate writing analysis as code and using open-source tools, version control software, and dependency management.
Within this strategy, we set out to continue their success and embed RAP across government. We are not starting something new but building upon proven successes.
The Office for National Statistics’ Coronavirus (COVID-19) Infection Survey shows how RAP can be a powerful tool to improve the efficiency and quality of analysis. Mentors helped the team make their nationally important analysis more efficient despite the tight deadlines and high pressure they faced throughout the coronavirus pandemic. From stories like this, we know that giving analysts the capability needed for RAP improves efficiency and the quality of their products.
Analysts have found that they can use the same digital capability they have gained to improve reproducibility to also deliver better analysis. Analysis can be higher quality through automated data validation and testing, more impactful through interactive data presentation, more prompt and less costly by removing manual steps, and more powerful by using advanced analytics. Driving forward the strategy will give analysts the right tools, help to build the required software capability, and begin the culture change needed for all analysis to benefit from this digital approach.
RAP not only delivers the capability needed for these better products but is also a core part of these new products. Explore education statistics, an open-source platform developed by the Department for Education, helps users interact with and better understand education statistics. The platform was designed to enable RAP from the ground up. Automated pipelines produced by statisticians supply data to the platform in the correct standardised format. The prospect of using the platform has, in turn, encouraged the statistics producers to adopt RAP and develop the capability they need to implement it.
Embedding RAP as the default approach to analysis in government is an essential step on the way to digital transformation of analysis. This publication sets out how to continue the success of RAP by identifying common barriers to use it and outlining actions to diminish them. This will be backed up by clear accountability and leadership.
I encourage leaders across the Civil Service to embed this strategy at the heart of delivery.”
Professor Sir Ian Diamond, Head of the Analysis Function and National Statistician
Citizens benefit from efficient, high-quality analysis delivered directly to them or used to improve services for them.
Leaders of analysis are empowered to advocate for further digital transformation of analysis.
Managers of analysis are advocates for Reproducible Analytical Pipelines (RAP) and feel confident managing analyses as software products.
Analytical teams in public sector organisations choose to deliver their analysis using the RAP principles by default.
Analysts have access to the tools and platforms they need to implement RAP principles.
Analysts feel comfortable writing and reading code.
Research software engineers are embedded in analytical areas to deliver digital expertise, build products and develop capability. Research software engineers are software engineers who understand the process of analysis and research. They work with other analysts to develop digital analytics products for end users.
What we want to achieve
As government analysts, we want to meet user needs by:
- making reproducible, re-usable, high quality and efficient analysis,
- using digital technology to improve our products,
- increasing trust in our work through transparent processes,
- improving the business continuity and knowledge management of our research.
Creating new products and re-developing existing products in line with the principles of Reproducible Analytical Pipelines (RAP) will deliver these aims.
This strategy will enable us to perform analysis in line with these principles by default. We will deliver it by achieving three goals:
- tools – ensure that analysts have the tools they need to implement RAP principles.
- capability – give analysts the guidance, support and learning to be confident implementing the Reproducible Analytical Pipelines (RAP)
- culture – create a culture of robust analysis where the RAP principles are the default for analysis, leaders engage with managing analysis as software, and users of analysis understand why this is important.
We will work with our colleagues and users to deliver these goals.
We aim to deliver the RAP strategy over four years from 2022 to 2026.
Why we are doing this now
Improved efficiency and quality are Civil Service priorities. Government Functions support the delivery of these priorities. They enforce common standards, produce resources that benefit the whole of government, and ensure that projects and programmes are implemented effectively.
We have shown that RAP delivers improved efficiency and quality for analysis. RAP forms a core part of digital transformation and supports the delivery of other government initiatives.
We now have strong foundations to deliver RAP. There is a vibrant cross-government community of practice. The Analysis Function supports them through central guidance, standards and mentoring to build capability.
However, progress is uneven. Some organisations have embedded RAP principles in their work more than others. There are still technology, capability and cultural barriers to implementing RAP everywhere.
We must capitalise on the foundations we have built to deliver our vision across the civil service.
This strategy underpins other government strategies and guidance by making it easier to deliver them. Here, we outline those strategies and guidance.
The Analysis Functional Standard sets out principles to be achieved by all analysts across government. RAP helps analysts to deliver high quality analysis, transparent results, value for money, continuous improvement and ethical research.
RAP supports the work of the Analysis Function to improve the efficiency of government as recommended in the Lord Maude review.
The strategy delivers the missions of the 2022 to 2025 roadmap for digital and data for the Analysis Function by helping to improve data for decision makers, build digital skills for analysts and ease systemic challenges to digital transformation for analysts.
The Aqua Book helps analysts to deliver high quality analysis. Applying RAP principles helps analysts comply with Aqua book requirements and demonstrate this.
The Code of Practice for Statistics sets the standards that producers of official statistics should commit to. The Office for Statistics Regulation endorses RAP as a method which helps analysts to follow the Code.
Mission three of the National Data Strategy is to transform government’s use of data to drive efficiency and deliver public services. Mission three is supported by five objectives. RAP is a key enabler for all of them.
The Goldacre review for better health research emphasises that RAP is foundational for good research. The review says, “RAPs reflect a modern, open, collaborative and software-driven approach to delivering high quality analytics that are reproducible, re-usable, auditable, efficient, high quality, and more likely to be free from error”.”.
The Data Quality Framework sets out guidance to help organisations use their data more effectively. RAP practices improve quality throughout the data lifecycle.
The Data Ethics Framework guides proper and responsible data use in government and the wider public sector. RAP supports the framework by improving transparency and quality. It is a recommended practice in the framework.
The Service Standard helps teams create and run great public services. RAP helps analysts deliver its requirements for open sourcing analysis, choosing the best tools, and using open standards and patterns. The service standard mandates the use of multidisciplinary teams – as outlined in this strategy.
The Technology code of practice criteria help government design, build and buy technology. RAP helps analyst teams comply with these criteria through open-source tools, open-source products, and enabling sharing and reuse.
The Turing Way is guidance for academic researchers to make reproducible, ethical and collaborative data science. RAP aligns with the practices it sets out.
Why Reproducible Analytical Pipelines matter
Analysis is reproducible when it reliably returns the same results using the same methods from the same data. The Aqua book emphasises that reproducibility is a core part of quality analysis. Good peer review and audit rely on reproducibility. The best way to improve reproducibility is to write analysis as code with no manual steps.
Reproducible Analytical Pipelines (RAP) use software engineering practices to improve reproducibility in analysis. RAP principles and practices help us to deliver sustainable analysis that is fit for purpose. Although RAP focuses on reproducibility, it enables more than reproducibility alone.
Analysts already create bespoke software to conduct analysis. This software often includes spreadsheets or processes that use proprietary analysis tools. Analysis made this way often needs extensive manual intervention, making it laborious to reproduce research. Analysis that is hard to reproduce is also difficult to reuse, review or audit. Poor reviews and audits can lead to, or hide, quality risks.
When we build analysis as a software product, we can draw on software engineering best practice with these immediate benefits:
- higher quality analysis and reduced risk, with quality assurance built into all parts of the process
- more efficient and reliable processes
- improved transparency and greater confidence in the analysis from producers, managers, and users
- improved business continuity and knowledge management
- analysis that is easier to adapt and reuse
Improving analytical software and automating manual steps creates more efficient processes. These processes are easier to adapt to answer related questions or to respond to updated input data. Teams have more time to focus on value-adding activities such as data visualisation, quality assurance, interpretation, user engagement and developing novel analysis.
Learning lessons from software engineering can speed up development for new or one-off analysis. Analysis that is quicker to produce is quicker to reproduce.
Figure 1: Reproducible practices make developing analysis faster
The infographic shows two example pipelines depicted as arrows with different stages. One pipeline is made with reproducible practices and one without reproducible practices. It is shown that although working with reproducible practices can take more time at the start, it is faster over the whole development process. This is because modifying and re-doing analysis in light of review is much quicker.
Automated processes can make multiple products to meet different user needs at little or no added cost. For instance, easy-read reports, data delivery through application programmable interfaces (APIs), interactive dashboards and machine-readable CSVs can all be made at the same time as parts of one pipeline. Finally, developing software skills enables the use of innovative analytical methods such as machine learning. All forms of quantitative analysis can benefit from RAP.
Where we are now
We spoke with producers of analysis across government to understand how they are implementing RAP. We conducted a survey of government analysts to understand their coding practices. Our evidence showed that a lack of prioritisation, tools, and capability hinders the digital transformation of analysis.
Progress towards Reproducible Analytical Pipelines
Since the Department for Culture, Media and Sport and the Government Digital Service developed RAP in 2017, we have seen many excellent examples of how adopting RAP principles delivers better analysis. Analysts have used RAP to bring better quality, transparency and efficiency to statistical production workflows, single-use policy analysis, public dashboards and APIs, and machine learning models.
Some organisations have made considerable progress towards adopting RAP principles. Some have built teams to promote the rollout of RAP principles.
The award-winning RAP Champions practitioner network is an active and vibrant community. It is responsible for driving forward adoption of RAP principles. It has a thriving online presence where members support each other by discussing challenges and offering help. It meets several times a year to share successes and solutions. RAP Champions have implemented RAP in their organisations, set standards like the Minimum Viable Product and developed software tools for use by analysts across government.
The central Analysis Function team supports the RAP Champions. The team delivers training and consultancy, software products, and guidance to help analysts adopt RAP. They wrote the Quality Assurance of Code for Analysis and Research manual. They maintain tools like the govcookiecutter template to help analysts deliver high-quality analysis as code. They break down common barriers by sharing success stories from across government. Their consultancy service has delivered high-profile RAP products. Finally, they work with departments and run an annual survey to understand caps in capability gaps. They have developed courses and learning pathways to tackle these gaps.
Analysts from many organisations have started open sourcing their analysis using RAP to enable review and contribution from the public. The UK Coronavirus Dashboard and Explore Education Statistics teams publish their code and methods. The Office for National Statistics’ Centre for Crime and Justice and Public Health Scotland publish the code behind their statistics.
The Office for Statistics Regulation (OSR) recommends RAP but notes that no organisation has adopted RAP principles by default for all analysis. They explain that:
“…there are often common barriers for teams and organisations wishing to implement RAP”
Reproducible Analytical Pipelines: Overcoming barriers to adoption, Office for Statistics Regulation, 2021
They add that access to tools and training and having the time and support to carry out development work are significant barriers. We summarise these as tools, capability, and culture.
Most analysts do not have all the tools they need to implement RAP. Analysts also feel they cannot influence decisions about which tools they can use. Without access to the right tools, they cannot develop the capability they need.
Most organisations reported challenges in getting the tools needed to enable RAP. Many have a long and difficult process to have software installed. This often results in waiting months to have access to the necessary tools. Difficulties getting the right tools can also happen because of security or information assurance concerns. When tools are made available, they are often old versions or have restricted functionality because of this.
Legacy platforms can make doing good analysis more difficult because they do not support the tools for RAP. Often, RAP requirements were not part of the system specification. Legacy platforms can be both expensive to move away from and expensive to change.
Analysts are unsure about how to open source their code and tools. As there is no guidance on how to publish code safely, they are risk averse and do not do so. When they write code that has wider application, they do not share it to maximise reuse. This leads to inefficiency and duplication of effort because analysts cannot find similar earlier work. Teams in the same organisation might duplicate code deriving the same variables, deploying the same methods, or producing similar outputs.
Analysts said that frequently changing input data made writing pipelines difficult and prevented them from reproducing their work. When source data are not version controlled, they cannot reproduce earlier workflows. When source data change format and shape analysts must rewrite code or configurations to ingest the new data. Many analysts find it difficult and cumbersome to adapt their analysis to upstream changes because they do not know how to write modular code. Many are also unaware of tidy data principles. Adopting RAP would help to mitigate both problems.
Capability in teams
Teams need to develop the right capability before they can adopt RAP principles. This takes time. One organisation noted that it takes longer to produce a RAP product than a conventional one while analysts learn new skills. But successful RAP projects from across government have shown that the costs of building capability are quickly recouped, and benefits realised, when existing processes are manual, frequent, or high risk.
Most organisations noted that a lack of existing skills and concern about keeping the capability to use RAP products when staff move has held them back from adopting RAP more widely. Organisations said it could be difficult to hire graduates with the right programming skills. At the same time, they cannot recruit analysts who can implement RAP from other public sector organisations as the capability is not yet widespread enough. When organisations cannot recruit the right capability, they are reluctant to invest in RAP products they might not be able to sustain.
Managers prioritise delivery over development. We found that often teams felt they did not have support to develop the capability they need to implement RAP principles. Some could not find enough mentors with the right skills to improve capability. Others could not take enough time out from delivery work to build up skills. Many analysts were unaware of guidance for writing good code for analysis.
Many analysts tasked with delivering RAP are unaware of the skills they will need to implement it fully. Therefore, they can find it difficult to choose the right training. They do not know about existing communities of practice or how to engage with them.
RAP Champions do not have dedicated time to support analysts in their organisations and must carve out time from their day jobs. Many feel unable to meet the level of demand for their support.
Leaders and analysts do not understand the full range of benefits of RAP and therefore do not prioritise implementing it fully. When they implement RAP, they often focus on process efficiencies. This leads them to implement parts of RAP (like writing analysis as code) but omit rigorous version control and testing. Partial adoption delivers some efficiencies but means leaders do not see how the components of RAP work together to improve quality, efficiency and transparency beyond what any one component can deliver. The most powerful benefits of RAP, like improved quality and efficiency through reuse, come from applying all parts of the methodology.
Leaders and analysts see RAP as too complex and burdensome for ad-hoc analysis. They underestimate the risks to quality when teams conduct such analysis without using RAP principles. Therefore, analysts do not prioritise adoption of RAP for ad-hoc or one-off analysis.
Users are unwilling to see business-as-usual delivery reduced to enable teams to adopt RAP principles. They are unaware that using RAP principles can improve quality and delivery.
Many teams are monocultural and do not have the right mix of skills to deliver great analytics products. Multidisciplinary analysis teams have been rare. Digital and data teams are seen as service providers rather than partners in the delivery of analytical products. Therefore, analysis products are not made as valuable as they could be.
Most organisations mentioned conflicting prioritisation as a barrier to RAP. This became obvious during the coronavirus (COVID-19) pandemic response when limited analytical resource was in very high demand. It meant some teams had to delay building RAPs because their analysts faced intense workloads and had no time for development. Some teams had the right capability but technical debt from the early days of the pandemic made transformation difficult.
Goal one: the right tools
Analysts have access to the right tools and platforms to make their work reproducible, high quality and efficient.
Why having the right tools matters
We must supply the tools analysts need to adopt RAP principles.
We must help analysts to re-use each other’s work.
Our technology platforms for analysis must enable reproducible analysis.
How this goal will address the barriers to Reproducible Analytical Pipelines
Analysts will have access to the right tools so that they can build capability and implement Reproducible Analytical Pipelines (RAP).
Analyst leaders will work with digital and security teams so that the platforms analysts use support reproducible analysis.
Analysts will open source their products and tools wherever possible in line with Central Digital and Data Office guidance. Open sourcing analysis helps users to understand what was done.
Analysts will have the tools they need to open source their analysis. They will have clear policies and guidance to open source the tools and products they make so that other analysts can reuse and learn from them.
Analysts will have access to open-source tools developed to aid analysis.
How we will deliver the right tools
The Analysis Function will:
- write guidance to help analysts understand how to open source their analysis safely and securely
- maintain tools, like govcookiecutter, to help analysts share their code
- share case studies that show how departments have overcome challenges and delivered the right tools
Analyst leaders will:
- work with security and IT teams to give analysts access to the right tools
- work with security and IT teams to develop platforms that are easy for analysts to access, flexible and responsive to the needs of analysts
- work with security, IT, and data teams to make sure that the tools data analysts need are available in the right place and are easy to access
- use open-source tools when appropriate
- open source their code
- work with data engineers and architects to make sure that source data are versioned and stored so that analysis can be reproduced
Goal two: the right capability
Analysts have capability and are supported to produce analysis in line with the Reproducible Analytical Pipelines (RAP) principles. Specialist analyst programmers are embedded in teams.
Why achieving the right capability matters
Analysts must have the right skills to implement high quality RAP.
Managers and leaders of analysis must be confident managing analytical software.
Organisations must be able to recruit the right people to develop and use RAP products.
How this goal will address the barriers to RAP
Analytical leaders will understand how and when to work with digital teams so they can deliver valuable analysis in the right format to users. Analysts will have the skills to collaborate with digital colleagues.
Analytical leaders and the Analysis Function will work with education providers and professional bodies so that we recruit the right skills into the Analysis Function.
Analytical leaders will promote a “RAP by default” approach so that we encourage more analysts to gain the capability to implement RAP. Having more analysts in the public sector with these skills will make it easier for organisations to recruit analysts who are familiar with this way of working. It will make it easier for other analysts to gain those skills through mentoring and peer review.
Managers will understand RAP so that they can deliver change in their teams. They will manage analysis as software so that analysis is efficient and high quality.
Managers will build in time to allow analysts to develop their capability. From our work with teams, we believe most analysts can learn to do RAP if they work at least one day per week on a RAP project with a mentor for three months.
Analysts and their managers will be familiar with programming good practice so they can write high quality analysis.
Analyst teams will have the right skills so they can implement RAP. All analysts in teams that implement RAP will be able to read and write good code. Some analysts will have stronger software engineering skills so that teams can produce high quality software. Large analytical groups may have a dedicated research software engineering function to deliver analytical products.
How we will deliver the right capability
The Analysis Function will:
- work with analyst leaders and RAP champions to help departments assess what they need to do and deliver strategic plans to implement this strategy
- include programming skills in analytical career frameworks, learning pathways and professional standards
- work with professional bodies and higher education institutions to encourage inclusion of RAP principles in curricula
- deliver training to teams across government
- find case studies and share best practice
- support teams when they are developing products
- promote existing training materials and learning pathways
- procure and develop training materials to help managers
- continue to support a RAP community of practice
Analyst leaders will:
- ensure their analysts build RAP learning and development time into work plans
- help their teams to work with DDaT professionals to share knowledge
Analyst managers will:
- build extra time into projects to adopt new skills and practices where appropriate
- learn the skills they need to manage software
The RAP Champions community will:
- deliver mentoring and peer review schemes in their organisations and share good practice across government
- learn the skills they need to implement RAP principles
Goal three: the right culture
Analysts carry out analysis in line with the Reproducible Analytical Pipelines (RAP) principles by default. Leaders give them the time to do so. Leaders expect analysis to be conducted using RAP principles where appropriate, and users see the benefits of this approach.
Why it matters to achieve the right culture
All our analysis must use RAP principles by default to make it as easy as possible to achieve excellent quality and maximise efficiency and transparency. Analysts must feel encouraged and supported to develop analysis products with RAP principles.
We must use RAP principles to deliver high-value analysis to users. Our culture must demand high-quality analysis. Our leaders and users must encourage continuous improvement.
We must work in multidisciplinary teams to deliver the most valuable analysis.
How this goal will address the barriers to RAP
Leaders will encourage their teams to adopt RAP principles so that we improve quality, business continuity, and resilience.
Leaders will promote “RAP by default” so that analysts feel empowered to adopt RAP.
Analyst teams will be multidisciplinary and will work with digital teams so that they can deliver high quality and valuable analysis products through application programmable interfaces (APIs), interactive visualisations and regularly updating dashboards.
Analysts will recognise the ways that RAP principles help them to do analysis better so that they are motivated to build their capability. When they share successful RAP projects, they will encourage others to adopt RAP.
Users will understand how RAP delivers better quality, transparency and resilience so they will be informed customers and drive development of RAP.
How we will deliver the right culture
The Analysis Function will:
- manage a RAP accelerator programme to connect mentors with mentees across government
- choose leaders responsible for promoting RAP and monitoring progress towards this strategy within organisations
- form multidisciplinary teams that have the skills to make great analytical products, with some members specialised in developing analysis as software
Analyst leaders will:
- promote a “RAP by default” approach for all appropriate analysis
- write and implement strategic plans to develop new analyses with RAP principles, and to redevelop existing products with RAP principles
- lead their RAP champions to advise analysis teams on how to implement RAP
- help teams to incorporate RAP development into workplans
- identify the most valuable projects by looking at how much capability the team already has and how risky and time-consuming the existing process is
RAP champions, leaders and the Analysis Function will:
- communicate the benefits of RAP to analysts, managers, and users
RAP champions will:
- support leaders in their organisation in delivering this strategy by acting as mentors, advocates and reviewers
- manage peer review schemes in their organisation to facilitate mutual learning and quality assurance
Analyst managers will:
- evaluate RAP projects within organisations to understand and demonstrate the benefits of RAP
- mandate their teams use RAP principles whenever possible
- engage with users of their analysis to demonstrate the value of RAP principles and build motivation for development
- deliver their analysis using RAP
How we will deliver this strategy
By December 2022, organisations must set out how they will meet the goals of the strategy and when they expect to deliver outcomes. They must set out what success looks like for their organisation and develop local strategic plans to implement the actions in this strategy against tools, capability, and culture.
The tools, skills and culture of analysts varies. So do the types of analysis they do. Some organisations have more RAP capability or access to more advanced tools than others. Some actions are not relevant to all organisations. We cannot outline a system-wide plan that will meet all needs, but we suggest that organisations consider the following steps for their local plans.
- Nominate the senior sponsor for RAP delivery. This will usually be the Departmental Director for Analysis or their nominated deputy.
- Nominate leaders and RAP champions to develop local strategic plans tackling the three priority areas of tools, capability and culture. They will assess the current capability of the organisation to deploy RAP and develop an implementation plan drawing on actions from this strategy that are outstanding.
- Get the minimum tools needed, if the organisation does not have them yet.
- Create training pathways for analysts, when tools are in place. We advocate just-in-time learning where analysts only receive training as they begin implementing these new capabilities in their work. Organisations will be able to draw upon the training pathway for RAP developed by the Analysis Function, but will need to tailor some training to the analytical environments the teams work in.
Once tools and training are in place, projects will be in a better position to show value to stakeholders quickly.
The Analysis Function will help organisations to deliver the strategy by continuing to promote and build capability for RAP through its Develop and Deliver Services workstream, sponsored by Departmental Directors of Analysis. They will focus on training the teams responsible for building RAP capability and culture within their respective organisations.
Monitoring and measuring success
Organisations should publish their plans to deliver the strategy and its goals. Organisational plans must include local evaluation frameworks to assess progress against the three strategic goals.
The Analysis Function will develop and publish a central evaluation framework once departmental plans are in place in early 2023. They will use this to monitor progress towards delivery of the strategy.
The Analysis Function team will continue to collate light-touch self-assessments from departments each year. They will also collect reports on progress against departmental strategic plans. They will deliver a yearly report to the Analysis Function Board on progress towards delivering the strategy. They will identify remaining barriers.
The Analysis Function will continue to undertake the annual Coding in Analysis and Research Survey to measure the capability of analysts to deliver RAP. The survey has already supplied great insight into where government has strong software expertise in analysis, and where further development is possible. Organisations should encourage their analysts to complete the survey.
Reproducible Analytical Pipelines principles
Reproducible Analytical Pipelines (RAPs) are automated analytical processes.
They incorporate elements of software engineering best practice to ensure that the pipelines are reproducible, auditable, efficient and high quality.
RAPs increase the efficiency of statistical and analytical processes, delivering value. Reproducibility and auditability increase trust in the statistics. The pipelines are easier to quality assure than manual processes, leading to higher quality.
Statisticians and analysts should look to implement the RAP principles in parts of their processes.
A RAP will:
- improve the quality of the analysis
- increase trust in the analysis by producers, their managers and users
- create a more efficient process
- improve business continuity and knowledge management
To achieve these benefits, at a minimum a RAP must:
- minimise manual steps, for example copy-paste, point-click, or drag-drop operation; where it is necessary to include a manual step in the process this must be documented as described in the following bullet points
- be built using open-source software, which is available to anyone, preferably R or Python
- deepen technical and quality assurance processes with peer review to ensure that the process is reproducible and that the requirements described in the following bullet points have been met
- guarantee an audit trail using version control software, preferably Git
- be open to anyone – this can be allowed most easily using file and code sharing platforms
- follow existing good practice for quality assurance – guidance set by departments or organisations, and by Best Practice and Impact, and Data Quality Hub for the Analysis Function and Government Statistical Service
- contain well-commented code and have documentation embedded and version controlled within the product, rather than saved elsewhere
- There may be restrictions, such as access to databases, which stop analysis producers building a RAP for their full end-to-end process. In this case, the previously described requirements apply to the selected part of the process.
- There may be restrictions, such as sensitive or confidential content, which stop analysis producers from sharing their RAP publicly. In this case, it may be possible to share the RAP within a department or organisation instead.
- It is recommended that where possible a RAP should be built collaboratively – this will improve the quality of the final product and helps to allow knowledge sharing.
There is no specific tool that is required to build a RAP, but both R and Python provide the power and flexibility to carry out end-to-end analytical processes, from data source to final presentation.
Once the minimum RAP has been implemented, statisticians and analysts should attempt to further develop their pipeline using:
- functions or code modularity
- unit testing of functions
- error handling for functions
- documentation of functions
- code style
- input data validation
- logging of data and the analysis
- continuous integration
- dependency management
Platforms for Reproducible Analysis
This list has been developed in consultation with data scientists, data engineers and analysts across government.
Members of the government analytical community agree that the following tools allow for better ways of working for most analysts. Security and IT teams must work with analysts to deliver these platforms so that analysts have the right level of control over the environments to conduct analysis effectively.
We have divided this list into software that analysts need to fulfil the minimum Reproducible Analytical Pipelines criteria (see Reproducible Analytical Pipelines principles section); and into software that analytical areas can use to develop further.
For Reproducible Analytical Pipelines that meet the minimum criteria:
- version control software, that is, git
- package and environment managers for each of the available languages
- packages and libraries for open-source programming languages, either through direct access to well-known libraries, for example, npm, PyPI, CRAN, or through a proxy repository system, for example, Artifactory
- individual storage, for example, home directory
- shared storage, for example, s3, cloud storage, with fine-grained access control, accessible programmatically
- integrated development environments suitable for the available languages – RStudio for R, Visual Studio Code for Python and so on
For further development:
- source control platforms, for example, GitHub, GitLab or BitBucket
- continuous integration tools, for example, GitHub Actions, GitLab CI, Travis CI, Jenkins, Concourse
- make-like tools for reproducible workflows, for example, make
- relational database management software, for example, PostgreSQL, that is available to users
- orchestration systems for pipelines and workflows, for example, airflow, NiFi
- internal-facing servers to host html-rendered documentation
- external-facing servers with authentication to host end-products such as web applications or APIs
- big data tool, for example, Presto or Athena, Spark, dask and so on, or access to large memory capability
- reproducible infrastructure and containers, for example, docker