Tips for urgent quality assurance of ad-hoc statistical analysis
This guidance sits alongside “Tips for urgent quality assurance of data”. This guidance focuses on the analysis, that guidance looks at the data used in the analysis.
|Publication date:||1 April 2020|
|Owner:||Data Quality Hub|
|Who this is for:||Users and producers of statistics|
There are well established and documented Quality Assurance (QA) processes in place for GSS statistical outputs. However, often there is a need for additional ad-hoc analysis of GSS data, with a quick turn-around, to answer a specific question. Even when time and resources are limited, the analysis still requires good QA to avoid costly mistakes. The GSS Quality Centre have developed some top tips to improve the QA of ad-hoc analysis across the GSS.
If the analysis you are undertaking is a ”business critical model” (our most important models, in terms of high amounts of money involved and high sensitivity i.e. influential and widely used) then you should use the Aqua Book and the Aqua Book related tools instead of the top tips. The Aqua book and related tools include standards for the QA of government models.
All other analysis that does not feed into a GSS business critical model or statistical output should follow the top tips. Use common sense and professional judgement to decide how much detail you need. If the analysis is a simple data extraction or presentation of simple analyses, you may not need to undertake all the steps.
Tip 1: Run pre-analysis checks to ensure you answer the right question and meet customer needs
Before undertaking any new piece of analysis, check whether the same or similar analysis has been done before. If it has then it may be possible to adjust the existing work to answer a new question.
Double check exactly what the customer is asking for. When undertaking analysis to short timescales, you may produce the wrong output if you don’t engage with your customer early on. In a worst-case scenario this could lead to the wrong analysis informing policy.
Undertake a ”back of the envelope” calculation at the start to get a rough idea of the ballpark estimate you’re expecting from your analysis.
Tip 2: Check the data source used in your analysis
Does the analysis use the correct and most up to date data?
Check that the data used in the analysis matches the source data.
Do you understand how your source data was collected and any issues that may affect the quality of your analysis? This is particularly important when bringing in new data sources for use in your analysis that are unfamiliar.
Tip 3: Check the processes used in your analysis
Where capability exists it is easier to assure analysis by implementing Reproducible Analytical Pipelines (RAPs). Support on implementing RAP is available from the Best Practice and Impact (BPI) division and RAP champions network.
For Excel data – always check that formulas are feeding through to the outputs correctly.
If applicable, carefully check through any code used to produce the outputs of the analysis.
If time allows, repeat the analysis from scratch in an alternative way (or using a different software package) and check you get the same output.
Check units (thousands, millions, billions) are correct for figures used throughout the analysis.
Tip 4: Sense check the outputs of your analysis
Check that the figures look right and that they fall into sensible ranges. If you can, check against similar published numbers and your pre-analysis estimate to ensure you’re in the right ballpark. If the results don’t fit with your expectations, can you explain why?
Compare figures from your analysis with any similar earlier analysis.
Tip 5: Always get sign-off and use peer review
All ad-hoc analysis should be peer reviewed by someone else in the team. This peer review should challenge how the analysis works and check that any assumptions used are realistic. The peer reviewer may also re-produce the analysis in a different way (e.g. using different software) to check they get the same output. This should help to avoid errors in the analysis.
All ad-hoc analysis should be signed-off by the head of the team.
Tip 6: Communicate the results effectively to users
For any commentary you provide, check that it is consistent with the figures from the analysis.
Proof-read and spell check commentary to maintain the credibility of the analysis.
For any commentary, have you made it clear how to interpret the results and what limitations there are in the data? Explain the assumptions behind your analysis. Include uncertainty measures like confidence intervals and phrasing such as “estimates” in the commentary to help users understand uncertainty. This is particularly important for users to understand how much confidence they should have in the analysis.
Engage with your users after you have delivered the analysis to check their understanding and to ensure they use the analysis correctly.
Make ad-hoc analyses easy to find, for example, the Ministry of Housing, Communities and Local Government have created a webpage to publish all of their ad-hoc analyses during the coronavirus (COVID-19) pandemic.
Tip 7: Document your analysis and the checks undertaken
Clearly record your assumptions and data sources.
Set out the purpose of the analysis clearly and concisely.
Record the name of the person responsible for the analysis and when it was done.
Use a system for version control that makes it clear what the difference between versions is and which version is the latest. If you can, use reproducible analysis methods with documentation and version control in GitLab to ensure that others can repeat the process and follow the steps that you took.
Share your code and methods with others e.g. through GitLab.
Records of the checks that have been undertaken should be documented in a QA checklist.
Tip 8: Things to avoid
When working with Excel, the following practices increase the risk of errors in your analysis and will make it much harder for others to understand and repeat the steps you took.
Do not link your spreadsheet to other Excel spreadsheets: linking to another spreadsheet introduces dependencies that may be outside your control. If the linked workbook is moved, inaccessible to others because of security restrictions or the linked data are altered or deleted then the results of the analysis can be altered. In the worst case, the analysis may fail completely. A better solution is to copy and paste the linked data into a tab within the same workbook so that everything needed for the analysis is in one place.
Do not hide rows and columns in Excel spreadsheets: hiding rows and columns makes it harder for others to understand how a spreadsheet works and can lead to unintended copying over hidden sections. It also makes it difficult for a peer reviewer to clearly understand the analysis.