Transforming energy data pipelines
For the last six months the Office for National Statistics (ONS) have been working on digital transformation projects for us in the Energy Statistics division of the department of Business Energy and Industrial Strategy (BEIS). These projects concentrated on modernising our old SAS data models into better practice Python and pipelines. This blog summarises our mutual journey, what benefits this will create, and how we learned from the things that didn’t go as planned.
The problem we were solving
The wonderful energy statisticians at BEIS have collectively inherited data processing methods using a diverse range of tools including Excel, SAS, SPSS, SQL, STATA, R, Access and VBA. There are probably other pieces of software to add to this list too. These data processing methods developed organically over many years, resulting in systems that are hard to maintain, carry risk of error, and are inefficient to run.
Enter the Energy Statistics Improvement Programme (ESIP). Through this programme the Energy Statistics division has been starting to transform our methods of data processing to align with best practice Reproducible Analytical Pipelines (RAPs) as set out in the Government Statistical Service (GSS) Quality Strategy and the new Government Analysis Function’s RAP Strategy. We have decided to concentrate on developing RAPs in Python and SQL, using Git for collaboration and version control. This is because these tools are widely known and supported, and they produce RAPs that are relatively easy to maintain.
We have a small team of data scientists to support this work. Each data scientist in the team will support an average of ten analysts, who collectively author around a hundred published deliverables a year. So, it’s fair to say we have a lot of transformation work to do!
Working with the Office for National Statistics (ONS)
The worst-kept secret of digital transformation programmes is that they are hard. The biggest difficulty was finding time to work with analysts. They are busy producing ongoing publication and policy work, so it can be difficult for them to find the time to build new data pipelines and learn how to use new digital tools.
To help us make faster progress with this project, we secured funding so that we could contract out some specific transformation projects. We approached the ONS to help with these projects.
Usefully for us, the ONS started their digital transformation journey a few years earlier. They have built up a workforce of analysts, developers and business analysts in their Economic Statistics Change team who are experienced in building the types of pipeline we need.
The BEIS and ONS senior managers shared a common vision for using RAP principles. Thanks to their good working relationships we identified that the ONS could spare some of their valuable resource to work for BEIS on a contract basis.
As the contract manager, it was an easy decision to engage the ONS rather than a private contractor. The ONS had the expertise needed for the task, I could avoid the added paperwork of using a private contractor, and it was logistically easy to get ONS staff set up on our cloud based analytical system (CBAS). It also helped that the ONS offered excellent value for money too!
The work we did
We started with high expectations of completing nine projects. Together with the ONS we did some discovery work on each of these projects, and quickly identified that each project was far more complex than we had originally thought. We realised that completing all nine projects would be impossible.
And so nine projects became three projects. We concentrated on the three projects that involved transforming unwieldly SAS programmes into new Python and SQL pipelines. We did some detailed discovery work to help us understand the business rules the data processing would need to follow. This discovery work was led by an ONS business analyst working closely with the lead statisticians who held the knowledge of the data, processes and outputs needed.
Even before the discovery work was complete, we could see that delivering all three projects would take roughly twice the resource we had. So we reprioritised a second and final time. We committed to completing discovery work on all three projects, but we agreed that we would only develop new pipelines for the two smaller projects.
One project started with the 10,000 line SAS code used to prepare statistics on Downstream Oil. This is part of BEIS’ emergency response function. The original 26-stage SAS code was dismantled, restructured, and rewritten in a more logical 10-stage Python code totalling 2,250 lines. As part of this, the ONS identified areas where methodological improvements could be made. BEIS were then able to review these recommendations.
The other project the ONS recoded was on the Renewable Heat Incentive scheme. The data cleaning for this project needed less restructuring and was migrated from 3,000 lines of SAS into 2,300 lines of Python. This new pipeline can be easily run from a single script, rather than needing execution of a sequential list of SAS programmes. The new pipeline also introduces a streamlined approach to managing fixes and amendments to data. This avoids the need to store changes in multiple places in the code. It also avoids the need for analysts to repeat the same manual steps each month.
Neither development project was ready for operational use by BEIS at the end of the contract. This was because of some unexpected issues, including a response to fuel panic buying in the autumn and then the Ukraine war. Work is ongoing to upskill staff across project teams whilst the new pipelines are run in parallel and tested. We expect to see the full benefits of these new pipelines soon.
Both pipelines were left with comprehensive process documentation to help the teams that will run and maintain them. This is an essential requirement given how often people move jobs and departments in the Civil Service!
Discovery work was completed on BEIS’ subnational energy consumption data processing pipelines, which total 15,000 lines of SAS code. This process mapping has helped the team to start developing new pipelines. The process mapping activity was also a useful way of constructively challenging the way we currently do things.
Our code strategy is to optimise for maintainability. This condition has been met by the two pipelines developed by the ONS and their thorough documentation. This will save lots of maintenance time in the long term.
By making maintenance easier, our pipelines are more likely to be kept up to date. This enables us to make continuous improvements in our statistics. The new streamlined pipelines are also faster to run than the old methods, which will save time for analysts.
Through these projects we have started RAP development for two analyst teams that would otherwise have struggled to find resource for these projects. Having external expertise working with the teams has also provided a useful learning opportunity for BEIS analysts in Python, SQL and Git.
By moving away from dependency on older programmes such as SAS we are aligning with Government’s Technology Code of Practice to use open source methods that increase transparency and flexibility. This means we are getting ready for the near future when it will become increasingly difficult to recruit analysts with experience of SAS and other older programmes.
Working together across departments has helped encourage innovation. BEIS Energy Statistics are developing a custom code framework for building and running Python and SQL RAPs. By having the ONS use these tools we have successfully demonstrated how they can be used more widely, and uncovered areas for further development.
The ONS also found the opportunity to work with BEIS very useful. They welcomed the opportunity to demonstrate that they could successfully apply the methodology of making change across government. In particular, the ONS learned that they could make better digital transformation across government boundaries by being adaptive and dynamic in their approach. The ONS also improved their understanding of the significant challenges that teams across government can face when trying to make digital transformation!
We learned that it would have been better to start the contract sooner. This would have allowed more time to complete the projects before the financial year end.
Unsurprisingly, we should have aimed to complete far fewer projects. This would have saved some time running discovery on projects that were later dropped.
We in BEIS learned the value of taking time to do discovery properly, and the value of using the professional skills of business analysts to achieve this. It is so tempting to go straight into writing code without clearly stating the goals, the user needs, and the constraints. Good discovery also provides the platform to restructure data pipelines. If pipelines have developed in fragments over many generations of analysts then they are likely to need streamlining.
We should have performed discovery using a tool that was accessible to both ONS and BEIS. This would have meant that both parties could make changes, which would have sped up our progress.
We should have used Slack to enable instant messaging after discovering that ONS and BEIS were unable to message each other over Microsoft Teams. It was great to be able to get ONS onto CBAS to work together on writing code, but we suffered from a lack of flexible and informal dialogue.
Finally, in an ideal world, we would have invested time incrementally updating these pipelines in previous years. This would have helped us avoid the need for such substantial changes!
If you would like more information about the projects in this blog article, please contact: