Guide to GSS statistical techniques and tools

Policy details

Metadata item Details
Publication date:15 January 2025
Owner:The Government Statistical Service (GSS) People Advisory Group
Who this is for:Statisticians and Statistical Data Scientists
Type:Guidance
Contact:GSS.Recruitment@ONS.gov.uk

Introduction

The purpose of this document is to provide examples of some of the statistical tools and techniques used by statisticians, statistical data scientists, and others involved in producing statistics in the UK Civil Service (the Government Statistical Service, GSS).

The techniques described here are a good starting point for new colleagues interested in learning additional methods, or for candidates to get an understanding of the wide breadth of methodologies that they can use to demonstrate their statistical knowledge at interview.

Statistics is a rapidly developing discipline, and there are many more techniques and skills that statisticians can use to draw insight from a dataset.

This list should not be used as a checklist for recruitment. Techniques must not be disregarded because they do not appear on this list. Candidates are not expected to have knowledge or experience of all techniques.

The statistical tools and techniques within this document are presented against the Statistical Strands of the GSG Competency Framework and should be used as a supplement to that document:

  1. acquiring data/understanding customer needs
  2. data analysis
  3. presenting and disseminating data effectively

Those developing their career in statistics or wishing to pursue a career within the GSS are expected to be aware of the advantages and disadvantages of different techniques, and to identify the most appropriate of these to meet user needs. This includes consideration of the type of data, the distribution of the data, whether the assumptions required for a particular technique have been met, as well as resources (such as software, computing power, and human resources) available.

They must demonstrate understanding of the statistical techniques that they choose, including limitations, assumptions, and relationships to other similar techniques.

Use of this guide for Government Statistical Group (GSG) interviews

This list should not be used as a checklist for recruitment. Techniques must not be disregarded because they do not appear on this list. Candidates are not expected to have knowledge or experience of all techniques.

To join the Government Statistical Group (GSG), candidates are expected to demonstrate knowledge of two statistical techniques under the Data analysis competency. At least one of these must be a statistical technique; the other could be a technology or tool or more general knowledge of best practice, such as those in the “tools and knowledge” section. If necessary, assessors will use their judgement to determine if a method meets the standards for a statistical technique.

This list can be used for all GSG interviews, especially for entry to the GSG at Statistical Officer through to Grade 6, and Fast Stream assessments.

Acquiring data/understanding customer needs

Statistical techniques involved in acquiring data/understanding customer needs include:

  • pros and cons of surveys (e.g. longitudinal and cross-sectional), censuses, administrative data, open data
  • questionnaire design (including techniques to maximise response rate)
  • data matching (e.g. exact matching, probabilistic/fuzzy matching)
  • sampling design (e.g. simple random sampling, stratified sampling, cluster sampling)
  • imputation, correction, and interpolation (e.g. Winsorisation)
  • synthetic data (e.g. for protecting confidentiality)
  • data quality, including use of metadata
  • legal issues around collecting and sharing data

Technical tools associated with acquiring data/understanding customer needs include:

  • digital-first data collection methods (e.g. web scraping, use of APIs)
  • data structures and database design and storage/presentation of that data (e.g. semantic or analytical layers, data normalisation)
  • structured versus unstructured data and their storage (e.g. SQL vs NoSQL)
  • big data tools (e.g. Hadoop, MongoDB)

Examples

  1. Department for Levelling Up, Housing and Communities: English Housing Survey (survey design)
  2. Department for Environment, Food and Rural Affairs: Annual Survey of Agriculture and Horticulture (survey design)
  3. Welsh Government: Shielded patients’ access to private outdoor space during the coronavirus (COVID-19) pandemic (data linkage)
  4. Ministry of Justice: Splink: MoJ’s open source library for probabilistic record linkage at scale (data linkage)
  5. Office for National Statistics: Item editing and imputation process for Census 2021, England and Wales (imputation)

Data analysis

Statistical techniques involved in data analysis include:

Descriptive statistics

For example central tendencies, dispersion, skewness and kurtosis, demonstrating a genuine and thorough understanding of the shape of the data and tests to confirm such as the Kolmogorov-Smirnov test).

Hypothesis testing
  • parametric tests (e.g. t-tests, chi-squared tests, Levene’s test for equal variance)
  • non-parametric equivalents (e.g. Kruskal-Wallis test)
  • Bayesian methods for updating probabilities and parameters
Regression
  • linear regression, including multiple linear regression, generalised linear models
  • logistic regression
  • ANOVA, MANOVA, ANCOVA
  • correlation
  • Bayesian regression modelling
  • multilevel/hierarchical modelling
  • penalised/regularised regression (e.g. LASSO, ridge)
Geospatial analysis
  • identifying spatial patterns among neighbouring locations (e.g. autocorrelation, geographic cluster analysis)
  • exploring relationships between multiple spatial variables (e.g. multivariate spatial analysis, geographically weighted regression)
Time series analysis
  • time series models (e.g. autocorrelation, ARIMA)
  • forecasting
  • seasonal adjustment (e.g. working day adjustments, decomposition)
  • signal processing (e.g. Fourier transform, spectral density estimation)
  • Bayesian time series methods
Unsupervised learning
  • clustering techniques (e.g. Gaussian mixture analysis, k-means)
  • association rules
  • anomaly detection
  • methods for projecting data into another vector space (e.g. singular value decomposition)
Survey methodology
  • estimators (e.g. Horvitz-Thompson, ratio estimators, variance estimators)
  • weighting (e.g. design weights, post-stratification)
  • small area estimate (e.g. design- and model- based estimators)
  • identifying and correcting for bias (e.g. non-response bias)
Text analysis/natural language processing
  • text classification
  • sentiment analysis
  • topic modelling
  • training embeddings from text
Classification
  • any statistical method that allocates an observation to a category (e.g. linear discriminant analysis, and non-linear extensions)
  • support vector machines
  • decision trees (including derivatives such as random forest, bagging, gradient-boosting)
  • kernel estimation (e.g. k-nearest neighbour)
Dimensionality reduction
  • principal component analysis
  • multi-dimensional scaling
  • t-distributed stochastic neighbour embedding (t-SNE)
Simulation
  • Monte Carlo
  • bootstrapping
  • agent-based modelling
  • Bayesian probability networks
Stochastic processes
  • Markov chains
  • queuing processes
  • Poisson processes
  • random walks
Quality and disclosure
  • statistical disclosure control (e.g. perturbation, record swapping, synthetic data generation)
  • sensitivity analysis
  • measures of uncertainty
Index numbers
  • unweighted indices (e.g. Carli, Jevons), weighted indices (e.g. Laspeyres, Paasche), superlative indices (e.g. Fisher, Tornqvist), and multilateral indices (e.g. GEKS-Tornqvist)
  • chain linking, deflators, hedonic method

As this list is not exhaustive, candidates may wish to use methods from (for example):

  • A/B testing (e.g. CUPED)
  • signal detection theory
  • extreme value theory
  • geometric morphometrics
  • meta-techniques (e.g. model stacking)
  • neural networks
  • Gaussian processes

Technical tools and knowledge associated with data analysis include:

  • standards for developing and maintaining reproducible analytical pipelines (e.g. version control, continuous integration, unit testing, dependency management)
  • choosing and applying suitable model metrics (e.g. mean square error, log-loss, information criteria) and diagnostics (e.g. residuals, leverage)
  • automation for data validation (e.g. statistical process control) and model monitoring
  • statistical programming languages (e.g. R, Python, Julia, SAS)
  • computational efficiency for large datasets or computationally expensive algorithms

Examples

  1. Department for Levelling Up, Housing and Communities: English indices of deprivation (factor analysis)
  2. Office for National Statistics: Consumer prices indices technical manual (section 3 for indices)
  3. HM Revenue and Customs / Office for National Statistics: Monthly earnings and employment estimates (time series methods)
  4. Department for Environment, Food and Rural Affairs: Farm Business Survey (survey analysis)

Presenting and disseminating data effectively

Statistical techniques associated with this competency include:

  • use of tables and charts, including understanding what chart types are most appropriate for depicting different relationships
  • accessibility standards (e.g. colour contrast, spreadsheet structure)
  • communicating uncertainty and change (e.g. visual methods, measures of uncertainty, revisions triangles)

Technical tools and knowledge associated with presenting and disseminating data effectively include:

  • dashboarding software (e.g. Tableau, R Shiny, PowerBI)
  • static visualisation tools (e.g. ggplot2, matplotlib)
  • report generation (e.g. Markdown, SSRS)
  • spreadsheets and tables (e.g. Excel, and open source equivalents)

Examples

  1. Department for Levelling Up, Housing and Communities: Visualising spatial data to support decision-making
  2. Welsh Government: Fuel poverty in Wales (interactive dashboard)

  • If you would like us to get in touch with you then please leave your contact details or email Analysis.Function@ons.gov.uk directly.
  • This field is for validation purposes and should be left unchanged.