Introduction to Pyspark

Open to
Government analysts
Training category
Analytical, Data science
Type of training
2 days
Data Science Campus Faculty
Data Science Campus Faculty

This course will give you an understanding of Pyspark, which is the Python interface to the distributed processing tool “Spark”. Pyspark will help you to handle huge data sets effortlessly. It will also help you to process, query and manipulate data which is beyond the reach of traditional programming languages.

The course will:

  • cover distributed processing
  • give a strong introduction to the main data structure of Pyspark
  • teach you how to investigate data, combine it, query it, and run complex transformations upon it

This is a practical course. You will write a lot of code throughout the course and there will be plenty of opportunities to practice what you are learning. The course will end with a pair of case studies designed to combine everything you have learnt over the course.

Who this course is for

To enrol on this course you will need to have experience with Python. You do not need to have any knowledge of Pyspark or distributed processing to take part in this course.

Learning outcomes

On this course you will:

  • gain confidence in using Pyspark
  • gain an understanding of distributed programming
  • learn to import and export data
  • learn to investigate data sets
  • learn to manipulate data sets
  • learn to draw conclusions from data
  • learn to perform basic visualisation
  • gain the knowledge to handle large data sets with efficient code

How to book

Please use your Learning Hub account to enrol on this course.

If you do not have a Learning Hub account, please contact


If you would like more information about this course, please email 

Related courses

Introduction to Python 

Introduction to R

Foundations of SQL