Introduction to Sparklyr

Open to
Government analysts
Training category
Type of training
2 days
Data Science Campus Faculty
Data Science Campus Faculty

This course will give you an understanding of Sparklyr, which is the R interface to the distributed processing tool “Spark”. Sparklyr will help you to handle huge data sets effortlessly. It will also help you process, query and manipulate data which is beyond the reach of traditional programming languages.

The course will:

  • cover distributed processing
  • give a strong introduction to the main data structure of Sparklyr
  • teach you how to investigate data, combine it, query it and run complex transformations upon it

This is a practical course. You will write a lot of code throughout the course and there will be plenty of opportunities to practice what you are learning.

Learning outcomes

On this course you will:

  • gain confidence using Sparklyr
  • gain an understanding of distributed programming
  • learn how to import and export data
  • learn how to investigate data sets
  • learn how to manipulate data sets
  • learn how to draw conclusions from data
  • learn how to perform basic visualisation
  • gain the knowledge to handle large data sets with efficient code

How to book

Please use your Learning Hub account to enrol on this course.

If you do not have a Learning Hub account, please contact


If you have any questions about this course, please contact us at

Related courses

Introduction to Pyspark