Technical requirements
In this chapter, we will be using the Databricks Community Edition to run our code. This can be found at https://community.cloud.databricks.com.
Sign-up instructions can be found at https://databricks.com/try-databricks.
The code used in this chapter can be downloaded from https://github.com/PacktPublishing/Essential-PySpark-for-Data-Analytics/tree/main/Chapter01.
The datasets used in this chapter can be found at https://github.com/PacktPublishing/Essential-PySpark-for-Data-Analytics/tree/main/data.
The original datasets can be taken from their sources, as follows:
- Online Retail: https://archive.ics.uci.edu/ml/datasets/Online+Retail+II
- Image Data: https://archive.ics.uci.edu/ml/datasets/Rice+Leaf+Diseases
- Census Data: https://archive.ics.uci.edu/ml/datasets/Census+Income
- Country Data: https://public.opendatasoft.com/explore/dataset/countries-codes/information/