One of the tools we will be using is Apache Spark. Spark is an open source toolset for cluster computing. While we will not be using a cluster, typical usage for Spark is a larger set of machines or clusters that operate in parallel to analyze a big dataset. Installation instructions are available at https://www.dataquest.io/blog/pyspark-installation-guide.
Apache Spark
Installing Spark on macOS
Up-to-date instructions for installing Spark are available at https://medium.freecodecamp.org/installing-scala-and-apache-spark-on-mac-os-837ae57d283f. The main steps are:
- Get Homebrew from http://brew.sh. If you are doing software development on macOS, you will likely already have Homebrew.
- Install xcode-select: xcode...