How to Use Spark-Shell to Execute Scala File?

Last Updated : 06 Aug, 2024

Apache Spark is a lightning-quick analytics tool that is used for cluster registering for huge data sets like BigData and Hadoop which can run programs lined up across different nodes. Users can perform a wide range of work using Spark Shell, like stacking information, controlling DataFrames and RDDs (Resilient Distributed Datasets), running changes and activities, and examining results, all continuously. This intelligent climate is especially important for prototyping, troubleshooting, and investigating information-driven arrangements, which is helpful for information researchers and designers working with large information applications. This article focuses on discussing how to use Spark Shell to execute Scala files.

What is Spark Shell?

Spark Shell is a command line tool given by Apache Spark. It's used for interactive data analysis and exploration.

It permits users to intelligently compose and execute Spark code in Scala, Python (PySpark), or R (SparkR) dialects straightforwardly in a shell tool.
Spark Shell uses the strong capacities of Spark's registering system, allowing users to handle enormous amounts of informational data effectively across a cluster of machines.

Setting up Environment

1. For setting up the environment for the Spark-Shell, setup and install required libraries for Spark-Shell on Windows.

2. Run the following command in the command prompt to check whether the environment is ready or not:

spark-shell

If this menu is displayed successfully, then it means that spark shell is ready to execute Scala files.

Writing a Scala File for Spark

Writing a Scala file is very simple, works by creating a Scala file for the spark which involves script in Scala that can use the API of Spark in order to perform distributed data processing. The Scala file (Spark.scala) can be compiled and run with spark-submit or we can also directly run it in an IDE configured for Spark development.

Following is an example of a Scala file for Spark:

Scala

// GetRevenuePerOrder.scala

package retail_db

import org.apache.spark.{SparkConf, SparkContext}

object GetRevenuePerOrder {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().
      setMaster(args(0)).
      setAppName("Get revenue per order")
    val sc = new SparkContext(conf)
    sc.setLogLevel("ERROR")

    val orderItems = sc.textFile(args(1))

sc.setLogLevel("ERROR")

val orderItems = sc.textFile(args(1))

val revenuePerOrder = orderIt
map(oi => (oi.split(",")(1).toInt, oi.split(",")(4).toInt))
reduceByKey(_ + _)

map(oi => oi._1 + "," + oi._2)

revenuePerOrder.saveAsTextFile(args(2))

This is the scala file, it is an example file just to test the spark-shell for executing scala file, in order to use spark shell to execute this scala file, we can write additional code in the terminal or command prompt which will execute this file and show us the output.

Executing Scala Files in Spark-Shell

For executing the scala file, such as the above file we will have to write the code in the command prompt, we can also write it in the notepad and paste it in the command prompt.

For getting an output from the above code, we can use following code which will execute the scala file using spark-shell:

val orderItems = sc.textFile("C:\\data\\retail_db\\order_items") val revenuePerOrder = orderItems. map(oi => (oi.split(",")(1).toInt, oi.split(",")(4).toFloat)). reduceByKey(_ + _). map(oi => oi._1 + "," + oi._2) revenuePerOrder.take(10).foreach(println)

Once we paste the above code in the scala, the complete output of the execution will be displayed:

This is how spark shell is used for executing scala files.

Conclusion

This method can be used for testing and running of spark applications using scala, it also provides us with an interactive environment for running and debugging the code, the steps mentioned in the article are also easy to understand and can be used for executing the scala file using spark-shell.