How to create an empty dataframe in Scala?
Last Updated :
29 Apr, 2024
Improve
In this article, we will learn how to create an empty dataframe in Scala. We can create an empty dataframe in Scala by using the createDataFrame method provided by the SparkSession object.
Syntax to create an empty DataFrame:
val df = spark.emptyDataFrame
Example of How to create an empty dataframe in Scala:
import org.apache.spark.sql.{SparkSession, DataFrame}
import org.apache.spark.sql.types.{StructType, StructField, StringType}
// Create SparkSession
val spark = SparkSession.builder()
.appName("EmptyDataFrameExample")
.getOrCreate()
// Define schema for the empty DataFrame
val schema = new StructType(Array(
StructField("column_name", StringType, true)
))
// Create an empty DataFrame using createDataFrame
// method with an empty RDD and the schema
val emptyDF: DataFrame = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema)
// Show the schema of the empty DataFrame
emptyDF.printSchema()
19
1
import org.apache.spark.sql.{SparkSession, DataFrame}
2
import org.apache.spark.sql.types.{StructType, StructField, StringType}
3
4
// Create SparkSession
5
val spark = SparkSession.builder()
6
.appName("EmptyDataFrameExample")
7
.getOrCreate()
8
9
// Define schema for the empty DataFrame
10
val schema = new StructType(Array(
11
StructField("column_name", StringType, true)
12
))
13
14
// Create an empty DataFrame using createDataFrame
15
// method with an empty RDD and the schema
16
val emptyDF: DataFrame = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema)
17
18
// Show the schema of the empty DataFrame
19
emptyDF.printSchema()
Output:

Explanation of the above example:
- Import necessary classes from the org.apache.spark.sql package, including SparkSession, DataFrame, StructType, StructField, and StringType.
- Create a SparkSession object named spark.
- Define a schema for the empty DataFrame. In this example, we're creating a DataFrame with a single column named "column_name" of type StringType. You can define your schema according to your requirements.
- Use the createDataFrame method of the SparkSession object (spark) to create an empty DataFrame. Pass an empty RDD of type Row and the schema you defined earlier.
- The resulting DataFrame (emptyDF) will have the schema defined earlier and no rows.
- Print the schema of the empty DataFrame using the printSchema method.