Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.6.0
Description
Sedona cannot handle dataframes containing 0 partitions properly when performing spatial join. For example:
schema = StructType([
StructField("id", IntegerType(), True)
])
# Create empty RDD
empty_rdd = spark.sparkContext.emptyRDD()
# Create empty DataFrame
empty_df = spark.createDataFrame(empty_rdd, schema)
df_point = spark.range(0, 10).toDF("id").withColumn('geom', expr("ST_Point(id, id)"))
df_poly = empty_df.withColumn("poly", expr("ST_Buffer(ST_Point(id, id), 2)")).drop("geom")
spark.conf.set("sedona.join.autoBroadcastJoinThreshold", "-1")
df_point.join(broadcast(df_poly), expr("ST_Intersects(poly, geom)")).count()
failed with the following error message:
Py4JJavaError: An error occurred while calling o107.showString.
: java.util.NoSuchElementException: next on empty iterator
at scala.collection.Iterator$$anon$2.next(Iterator.scala:41)
at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
at scala.collection.IterableLike.head(IterableLike.scala:109)
at scala.collection.IterableLike.head$(IterableLike.scala:108)
at scala.collection.AbstractIterable.head(Iterable.scala:56)
at org.apache.spark.sql.sedona_sql.strategy.join.SpatialIndexExec.doExecuteBroadcast(SpatialIndexExec.scala:63)
This does not only happen to broadcast join, range join also has problems:
df_point.join(df_poly, expr("ST_Intersects(poly, geom)")).count() 24/05/27 10:50:13.989 WARN RangeJoinExec: [SedonaSQL] Join dominant side partition number 8 is larger than 1/2 of the dominant side count 10 24/05/27 10:50:13.989 WARN RangeJoinExec: [SedonaSQL] Try to use follower side partition number 0 Number of partitions must be >= 0
Attachments
Issue Links
- links to