【Python笔记】Sparksql from_json

最新推荐文章于 2025-04-19 10:34:43 发布

阳光快乐普信男

最新推荐文章于 2025-04-19 10:34:43 发布

阅读量1.7k

点赞数

CC 4.0 BY-SA版权

分类专栏： Python笔记

原文链接：https://stackoverflow.com/questions/45003393/spark-from-json-structtype-and-arraytype

Python笔记专栏收录该内容

52 篇文章

订阅专栏

本文指导如何修复不完整JSON格式，使用`from_json`函数将`Spark DataFrame`转换。作者分享了缺失的{}

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

I am trying to use from_json() to convert the JSON to a DataFrame.

import org.apache.spark.sql.functions._

val schemaExample2 = new StructType()
                              .add("", ArrayType(new StructType()
                                                          .add("FirstName", StringType)
                                                          .add("Surname", StringType)
                                                )
                                  )

val dfExample2= spark.sql("""select "[{ \"FirstName\":\"Johnny\", \"Surname\":\"Boy\" }, { \"FirstName\":\"Franky\", \"Surname\":\"Man\" }" as theJson""")

val dfICanWorkWith = dfExample2.select(from_json($"theJson", schemaExample2))

dfICanWorkWith.collect()

// Result \\
res22: Array[org.apache.spark.sql.Row] = Array([null])

The problem is that you don’t have a fully qualified json. Your json is missing a couple of things:

First you are missing the surrounding {} in which the json is done
Second you are missing the variable value (you set it as "" but did not add it)
Lastly you are missing the closing ]

Try replacing it with:

val dfExample2= spark.sql("""
select "{\"\":[{ \"FirstName\":\"Johnny\", \"Surname\":\"Boy\" }, { \"FirstName\":\"Franky\", \"Surname\":\"Man\" }]}" as theJson
""")

and you will get:

scala> dfICanWorkWith.collect()
res12: Array[org.apache.spark.sql.Row] = Array([[WrappedArray([Johnny,Boy], [Franky,Man])]])

Reference: Spark from_json - StructType and ArrayType