mappartitions java_通过MapPartitionsRDD迭代 - Scala

本文分析了使用Spark MLlib库中的KNN分类模型进行预测时出现的ClassCastException错误原因及回溯路径,该错误源于数据类型不匹配。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

回溯(最近一次调用最后一次):文件“/home/yhkwon/Desktop/knn/spark-knn/python/test.py”,第33行,在预测中= model.transform(test)文件“/ usr / local / spark / python / lib / pyspark.zip / pyspark / ml / base.py“,第105行,在转换文件”/usr/local/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py“中,第281行,在_transform文件“/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py”,第1133行,在调用文件“/ usr / local / spark / python中/lib/pyspark.zip/pyspark/sql/utils.py“,第63行,在deco文件中”/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py“ ,第319行,get_return_value py4j.protocol.Py4JJavaError:调用o99.transform时发生错误 . :org.apache.spark.SparkException:由于阶段失败而中止作业:阶段13.0中的任务0失败4次,最近失败:阶段13.0中丢失任务0.3(TID 32,192.168.0.18,执行程序0):java.lang .ClassCastException:java.lang.Long无法强制转换为org.apache.spark.ml.classification.KNNClassificationModel $$ anonfun $ transform $ 1.apply(KNNClassifier.scala:183)org中的org.apache.spark.sql.Row在scala.collection.AbstractIterator.foreach的scala.collection.Iterator $ class.foreach(Iterator.scala:893)上的.apache.spark.ml.classification.KNNClassificationModel $$ anonfun $ transform $ 1.apply(KNNClassifier.scala:183) (Iterator.scala:1336)org.apache.spark.rdd.RDD $$ anonfun $ foreach $ 1 $$ anonfun $ apply $ 28.apply(RDD.scala:918)at org.apache.spark.rdd.RDD $$ anonfun $ foreach $ 1 $$ anonfun $应用$ 28.apply(RDD.scala:918)org.apache.spark.SparkContext $$ anonfun $ runJob $ 5.apply(SparkContext.scala:2062)at org.apache.spark.SparkContext在ang.apache.spark.scheduler上的$$ anonfun $ runJob $ 5.apply(SparkContext.scala:2062) . 位于org.apache.spark.exe.Tart.run(Task.scala:108)的orT.apache.spark.executor.Executor上的ResultTask.runTask(ResultTask.scala:87)$ TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748) )驱动程序堆栈跟踪:at org.apache.spark.scheduler.DAGScheduler.org $ apache $ spark $ scheduler $ DAGScheduler $$ failJobAndIndependentStages(DAGScheduler.scala:1499)at org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1 .apply(DAGScheduler.scala:1487)at org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply(DAGScheduler.scala:1486)at scala.collection.mutable.ResizableArray $ class.foreach(ResizableArray.scala :59)位于org.apache.spark.cheduler.DAGScheduler $的org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)的scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) $的onfun $ handleTaskSetFailed $ 1.apply(DAGScheduler.scala:814)at org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.apply(DAGScheduler.scala:814)at scala.Option.foreach(Option.scala:257) )org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop . onReceive(DAGScheduler.scala:1669)org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)at org.apache.spark.util.EventLoop $$ anon $ 1.run(EventLoop.scala:48) org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)at org.apache.spark.SparkContext.runJob(SparkContext.scala) :2043)org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)atg.apache.spark.SparkContext.runJob(SparkContext.scala:2087)at org.apache.spark.rdd . RDD $$ anonfun $ foreach $ 1.apply(RDD.scala:918)atg.apache.spark.rdd.RDD $$ anonfun $ foreach $ 1.apply(RDD.scala:916)at org.apache.spark.rdd . RDDOperationScope $ .withScope(RDDOperationScope.scala:151)atg.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:112)at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)在org.apache.spark.rdd.RDD.foreach(RDD.scala:916)atg.apache.spark.ml.classification.KNNClassificationModel.transform(KNNClassifier.scala:164)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) )sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at py.j at java.lang.reflect.Method.invoke(Method.java:498)at py4j .reflection.MethodInvoke.invoke(MethodInvoker.java:244)at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)py4j.Gateway.invoke(Gateway.java:280)at py4j.commands.AbstractCommand.invokeMethod( AbstractCommand.java:132)py4j.commands.CallCommand.execute(CallCommand.java:79)at py4j.GatewayConnection.run(GatewayConnection.java:214)at java.lang.Thread.run(Thread.java:748)引起by:java.lang.ClassCastException:java.lang.Long无法在org.apache.spark.ml.classification.KNNClassificationModel $$ anonfu中强制转换为org.apache.spark.sql.Row n $转换$ 1.apply(KNNClassifier.scala:183)at org.apache.spark.ml.classification.KNNClassificationModel $$ anonfun $ transform $ 1.apply(KNNClassifier.scala:183)at scala.collection.Iterator $ class.foreach (Iterator.scala:893)at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)at org.apache.spark.rdd.RDD $$ anonfun $ foreach $ 1 $$ anonfun $ apply $ 28.apply(RDD.scala :918)在org.apache.spark.rdd.RDD $$ anonfun $ foreach $ 1 $$ anonfun $ apply $ 28.apply(RDD.scala:918)at org.apache.spark.SparkContext $$ anonfun $ runJob $ 5.apply (SparkContext.scala:2062)org.apache.spark.SparkContext $$ anonfun $ runJob $ 5.apply(SparkContext.scala:2062)at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)at at org.apache.spark.scheduler.Task.run(Task.scala:108)at org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:335)at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1149)在java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624)... 1更多

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值