SparkStreaming下Python报net.jpountz.lz4.LZ4BlockInputStream的解决

SparkStreaming下Python报net.jpountz.lz4.LZ4BlockInputStream的解决

这几天在测试SparkStreaming,连接Kafka一直报这个错,

Python
18/08/30 21:09:00 ERROR Utils: Uncaught exception in thread stdout writer for <span class="wp_keywordlink"><a href="http://www.168seo.cn/python" title="python">python</a></span> java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V at org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122) at org.apache.spark.serializer.SerializerManager.wrapForCompression(SerializerManager.scala:163) at org.apache.spark.serializer.SerializerManager.wrapStream(SerializerManager.scala:124) at org.apache.spark.shuffle.BlockStoreShuffleRjareader$$anonfun$2.apply(BlockStoreShuffleReader.scala:50)
1
2
3
4
5
6
7
18 / 08 / 30 21 : 09 : 00 ERROR Utils : Uncaught exception in thread stdout writer for python
java . lang . NoSuchMethodError : net . jpountz . lz4 . LZ4BlockInputStream . < init > ( Ljava / io / InputStream ; Z ) V
         at org . apache . spark . io . LZ4CompressionCodec . compressedInputStream ( CompressionCodec . scala : 122 )
         at org . apache . spark . serializer . SerializerManager . wrapForCompression ( SerializerManager . scala : 163 )
         at org . apache . spark . serializer . SerializerManager . wrapStream ( SerializerManager . scala : 124 )
         at org . apache . spark . shuffle . BlockStoreShuffleRjareader $ $ anonfun $ 2.apply ( BlockStoreShuffleReader . scala : 50 )
 

连接同事用java写进去的都OK,用client端或NiFi都抱着个错.

开始一直怀疑jar包版本的问题, 用了最新的spark-streaming-kafka-0-8-assembly_2.10-2.2.2.jar 也是出错.

最后查到这个文档,设置这个参数就可以了:

Python
.config("spark.io.compression.codec", "snappy")
1
2
. config ( "spark.io.compression.codec" , "snappy" )
 

修改了wordcount例子就可以了:

Python
from pyspark import SparkConf,SparkContext from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils if __name__ == "__main__": if len(sys.argv) != 3: print("Usage: kafka_wordcount.py <zk> <topic>", file=sys.stderr) exit(-1) #sc = SparkContext(appName="PythonStreamingKafkaWordCount") conf = SparkConf().setAppName("PythonStreamingKafkaWordCount").set('spark.io.compression.codec','snappy') sc = SparkContext(conf=conf) ssc = StreamingContext(sc, 20)
1
2
3
4
5
6
7
8
9
10
11
12
from pyspark import SparkConf , SparkContext
from pyspark . streaming import StreamingContext
from pyspark . streaming . kafka import KafkaUtils
if __name__ == "__main__" :
     if len ( sys . argv ) != 3 :
         print ( "Usage: kafka_wordcount.py <zk> <topic>" , file = sys . stderr )
         exit ( - 1 )
     #sc = SparkContext(appName="PythonStreamingKafkaWordCount")
     conf = SparkConf ( ) . setAppName ( "PythonStreamingKafkaWordCount" ) . set ( 'spark.io.compression.codec' , 'snappy' )
     sc = SparkContext ( conf = conf )
     ssc = StreamingContext ( sc , 20 )
 

调用命令:

Python
bin/spark-submit --jars /pythontest/scripts/spark-streaming-kafka-0-8-assembly_2.10-2.2.2.jar examples/src/main/<span class="wp_keywordlink"><a href="http://www.168seo.cn/python" title="python">python</a></span>/streaming/kafka_wordcount30.py testnode:2181 sengtest
1
2
bin / spark - submit -- jars    / pythontest / scripts / spark - streaming - kafka - 0 - 8 - assembly_2 . 10 - 2.2.2.jar examples / src / main / python / streaming / kafka_wordcount30 . py testnode : 2181 sengtest
 

最后终于解决了




  • zeropython 微信公众号 5868037 QQ号 5868037@qq.com QQ邮箱
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值