-
countByKey
-
reduce
-
fold
-
first
-
take 取前n个(位置上)
-
top
7.takeSample
7. takeOrdered
rdd = sc.parallelize([1,3,2,4,7,9,6],1)
print(rdd.takeOrdered(3)) # 1,2,3
print(rdd.takeOrdered(3,lambda x:-x)) # 9,7,6
8.foreach
8. saveAsTextFile
9. foreachPartition
rdd = sc.parallelize([1,3,2,4,7,9,6],3)
def rid10(data):
print("-------------------")
result = list()
for i in data:
result.append(i*10)
print(result)
rdd.foreachPartition(rid10)
groupByKey与reduceByKey
总结:
- partitionBy