Pyspark Dataframe
1.对spark数据帧中的不同列求和
df = df.withColumn('sum1', sum([df[col] for col in ["A.p1","B.p1"]]))
2.选择几列的方法
color_df.select('length','color').show()
3. when操作
from pyspark.sql.functions import when
# 1.case when age=2 then 3 else 4
df.select(when(df['age'] == 2, 3).otherwise(4).alias("age")).show()
# 2.case when age=2 when age=age+1
df.select(when(df.age == 2, df.age + 1).alias("age")).show()
#case when age<2 then age+2 else age end
df.withColumn('age', when(df.age == 2, df.age + 1).otherwise(df2['age'])).show()
4.对其中大于1的值进行操作使其等于1