数据准备
//表1
scala> val df1 = spark.createDataFrame(Seq(("aaa", 14, 1), ("bbb", 30, 2), ("ccc", 45, 3), ("bbb", 56, 4)) ).toDF("R1","R2","R3")
scala> df1.show
+---+---+---+
| R1| R2| R3|
+---+---+---+
|aaa| 14| 1|
|bbb| 30| 2|
|ccc| 45| 3|
|bbb| 56| 4|
+---+---+---+
//表2
scala> val df2 = spark.createDataFrame(Seq(("eee", 140, 1), ("fff", 300, 2), ("ccc", 450, 3), ("ggg", 560, 9)) ).toDF("R1","R4","R3")
scala> df2.show
+---+---+---+
| R1| R4| R3|
+---+---+---+
|eee|140| 1|
|fff|300| 2|
|ccc|450| 3|
|ggg|560| 9|
+---+---+---+
横向连接:两张表行数必须相等
//当连个表头相同列名时,进行横向合并可能会生成两个相同列名的列,所以要找出相同的列名
val df2_col=df2.columns
val repeat=new ArrayBuffer[String]()
for(i<-df1.columns){
if(df2_col.contains(i)){
repeat.append(i)
}
}
//查看重复的列名
scala> repeat
res6: scala.collection.mutable.ArrayBuffer[String] = ArrayBuffer(R1, R3)
//选择其中一个表,把列名重复的列提取出来形成新的dataframe,并把列名改为_+原始列名
scala> val df2_repeat=