over 与lateral view 的hive、spark sql执行计划

本文探讨了在Hive和Spark SQL中使用`over`和`lateral view`的执行计划。分别从建表语句开始,详细解析了在Spark和Hive环境下,`over`子句的执行流程,接着同样对比分析了`lateral view`在两种SQL引擎下的执行策略。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

建表语句

create table test_over 
(
    user_id string,
    login_date string
)  COMMENT '测试函数使用,可以删除'
row format delimited
fields terminated by '\t';

over 执行计划

spark

spark-sql> explain select
         >   user_id
         >   ,login_date
         >   ,lag(login_date,1,'0001-01-01') over(partition by user_id order by login_date) prev_date
         > from test_over;
22/03/10 10:55:50 INFO [main] CodeGenerator: Code generated in 9.641436 ms
== Physical Plan ==
Window [lag(login_date#34, 1, 0001-01-01) windowspecdefinition(user_id#33, login_date#34 ASC NULLS FIRST, specifiedwindowframe(RowFrame, -1, -1)) AS prev_date#30], [user_id#33], [login_date#34 ASC NULLS FIRST]
+- *(1) Sort [user_id#33 ASC NULLS FIRST, login_date#34 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(user_id#33, 200)
      +- Scan hive default.test_over [user_id#33, login_date#34], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#33, login_date#34]
Time taken: 0.098 seconds, Fetched 1 row(s)
22/03/10 10:55:50 INFO [main] SparkSQLCLIDriver: Time taken: 0.098 seconds, Fetched 1 row(s)
spark-sql> 
         > explain 
         > select
         >   user_id
         >   ,login_date
         >   ,first_value(login_date) over(partition by user_id ) prev_date
         > from test_over;
== Physical Plan ==
Window [first(login_date#39, false) windowspecdefinition(user_id#38, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS prev_date#35], [user_id#38]
+- *(1) Sort [user_id#38 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(user_id#38, 200)
      +- Scan hive default.test_over [user_id#38, login_date#39], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#38, login_date#39]
Time taken: 0.077 seconds, Fetched 1 row(s)
22/03/10 10:57:34 INFO [main] SparkSQLCLIDriver: Time taken: 0.077 seconds, Fetched 1 row(s)
spark-sql> 
         > 
         > explain select
         >   user_id
         >   ,login_date
         >   ,max(login_date) over(partition by user_id ) prev_date
         > from test_over;
== Physical Plan ==
W
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值