详解窗口函数中的RANGE BETWEEN子句


RANGE BETWEEN子句,它与ROWS BETWEEN不同,RANGE BETWEEN是基于数值范围来定义窗口的。具体来说,RANGE BETWEEN会根据ORDER BY子句指定的列(必须是数值类型或可以转换为数值的类型,如日期)的值的范围来确定窗口边界。

基本语法:

SUM(column) OVER (
    PARTITION BY partition_expression
    ORDER BY order_expression [ASC|DESC]
    RANGE BETWEEN lower_bound AND upper_bound
)

RANGE BETWEEN的边界选项:

UNBOUNDED PRECEDING: 窗口从分区的第一行开始
UNBOUNDED FOLLOWING: 窗口到分区的最后一行结束
CURRENT ROW: 当前行(对于RANGE,它表示与当前行在排序表达式上值相同的所有行)
expression PRECEDING: 在排序表达式上,值比当前行小一定数量的行(该表达式必须是非负常量,并且与排序表达式类型兼容)
expression FOLLOWING: 在排序表达式上,值比当前行大一定数量的行

注:如果不指定窗口子句,则默认采用以下的窗口定义
—>若不指定 ORDER BY,默认使用分区内所有行 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING;
—>若指定了 ORDER BY,默认使用分区内第一行到当前值 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

示例:

1.假设我们有一个销售表sales,包含字段sale_date(日期类型)和amount(销售额)。我们想要计算每个销售记录前7天(包括当天)的累计销售额。

with temp_1 as (
select 
'01' as employee_id ,'2023-01-01' as sale_date ,1000.00 as amount 
union all  
select 
'02' as employee_id ,'2023-01-01' as sale_date ,500.00 as amount 
union all 
select 
'02' as employee_id ,'2023-01-02' as sale_date,800.00 as amount 
union all 
select 
'01' as employee_id ,'2023-01-03' as sale_date,500.00 as amount 
union all 
select 
'03' as employee_id ,'2023-01-03' as sale_date,500.00 as amount 
union all 
select 
'02' as employee_id,'2023-01-04' as sale_date,1200.00 as amount 
union all 
select 
'01' as employee_id,'2023-01-05' as sale_date,500.00 as amount 
union all 
select 
'03' as employee_id,'2023-01-05' as sale_date,700.00 as amount 
union all 
select 
'01' as employee_id,'2023-01-06' as sale_date,600.00 as amount 
union all 
select 
'02' as employee_id,'2023-01-07' as sale_date,720.00 as amount 
union all 
select 
'01' as employee_id,'2023-01-08' as sale_date,1500.00 as amount 
union all 
select 
'03' as employee_id,'2023-01-08' as sale_date,900.00 as amount 
union all 
select 
'01' as employee_id,'2023-01-09' as sale_date,1000.00 as amount 
)

SELECT
    sale_date,
    amount,
    SUM(amount) OVER (
        ORDER BY datediff(sale_date, '1970-01-01')
        RANGE BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS running_total,
	SUM(amount) OVER (
        ORDER BY CAST(sale_date AS TIMESTAMP) 
        RANGE BETWEEN INTERVAL 6 DAYS PRECEDING AND CURRENT ROW
    ) AS running_total_0
FROM temp_1;

2.假设我们有一个按分数排序的学生成绩表scores,我们想要计算每个学生分数上下10分范围内的平均分(包含当前行)。

sql
SELECT
    student_id,
    score,
    AVG(score) OVER (
        ORDER BY score
        RANGE BETWEEN 10 PRECEDING AND 10 FOLLOWING
    ) AS avg_score_in_range
FROM scores;

注意事项

排序表达式类型:对于RANGE BETWEEN,排序表达式必须是数值类型、日期类型或时间戳类型。如果是日期或时间戳,则边界必须使用INTERVAL来指定;如果是数值,则直接使用数字。
边界定义:RANGE窗口是基于排序表达式的值来确定的。例如,RANGE BETWEEN 10 PRECEDING意味着所有排序表达式值在[当前行值-10, 当前行值]范围内的行(对于数值类型)。
重复值处理:在同一分区内,如果有多行具有相同的排序表达式值,那么它们都会被包含在窗口中。因此,RANGE窗口的大小可能会动态变化,特别是当有重复值时。

与ROWS BETWEEN的区别

ROWS是按物理行数划分窗口,而RANGE是按逻辑值范围划分窗口。
例如,ROWS BETWEEN 2 PRECEDING AND CURRENT ROW会包括当前行及前两行(物理位置上的前两行),而RANGE BETWEEN 2 PRECEDING AND CURRENT ROW会包括所有排序值在[当前值-2, 当前值]范围内的行(可能有更多行,因为值重复或值在范围内)。

总结

RANGE BETWEEN是一种基于数值或日期范围的窗口定义方式,非常适合于基于时间序列或数值范围的累积计算。使用时需要确保排序表达式的类型与边界表达式的类型兼容(数值类型用数字,日期类型用INTERVAL)。同时,注意重复值会导致窗口行数增多。

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值