presto结合grouping、grouping set实现统计宽表

最新推荐文章于 2026-02-16 15:53:46 发布

原创最新推荐文章于 2026-02-16 15:53:46 发布 · 628 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#sql #数据库 #hive

博学谷学习记录专栏收录该内容

23 篇文章

订阅专栏

本文介绍了一个关于Hive SQL的实战案例，通过创建表、加载数据并使用Grouping Sets和Grouping函数来实现不同维度的数据分组汇总，最终得出不同渠道在各层级区域上的订单总额。

数据准备

准备一张表t_order，字段有：

province：省
city：市
area：区
channel：渠道，1为小程序，2为app
price：价格

该表表示某个区域的某个渠道的订单交易额，这里忽略时间维度。

创建表

create table demo.t_order (
    province string,
    city string,
    area string,
    channel int,
    price int
)
row format delimited fields terminated by ',';

t_order.txt

01,0101,010101,1,60
01,0101,010101,2,55
01,0101,010102,1,70
01,0101,010102,2,85
01,0101,010103,1,66
01,0101,010103,2,73
01,0102,010201,1,64
01,0102,010201,2,41
01,0102,010202,1,33
01,0102,010202,2,28
01,0102,010203,1,75
01,0102,010203,2,61
01,0103,010301,1,34
01,0103,010301,2,96
01,0103,010302,1,88
01,0103,010302,2,71
01,0103,010303,1,70
01,0103,010303,2,64

hive加载数据

load data local inpath '/export/data/t_order.txt' overwrite into table demo.t_order;

Grouping set

根据(省、市、区)，（省、市），（省）进行分组

select
    province,
    city,
    area

from demo.t_order
group by
    grouping sets (
    province,
    (province, city),
    (province, city, area)
    )

返回的数据为
在这里插入图片描述
即有多个组合，根据不同城市进行组合

Grouping

往上面的sql语句中添加Grouping操作，增加一个grouping_type字段，表示是根据某个维度进行分组

select
    province,
    city,
    area,
    case when grouping (province, city, area) = 0
            then 'area'
         when grouping (province, city) = 0
            then 'city'
         when grouping (province) = 0
            then 'province'
    end as group_type
from demo.t_order
group by
    grouping sets (
    province,
    (province, city),
    (province, city, area)
    )

得到的结果为：
在这里插入图片描述
以 grouping (province, city, area) = 0为例，当分组的数据中包含了同个区域的省、市、区时，grouping 返回的值为二进制的000，即数值为0，以此类推

根据对应的分组，返回不同渠道的订单总额

select
    province,
    city,
    area,
    case when grouping (province, city, area) = 0
            then 'area'
         when grouping (province, city) = 0
            then 'city'
         when grouping (province) = 0
            then 'province'
        end as group_type,
    case when grouping (province, city, area) = 0
            then sum(case when channel = 1 then price else 0 end)
         when grouping (province, city) = 0
            then sum(case when channel = 1 then price else 0 end)
         when grouping (province) = 0
            then sum(case when channel = 1 then price else 0 end)
        end as mp_price,
    case when grouping (province, city, area) = 0
             then sum(case when channel = 2 then price else 0 end)
         when grouping (province, city) = 0
             then sum(case when channel = 2 then price else 0 end)
         when grouping (province) = 0
             then sum(case when channel = 2 then price else 0 end)
        end as app_price
from demo.t_order
group by
    grouping sets (
        province,
        (province, city),
        (province, city, area)
    )

返回结果为：
在这里插入图片描述