SQL函数及报错问题合集
一、sql函数
1.日期函数(Hive)
注:只能识别“yyyy-mm-dd”日期格式
1.1日期减少函数: date_sub
语法: date_sub (string startdate, int days)
返回值: string
说明:返回开始日期startdate减少days天后的日期。
举例:
hive> select date_sub('2012-12-08',10) from lxw_dual;
小思考:同一份代码中,date_sub 有些stringdate需要引号,有些不需要
select s1.city_name,s1.base_sku_id,s1.link,count(s1.new_dt) `负毛利天数`
from
(select s.city_name,s.base_sku_id,s.link,date_sub(datekey2date(s.dt),s.rank) new_dt
from
(select
t.city_name,t.base_sku_id,t.link,t.dt ,row_number() over(PARTITION BY t.link order by t.dt) rank
from
(SELECT a.dt,
concat(a.city_name,a.base_sku_id ) link,
a.city_name,a.category1_name,a.category2_name,a.category3_name,a.base_sku_id,sum(a.sku_gmv) sku_gmv,sum(a.sku_init_gross_income_amt) sku_init_gross_income_amt
from mart_mall.app_cube_wide_view a
-- WHERE datekey2date(a.dt) between date_sub($$begindate,7) and $$begindate
where datekey2date(a.dt) BETWEEN date_sub('$$begindate',7) and date_sub('$$begindate',1)
1.2日期增加函数:date_add
语法: date_add (string startdate, int days)
返回值:string
说明:返回开始日期startdate增加days天后的日期。
注:也可以date_add (string startdate, -int days)
表示减少天数。
1.3日期格式转化函数:date2datekey/datekey2date
说明:date类型与datekey类型转换函数
select date2datekey('2012-12-29'),datekey2date(20121229)
from dim.city
limit 1;
输出结果:20121229, '2012-12-29'
2.字段去重
2.1 单字段去重(distinct)
select distinct(sku_name)from df
where dt from "20211009" between "20211011";
2.2 多字段去重
实际是按name+id 去重
在Access和SQL Server同时支持
select distinct name, id from A
或者可以使用拼接字段进行去重
--在mysql中
select distinct name,age,sex a name,age,sex from t_user
--在Oracle中
select distinct name||age||sex a name,age,sex from t_user
--在SQL Server中,必须为同种类型
select distinct name+age+sex a name,age,sex from t_user
--不同类型
select distinct concat(name,age,sex) a name,age,sex from t_user
注意:distinct需要放在开头
select id, distinct name from A; --会提示错误,因为distinct必须放在开头
3.众数计算
例取一列数据中出现频次最高的数据(类似于excel中mode函数用法)
FIRST_VALUE(xx_sku_sell_price) over (partition by dt,cpt_type,city_id,xx_sku_id order by count(1) DESC,xx_sku_sell_price DESC) xx_high_freq_price
4.case when 两种方式比较
A. 使用带有简单 CASE 表达式的 SELECT 语句
在一个SELECT语句中,一个简单的CASE表达式只允许进行相等检查;没有进行其他比较。
select skuid,
case city_name when 'BJ' then '北京'
when 'SH' then '上海'
when 'GZ' then '广州'
else '中国'
end '城市',
skuname,skuprice
from xxxx;
B. 使用带有搜索的 CASE 表达式的 SELECT 语句
在SELECT语句中,搜索CASE表达式允许基于比较值替换结果集中的值。
select skuname,skuid,
case when skuprice <=1 then '低价'
when skuprice <=5 and skuprice >1 then '中低价'
when skuprice >5 and skuprice <=15 then '中高价'
when skuprice >15 then '高价'
else '特价'
end as '价格带'
from xxxx
where yyyyy
group by 1,2,3
5.排名函数
row_number()
排名逻辑
1.二级分类为肉禽蛋、酒水饮料、粮油副食的按三级分类排名
2.二级分类为’日配’,‘休闲零食’,‘海鲜水产’,‘蔬菜’,‘水果’,‘家居厨卫’,'个护美妆’按二级分类排名
3.一级分类为加工食品的按一级分类排名
select * ,ROW_NUMBER() over(parititon by city,cat1,cat2,cat3 order by sales desc) as row_num FROM mart_mall_dev.mt_all where cat2 in("肉禽蛋","酒水饮料","粮油副食")
union
select * ,ROW_NUMBER() over(parititon by city,cat1,cat2 order by sales desc) as row_num FROM mart_mall_dev.mt_all where cat2 in ('日配','休闲零食','海鲜水产','蔬菜','水果','家居厨卫','个护美妆')
union
select * ,ROW_NUMBER() over(parititon by city,cat1 order by sales desc) as row_num FROM mart_mall_dev.mt_all where cat1="加工食品";
6.窗口函数
**窗口函数用法:
<窗口函数> OVER ( [PARTITION BY <列清单> ]
ORDER BY <排序用列清单> )
partition by 能够设定排序的对象范围,类似于group by语句,(非必须 )
可作为窗口函数
1.聚合函数(sum,avg,count,max,min)
2.专用窗口函数rank,dense_rank、row_number
窗口函数的适用范围:只能在select子句中使用**
用于排序的专用窗口函数
1、RANK()
在计算排序时,若存在相同位次,会跳过之后的位次。
例如,有3条排在第1位时,排序为:1,1,1,4······
2、DENSE_RANK()
在计算排序时,若存在相同位次,不会跳过之后的位次。
例如,有3条排在第1位时,排序为:1,1,1,2······
3、ROW_NUMBER()
这个函数赋予唯一的连续位次。
例如,有3条排在第1位时,排序为:1,2,3,4······
累计求和
sum
移动平均
avg
指定框架(汇总范围):这里使用的rows(行)和preceding(之前)两个关键字,将框架指定为截止到之前n行,因此rows 2 preceding就是将框架指定为截止到之前2行,也就是将作为汇总对象的记录限定为如下的最靠近3行
avg(***) over (order by (***) rows 2 preceding)#将前两行的数据汇总平均
avg(***) over (order by (***) rows 1 preceding and 1 following) #将本行的上下两行数据汇总平均
二、sql报错
1.重复嵌套聚合
--报错提示
SQL_SYNTAX_ERROR(USER_ERROR): From line 22, column 16 to line 22, column 55: Aggregate expressions cannot be nested
以下代码犯了两个错误:
1:计算中重用列别名;
2:不能将已聚合的列再聚合
SELECT a.dt,
case when a.city_name is null then '0_全国'
else a.city_name
end as city_name,
case when c.m_category1_name is null then '0_汇总'
else c.m_category1_name
end as category1_name,
case when c.m_category2_name is null then '0_汇总'
else c.m_category2_name
end as category2_name,
case when b.stratified_type is null then '0_汇总'
else b.stratified_type
end as stratified_type,
case when a.base_sku_id is null then '0_汇总'
else a.base_sku_id
end as base_sku_id,
-- a.poi_id,
count(distinct app_oos_sku_id) as oos_num,
count(distinct app_normal_spd_code) as high_spd_num,
if(sum(high_spd_num)<>0,sum(oos_num)*1.0/sum(high_spd_num),null) as oos_sku_rate
From topic_wms_poi_sku_oos_dt_hour a
解决方法:增加一层嵌套
select dt,city_name,category1_name,category2_name,stratified_type,base_sku_id, if(sum(high_spd_num)<>0,sum(oos_num)*1.0/sum(high_spd_num),null) as oos_sku_rate
from
(SELECT a.dt,
case when a.city_name is null then '0_全国'
else a.city_name
end as city_name,
case when c.m_category1_name is null then '0_汇总'
else c.m_category1_name
end as category1_name,
case when c.m_category2_name is null then '0_汇总'
else c.m_category2_name
end as category2_name,
case when b.stratified_type is null then '0_汇总'
else b.stratified_type
end as stratified_type,
case when a.base_sku_id is null then '0_汇总'
else a.base_sku_id
end as base_sku_id,
-- a.poi_id,
count(distinct app_oos_sku_id) as oos_num,
count(distinct app_normal_spd_code) as high_spd_num
-- if(sum(count(distinct app_normal_spd_code))<>0,sum(count(distinct app_oos_sku_id))*1.0/sum(count(distinct app_normal_spd_code)),null) as oos_sku_rate
From topic_wms_poi_sku_oos_dt_hour a