分区表优化查询-CSDN博客

本文链接：https://blog.csdn.net/qq_39664250/article/details/106456444

本文介绍如何通过创建分区表来优化数据库查询效率，避免全表扫描，特别讲解了内部表和外部表的分区策略，以及如何使用Hive SQL进行分区表的创建、添加分区和删除分区。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

创建分区表的好处是查询时，不用全表扫描，查询时只要指定分区，就可查询分区下面的数据。
分区表可以是内部表，也可以是外部表。

建表格式
CREATE [EXTERNAL] TABLE par_test(
col_name data_type ...)
COMMENT 'This is the par_test table'   说明性文字
PARTITIONED BY(day STRING, hour STRING)     指定分区
[ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ] 指定分隔符
[LINES TERMINATED BY  '\n'] 指定换行
[LOCATION '/user/hainiu/data/'] 指定目录位置

大写字母读起来感觉有点麻烦

建表格式
create[external] table 表名(
列名 数据类型 ...)
comment 'This is the par_test table'   说明性文字
partitioned by(分区字段1 STRING, 分区字段2 STRING)     指定分区
[row format delimited fields terminated by'\t' ] 指定分隔符
[lines terminated by '\n'] 指定换行
[location '/user/hainiu/data/'] 指定目录位置

添加单个partition 语法格式

在表中添加单个partition，相当于hdfs:'/……/表名/20141117/00'
alter table 分区表 add IF NOT EXISTS partition(分区字段='值',分区字段='值') location '指定分区在HDFS上目录结构';

添加多个partition 语法格式

在表中添加多个partition
相当于hdfs:'/……/表名/20141117/00'
相当于hdfs:'/……/表名/20141117/01'
alter table par_test add partition(day='20141117',hour='00') location '20141117/00' partition(day='20141117',hour='01') location '20141117/01';

查看表分区

show partitions tableName;

删除partition

alter table 表名 drop if exists partition(字段1='值1',字段2='值2');

hive 创建分区表