Hive基础-1
Hive是Apache社区的一个顶级项目,由facebook
贡献出来,hive的出现主要是解决海量的结构化日志的统计问题,因为使用传统的Hadoop MapReduce来开发业务逻辑繁琐,让开发人员很痛苦,周期比较长.而Hive可以将SQL转换成MR作业,减少了开发人员的任务.
Hive 刚开始时是作为Hadoop项目的一个子项目的,后面才单独成为一个项目
需要注意几点的是:
- Hive是构建在Hadoop之上的数据仓库,适合处理离线数据.延时性高
- Hive是一个客户端,不是一个集群,可以将SQL提交到Hadoop集群执行
- Hive是一个类SQL的框架,查询语言是HQL
- HQL默认不支持行级别的增改删,需要手动配置
- Hive默认不支持事务,需要自己配置。
- 支持分区存储。
- Hive职责:SQL ==> MR/Spark
- Hive底层支持的引擎:MR/Spark/Tez
- SQL on Hadoop的几个框架:Spark SQL/Hive/Impala/Presto
部署架构
Hive的数据分两类:
- 元数据,描述数据的数据,通常我们用mysql来存储
- 存储数据:存在HDFS
- 默认的Hive数据存放在HDFS上:/user/hive/warehouse,如果需要改动,需要在hive-site.xml配置
hive.metastore.warehouse.dir属性
-
hive的信息是可以配置在hive-site.xml里面 ,这里设置的属性是全局的
-
session范围的属性可以使用set命令进行设置
-
set 参数; 查看当前参数的值
-
set 参数=值; 设置参数为对应的值
-
也可以使用hiveconf
hive --hiveconf k=v --hiveconf k=v
-
-
Hive中的数据在HDFS上都是以文件夹/文件的方式存储的
1. Hive安装
wget http://archive.cloudera.com/cdh5/cdh/5/hive-1.1.0-cdh5.7.0.tar.gz
tar -zxvf ./hive-1.1.0-cdh5.7.0.tar.gz
sudo ln -s /opt/hive-1.1.0-cdh5.7.0 /hive
chown -R hadoop:hadoop /hive
chmod -R 777 /hive/*
sudo vi /etc/profile
export HIVE_HOME=/hive
export PATH=$PATH:$HIVE_HOME/bin
source /etc/profile
cd /hive
cp ./hive-log4j.properties.template hive-log4j.properties
cp ./hive-env.sh.template hive-env.sh
2.配置
wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.37/mysql-connector-java-5.1.37.jar
mv ./mysql-connector-java-5.1.37.jar /hive/lib
chown -R hadoop:hadoop /hive/lib
chmod -R 664 /hive/lib/mysql*
vi ./hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.local</name>
<value>true</value>
<description>
Use false if a production metastore server is used.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop001:3307/test?createDatabaseIfNotExist=true</value>
<description>
The JDBC connection URL.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>
Driver class name for a JDBC metastore
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root@1234%</value>
</property>
</configuration>
3.启动
start-dfs.sh
start-yarn.sh
mysqld_multi start
hive
4.基本操作
-
创建数据库
标准语法 CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name [COMMENT database_comment] [LOCATION hdfs_path] [WITH DBPROPERTIES (property_name=property_value, ...)];
# 创建数据库 # 不指定location,默认路径/user/hive/warehouse/hive_create_test.db CREATE DATABASE IF NOT EXISTS hive_create_test COMMENT '测试hive' ; # 指定location 路径/db/hive/test # 可以看到hive的database就是一个一个文件夹 CREATE DATABASE IF NOT EXISTS hive_create_test2 COMMENT "测试hive' " LOCATION "/db/hive/test"; show databases; desc desc database hive_create_test2; # 删除数据库后还是有/db/hive目录 drop database hive_create_test2; use hive_create_test;
-
创建表
标准语法 CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later) [(col_name data_type [COMMENT col_comment], ... [constraint_specification])] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)] ON ((col_value, col_value, ...), (col_value, col_value, ...), ...) [STORED AS DIRECTORIES] [ [ROW FORMAT row_format] [STORED AS file_format] | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later) ] [LOCATION hdfs_path] [TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later) [AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables) CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name LIKE existing_table_or_view_name [LOCATION hdfs_path]; data_type : primitive_type | array_type | map_type | struct_type | union_type -- (Note: Available in Hive 0.7.0 and later) primitive_type : TINYINT | SMALLINT | INT | BIGINT | BOOLEAN | FLOAT | DOUBLE | DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later) | STRING | BINARY -- (Note: Available in Hive 0.8.0 and later) | TIMESTAMP -- (Note: Available in Hive 0.8.0 and later) | DECIMAL -- (Note: Available in Hive 0.11.0 and later) | DECIMAL(precision, scale) -- (Note: Available in Hive 0.13.0 and later) | DATE -- (Note: Available in Hive 0.12.0 and later) | VARCHAR -- (Note: Available in Hive 0.12.0 and later) | CHAR -- (Note: Available in Hive 0.13.0 and later) array_type : ARRAY < data_type > map_type : MAP < primitive_type, data_type > struct_type : STRUCT < col_name : data_type [COMMENT col_comment], ...> union_type : UNIONTYPE < data_type, data_type, ... > -- (Note: Available in Hive 0.7.0 and later) row_format : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] [NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later) | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)] file_format: : SEQUENCEFILE | TEXTFILE -- (Default, depending on hive.default.fileformat configuration) | RCFILE -- (Note: Available in Hive 0.6.0 and later) | ORC -- (Note: Available in Hive 0.11.0 and later) | PARQUET -- (Note: Available in Hive 0.13.0 and later) | AVRO -- (Note: Available in Hive 0.14.0 and later) | JSONFILE -- (Note: Available in Hive 4.0.0 and later) | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname constraint_specification: : [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE ] [, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE
# 创建表,设置数据记录以,分割 CREATE TABLE IF NOT EXISTS hive_hello(id int,data string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; sudo vi /opt/test.log 9,hello 29,world 77,hive #导入数据 load data local inpath '/opt/test.log' OVERWRITE INTO TABLE hive_hello # 查询数据 select * from hive_hello;
Reference