Hive Lab
Hive Lab
HIVE LAB
Table of Contents
Hive Lab Assignment..........................................................................................................................................2
Hive – Execution........................................................................................................................................................2
Scenario 1: Create a managed table and load the data from LFS...........................................................................3
Scenario 2: Create a managed table and load the data from HDFS........................................................................4
Scenario 3: Create an external table and load the data from LFS...........................................................................5
Scenario 4: Create an external table and load the data from HDFS........................................................................6
Scenario 5: Drop a managed table and check the result in HDFS...........................................................................7
Scenario 6: Drop an external table and check the data from HDFS........................................................................7
Programming in Hive Script.......................................................................................................................................8
JOINS using Hive........................................................................................................................................................9
Static Partitioning using Hive...................................................................................................................................12
Dynamic Partitioning using Hive..............................................................................................................................15
Bucketing using Hive...............................................................................................................................................18
Complex data type in Hive: Array............................................................................................................................21
Complex data type in Hive: Struct...........................................................................................................................23
Complex data type in Hive: Map.............................................................................................................................25
Hive UDF.................................................................................................................................................................26
Integration of Pig with HBase..................................................................................................................................29
Hive – Execution
hive>
Scenario 2: Create a managed table and load the data from HDFS
Scenario 3: Create an external table and load the data from LFS
Scenario 4: Create an external table and load the data from HDFS
Scenario 6: Drop an external table and check the data from HDFS
Flat file:One.txt
1,sriram
2,raj
Table Creation
hive> create table sri_cust(cid int,cname string) rowformat delimited fieldsterminated by ',';
OK
OK
OK
1 sriram
2 raj
For managed tables the values are stored under hive Meta store
Flat file:One.txt
1,sriram
2,raj
Table Creation
OK
OK
Retrieving data
OK
1 sriram
2 raj
Sritwo.txt
US,1,United States
CHN,2,China
hive> create external table sri_ext1(cname string,cid int,des string) row format delimited fields terminated by ','
OK
OK
OK
US 1 United States
CHN 2 China
hive> create external table sri_ext2(cname string,cid int,des string) row format delimited fields terminated by ','
OK
OK
OK
US 1 United States
CHN 2 China
OK
1 sriram
2 raj
If u drop the internal table the meta data and actual data is deleted.
OK
OK
US 1 United States
CHN 2 China
Scenario 6: Drop an external table and check the data from HDFS
If u drop the external table the meta data is deleted and actual data is not deleted.
OK
Product_sri.txt
Code:
Productscript.sql
create table product_tab(pid int,pname string,price float,des string) row formatdelimited fields terminated by '\t';
Execution
OK
OK
OK
empdataset.txt
1RamUS
2DiyaUS
3SriramIND
4JanaIND
deptdataset.txt
1IT
2IT
3Analyst
4Admin
Table Creation
hive> create table empjoin(eid INT,ename STRING,address STRING) ROW FORMAT DELIMITED FIELDS
TERMINATED BY '\t';
OK
OK
OK
1RamUS
2DiyaUS
3SriramIND
4JanaIND
5RajCHN
hive> create table deptjoin(eid INT,dept STRING) row format delimited fields terminated by '\t';
OK
OK
OK
1IT
2IT
3Analyst
4Admin
7HR
Inner JOIN
OK
1RamUSIT
2DiyaUS2IT
3SriramIND3Analyst
4JanaIND4Admin
10 | H i v e L a b P a g e
LEFT OUTER JOIN
hive> select e.eid,ename,dept from empjoin e LEFT OUTER JOIN deptjoin d ON(e.eid=d.eid);
OK
1RamIT
2DiyaIT
3SriramAnalyst
4JanaAdmin
5RajNULL
hive> select e.eid,ename,dept from empjoin e RIGHT OUTER JOIN deptjoin d ON(e.eid=d.eid);
OK
1RamIT
2DiyaIT
3SriramAnalyst
4JanaAdmin
hive> select e.eid,ename,dept from empjoin e FULL OUTER JOIN deptjoin d ON(e.eid=d.eid);
OK
1RamIT
2DiyaIT
3SriramAnalyst
4JanaAdmin
5RajNULL
NULLNULLHR
11 | H i v e L a b P a g e
Static Partitioning using Hive
user_info.txt
satyam,kumar,89
prateek,kumar,78
diya,anand,76
ashu,singh,74
user_info1.txt
manish,kumar,76
sohail,tanvir,89
lovely,choudhary,4
Table Creation
OK
OK
fname varchar(20)
lname varchar(20)
eid int
country varchar(20)
state varchar(20)
12 | H i v e L a b P a g e
# Partition Information
country varchar(20)
state varchar(20)
hive> load data local inpath '/home/cloudera/Desktop/user_info.txt' into table part_user partition
(country='US',state='FL');
OK
OK
satyam kumar 89 US FL
prateek kumar 78 US FL
diya anand 76 US FL
ashu singh 74 US FL
hive> load data local inpath '/home/cloudera/Desktop/user_info1.txt' into table part_user partition
(country='CA',state='AU');
OK
OK
manish kumar 76 CA AU
sohail tanvir 89 CA AU
13 | H i v e L a b P a g e
lovely choudhary 4 CA AU
satyam kumar 89 US FL
prateek kumar 78 US FL
diya anand 76 US FL
ashu singh 74 US FL
OK
satyam kumar 89 US FL
prateek kumar 78 US FL
diya anand 76 US FL
ashu singh 74 US FL
OK
manish kumar 76 CA AU
sohail tanvir 89 CA AU
lovely choudhary 4 CA AU
14 | H i v e L a b P a g e
Dynamic Partitioning using Hive
Table Creation
> (country string,state string) row format delimited fields terminated by ',' stored as textfile;
OK
> string) row format delimited fields terminated by ',' stored as textfile;
OK
User_info2.txt
Ram,Durai,89,US,FL
Sri,Ram,56,US,FL
Raghu,Patel,45,US,FL
Prasad,Kumar,23,CA,AU
Kumar,Singh,55,CA,AU
OK
15 | H i v e L a b P a g e
Setting of Parameters for dynamic partitioning
hive> insert into table par_user1 partition(country, state) select fname, lname, eid, country, state from user1;
Query ID = cloudera_20160928224848_758afcb0-ab41-4cda-8763-3310e9d7f021
Total jobs = 3
2016-09-28 22:57:19,580 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.46 sec
Stage-Stage-1: Map: 1 Cumulative CPU: 1.46 sec HDFS Read: 3711 HDFS Write: 219 SUCCESS
OK
OK
Ram Durai 89 US FL
Sri Ram 56 US FL
Raghu Patel 45 US FL
Prasad Kumar 23 CA AU
Kumar Singh 55 CA AU
OK
Prasad Kumar 23 CA AU
Kumar Singh 55 CA AU
Ram Durai 89 US FL
Sri Ram 56 US FL
Raghu Patel 45 US FL
17 | H i v e L a b P a g e
Bucketing using Hive
empbucket_old.txt
1,Ram,34,63000,HR
2,Sriram,32,75000,IT
3,Jana,28,45000,HCLS
4,Diya,22,23000,BNFS
5,sudhir,32,10000,INS
6,raju,24,30000,MF
7,sanjay,22,14000,SE
8,ajay,34,50000,SE
9,soman,21,50000,IT
10,suresh,31,60000,ES
11,john,32,30000,IT
$hive
hive> create table empbucketmain (id int,name string,age int,salary float,dept string)
OK
OK
18 | H i v e L a b P a g e
Table Creation
hive> create table emp_bucket (id int,name string,age int,salary float,dept string) clustered by (id) into 5 buckets
OK
Enforcing Bucketing
Query ID = cloudera_20160927230505_7e89b0c6-98b7-4fbc-97d9-dc876b7058b8
Total jobs = 1
set hive.exec.reducers.bytes.per.reducer=<number>
set hive.exec.reducers.max=<number>
set mapreduce.job.reduces=<number>
2016-09-27 23:09:08,332 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.06 sec
2016-09-27 23:09:47,206 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 4.93 sec
2016-09-27 23:09:52,674 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 9.51 sec
19 | H i v e L a b P a g e
Stage-Stage-1: Map: 1 Reduce: 5 Cumulative CPU: 9.51 sec HDFS Read: 20297 HDFS Write: 616 SUCCESS
OK
Note:
20 | H i v e L a b P a g e
Complex data type in Hive: Array
Flat File Creation
arrayinput.txt
1,326362$3443$23432$875665$3443$43534$234$342
2,123$323$546$546$5476
3,435$345$678$122$98987
4,234$7234$65242$6272
$hive
Table Creation
hive> create table array_test1 ( id int,all_nums array<int>) row format delimited fields terminated by ','
OK
OK
OK
1 [326362,3443,23432,875665,3443,43534,234,342]
2 [123,323,546,546,5476]
3 [435,345,678,122,98987]
4 [234,7234,65242,6272]
21 | H i v e L a b P a g e
hive> select id,all_nums[1] from array_test1;
OK
1 3443
2 323
3 345
4 7234
OK
1 3443
2 5476
3 98987
4 NULL
22 | H i v e L a b P a g e
Complex data type in Hive: Struct
Weather.txt
1,32$65$moderate
2,37$78$humid
3,43$55$hot
4,23$45$cold
Table Creation
>row format delimited fields terminated by ',' collection items terminated by '$' stored as textfile;
OK
OK
Verify data
OK
1 {"temp":32,"humidity":65,"comment":"moderate"}
2 {"temp":37,"humidity":78,"comment":"humid"}
3 {"temp":43,"humidity":55,"comment":"hot"}
4 {"temp":23,"humidity":45,"comment":"cold"}
23 | H i v e L a b P a g e
hive> select id,weather_reading.temp from struct_test;
OK
1 32
2 37
3 43
4 23
OK
1 65
2 78
3 55
4 45
OK
1 moderate
2 humid
3 hot
4 cold
24 | H i v e L a b P a g e
Complex data type in Hive: Map
Comments.txt
Table Creation
> row format delimited fields terminated by ',' collection items terminated by '#'
OK
OK
Verify data
OK
25 | H i v e L a b P a g e
hive> select id,comments_map[1] from map_test;
OK
1 india is great
2 we are awesome
3 hurray we won
4 hectic day
OK
2 i like cricket
4 NULL
26 | H i v e L a b P a g e
Hive UDF
JARS Used
And /usr/lib/hive/hive-exec.jar
Java Code
package com;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
String rep=str.replace(str1,str2);
result.set(rep);
return result;
27 | H i v e L a b P a g e
Hive Code
$ hive
Table Creation
hive> create table customer (fname STRING,lname STRING) row format delimited fields terminated by '\t'
28 | H i v e L a b P a g e
> stored as textfile;
OK
OK
Creating function
OK
Names.txt
sri ram
Retrieving table
OK
raj
cust_info.txt
1,Sriram,IT
2,Ram,LT
3,Jana,RCT
29 | H i v e L a b P a g e
Copying file from LFS to HDFS
Table Creation
Table Scan
ROW COLUMN+CELL
30 | H i v e L a b P a g e
Open another terminal type the following commands
Success!
31 | H i v e L a b P a g e
Input(s)
Output(s)
Counters:
Job DAG:
job_1475648951424_0002
ROW COLUMN+CELL
32 | H i v e L a b P a g e