0% found this document useful (0 votes)

227 views

Hive Lab

create table deptjoin(did INT,dname STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; OK Time taken: 0.1 seconds Loading data into tables hive> load data local inpath '/home/cloudera/Desktop/empdataset.txt' into table empjoin; Loading data to table default.empjoin Table default.empjoin stats: [numFiles=1, totalSize=38] OK Time taken: 0.5 seconds hive> load data local inpath '/home/cloudera/Desktop/deptdataset.txt' into table deptjoin; Loading data to table default

Uploaded by

tuancoi

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

227 views

Hive Lab

Uploaded by

tuancoi

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 33

CLOUDERA

HIVE LAB

Table of Contents
Hive Lab Assignment..........................................................................................................................................2
Hive – Execution........................................................................................................................................................2
Scenario 1: Create a managed table and load the data from LFS...........................................................................3
Scenario 2: Create a managed table and load the data from HDFS........................................................................4
Scenario 3: Create an external table and load the data from LFS...........................................................................5
Scenario 4: Create an external table and load the data from HDFS........................................................................6
Scenario 5: Drop a managed table and check the result in HDFS...........................................................................7
Scenario 6: Drop an external table and check the data from HDFS........................................................................7
Programming in Hive Script.......................................................................................................................................8
JOINS using Hive........................................................................................................................................................9
Static Partitioning using Hive...................................................................................................................................12
Dynamic Partitioning using Hive..............................................................................................................................15
Bucketing using Hive...............................................................................................................................................18
Complex data type in Hive: Array............................................................................................................................21
Complex data type in Hive: Struct...........................................................................................................................23
Complex data type in Hive: Map.............................................................................................................................25
Hive UDF.................................................................................................................................................................26
Integration of Pig with HBase..................................................................................................................................29

1|Hive Lab Page

Hive Lab Assignment

Hive – Execution

To start the hive terminal

[cloudera@quickstart ~]$ hive

2016-10-31 23:49:51,032 WARN [main] mapreduce.TableMapReduceUtil: The hbase-prefix-tree module jar

containing PrefixTreeCodec is not present. Continuing without it.

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties

WARNING: Hive CLI is deprecated and migration to Beeline is recommended.

hive>

Hive Oriented Scenarios

Scenario 1: Create a managed table and load the data from LFS

Scenario 2: Create a managed table and load the data from HDFS

Scenario 3: Create an external table and load the data from LFS

Scenario 4: Create an external table and load the data from HDFS

Scenario 5: Drop a managed table and check the result in HDFS

Scenario 6: Drop an external table and check the data from HDFS

2|Hive Lab Page

Scenario 1: Create a managed table and load the data from LFS
Flat File Creation

Flat file:One.txt

1,sriram

2,raj

Table Creation

hive> create table sri_cust(cid int,cname string) rowformat delimited fieldsterminated by ',';

Time taken: 2.866 seconds

Loading the data from LFS

hive> load data local inpath '/home/cloudera/Desktop/one.txt'into table sri_cust;

Loading data to table default.sri_cust

Table default.sri_cust stats: [numFiles=1, totalSize=15]

Time taken: 1.185 seconds

Retrieving the data

hive> select * from sri_cust;

1 sriram

2 raj

Time taken: 0.708 seconds, Fetched: 2 row(s)

Browse the Directory / Check the result in HDFS

For managed tables the values are stored under hive Meta store

3|Hive Lab Page

Scenario 2: Create a managed table and load the data from HDFS
Flat File Creation

Flat file:One.txt

1,sriram

2,raj

Open a new Terminal

[cloudera@quickstart ~]$ hadoop fs -put /home/cloudera/Desktop/one.txt /

Table Creation

hive> create table cust_sri(cid int,cname string)rowformat delimited fieldsterminated by ',';

Time taken: 0.11 seconds

Loading data from HDFS

hive> load data inpath '/one.txt' into tablecust_sri;

Loading data to table default.cust_sri

Table default.cust_sri stats: [numFiles=1, totalSize=15]

Time taken: 0.56 seconds

Retrieving data

hive> select * from cust_sri;

1 sriram

2 raj

Time taken: 0.526 seconds, Fetched: 2 row(s)

Browse the Directory / Check the result in HDFS

4|Hive Lab Page

Scenario 3: Create an external table and load the data from LFS
Flat File Creation

Sritwo.txt

US,1,United States

CHN,2,China

Creating an external table

hive> create external table sri_ext1(cname string,cid int,des string) row format delimited fields terminated by ','

> location '/user/cloudera/result_ext';

Time taken: 0.264 seconds

Loading data from LFS for external table

hive> load data local inpath '/home/cloudera/Desktop/sritwo.txt' into table sri_ext1;

Loading data to table default.sri_ext1

Table default.sri_ext1 stats: [numFiles=1, totalSize=31]

Time taken: 0.327 seconds

Retrieving table information

hive> select * from sri_ext1;

US 1 United States

CHN 2 China

Time taken: 0.09 seconds, Fetched: 2 row(s)

Browse the Directory / Check the result in HDFS

5|Hive Lab Page

Scenario 4: Create an external table and load the data from HDFS
Creating external table

hive> create external table sri_ext2(cname string,cid int,des string) row format delimited fields terminated by ','

> location '/user/cloudera/result_ext1';

Time taken: 0.259 seconds

Loading the data from HDFS

hive> load data inpath '/sricountry.txt' into table sri_ext2;

Loading data to table default.sri_ext2

Table default.sri_ext2 stats: [numFiles=1, totalSize=31]

Time taken: 0.233 seconds

Retrieving table information

hive> select * from sri_ext2;

US 1 United States

CHN 2 China

Time taken: 0.09 seconds, Fetched: 2 row(s)

Browse the Directory / Check the result in HDFS

6|Hive Lab Page

Scenario 5: Drop a managed table and check the result in HDFS
Dropping scenario

hive> select * from cust_sri;

1 sriram

2 raj

Time taken: 1.291 seconds, Fetched: 2 row(s)

Dropping internal table

If u drop the internal table the meta data and actual data is deleted.

hive> drop table cust_sri;

Time taken: 0.729 seconds

hive> select * from sri_ext1;

US 1 United States

CHN 2 China

Time taken: 0.088 seconds, Fetched: 2 row(s)

Scenario 6: Drop an external table and check the data from HDFS

If u drop the external table the meta data is deleted and actual data is not deleted.

hive> drop table sri_ext1;

Time taken: 0.128 seconds

7|Hive Lab Page

Programming in Hive Script

Loading data file through Hive Script

Flat File Creation

Product_sri.txt

1 BigBooks 20.1 stationery

2 pens 45.6 stationery

3 Furniture 67.8 Householditems

Code:

Productscript.sql

create table product_tab(pid int,pname string,price float,des string) row formatdelimited fields terminated by '\t';

load data local inpath '/home/cloudera/Desktop/product_sri.txt' into table product_tab;

select * from product_tab;

Execution

[cloudera@quickstart ~]$ hive -f /home/cloudera/Desktop/productscript.sql

2016-11-02 03:17:43,908 WARN [main] mapreduce.TableMapReduceUtil: The hbase-prefix-tree module jar

containing PrefixTreeCodec is not present. Continuing without it.

Logging initializedusing configuration in file:/etc/hive/conf.dist/hive-log4j.properties

Time taken: 1.239 seconds

Loading data to table default.product_tab

Table default.product_tab stats: [numFiles=1, totalSize=82]

Time taken: 0.835 seconds

1 BigBooks 20.1 stationery

2 pens 45.6 stationery

3 Furniture 67.8 Householditems

Time taken: 0.657 seconds, Fetched: 3 row(s)

8|Hive Lab Page

JOINS using Hive
Flat File Creation

empdataset.txt

1RamUS

2DiyaUS

3SriramIND

4JanaIND

deptdataset.txt

1IT

2IT

3Analyst

4Admin

Invoke the hive terminal

$ hive

Table Creation

hive> create table empjoin(eid INT,ename STRING,address STRING) ROW FORMAT DELIMITED FIELDS
TERMINATED BY '\t';

Time taken: 0.848 seconds

Load the data

hive> load data local inpath '/home/cloudera/Desktop/empdataset.txt' into table empjoin;

Loading data to table default.empjoin

Table default.empjoin stats: [numFiles=1, totalSize=43]

Time taken: 0.606 seconds

Verify the loaded records

hive> select * from empjoin;

1RamUS

2DiyaUS

3SriramIND

4JanaIND

5RajCHN

Time taken: 0.451 seconds, Fetched: 4 row(s)

9|Hive Lab Page
Table Cration

hive> create table deptjoin(eid INT,dept STRING) row format delimited fields terminated by '\t';

Time taken: 0.089 seconds

Load the data

hive> load data local inpath '/home/cloudera/Desktop/deptdataset.txt' into table deptjoin;

Loading data to table default.deptjoin

Table default.deptjoin stats: [numFiles=1, totalSize=29]

Time taken: 0.243 seconds

Verify the loaded records

hive> select * from deptjoin;

1IT

2IT

3Analyst

4Admin

7HR

Time taken: 0.09 seconds, Fetched: 4 row(s)

Inner JOIN

hive> select * from empjoin JOIN deptjoin ON (empjoin.eid=deptjoin.eid);

1RamUSIT

2DiyaUS2IT

3SriramIND3Analyst

4JanaIND4Admin

Time taken: 33.448 seconds, Fetched: 4 row(s)

10 | H i v e L a b P a g e
LEFT OUTER JOIN

hive> select e.eid,ename,dept from empjoin e LEFT OUTER JOIN deptjoin d ON(e.eid=d.eid);

1RamIT

2DiyaIT

3SriramAnalyst

4JanaAdmin

5RajNULL

RIGHT OUTER JOIN

hive> select e.eid,ename,dept from empjoin e RIGHT OUTER JOIN deptjoin d ON(e.eid=d.eid);

1RamIT

2DiyaIT

3SriramAnalyst

4JanaAdmin

FULL OUTER JOIN

hive> select e.eid,ename,dept from empjoin e FULL OUTER JOIN deptjoin d ON(e.eid=d.eid);

1RamIT

2DiyaIT

3SriramAnalyst

4JanaAdmin

5RajNULL

NULLNULLHR

Time taken: 32.245 seconds, Fetched: 6 row(s)

11 | H i v e L a b P a g e
Static Partitioning using Hive

Flat File Creation

user_info.txt

satyam,kumar,89

prateek,kumar,78

diya,anand,76

ashu,singh,74

user_info1.txt

manish,kumar,76

sohail,tanvir,89

lovely,choudhary,4

Invoke the hive terminal

$hive

Table Creation

hive> create table part_user(fname varchar(20),lname varchar(20),eid int) partitioned by (country

varchar(20),state varchar(20)) row format delimited fields terminated by ','stored as textfile;

Time taken: 1.653 seconds

hive> desc part_user;

fname varchar(20)

lname varchar(20)

eid int

country varchar(20)

state varchar(20)

12 | H i v e L a b P a g e
# Partition Information

# col_name data_type comment

country varchar(20)

state varchar(20)

Time taken: 0.515 seconds, Fetched: 11 row(s)

Load the data

hive> load data local inpath '/home/cloudera/Desktop/user_info.txt' into table part_user partition
(country='US',state='FL');

Loading data to table default.part_user partition (country=US, state=FL)

Partition default.part_user{country=US, state=FL} stats: [numFiles=1, numRows=0, totalSize=61, rawDataSize=0]

Time taken: 0.874 seconds

Verify the loaded records

hive> select * from part_user;

satyam kumar 89 US FL

prateek kumar 78 US FL

diya anand 76 US FL

ashu singh 74 US FL

Time taken: 0.464 seconds, Fetched: 4 row(s)

Load the data

hive> load data local inpath '/home/cloudera/Desktop/user_info1.txt' into table part_user partition
(country='CA',state='AU');

Loading data to table default.part_user partition (country=CA, state=AU)

Partition default.part_user{country=CA, state=AU} stats: [numFiles=1, numRows=0, totalSize=52, rawDataSize=0]

Time taken: 1.302 seconds

Verify the loaded records

hive> select * from part_user;

manish kumar 76 CA AU

sohail tanvir 89 CA AU
13 | H i v e L a b P a g e
lovely choudhary 4 CA AU

satyam kumar 89 US FL

prateek kumar 78 US FL

diya anand 76 US FL

ashu singh 74 US FL

Time taken: 0.572 seconds, Fetched: 7 row(s)

Retrieving information: Verify the loaded records

hive> select * from part_user where part_user.country='US' and part_user.state='FL';

satyam kumar 89 US FL

prateek kumar 78 US FL

diya anand 76 US FL

ashu singh 74 US FL

Time taken: 1.252 seconds, Fetched: 4 row(s)

hive> select * from part_user where part_user.country='CA' and part_user.state='AU';

manish kumar 76 CA AU

sohail tanvir 89 CA AU

lovely choudhary 4 CA AU

Time taken: 0.126 seconds, Fetched: 3 row(s)

14 | H i v e L a b P a g e
Dynamic Partitioning using Hive

Table Creation

hive> create table par_user1(fname string,lname string,eid int) partitioned by

> (country string,state string) row format delimited fields terminated by ',' stored as textfile;

Time taken: 0.882 seconds

hive> create table user1(fname string,lname string,eid int,country string,state

> string) row format delimited fields terminated by ',' stored as textfile;

Time taken: 0.174 seconds

Flat File Creation

User_info2.txt

Ram,Durai,89,US,FL

Sri,Ram,56,US,FL

Raghu,Patel,45,US,FL

Prasad,Kumar,23,CA,AU

Kumar,Singh,55,CA,AU

Loading the data

hive> load data local inpath '/home/cloudera/Desktop/user_info2.txt' into table user1;

Loading data to table default.user1

Table default.user1 stats: [numFiles=1, totalSize=100]

Time taken: 0.966 seconds

15 | H i v e L a b P a g e
Setting of Parameters for dynamic partitioning

hive> set hive.exec.dynamic.partition=true;

hive> set hive.exec.dynamic.partition.mode=nonstrict;

Retrieving data from the partitioned table

hive> insert into table par_user1 partition(country, state) select fname, lname, eid, country, state from user1;

Query ID = cloudera_20160928224848_758afcb0-ab41-4cda-8763-3310e9d7f021

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1475038007099_0002, Tracking URL =

http://quickstart.cloudera:8088/proxy/application_1475038007099_0002/

Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1475038007099_0002

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2016-09-28 22:57:06,378 Stage-1 map = 0%, reduce = 0%

2016-09-28 22:57:19,580 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.46 sec

MapReduce Total cumulative CPU time: 1 seconds 460 msec

Ended Job = job_1475038007099_0002

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to: hdfs://quickstart.cloudera:8020/user/hive/warehouse/par_user1/.hive-staging_hive_2016-09-

28_22-56-51_418_4721036283115133219-1/-ext-10000

Loading data to table default.par_user1 partition (country=null, state=null)

Time taken for load dynamic partitions : 349

Loading partition {country=CA, state=AU}

Loading partition {country=US, state=FL}

Time taken for adding to write entity : 3

Partition default.par_user1{country=CA, state=AU} stats: [numFiles=1, numRows=2, totalSize=31, rawDataSize=29]

Partition default.par_user1{country=US, state=FL} stats: [numFiles=1, numRows=3, totalSize=39, rawDataSize=36]

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 1.46 sec HDFS Read: 3711 HDFS Write: 219 SUCCESS

Total MapReduce CPU Time Spent: 1 seconds 460 msec

Time taken: 31.294 seconds

16 | H i v e L a b P a g e
Retrieving information: Verify the loaded records

hive> select * from user1;

Ram Durai 89 US FL

Sri Ram 56 US FL

Raghu Patel 45 US FL

Prasad Kumar 23 CA AU

Kumar Singh 55 CA AU

Time taken: 0.474 seconds, Fetched: 5 row(s)

hive> select * from par_user1;

Prasad Kumar 23 CA AU

Kumar Singh 55 CA AU

Ram Durai 89 US FL

Sri Ram 56 US FL

Raghu Patel 45 US FL

Time taken: 0.244 seconds, Fetched: 5 row(s)

Browse the Directory / Check the result in HDFS

17 | H i v e L a b P a g e
Bucketing using Hive

Flat File Creation

empbucket_old.txt

1,Ram,34,63000,HR

2,Sriram,32,75000,IT

3,Jana,28,45000,HCLS

4,Diya,22,23000,BNFS

5,sudhir,32,10000,INS

6,raju,24,30000,MF

7,sanjay,22,14000,SE

8,ajay,34,50000,SE

9,soman,21,50000,IT

10,suresh,31,60000,ES

11,john,32,30000,IT

Copying Local File System (LFS) data into HDFS

$ hadoop fs -put /home/cloudera/Desktop/empbucket_old.txt /

Invoke hive terminal

$hive

hive> create table empbucketmain (id int,name string,age int,salary float,dept string)

>row format delimited fields terminated by ',';

Time taken: 0.784 seconds

Loading the data

hive> load data inpath '/empbucket_old.txt' into table empbucketmain;

Loading data to table default.empbucketmain

Table default.empbucketmain stats: [numFiles=1, totalSize=224]

Time taken: 0.644 seconds

18 | H i v e L a b P a g e
Table Creation

hive> create table emp_bucket (id int,name string,age int,salary float,dept string) clustered by (id) into 5 buckets

> row format delimited fields terminated by ',' stored as textfile;

Time taken: 0.095 seconds

Enforcing Bucketing

hive> set hive.enforce.bucketing=true;

Inserting table with bucket

hive> insert overwrite table emp_bucket select * from empbucketmain;

Query ID = cloudera_20160927230505_7e89b0c6-98b7-4fbc-97d9-dc876b7058b8

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 5

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

set mapreduce.job.reduces=<number>

Starting Job = job_1475038007099_0001, Tracking URL =

http://quickstart.cloudera:8088/proxy/application_1475038007099_0001/

Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1475038007099_0001

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 5

2016-09-27 23:08:59,071 Stage-1 map = 0%, reduce = 0%

2016-09-27 23:09:08,332 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.06 sec

2016-09-27 23:09:47,206 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 4.93 sec

2016-09-27 23:09:52,674 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 9.51 sec

MapReduce Total cumulative CPU time: 9 seconds 510 msec

Ended Job = job_1475038007099_0001

Loading data to table default.emp_bucket

Table default.emp_bucket stats: [numFiles=5, numRows=11, totalSize=246, rawDataSize=235]

MapReduce Jobs Launched:

19 | H i v e L a b P a g e
Stage-Stage-1: Map: 1 Reduce: 5 Cumulative CPU: 9.51 sec HDFS Read: 20297 HDFS Write: 616 SUCCESS

Total MapReduce CPU Time Spent: 9 seconds 510 msec

Time taken: 69.862 seconds

Browse the Directory / Check the result in HDFS

Note:

[Hash (columns) ] MOD [Number of buckets ]

20 | H i v e L a b P a g e
Complex data type in Hive: Array
Flat File Creation

arrayinput.txt

1,326362$3443$23432$875665$3443$43534$234$342

2,123$323$546$546$5476

3,435$345$678$122$98987

4,234$7234$65242$6272

Invoke Hive Terminal

$hive

Table Creation

hive> create table array_test1 ( id int,all_nums array<int>) row format delimited fields terminated by ','

>collection items terminated by '$' stored as textfile;

Time taken: 0.215 seconds

Load the data

hive> load data local inpath '/home/cloudera/Desktop/arrayinput.txt' into table array_test1;

Loading data to table default.array_test1

Table default.array_test1 stats: [numFiles=1, totalSize=115]

Time taken: 0.648 seconds

Retrieving information: Verify the loaded records

hive> select id,all_nums from array_test1;

1 [326362,3443,23432,875665,3443,43534,234,342]

2 [123,323,546,546,5476]

3 [435,345,678,122,98987]

4 [234,7234,65242,6272]

Time taken: 0.144 seconds, Fetched: 4 row(s)

21 | H i v e L a b P a g e
hive> select id,all_nums[1] from array_test1;

1 3443

2 323

3 345

4 7234

Time taken: 0.108 seconds, Fetched: 4 row(s)

hive> select id,all_nums[4] from array_test1;

1 3443

2 5476

3 98987

4 NULL

Time taken: 0.086 seconds, Fetched: 4 row(s)

22 | H i v e L a b P a g e
Complex data type in Hive: Struct

Flat File Creation

Weather.txt

1,32$65$moderate

2,37$78$humid

3,43$55$hot

4,23$45$cold

Table Creation

hive> create table struct_test ( id int,weather_reading struct<temp:int,humidity:int,comment:string>)

>row format delimited fields terminated by ',' collection items terminated by '$' stored as textfile;

Time taken: 0.232 seconds

Load the data

hive> load data local inpath '/home/cloudera/Desktop/weather.txt' into table struct_test;

Loading data to table default.struct_test

Table default.struct_test stats: [numFiles=1, totalSize=56]

Time taken: 0.424 seconds

Verify data

hive> select id,weather_reading from struct_test;

1 {"temp":32,"humidity":65,"comment":"moderate"}

2 {"temp":37,"humidity":78,"comment":"humid"}

3 {"temp":43,"humidity":55,"comment":"hot"}

4 {"temp":23,"humidity":45,"comment":"cold"}

Time taken: 0.092 seconds, Fetched: 4 row(s)

23 | H i v e L a b P a g e
hive> select id,weather_reading.temp from struct_test;

1 32

2 37

3 43

4 23

Time taken: 0.087 seconds, Fetched: 4 row(s)

hive> select id,weather_reading.humidity from struct_test;

1 65

2 78

3 55

4 45

Time taken: 0.097 seconds, Fetched: 4 row(s)

hive> select id,weather_reading.comment from struct_test;

1 moderate

2 humid

3 hot

4 cold

Time taken: 0.127 seconds, Fetched: 4 row(s)

24 | H i v e L a b P a g e
Complex data type in Hive: Map

Flat File Creation

Comments.txt

1,1@india is great#2@india won icc t20#3@jai hind

2,1@we are awesome#2@i like cricket

3,1@hurray we won#2@what a great match#3@watching cricket all day

4,1@hectic day#3@irctc rocks

Table Creation

hive> create table map_test (id int,comments_map Map<int,string>)

> row format delimited fields terminated by ',' collection items terminated by '#'

> map keys terminated by '@' stored as textfile;

Time taken: 0.136 seconds

Load the data

hive> load data local inpath '/home/cloudera/Desktop/comments.txt' into table map_test;

Loading data to table default.map_test

Table default.map_test stats: [numFiles=1, totalSize=181]

Time taken: 0.367 seconds

Verify data

hive> select id,comments_map from map_test;

1 {1:"india is great",2:"india won icc t20",3:"jai hind"}

2 {1:"we are awesome",2:"i like cricket"}

3 {1:"hurray we won",2:"what a great match",3:"watching cricket all day"}

4 {1:"hectic day",3:"irctc rocks"}

Time taken: 0.075 seconds, Fetched: 4 row(s)

25 | H i v e L a b P a g e
hive> select id,comments_map[1] from map_test;

1 india is great

2 we are awesome

3 hurray we won

4 hectic day

Time taken: 0.087 seconds, Fetched: 4 row(s)

hive> select id,comments_map[2] from map_test;

1 india won icc t20

2 i like cricket

3 what a great match

4 NULL

Time taken: 0.087 seconds, Fetched: 4 row(s)

26 | H i v e L a b P a g e
Hive UDF

JARS Used

Add the following jars in the build path.

/usr/lib/hadoop/all hadoop jars

And /usr/lib/hive/hive-exec.jar

Java Code

package com;

import org.apache.hadoop.hive.ql.exec.UDF;

import org.apache.hadoop.io.Text;

public class ReplaceCase extends UDF

private Text result=new Text();

public Text evaluate(String str,String str1,String str2)

String rep=str.replace(str1,str2);

result.set(rep);

return result;

Export JAR file

27 | H i v e L a b P a g e
Hive Code

$ hive

hive> add jar /home/cloudera/Desktop/replaceudf.jar;

Added [/home/cloudera/Desktop/replaceudf.jar] to class path

Added resources: [/home/cloudera/Desktop/replaceudf.jar]

Table Creation

hive> create table customer (fname STRING,lname STRING) row format delimited fields terminated by '\t'

28 | H i v e L a b P a g e
> stored as textfile;

Time taken: 0.981 seconds

Load the data

hive> load data local inpath '/home/cloudera/Desktop/names.txt' into table customer;

Loading data to table default.customer

Table default.customer stats: [numFiles=1, totalSize=8]

Time taken: 1.014 seconds

Creating function

hive> create temporary function replaceword as 'com.ReplaceCase';

Time taken: 0.026 seconds

Flat File Creation

Names.txt

sri ram

Retrieving table

hive> select replaceword(fname,"sri","raj") from customer;

raj

Time taken: 0.619 seconds, Fetched: 1 row(s)

Integration of Pig with HBase

Flat File Creation

cust_info.txt

1,Sriram,IT

2,Ram,LT

3,Jana,RCT

29 | H i v e L a b P a g e
Copying file from LFS to HDFS

hadoop fs -put /home/cloudera/Desktop/cust_info.txt /

Open a hbase terminal and type the following

[cloudera@quickstart ~]$ hbase shell

2016-10-05 02:44:55,145 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use

io.native.lib.available

HBase Shell; enter 'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" to leave the HBase Shell

Version 1.2.0-cdh5.7.0, rUnknown, Wed Mar 23 11:39:14 PDT 2016

Table Creation

hbase(main):001:0> create 'cust_table','cust_data'

0 row(s) in 1.7130 seconds

=> Hbase::Table - cust_table

Table Scan

hbase(main):002:0> scan 'cust_table'

ROW COLUMN+CELL

0 row(s) in 0.3630 seconds

30 | H i v e L a b P a g e
Open another terminal type the following commands

[cloudera@quickstart ~]$ export PIG_CLASSATH=/home/hadoop/HADOOP/hbase-0.98.4-hadoop2/lib/hbase-

server-0.98.4-hadoop2:/home/hadoop/HADOOP/hbase-0.98.4-hadoop2/lib/hbase-*.jar

[cloudera@quickstart ~]$ pig

grunt> rawd = LOAD '/cust_info.txt' using PigStorage(',') as

(cust_id:int,cust_name:chararray,cust_sector:chararray);

grunt> STORE rawd INTO 'hbase://cust_table' USING

org.apache.pig.backend.hadoop.hbase.HBaseStorage('cust_data:cust_id,cust_data:cust_name,cust_data:cust_sec
tor');

2016-10-05 02:49:30,917 [main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete

2016-10-05 02:49:55,369 [main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete

2016-10-05 02:50:01,551 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is

completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server

2016-10-05 02:50:02,523 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is

deprecated. Instead, use mapreduce.job.reduces

2016-10-05 02:50:02,614 [main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete

2016-10-05 02:50:02,619 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

HadoopVersion PigVersion UserId StartedAt FinishedAt Features

2.6.0-cdh5.7.0 0.12.0-cdh5.7.0 cloudera 2016-10-05 02:49:23 2016-10-05 02:50:02 UNKNOWN

Success!

Job Stats (time in seconds):

JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime

MaxReduceTime MinReduceTime AvgReduceTimeMedianReducetime Alias Feature
Outputs

job_1475648951424_0002 1 0 8 8 8 8 n/a n/a n/a n/a

rawd MAP_ONLY hbase://cust_table,

31 | H i v e L a b P a g e
Input(s)

Successfully read 3 records (396 bytes) from: "/cust_info.txt"

Output(s)

Successfully stored 3 records in: "hbase://cust_table"

Counters:

Total records written : 3

Total bytes written : 0

Spillable Memory Manager spill count : 0

Total bags proactively spilled: 0

Total records proactively spilled: 0

Job DAG:

job_1475648951424_0002

2016-10-05 02:50:02,709 [main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!

Check the result in hbase terminal

hbase(main):003:0> scan 'cust_table'

ROW COLUMN+CELL

1 column=cust_data:cust_id, timestamp=1475660994141, value=Sriram

1 column=cust_data:cust_name, timestamp=1475660994141, value=IT

2 column=cust_data:cust_id, timestamp=1475660994151, value=Ram

2 column=cust_data:cust_name, timestamp=1475660994151, value =LT

3 column=cust_data:cust_id, timestamp=1475660994151, value=Jana

3 column=cust_data:cust_name, timestamp=1475660994151, value =RCT

3 row(s) in 0.1130 seconds

32 | H i v e L a b P a g e

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Configuring Data Guard in Heterogeneous
0% (1)
Configuring Data Guard in Heterogeneous
8 pages
2.1 Informix HighAvailability and Scalability
No ratings yet
2.1 Informix HighAvailability and Scalability
102 pages
Table Control Using Wizard in Module Pool Programming
83% (6)
Table Control Using Wizard in Module Pool Programming
90 pages
SAP Delete CO Transaction Data
No ratings yet
SAP Delete CO Transaction Data
5 pages
Experiment 3: Hive: Aim: To Understand Data Processing Tool - Hive and HQL (Hive Query Language)
No ratings yet
Experiment 3: Hive: Aim: To Understand Data Processing Tool - Hive and HQL (Hive Query Language)
11 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Apache Hive
No ratings yet
Apache Hive
3 pages
Talend Open Studio For Data Integration: User Guide
No ratings yet
Talend Open Studio For Data Integration: User Guide
452 pages
12c ASM To Non Asm
No ratings yet
12c ASM To Non Asm
13 pages
Twitter Sentimental Analysis
No ratings yet
Twitter Sentimental Analysis
42 pages
HDFS Commands
No ratings yet
HDFS Commands
15 pages
1 Hdfs Notes
No ratings yet
1 Hdfs Notes
38 pages
Spark A To Z
No ratings yet
Spark A To Z
63 pages
Midhun BIGDATA Curicullum
No ratings yet
Midhun BIGDATA Curicullum
17 pages
7 Hive Notes
No ratings yet
7 Hive Notes
36 pages
BD - Spark - Baladasu A - SightSpectrum
No ratings yet
BD - Spark - Baladasu A - SightSpectrum
3 pages
Hadoop - Hive
No ratings yet
Hadoop - Hive
190 pages
Lab9 Troubleshooting Snowpipe AWS
No ratings yet
Lab9 Troubleshooting Snowpipe AWS
2 pages
Create An Spark Streaming App: 1. Architecture and Abstraction
No ratings yet
Create An Spark Streaming App: 1. Architecture and Abstraction
8 pages
Sqoop User Guide
No ratings yet
Sqoop User Guide
58 pages
Sqoop Export and Import Commands
No ratings yet
Sqoop Export and Import Commands
5 pages
SS1123 - D2T - Apache Cassandra Overview PDF
100% (1)
SS1123 - D2T - Apache Cassandra Overview PDF
45 pages
SQL Server 2016 Step by Step - Creating AlwaysOn Availability Group - TechNet Articles - United States (English) - TechNet Wiki
No ratings yet
SQL Server 2016 Step by Step - Creating AlwaysOn Availability Group - TechNet Articles - United States (English) - TechNet Wiki
1 page
Cloudera Administration Study Guide
No ratings yet
Cloudera Administration Study Guide
3 pages
Cloudera Quickstart PDF
No ratings yet
Cloudera Quickstart PDF
28 pages
EDB Postgres Advanced Server Installation Guide Linux v11
No ratings yet
EDB Postgres Advanced Server Installation Guide Linux v11
49 pages
Apache Hive
No ratings yet
Apache Hive
77 pages
Spark Training in Bangalore
No ratings yet
Spark Training in Bangalore
36 pages
Pig and Pig Latin
No ratings yet
Pig and Pig Latin
16 pages
Cloudera Lab Preparation
No ratings yet
Cloudera Lab Preparation
3 pages
Hadoop Hdfs Commands
No ratings yet
Hadoop Hdfs Commands
5 pages
Module 7: Data Management Backup, DR, Test/Dev Environments
No ratings yet
Module 7: Data Management Backup, DR, Test/Dev Environments
9 pages
Hadoop Big Data Administration
No ratings yet
Hadoop Big Data Administration
6 pages
HDFS Commands
No ratings yet
HDFS Commands
6 pages
Unit-V Pig Programming
No ratings yet
Unit-V Pig Programming
123 pages
Learning Apache Spark With Python
No ratings yet
Learning Apache Spark With Python
10 pages
MongoDB University - PreExamDBA
No ratings yet
MongoDB University - PreExamDBA
64 pages
Hadoop Distributed File System (HDFS) : Suresh Pathipati
No ratings yet
Hadoop Distributed File System (HDFS) : Suresh Pathipati
43 pages
Apache Hive DDL DML, Queries
100% (1)
Apache Hive DDL DML, Queries
4 pages
Interview
No ratings yet
Interview
86 pages
BK Hdfs Administration
No ratings yet
BK Hdfs Administration
73 pages
Must Know Pyspark Coding Before Databricks Interview
No ratings yet
Must Know Pyspark Coding Before Databricks Interview
7 pages
Snowflake Standards
No ratings yet
Snowflake Standards
2 pages
Facebook Hive POC
No ratings yet
Facebook Hive POC
18 pages
14-Lesson Cloudera Hive
No ratings yet
14-Lesson Cloudera Hive
9 pages
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
74 pages
L02 - Spark SQL For Data Processing: CBG1C04 Big Data Programming
No ratings yet
L02 - Spark SQL For Data Processing: CBG1C04 Big Data Programming
23 pages
Dimensional Modeling
No ratings yet
Dimensional Modeling
188 pages
Big Data Masters Certification Learnbay
No ratings yet
Big Data Masters Certification Learnbay
12 pages
Real Time Hadoop Interview Questions From Various Interviews
No ratings yet
Real Time Hadoop Interview Questions From Various Interviews
6 pages
Hive Cheat Sheet - Quick Reference
No ratings yet
Hive Cheat Sheet - Quick Reference
19 pages
Hadoop Realtime Issues
No ratings yet
Hadoop Realtime Issues
3 pages
Spark DataFrames Project Exercise - Jupyter Notebook
No ratings yet
Spark DataFrames Project Exercise - Jupyter Notebook
7 pages
Step Install Cloudera Manager & Setup Cloudera Cluster
No ratings yet
Step Install Cloudera Manager & Setup Cloudera Cluster
23 pages
Hive Queries
No ratings yet
Hive Queries
5 pages
Cloudera Administration
No ratings yet
Cloudera Administration
694 pages
Mining Data Streams
No ratings yet
Mining Data Streams
67 pages
Snowflake Fundamentals Anand Jha
No ratings yet
Snowflake Fundamentals Anand Jha
50 pages
Exadata Cellcli Commands
No ratings yet
Exadata Cellcli Commands
9 pages
Hadoop Unit-4
No ratings yet
Hadoop Unit-4
44 pages
Monitoring Hadoop
From Everand
Monitoring Hadoop
Gurmukh Singh
No ratings yet
Final Report SAMPLE
No ratings yet
Final Report SAMPLE
27 pages
Report Types
No ratings yet
Report Types
2 pages
Active Memory Expansion Client 030110
No ratings yet
Active Memory Expansion Client 030110
23 pages
Websense Deployment Checklist - 1
No ratings yet
Websense Deployment Checklist - 1
7 pages
Storwize V7000 Technical Sales Related Resources
No ratings yet
Storwize V7000 Technical Sales Related Resources
24 pages
How To Mirror Your Root Disk On AIX (A.k.a. Rootvg)
No ratings yet
How To Mirror Your Root Disk On AIX (A.k.a. Rootvg)
8 pages
Websense® Content Protection Suite Endpoint Agent Installation
No ratings yet
Websense® Content Protection Suite Endpoint Agent Installation
13 pages
Configuring Serial Encapsulation: WAN Connections
No ratings yet
Configuring Serial Encapsulation: WAN Connections
23 pages
Enabling The Internet Connection: WAN Connections
No ratings yet
Enabling The Internet Connection: WAN Connections
23 pages
Understanding Switch Security: Ethernet Lans
No ratings yet
Understanding Switch Security: Ethernet Lans
11 pages
Troubleshooting Switch Issues: Ethernet Lans
No ratings yet
Troubleshooting Switch Issues: Ethernet Lans
12 pages
Implementing VLSM: Medium-Sized Routed Network Construction
No ratings yet
Implementing VLSM: Medium-Sized Routed Network Construction
19 pages
Using The Cisco SDM: LAN Connections
No ratings yet
Using The Cisco SDM: LAN Connections
9 pages
Troubleshooting Eigrp
No ratings yet
Troubleshooting Eigrp
8 pages
Troubleshooting Frame Relay Wans: LAN Extension Into A WAN
No ratings yet
Troubleshooting Frame Relay Wans: LAN Extension Into A WAN
7 pages
Data Science Notes
No ratings yet
Data Science Notes
95 pages
T2 Worksheet
No ratings yet
T2 Worksheet
5 pages
Business Process Modeling
50% (2)
Business Process Modeling
69 pages
A Study On Warehousing and Physical Distrubution in Shanthi Feeds PVT LTD
No ratings yet
A Study On Warehousing and Physical Distrubution in Shanthi Feeds PVT LTD
37 pages
Component-Guide-Wps04 Clinical Psych Skills Quanskills
No ratings yet
Component-Guide-Wps04 Clinical Psych Skills Quanskills
31 pages
Ôn tập QTH
No ratings yet
Ôn tập QTH
20 pages
Using NFS To Share Files
No ratings yet
Using NFS To Share Files
3 pages
911 Pages Bits Summative
No ratings yet
911 Pages Bits Summative
911 pages
PCM Waveform Coding / Line Code
No ratings yet
PCM Waveform Coding / Line Code
21 pages
Dataset y DataTable en VISUAL STUDIO 2010 1
No ratings yet
Dataset y DataTable en VISUAL STUDIO 2010 1
19 pages
Eutanol R G IPD
No ratings yet
Eutanol R G IPD
3 pages
Diass Q4 Modules 5 9
No ratings yet
Diass Q4 Modules 5 9
16 pages
Statistics Consultant For Dissertation, Process of Dissertation Statistics Analysis
No ratings yet
Statistics Consultant For Dissertation, Process of Dissertation Statistics Analysis
2 pages
Decision Support Systems and Business Intelligence
No ratings yet
Decision Support Systems and Business Intelligence
59 pages
Tables in OBIA
No ratings yet
Tables in OBIA
5 pages
Hitachi Data Systems Software Matrix Product Line Card PDF
No ratings yet
Hitachi Data Systems Software Matrix Product Line Card PDF
12 pages
Copy of Research Proposal 2
No ratings yet
Copy of Research Proposal 2
21 pages
Definition of Terms in Research Paper
100% (1)
Definition of Terms in Research Paper
4 pages
Markus Fath, SAP HANA Product Management June 2013
No ratings yet
Markus Fath, SAP HANA Product Management June 2013
43 pages
Department of Information Technology:: VRSEC III/IV B.Tech VTH Sem DBMS - 17IT2505A
No ratings yet
Department of Information Technology:: VRSEC III/IV B.Tech VTH Sem DBMS - 17IT2505A
12 pages
Literature Review On Store Management System
100% (1)
Literature Review On Store Management System
8 pages
S7 Term 2 CS Practical Exam - RE TEST
No ratings yet
S7 Term 2 CS Practical Exam - RE TEST
1 page
Platt 1983
No ratings yet
Platt 1983
15 pages
Lab 03
No ratings yet
Lab 03
8 pages
E 23960
No ratings yet
E 23960
10 pages
Database Administration and Management
No ratings yet
Database Administration and Management
16 pages
04 User Guide-2019-004
No ratings yet
04 User Guide-2019-004
9 pages
JD - Data Engineer
No ratings yet
JD - Data Engineer
3 pages