HDFSandhivecommands

The document discusses important commands for HDFS and Hive. It covers commands to interact with HDFS like ls, mkdir, copyFromLocal. It also covers concepts in Hive like creating and using databases, creating and querying tables, partitioning and bucketing tables.

Uploaded by

Sravanth

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

HDFSandhivecommands

Uploaded by

Sravanth

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Basic HDFS and HIVE

Commands
By
Dr, R. Satya Krishna Sharma
Some Important commands of HDFS
Command Description Syntax
hadoop fs -ls Lists the contents of a directory hadoop fs -ls <path>
in HDFS.
mkdir Creates a new directory in HDFS hadoop fs -mkdir <path>
copyFromLocal Copies files or directories from hadoop fs -copyFromLocal
the local file system to HDFS <local-source> <hdfs-
destination>
copyToLocal Copies files or directories from hadoop fs -copyToLocal
HDFS to the local file system <hdfs-source> <local-
destination>
rm Deletes files or directories in hadoop fs -rm <path>
HDFS
mv Moves files or directories within hadoop fs -mv <source>
HDFS <destination>
cat Displays the contents of a file in hadoop fs -cat <file-path>
HDFS
Some Important commands of HDFS
Command Description Syntax
chown Changes the owner of a file or hadoop fs -chown [-R]
directory in HDFS <owner>[:<group>]
<path>
fs -get Copies files or directories from hadoop fs -get [hdfs-
HDFS to the local file system source] [local-destination]
fs -put Copies files or directories from hadoop fs -put [local-
the local file system to HDFS source] [hdfs-destination]
chmod Changes the permissions of hadoop fs -chmod [mode]
files or directories in HDFS. [path]
copyFromLocal Appends data from a local file hadoop fs -copyFromLocal
with - to an existing file in HDFS. -appendToFile [local-
appendToFile source] [hdfs-destination]
option
 Create Database:
Syntax: CREATE DATABASE [IF NOT EXISTS] database_name;
Description: Creates a new database.
 Use Database:
Syntax: USE database_name;
Description: Sets the current database context.
 Create Table:
◦ Syntax: CREATE TABLE [IF NOT EXISTS] table_name
◦ ( column1 data_type,
◦ column2 data_type, ... )
◦ [COMMENT 'table_comment']
◦ [PARTITIONED BY (col_name data_type, ...)]
◦ [ROW FORMAT row_format]
◦ [STORED AS file_format]
◦ [LOCATION 'hdfs_path'];
 Show Databases:
Syntax: SHOW DATABASES;
Description: Lists all databases.
 Show Tables:
Syntax: SHOW TABLES;
Description: Lists all tables in the current database.
 Describe Table:
Syntax: DESCRIBE [EXTENDED] table_name;
Description: Displays the schema of a table.
 Select Query:
◦ Syntax: SELECT [ALL | DISTINCT] column1, column2, ...
FROM table_name WHERE condition;
◦ Description: Retrieves data from a table based on the specified
conditions.
 Insert Into Table:
◦ Syntax: INSERT INTO TABLE table_name [PARTITION
(partition_key = 'value', ...)] VALUES (value1, value2, ...);
◦ Description: Inserts data into a table.
 Alter Table:
◦ Syntax:cssCopy code
◦ ALTER TABLE table_name ADD COLUMNS (new_column
data_type [COMMENT 'column_comment'], ...), CHANGE
column_name new_column data_type [COMMENT
'new_column_comment'], DROP COLUMN column_name,
RENAME TO new_table_name;
Partitioning
Partitioning involves dividing a table into smaller, more manageable
parts based on one or more columns. Each partition represents a subset
of the data, and these partitions are stored separately. This is useful
when you often query data based on certain criteria, as it allows Hive to
skip irrelevant partitions during query execution.
 Example of Partitioning:
◦ Let's say you have a table named sales with the following columns: product_id, date,
amount, and region. You can partition this table by the region column.
CREATE TABLE sales_partitioned ( product_id INT, date STRING,
amount DOUBLE )
PARTITIONED BY (region STRING);
Now, when you insert data into this table, Hive will automatically create
separate directories for each region in the Hadoop Distributed File
System (HDFS).
INSERT INTO TABLE sales_partitioned PARTITION (region='North')
VALUES (1, '2024-01-01', 100.0); INSERT INTO TABLE
sales_partitioned PARTITION (region='South') VALUES (2, '2024-01-
02', 150.0);
 The data will be stored in HDFS like this:
◦ /user/hive/warehouse/sales_partitioned/region=North/ -
0001 /user/hive/warehouse/sales_partitioned/region=South/
- 0002
 When querying this table, if you filter by region, Hive
will only scan the relevant partition, leading to improved
performance.
Bucketing in Hive:

Bucketing involves dividing data within each partition into a fixed number
of buckets. This helps distribute data evenly and can be beneficial when
performing join operations, as it reduces the amount of data that needs
to be shuffled and processed.
CREATE TABLE sales_bucketed ( product_id INT, date STRING, amount
DOUBLE )
PARTITIONED BY (region STRING)
CLUSTERED BY (product_id) INTO 4 BUCKETS;
In this example, the sales_bucketed table is bucketed by the product_id column
into 4 buckets. When inserting data, Hive will distribute the data evenly across
these buckets.
INSERT INTO TABLE sales_bucketed PARTITION (region='North') VALUES (1,
'2024-01-01', 100.0);
INSERT INTO TABLE sales_bucketed PARTITION (region='South') VALUES (2,
'2024-01-02', 150.0);
/user/hive/warehouse/sales_bucketed/region=North/
- bucket_00000
- - bucket_00001
- - bucket_00002
- - bucket_00003
/user/hive/warehouse/sales_bucketed/region=South/
- bucket_00000
- - bucket_00001
- - bucket_00002
- - bucket_00003
Data Models in Hive
 Table:
◦ The basic building block in Hive is the table. Tables in Hive define the structure of the
data and how it is stored. They are similar to tables in relational databases and can be
partitioned for better performance.
 Partitioning:
◦ Hive allows you to partition data in a table based on one or more columns. This is
particularly useful when dealing with large datasets, as it helps optimize queries by
reducing the amount of data that needs to be scanned.
 Bucketing:
◦ Bucketing is another technique in Hive for organizing data. It involves dividing data
into buckets based on a hash function applied to one or more columns. Bucketing can
improve query performance by reducing the number of files that need to be read.
 External Tables:
◦ Hive supports external tables, where the data is stored outside of the Hive warehouse
directory. This is useful when you want to manage data that is generated or updated by
processes outside of Hive.
 SerDe (Serializer/Deserializer):
◦ Hive uses SerDe for processing data during loading and unloading. SerDe allows Hive
to work with various data formats like JSON, XML, Avro, etc. It defines how data is
serialized and deserialized.
 Data Modeling with HiveQL:
◦ Hive uses HiveQL, a SQL-like language, for querying data. Through HiveQL, users
can define and manipulate the data model, including creating tables, altering their
structures, and performing various transformations
Joins with Hive
 CREATE TABLE employees ( emp_id INT, emp_name
STRING, dept_id INT );
INSERT INTO employees VALUES
◦ (1, 'John', 101),
◦ (2, 'Alice', 102),
◦ (3, 'Bob', 101),
 CREATE TABLE departments ( dept_id INT, dept_name
STRING );(4, 'Charlie', 103);
 INSERT INTO departments VALUES
◦ (101, 'HR'),
◦ (102, 'Finance'),
◦ (104, 'Marketing');
 SELECT e.emp_id, e.emp_name, e.dept_id, d.dept_name FROM
employees e LEFT OUTER JOIN departments d ON e.dept_id =
d.dept_id;
 emp_id | emp_name | dept_id | dept_name |
|1| John | 101 | HR |
|2| Alice | 102 | Finance |
|3| Bob | 101 | HR |
|4| Charlie | 103 | | NULL |

Left inner join

SELECT e.emp_id, e.emp_name, e.dept_id, d.dept_name FROM
employees e INNER JOIN departments d ON e.dept_id = d.dept_id;
 emp_id | emp_name | dept_id | dept_name |
|1| John | 101 | HR |
|2| Alice | 102 | Finance |
|3| Bob | 101 | HR |
Right outer joint is the inevsre of left outer joint
SELECT e.emp_id, e.emp_name, e.dept_id, d.dept_name FROM
employees e RIGHT OUTER JOIN departments d ON e.dept_id
= d.dept_id;dept_id
| emp_id | emp_name | dept_id | dept_name |
|1| John | 101 | HR |
|3| Bob | 101 | HR |
|2| Alice 102 | Finance |
| NULL | NULL | 104 | Marketing |
 SELECT e.emp_id, e.emp_name, e.dept_id, d.dept_name FROM
employees e LEFT JOIN departments d ON e.dept_id = d.dept_id
WHERE d.dept_id IS NOT NULL;
 | emp_id | emp_name | dept_id | dept_name |
|1| John | 101 | HR |
|2| Alice | 102 | Finance |
|3| Bob | 101 | HR |

Inter Banking Screen Tracer & Black Screen Debit Confirmation
100% (5)
Inter Banking Screen Tracer & Black Screen Debit Confirmation
3 pages
Hadoop Hive
No ratings yet
Hadoop Hive
61 pages
Week-11 - 12-Hivepdf - 2023 - 11 - 10 - 12 - 47 - 43
No ratings yet
Week-11 - 12-Hivepdf - 2023 - 11 - 10 - 12 - 47 - 43
8 pages
Hive Overview
No ratings yet
Hive Overview
28 pages
Experiment 3: Hive: Aim: To Understand Data Processing Tool - Hive and HQL (Hive Query Language)
No ratings yet
Experiment 3: Hive: Aim: To Understand Data Processing Tool - Hive and HQL (Hive Query Language)
11 pages
ABP W11-W12 Big Data Analytics Lab-HIVE
No ratings yet
ABP W11-W12 Big Data Analytics Lab-HIVE
8 pages
HIVE
No ratings yet
HIVE
28 pages
Ex3-Query Processing Using Hive and Beeswax
No ratings yet
Ex3-Query Processing Using Hive and Beeswax
4 pages
Hive-Part-2
No ratings yet
Hive-Part-2
47 pages
2 SQL Hadoop Analyzing Big Data Hive m2 Intro Slides
No ratings yet
2 SQL Hadoop Analyzing Big Data Hive m2 Intro Slides
14 pages
hive
No ratings yet
hive
15 pages
6 - Big - Data Vivek
No ratings yet
6 - Big - Data Vivek
5 pages
BDA
No ratings yet
BDA
88 pages
14-Lesson Cloudera Hive
No ratings yet
14-Lesson Cloudera Hive
9 pages
Experiment No 2
No ratings yet
Experiment No 2
9 pages
Hive
No ratings yet
Hive
65 pages
Hive PPT
No ratings yet
Hive PPT
25 pages
Exp-2 Hadoop Commands
No ratings yet
Exp-2 Hadoop Commands
6 pages
Sample
No ratings yet
Sample
30 pages
Hive
No ratings yet
Hive
9 pages
Unit IV Notes
No ratings yet
Unit IV Notes
47 pages
HIVE Data Types
No ratings yet
HIVE Data Types
6 pages
2335_m4_demo2_v1_hdl_781gomlg
No ratings yet
2335_m4_demo2_v1_hdl_781gomlg
6 pages
Hive-Part-2
No ratings yet
Hive-Part-2
53 pages
BDS Notes Merged
No ratings yet
BDS Notes Merged
23 pages
Final Bda 1-8 Lab Aayush
No ratings yet
Final Bda 1-8 Lab Aayush
17 pages
Introduction to Hive
No ratings yet
Introduction to Hive
14 pages
Hive Advanced Concepts
No ratings yet
Hive Advanced Concepts
57 pages
hive table session
No ratings yet
hive table session
23 pages
Apache Hive: An Introduction
No ratings yet
Apache Hive: An Introduction
51 pages
Hiveppt
No ratings yet
Hiveppt
29 pages
PDF - HDFS Commandsdsa
No ratings yet
PDF - HDFS Commandsdsa
22 pages
Summer Training
No ratings yet
Summer Training
10 pages
Big Data Fundamentals and Platforms Assginment 3
No ratings yet
Big Data Fundamentals and Platforms Assginment 3
6 pages
Solutions
No ratings yet
Solutions
3 pages
Unit 5 Notes
100% (3)
Unit 5 Notes
66 pages
Hive
No ratings yet
Hive
23 pages
Résumé W Safi Dial Bigdata
No ratings yet
Résumé W Safi Dial Bigdata
25 pages
Big Data Lab Manual and Syllabus
No ratings yet
Big Data Lab Manual and Syllabus
71 pages
CDC With HDFS Apply
No ratings yet
CDC With HDFS Apply
10 pages
How Sqoop Works?: Relationaldatabase Servers in The Relational Database Structure
No ratings yet
How Sqoop Works?: Relationaldatabase Servers in The Relational Database Structure
7 pages
Exp 9 and 10
No ratings yet
Exp 9 and 10
7 pages
BDA Assignment I and II
No ratings yet
BDA Assignment I and II
8 pages
BDA-UNIT-IV -2020-21
100% (1)
BDA-UNIT-IV -2020-21
30 pages
Hive Crash Course: A Beginner's Guide
No ratings yet
Hive Crash Course: A Beginner's Guide
19 pages
Bda Unit 5
No ratings yet
Bda Unit 5
16 pages
Hadoop Hive
No ratings yet
Hadoop Hive
4 pages
Hive File Format
No ratings yet
Hive File Format
38 pages
Hadoop Basic Commands
No ratings yet
Hadoop Basic Commands
8 pages
Hive_Basics
No ratings yet
Hive_Basics
35 pages
Practical 3.6 Hive
No ratings yet
Practical 3.6 Hive
8 pages
Hive
No ratings yet
Hive
29 pages
TP 1 - HDFS
No ratings yet
TP 1 - HDFS
40 pages
Hive Interview
75% (4)
Hive Interview
17 pages
BDA LabManual
No ratings yet
BDA LabManual
20 pages
Materi08 Framework-Web Db-Insert
No ratings yet
Materi08 Framework-Web Db-Insert
5 pages
PHP CRUD Create
No ratings yet
PHP CRUD Create
8 pages
hive
No ratings yet
hive
49 pages
Introduction To HDFS
No ratings yet
Introduction To HDFS
20 pages
Unit-4 Pig Hive
No ratings yet
Unit-4 Pig Hive
40 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
A Two-Way Product Lift Therefore Is Simply A Lift Involving
No ratings yet
A Two-Way Product Lift Therefore Is Simply A Lift Involving
10 pages
Category Positioning
No ratings yet
Category Positioning
11 pages
Map Reduce Algorithm For Statistical Function
No ratings yet
Map Reduce Algorithm For Statistical Function
7 pages
Marketing
No ratings yet
Marketing
4 pages
Working Capital Managemnt
No ratings yet
Working Capital Managemnt
6 pages
HRM Notes
No ratings yet
HRM Notes
7 pages
CV 20
No ratings yet
CV 20
1 page
Thesis Chapter 4 Introduction
100% (2)
Thesis Chapter 4 Introduction
6 pages
l14_l15_gen4_hmm_en
No ratings yet
l14_l15_gen4_hmm_en
128 pages
Advanced Concepts in SQL
No ratings yet
Advanced Concepts in SQL
5 pages
Exterior Inspection: The Boeing 737
100% (3)
Exterior Inspection: The Boeing 737
40 pages
IRE Tipsheet - I'm Entitled To That Spreadsheet
No ratings yet
IRE Tipsheet - I'm Entitled To That Spreadsheet
65 pages
Z-Subsea Free Span & VIV
No ratings yet
Z-Subsea Free Span & VIV
2 pages
Attention Engineering and Persuasive Technology: How Social Media Controls Your Psychology Against You
No ratings yet
Attention Engineering and Persuasive Technology: How Social Media Controls Your Psychology Against You
8 pages
Radar Remote Sensing: Applications and Challenges Prashant K. Srivastava & Dileep Kumar Gupta & Tanvir Islam & Dawei Han & Rajendra Prasad 2024 scribd download
75% (4)
Radar Remote Sensing: Applications and Challenges Prashant K. Srivastava & Dileep Kumar Gupta & Tanvir Islam & Dawei Han & Rajendra Prasad 2024 scribd download
41 pages
Strategic TP Pre Final
No ratings yet
Strategic TP Pre Final
8 pages
Training Programmes: Our Motto: Performance Improvement
No ratings yet
Training Programmes: Our Motto: Performance Improvement
16 pages
MarketSmith Growth 250
No ratings yet
MarketSmith Growth 250
28 pages
5 Yellow (Diffused) 5mm
No ratings yet
5 Yellow (Diffused) 5mm
3 pages
EM-Based Design of Large-Scale Dielectric-Resonator Filters and Multiplexers by Space Mapping
No ratings yet
EM-Based Design of Large-Scale Dielectric-Resonator Filters and Multiplexers by Space Mapping
7 pages
Unceasing Fire Ministries Church E281 Ward 9 Osizweni 2952: Page 1 of 2
No ratings yet
Unceasing Fire Ministries Church E281 Ward 9 Osizweni 2952: Page 1 of 2
2 pages
Vertical High Thrust Motors: Installation, Operation and Maintenance Manual
No ratings yet
Vertical High Thrust Motors: Installation, Operation and Maintenance Manual
74 pages
Dexhand A Space Qualified Multi Fingered
No ratings yet
Dexhand A Space Qualified Multi Fingered
7 pages
LInux FIle Tree
No ratings yet
LInux FIle Tree
4 pages
Overview
No ratings yet
Overview
44 pages
Manpower Planning, Organising, Staffing, Directing and Controlling
No ratings yet
Manpower Planning, Organising, Staffing, Directing and Controlling
2 pages
Lab 8 HTML Manual
No ratings yet
Lab 8 HTML Manual
3 pages
Acidizing
No ratings yet
Acidizing
33 pages
Web Client Logs
No ratings yet
Web Client Logs
40 pages
PVT & Eos Modelling: Using Pvtsim Software
No ratings yet
PVT & Eos Modelling: Using Pvtsim Software
109 pages
Benefits of An Executive Information System
No ratings yet
Benefits of An Executive Information System
8 pages
Ids
No ratings yet
Ids
20 pages
Gamers and Gaming Context: Relationships To Critical Thinking
No ratings yet
Gamers and Gaming Context: Relationships To Critical Thinking
8 pages
Description: L Series Bolt-On Meter Head Assembly
No ratings yet
Description: L Series Bolt-On Meter Head Assembly
9 pages
greytHR Academy - DIGITAL HR
No ratings yet
greytHR Academy - DIGITAL HR
19 pages

HDFSandhivecommands

Uploaded by

HDFSandhivecommands

Uploaded by

Basic HDFS and HIVE

Left inner join

You might also like