Unit-3 FBDA

Uploaded by

Kalyan Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views34 pages

Unit-3 FBDA

Uploaded by

Kalyan Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

UNIT III

Introduction to Pig: Key Features of pig, The Anatomy of Pig, Pig on Hadoop, Pig
Philosophy, Pig Latin Overview, Data Types in Pig, Running Pig, Execution Modes of Pig,
Relational Operators.
Introduction to HIVE: HIVE features, HIVE architecture, HIVE datatypes, HIVE File
Formats, HIVE Query Language.
WHAT IS PIG?
Apache Pig is a platform for data analysis. It is an alternative to MapReduce Programming. Pig
was developed as a research project at Yahoo.
Key Features of Pig
1. It provides an engine for executing data flows (how your data should flow). Pig processes data
in parallel on the Hadoop cluster.
2. It provides a language called "Pig Latin" to express data flows.
3. Pig Latin contains operators for many of the traditional data operations such as join, filter,
sort, etc.
4. It allows users to develop their own functions (User Defined Functions) for reading,
processing, and writing data.

The Anatomy of PIG

The main components of Pig are as follows
1.Data Flow language(Pig Latin)
2. Interactive shell where you can type Pig Latin statements (Grunt).
3. Pig interpreter and execution engine.

PIG on Hadoop
Pig runs on Hadoop. Pig uses both Hadoop Distributed File System and MapReduce
Programming. By default, Pig reads input files from HDFS. Pig stores the intermediate data (data
produced by MapReduce jobs) and the output in HDFS. However, Pig can also read input from
and place output to other sources.
Pig supports the following:
1. HDFS commands.
2. UNIX shell commands.
3. Relational operators.
4. Positional parameters.
5. Common mathematical functions.
6. Custom functions.
7. Complex data structures.

Pig Philosophy
Figure 10.2 describes the Pig philosophy.
1. Pigs Eat Anything: Pig can process different kinds of data such as structured and unstructured
data.
2. Pigs Live Anywhere: Pig not only processes files in HDFS, it also processes files in other
sources such as files in the local file system.
3. Pigs are Domestic Animals: Pig allows you to develop user-defined functions and the same
can be included in the script for complex operations.
4. Pigs Fly: Pig processes data quickly.

Pig Latin Overview

Data Types in Pig
1 Simple Data Types
Table 10.3 describes simple data types supported in Pig. In Pig, fields of unspecified types are
considered as an array of bytes which is known as bytearray.
Null: In Pig Latin, NULL denotes a value that is unknown or is non-existent.
2 Complex Data Types
Table 10.4 describes complex data types in Pig.
Running Pig
You can run pig in two ways
1.Interactive Mode
2.Batch Mode

1.Interactive Mode

2.Batch Mode
Execution Modes of Pig

Relational Operators.
Introduction to HIVE
Hive is a Data Warehousing tool that sits on top of Hadoop. Refer Figure 9.1. Hive is used to
process structured data in Hadoop. The three main tasks performed by Apache Hive are:
1. Summarization
2. Querying
3. Analysis
Facebook initially created Hive component to manage their ever-growing volumes of log data.
Later Apache software foundation developed it as open-source and it came to be known as
Apache Hive.
Hive makes use of the following:
1. HDFS for Storage.
2. MapReduce for execution.
3. Stores metadata/schemas in an RDBMS.
Hive provides HQL (Hive Query Language) or HiveQL which is similar to SQL. Hive compiles
SQL queries into MapReduce jobs and then runs the job in the Hadoop Cluster. It is designed to
support

HIVE features,
1. It is similar to SQL.
2. HQL is easy to code.
3. Hive supports rich data types such as structs, lists and maps.
4. Hive supports SQL filters, group-by and order-by clauses.
5. Custom Types, Custom Functions can be defined.
HIVE architecture
Hive Architecture is depicted in Figure 9.7. The various parts are as follows:
1. Hive Command-Line Interface (Hive CLI): The most commonly used interface to interact
with Hive.
2. Hive Web Interface: It is a simple Graphic User Interface to interact with Hive and to
execute query.
3. Hive Server: This is an optional server. This can be used to submit Hive Jobs from a remote
client.

4. JDBC/ODBC: Jobs can be submitted from a JDBC Client. One can write a Java code to
connect to Hive and submit jobs on it.
5. Driver: Hive queries are sent to the driver for compilation, optimization and execution.
6. Metastore: Hive table definitions and mappings to the data are stored in a Metastore. A
Metastore consists of the following:
• Metastore service: Offers interface to the Hive.
• Database: Stores data definitions, mappings to the data and others.
The metadata which is stored in the metastore includes IDs of Database, IDs of Tables, IDs of
Indexes, etc., the time of creation of a Table, the Input Format used for a Table, the Output
Format used for a Table, etc. The metastore is updated whenever a table is created or deleted
from Hive. There are three kinds of metastore.

1. Embedded Metastore: This metastore is mainly used for unit tests. Here, only one process is
allowed to connect to the metastore at a time. This is the default metastore for Hive. It is Apache
Derby Database. In this metastore, both the database and the metastore service run embedded in
the main Hive Server process. Figure 9.8 shows an Embedded Metastore.
2. Local Metastore: Metadata can be stored in any RDBMS component like MySQL. Local
metastore allows multiple connections at a time. In this mode, the Hive metastore service runs in
the main Hive Server process, but the metastore database runs in a separate process, and can be
on a separate host. Figure 9.9 shows a Local Metastore.
3. Remote Metastore: In this, the Hive driver and the metastore interface run on different JVMs
(which can run on different machines as well) as in Figure 9.10. This way the database can be
fire-walled from the Hive user and also database credentials are completely isolated from the
users of Hive.
HIVE datatypes
HIVE File Formats
HIVE Query Language.

Hive query language provides basic SQL like operations. Here are few of the tasks which HQL
can do easily.
1. Create and manage tables and partitions.
2. Support various Relational, Arithmetic, and Logical Operators.
3. Evaluate functions.
4. Download the contents of a table to a local directory or result of queries to HDFS directory.

.1 DDL (Data Definition Language) Statements

These statements are used to build and modify the tables and other objects in the database. The
DDL commands are as follows:
1. Create/Drop/Alter Database
2. Create/Drop/Truncate Table
3. Alter Table/Partition/Column
4. Create/Drop/Alter View
5. Create/Drop/Alter Index
6. Show
7. Describe

2 DML (Data Manipulation Language) Statements

These statements are used to retrieve, store, modify, delete, and update data in database. The
DML commands are as follows:
1. Loading files into table.
2. Inserting data into Hive Tables from queries.
Note: Hive 0.14 supports update, delete, and transaction operations.

Unit 5-Hive
No ratings yet
Unit 5-Hive
18 pages
Bda Ia-3 QB-1
No ratings yet
Bda Ia-3 QB-1
17 pages
Module-IV Hive
No ratings yet
Module-IV Hive
17 pages
Unit-Vi Hive Hadoop & Big Data
100% (1)
Unit-Vi Hive Hadoop & Big Data
24 pages
Big Data
No ratings yet
Big Data
120 pages
Hive
No ratings yet
Hive
30 pages
Hive Data Warehousing Overview
No ratings yet
Hive Data Warehousing Overview
9 pages
BIGDATUNIT5
No ratings yet
BIGDATUNIT5
32 pages
Bda Unit 5 Hive Notes
No ratings yet
Bda Unit 5 Hive Notes
23 pages
BD 5
No ratings yet
BD 5
28 pages
Notes of Aktu Btech 3 Yr Big Data
No ratings yet
Notes of Aktu Btech 3 Yr Big Data
15 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Big Data Analytics Module-4
No ratings yet
Big Data Analytics Module-4
39 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
33 pages
(R17a0528) Big Data Analytics-57-100
No ratings yet
(R17a0528) Big Data Analytics-57-100
44 pages
Introduction to Hive Architecture
No ratings yet
Introduction to Hive Architecture
23 pages
Unit 5 BDT
No ratings yet
Unit 5 BDT
49 pages
Unit 5 (Pig, Hive, Hbase)
No ratings yet
Unit 5 (Pig, Hive, Hbase)
18 pages
Unit IV Notes
No ratings yet
Unit IV Notes
47 pages
Hive for Data Analysts
No ratings yet
Hive for Data Analysts
16 pages
HIVE
No ratings yet
HIVE
18 pages
Apache HIVE
100% (1)
Apache HIVE
105 pages
Apache Hive for Data Analysts
No ratings yet
Apache Hive for Data Analysts
8 pages
Unit IV
No ratings yet
Unit IV
22 pages
Unit V
No ratings yet
Unit V
23 pages
Module 4
No ratings yet
Module 4
34 pages
Unit-5 1
No ratings yet
Unit-5 1
29 pages
Unit V Notes
No ratings yet
Unit V Notes
17 pages
Unit-5 (1) BD
No ratings yet
Unit-5 (1) BD
18 pages
Using Hive For Data Warehousing: Introduction To Hive
No ratings yet
Using Hive For Data Warehousing: Introduction To Hive
4 pages
Unit 5
No ratings yet
Unit 5
39 pages
Big Data - Unit 5 - Frame Works - Mini Xerox - Easy Read
No ratings yet
Big Data - Unit 5 - Frame Works - Mini Xerox - Easy Read
23 pages
Unit 3 BDA
No ratings yet
Unit 3 BDA
44 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Hive Updated
No ratings yet
Hive Updated
18 pages
Hive Final
No ratings yet
Hive Final
75 pages
Unit Iv Part - 1
No ratings yet
Unit Iv Part - 1
60 pages
Hive
100% (1)
Hive
47 pages
BD U-5 (Anupam Sir)
No ratings yet
BD U-5 (Anupam Sir)
12 pages
Hive Pig
No ratings yet
Hive Pig
20 pages
Hive - PIG - HBase - Zookeeper
100% (1)
Hive - PIG - HBase - Zookeeper
31 pages
Chapter 7
No ratings yet
Chapter 7
84 pages
Unit 4 HIVE - PIG
No ratings yet
Unit 4 HIVE - PIG
71 pages
Hive Data Types and Data Models
No ratings yet
Hive Data Types and Data Models
24 pages
Big Data Analytics Unit 4
No ratings yet
Big Data Analytics Unit 4
83 pages
Hive
No ratings yet
Hive
5 pages
Unit5 Notes
No ratings yet
Unit5 Notes
29 pages
Hive
No ratings yet
Hive
26 pages
Unit 5 (BDC)
No ratings yet
Unit 5 (BDC)
59 pages
Unit 5 Short
No ratings yet
Unit 5 Short
14 pages
Unit-V CC&BD CS62
No ratings yet
Unit-V CC&BD CS62
73 pages
Leçon4 Hadoop Query Languages
No ratings yet
Leçon4 Hadoop Query Languages
21 pages
Apache Hive Overview & Architecture
No ratings yet
Apache Hive Overview & Architecture
27 pages
Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
IET Udaipur BDA Unit-5
No ratings yet
IET Udaipur BDA Unit-5
9 pages
Unit 5 Bda
No ratings yet
Unit 5 Bda
42 pages
Apache HIVE
No ratings yet
Apache HIVE
5 pages
Week 2
No ratings yet
Week 2
3 pages
UNIT-1 Bda Kalyan
No ratings yet
UNIT-1 Bda Kalyan
25 pages
Unit - 5 FBDA
No ratings yet
Unit - 5 FBDA
7 pages
E-COMMERCE - Unit-1
No ratings yet
E-COMMERCE - Unit-1
26 pages
Chapter 9 - Sinusoids and Phasors
No ratings yet
Chapter 9 - Sinusoids and Phasors
13 pages
How To Configure Electronic Bank Statements in SAP.
No ratings yet
How To Configure Electronic Bank Statements in SAP.
5 pages
Guidelines For Project Proposal Writing - Ict-1
No ratings yet
Guidelines For Project Proposal Writing - Ict-1
4 pages
Lab Book
No ratings yet
Lab Book
250 pages
Locopias Manual
No ratings yet
Locopias Manual
52 pages
LinkedIn Tips for Sales Pros
100% (1)
LinkedIn Tips for Sales Pros
66 pages
Introduction BSNL
No ratings yet
Introduction BSNL
4 pages
Lori Emerson Reading Writing Interfaces
No ratings yet
Lori Emerson Reading Writing Interfaces
248 pages
A Survey On Power System Blackout and Cascading Events Research Motivations and Challenges
No ratings yet
A Survey On Power System Blackout and Cascading Events Research Motivations and Challenges
19 pages
IoT - UNIT 1 (COMPLETED On 24.11.2022 - 11 Classes)
100% (1)
IoT - UNIT 1 (COMPLETED On 24.11.2022 - 11 Classes)
103 pages
PanelView Standard Terminal Specifications
No ratings yet
PanelView Standard Terminal Specifications
24 pages
Karpagaraj Durairaj CV
No ratings yet
Karpagaraj Durairaj CV
3 pages
08 Gnanadesigan
No ratings yet
08 Gnanadesigan
15 pages
Mototrbo System Training
No ratings yet
Mototrbo System Training
58 pages
BLDC Motors in Aerospace Applications
100% (1)
BLDC Motors in Aerospace Applications
14 pages
Record
No ratings yet
Record
115 pages
OSC Manual
No ratings yet
OSC Manual
25 pages
SB9011D POS Terminal User Guide
No ratings yet
SB9011D POS Terminal User Guide
21 pages
Data Classification Template
No ratings yet
Data Classification Template
4 pages
Workflow Document
No ratings yet
Workflow Document
4 pages
DCC Microproject
No ratings yet
DCC Microproject
22 pages
SmartRide FAQs - Mobile Program PDF
No ratings yet
SmartRide FAQs - Mobile Program PDF
3 pages
Checklist Textiletesting
No ratings yet
Checklist Textiletesting
2 pages
Documenting Software
No ratings yet
Documenting Software
2 pages
Balance
No ratings yet
Balance
8 pages
TRW
No ratings yet
TRW
40 pages
Clinical Chemistry Analyzer Guide
No ratings yet
Clinical Chemistry Analyzer Guide
4 pages
Automated E-Ration Dispensing System
No ratings yet
Automated E-Ration Dispensing System
22 pages
Portfolio: Review Intel - TMS ICT Curriculum Unit 2: Technology at Work Grade 1
No ratings yet
Portfolio: Review Intel - TMS ICT Curriculum Unit 2: Technology at Work Grade 1
4 pages
Audi VW Parts Catalog 2022
No ratings yet
Audi VW Parts Catalog 2022
3 pages

Unit-3 FBDA

Uploaded by

Unit-3 FBDA

Uploaded by

UNIT III

The Anatomy of PIG

Pig Latin Overview

.1 DDL (Data Definition Language) Statements

2 DML (Data Manipulation Language) Statements

You might also like