Lab 5

Uploaded by

maddyhereforu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views9 pages

Lab 5

Uploaded by

maddyhereforu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

WQD7007 Big Data Management

Introduction to Pig

1
Introduction
• In this lab, we are going to practice how to analyze
large amount of data as data flows using Apache
Pig.
• Pig use Pig Latin scripting language, to achieve
adhoc data analysis in an iterative fashion
• Pig sits on top of MapReduce, so all Pig scripts run
as Map and Reduce task.

2
Installation
• Online reference: https://www.edureka.co/blog/apache-
pig-installation
• wget http://www-us.apache.org/dist/pig/pig-0.16.0/pig-
0.16.0.tar.gz
• tar -xzf pig-0.16.0.tar.gz
• mv pig-0.16.0 /home/{yourname}/pig/
• In .bashrc:
• export PATH=$PATH:/home/{yourname}/pig
• export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/

• Execute source .bashrc

• Execute pig

3
Load Data
• Write Pig script:
• batting = load ‘/user/hdfs/batting.csv’ using
PigStorage(‘,’);
• raw_runs = FILTER batting BY $1>0;

• No result appeared even though the operation is

completed.
• This is because not DUMP command is called to display result or
save to storage.
• DUMP raw_runs
• Sample result (1st line):
• (aardsda01,2004,1,SFN,NL,11,11,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,11)

4
Filter data
• Pig Characteristics: iterative. Means we an step into each
intermediate step. Example:
• Runs = FOREACH raw_runs GENERATE $0 as playerID,
$1 as year, $8 as runs;

5
Aggregate Data
• Data can be grouped based on elements e.g. according to
the year by setting grp_data object to be indexed by year.
Example:
• grp_data = GROUP runs by (year);
• max_runs = FOREACH grp_data GENERATE group as
grp,MAX(run.runs) as max_runs;
• DUMP max_runs

6
Join Data
• We have the maximum for each year but we need to join
this with the runs data object.
• We want our output result in the form of (Year, PlayerID and
Max Run). Example:
• join_max_run = JOIN max_runs by ($0, max_runs),
runs by (year, runs);
• join_data = FOREACH join_max_run GENERATE $0 as
year, $2 as playerID, $1 as run.
• DUMP join_data

7
Another example: Movie data
1. Download movies_data.csv and upload it to HDFS.
2. Run the following scripts in Pig:
• movies = LOAD ‘/user/hdfs/movies_data.csv’;
• USING PigStorage(‘,’) as (id, name, year, rating, duration);
• DUMP movies;
3. Filter data iteratively (find movies that are worth watching
à rating higher than 4.0):
• movies_greater_than_four = FILTER movies BY
(float)rating>4.0;
• DUMP movies_greater_than_four

8
Another example: Movie data
4. Write outcome to persistent storage:
• STORE movies_greater_than_four into
‘/user/hdfs/movies_greater_than_four’;
5. Look for classic movies that are between 50s and 60s:
• movies_between_50_60 = FILTER movies by year>1950
and year<1960;
6. Retrieve movies that start with the character ‘A’
• movies_starting_with_A = FILTER movies by name
matches ‘A.*’;

* Use Hive query to retrieve the same information. What are

the pros and cons between Hive and Pig?
9

Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction To Pig
67% (3)
Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction To Pig
34 pages
Chaucer As A Representative of His Age
100% (1)
Chaucer As A Representative of His Age
4 pages
Unit 4
No ratings yet
Unit 4
5 pages
Pig
No ratings yet
Pig
16 pages
Apache PIG by Sravanthi
No ratings yet
Apache PIG by Sravanthi
31 pages
PIG A Big Data Processor
No ratings yet
PIG A Big Data Processor
49 pages
Unit 5
No ratings yet
Unit 5
16 pages
BDA Module 4 - Part 1 (Pig) 2023
No ratings yet
BDA Module 4 - Part 1 (Pig) 2023
34 pages
Chapter 10
No ratings yet
Chapter 10
50 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
BDA-V
No ratings yet
BDA-V
10 pages
Big_Data_Unit-5
No ratings yet
Big_Data_Unit-5
81 pages
32 BDA Exp5
No ratings yet
32 BDA Exp5
33 pages
bda-unit-4-060115-big-data-analytics-unit-4
No ratings yet
bda-unit-4-060115-big-data-analytics-unit-4
19 pages
Chapter 5 - Introducing Pig Pig Architecture
No ratings yet
Chapter 5 - Introducing Pig Pig Architecture
81 pages
Unit 5 Lecture No-2(PIG)
No ratings yet
Unit 5 Lecture No-2(PIG)
94 pages
Module 4 - Pig
No ratings yet
Module 4 - Pig
65 pages
BigData Module 2
No ratings yet
BigData Module 2
41 pages
Apache Pig
No ratings yet
Apache Pig
23 pages
Unit IV EBDP 22
No ratings yet
Unit IV EBDP 22
97 pages
Unit 4
No ratings yet
Unit 4
29 pages
PIG: A Big Data Processor: Tushar B. Kute
No ratings yet
PIG: A Big Data Processor: Tushar B. Kute
50 pages
7 Ibiz Pig Workouts
No ratings yet
7 Ibiz Pig Workouts
7 pages
Unit 5 Lecture No-2(PIG)
No ratings yet
Unit 5 Lecture No-2(PIG)
101 pages
BDA-NOTES-JNTUK-R20-UNIT-4
No ratings yet
BDA-NOTES-JNTUK-R20-UNIT-4
14 pages
Apache PIG.pptx
No ratings yet
Apache PIG.pptx
41 pages
Apache Pig in noSql Databases
No ratings yet
Apache Pig in noSql Databases
5 pages
BDP U4
No ratings yet
BDP U4
58 pages
Pig Tutorial
No ratings yet
Pig Tutorial
22 pages
Pig Tutorial PDF
No ratings yet
Pig Tutorial PDF
22 pages
Pig - Lab Demonstrations Explore!: Woha! Pig Is Supercool!
No ratings yet
Pig - Lab Demonstrations Explore!: Woha! Pig Is Supercool!
4 pages
BDA_UNIT_IV_NOTES (1)
No ratings yet
BDA_UNIT_IV_NOTES (1)
32 pages
05a-pig
No ratings yet
05a-pig
52 pages
Hadoop Pig Presentation
No ratings yet
Hadoop Pig Presentation
33 pages
Pig Slides
No ratings yet
Pig Slides
46 pages
Lab 7
No ratings yet
Lab 7
2 pages
Big Data Unit-5
No ratings yet
Big Data Unit-5
9 pages
Bigdata: What Is Pig?
No ratings yet
Bigdata: What Is Pig?
16 pages
Introduction To Pig: SESSION 2016-2017
No ratings yet
Introduction To Pig: SESSION 2016-2017
44 pages
BIG DATA Module 2 FINAL SMI
No ratings yet
BIG DATA Module 2 FINAL SMI
44 pages
BDC Output 7
No ratings yet
BDC Output 7
9 pages
BigData Unit 4
No ratings yet
BigData Unit 4
13 pages
Pig_2
No ratings yet
Pig_2
63 pages
06-Pig-01-Intro-1
No ratings yet
06-Pig-01-Intro-1
23 pages
Apache Pig: Senthil Kumar A
No ratings yet
Apache Pig: Senthil Kumar A
24 pages
Unit 5(Pig,Hive,Hbase)
No ratings yet
Unit 5(Pig,Hive,Hbase)
18 pages
Experiment-7 BDA
No ratings yet
Experiment-7 BDA
4 pages
Pig Practical: Mcjjcbek/View?Usp Sharing
No ratings yet
Pig Practical: Mcjjcbek/View?Usp Sharing
10 pages
Experiment-7 Pig-Script
No ratings yet
Experiment-7 Pig-Script
4 pages
Unit IV - Big Data Programming
No ratings yet
Unit IV - Big Data Programming
17 pages
Bda Module 5
No ratings yet
Bda Module 5
26 pages
Pig Hive
No ratings yet
Pig Hive
72 pages
Pig Hive
No ratings yet
Pig Hive
58 pages
BDA-Unit 5-notes
No ratings yet
BDA-Unit 5-notes
36 pages
UNIT 5 Complete Notes
No ratings yet
UNIT 5 Complete Notes
21 pages
Notes Unit 5 Bigdata
No ratings yet
Notes Unit 5 Bigdata
19 pages
Big Data Processing, 2014/15: Lecture 8: Pig Latin!
No ratings yet
Big Data Processing, 2014/15: Lecture 8: Pig Latin!
58 pages
Big Data Analytics Unit 4
No ratings yet
Big Data Analytics Unit 4
83 pages
Quick Python Guide
From Everand
Quick Python Guide
Coder1
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
From Everand
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
Kanto
No ratings yet
The Voice of The Victim: Gender, Representation and Early Christian Martyrdom
No ratings yet
The Voice of The Victim: Gender, Representation and Early Christian Martyrdom
12 pages
Beauty & Wellness
No ratings yet
Beauty & Wellness
9 pages
Bank Exams Winners Kit
No ratings yet
Bank Exams Winners Kit
62 pages
Chapter 9 Imc
No ratings yet
Chapter 9 Imc
96 pages
TR Electronic Absolute Encoders
No ratings yet
TR Electronic Absolute Encoders
24 pages
Lista Jpop Jrock Kpop (Pv's Live's Tv's Etc.
No ratings yet
Lista Jpop Jrock Kpop (Pv's Live's Tv's Etc.
32 pages
TGG Awp Final
No ratings yet
TGG Awp Final
4 pages
Encyclopedia of Knowledge Management David Schwartz download
No ratings yet
Encyclopedia of Knowledge Management David Schwartz download
53 pages
Advanced Web Designing
No ratings yet
Advanced Web Designing
96 pages
Mini 3dx Plans Tiled A4
No ratings yet
Mini 3dx Plans Tiled A4
20 pages
Ba Zhen Tang - 八珍湯 - Eight Treasures Decoction - Dang Gui and Ginseng Eight Combination - Chinese Herbs - American Dragon - Dr Joel Penner OMD, LAc
No ratings yet
Ba Zhen Tang - 八珍湯 - Eight Treasures Decoction - Dang Gui and Ginseng Eight Combination - Chinese Herbs - American Dragon - Dr Joel Penner OMD, LAc
10 pages
Weick - 1989 - Theory Construction As Disciplined Imagination
No ratings yet
Weick - 1989 - Theory Construction As Disciplined Imagination
17 pages
Pengumuman UMPI Tahap 1 Gelb 3 2021
No ratings yet
Pengumuman UMPI Tahap 1 Gelb 3 2021
5 pages
Evolution Notes Detailed Class 12
No ratings yet
Evolution Notes Detailed Class 12
4 pages
Edited by Frances Peters-Little, Ann Curthoys and John Docker-PASSIONATE HISTORIES - MYTH, MEMORY and INDIGENOUS AUSTRALIA-ANU E Press and Aboriginal History Incorporated (2010)
No ratings yet
Edited by Frances Peters-Little, Ann Curthoys and John Docker-PASSIONATE HISTORIES - MYTH, MEMORY and INDIGENOUS AUSTRALIA-ANU E Press and Aboriginal History Incorporated (2010)
348 pages
Cambridge IGCSE 0417 Information and Communication Technology Syllabus for Examination in 2026, 2027 and 2028
No ratings yet
Cambridge IGCSE 0417 Information and Communication Technology Syllabus for Examination in 2026, 2027 and 2028
29 pages
Ruanta C-12 TDS
50% (2)
Ruanta C-12 TDS
1 page
Dewi Kurnia 60400118035 - Jurnal JFT Difraksi Laser
No ratings yet
Dewi Kurnia 60400118035 - Jurnal JFT Difraksi Laser
9 pages
000 P MBAM Overview of Slope Stability Problems 4 Dec 19L
No ratings yet
000 P MBAM Overview of Slope Stability Problems 4 Dec 19L
15 pages
Aspen Tutorial
No ratings yet
Aspen Tutorial
33 pages
Energy End Use Buildings
No ratings yet
Energy End Use Buildings
112 pages
Forex Risk Management Services WriteUp CA Rajiv D Khatlawala
No ratings yet
Forex Risk Management Services WriteUp CA Rajiv D Khatlawala
1 page
[ Cô Vũ Mai Phương ] Đề thi thử tốt nghiệp THPT Quốc Gia 2024_ Đề số 06 - Mức độ cơ bản
No ratings yet
[ Cô Vũ Mai Phương ] Đề thi thử tốt nghiệp THPT Quốc Gia 2024_ Đề số 06 - Mức độ cơ bản
6 pages
CheatsFS251
No ratings yet
CheatsFS251
19 pages
Storage Equipment Information Bulletin No. 2 July 11
No ratings yet
Storage Equipment Information Bulletin No. 2 July 11
15 pages
IT Video Exercises & Tests
No ratings yet
IT Video Exercises & Tests
19 pages
SSC CHSL Question Paper 17 March 2023 Shift 3
No ratings yet
SSC CHSL Question Paper 17 March 2023 Shift 3
29 pages
Lesson Plan: Paul Ebenezer.P 1 Year MSC Nursing College of Nursing, CMC, Vellore
100% (1)
Lesson Plan: Paul Ebenezer.P 1 Year MSC Nursing College of Nursing, CMC, Vellore
44 pages
A5/1 Security Project: Latest News
No ratings yet
A5/1 Security Project: Latest News
2 pages

Lab 5

Uploaded by

Lab 5

Uploaded by

WQD7007 Big Data Management

• Execute source .bashrc

• No result appeared even though the operation is

* Use Hive query to retrieve the same information. What are

You might also like