2 MapReduce continue

MapReduce is a programming framework designed for distributed and parallel processing of large data sets, consisting of three main components: the Mapper Class, Reducer Class, and Driver Class. The Mapper processes input records into key-value pairs, while the Reducer aggregates these pairs to produce final output, which is then saved in HDFS. Key advantages of MapReduce include parallel processing, improved data locality, and reduced processing time by distributing tasks across multiple nodes.

Uploaded by

kajalyadav102703

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

2 MapReduce continue

Uploaded by

kajalyadav102703

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 12

MapReduce Working and

Advantages
MapReduce
MapReduce is a programming framework that allows us to perform distributed
and parallel processing on large data sets in a distributed environment.
More about MapReduce and its components. MapReduce majorly
has the following three Classes.

Mapper Class
• The first stage in Data Processing using MapReduce is
the Mapper Class. Here, RecordReader processes each Input
record and generates the respective key-value pair. Hadoop’s
Mapper store saves this intermediate data into the local disk.
• Input Split
It is the logical representation of data. It represents a block of
work that contains a single map task in the MapReduce
Program.
• RecordReader
It interacts with the Input split and converts the obtained data
in the form of Key-Value Pairs.
Reducer Class

The Intermediate output generated from the mapper is fed to

the reducer which processes it and generates the final output
which is then saved in the HDFS.
Driver Class

The major component in a MapReduce job is a Driver Class. It

is responsible for setting up a MapReduce Job to run-in
Hadoop.
We specify the name of Mapper and Reducer Classes long
with data types and their respective job names.
• First, we divide the input into three splits as shown in the
figure. This will distribute the work among all the map nodes.
• Then, we tokenize the words in each of the mappers and give
a hardcoded value (1) to each of the tokens or words. The
rationale behind giving a hardcoded value equal to 1 is that
every word, in itself, will occur once.
• Now, a list of key-value pair will be created where the key is
nothing but the individual words and value is one. So, for the
first line (Dear Bear River) we have 3 key-value pairs – Dear, 1;
Bear, 1; River, 1. The mapping process remains the same on all
the nodes.
• After the mapper phase, a partition process takes place where
sorting and shuffling happen so that all the tuples with the
same key are sent to the corresponding reducer.
• So, after the sorting and shuffling phase, each reducer will
have a unique key and a list of values corresponding to that
very key. For example, Bear, [1,1]; Car, [1,1,1].., etc.
• Now, each Reducer counts the values which are present in
that list of values. As shown in the figure, reducer gets a list of
values which is [1,1] for the key Bear. Then, it counts the
number of ones in the very list and gives the final output as –
Bear, 2.
• Finally, all the output key/value pairs are then collected and
written in the output file.
Advantages of MapReduce

1. Parallel Processing:
In MapReduce, we are dividing the job among multiple nodes
and each node works with a part of the job simultaneously.
So, MapReduce is based on Divide and Conquer paradigm
which helps us to process the data using different machines.
As the data is processed by multiple machines instead of a
single machine in parallel, the time taken to process the data
gets reduced by a tremendous amount
2. Data Locality:

• Instead of moving data to the processing unit, we are moving

the processing unit to the data in the MapReduce Framework.
• In the traditional system, we used to bring data to the
processing unit and process it. But, as the data grew and
became very huge, bringing this huge amount of data to the
processing unit posed the following issues:
• Moving huge data to processing is costly and deteriorates the
network performance.
• Processing takes time as the data is processed by a single unit
which becomes the bottleneck.
• The master node can get over-burdened and may fail.
Now, MapReduce allows us to overcome the above issues by
bringing the processing unit to the data. So, as you can see in
the above image that the data is distributed among multiple
nodes where each node processes the part of the data
residing on it. This allows us to have the following advantages:

It is very cost-effective to move processing unit to the data.

The processing time is reduced as all the nodes are working

with their part of the data in parallel.

Every node gets a part of the data to process and therefore,

there is no chance of a node getting overburdened.
MapReduce-Example
Twitter receives around 500 million tweets per day, which is
nearly 3000 tweets per second.

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
57% (82)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (108)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
91% (35)
The 36 Questions That Lead To Love - The New York Times
3 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
1001 Songs
70% (73)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
Map reduce
No ratings yet
Map reduce
35 pages
Traditional Way Vs Map Reduce Way and Steps in Mapreduce (Word Count) - 1
No ratings yet
Traditional Way Vs Map Reduce Way and Steps in Mapreduce (Word Count) - 1
4 pages
Map Reduce
No ratings yet
Map Reduce
18 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
32 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
32 pages
BIG DATA
No ratings yet
BIG DATA
120 pages
Chapter4 - MapReduce
No ratings yet
Chapter4 - MapReduce
29 pages
Chapter 3 - 大数据管理
No ratings yet
Chapter 3 - 大数据管理
38 pages
Cloud Computing Prof
No ratings yet
Cloud Computing Prof
11 pages
Term Paper Java
No ratings yet
Term Paper Java
14 pages
The Map Reduce Programming
No ratings yet
The Map Reduce Programming
15 pages
Unit 5 Lecture 5
No ratings yet
Unit 5 Lecture 5
21 pages
3.Map-Reduce Framework - 1
No ratings yet
3.Map-Reduce Framework - 1
47 pages
Data Science
No ratings yet
Data Science
7 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
Chapter 4
No ratings yet
Chapter 4
53 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Unit4 Fos
No ratings yet
Unit4 Fos
7 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Map Reduce
No ratings yet
Map Reduce
42 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
Map Reduce Tutorial-1
No ratings yet
Map Reduce Tutorial-1
7 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
The Mapreduce Paradigm: Michael Kleber
No ratings yet
The Mapreduce Paradigm: Michael Kleber
13 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
Map Reduce
No ratings yet
Map Reduce
3 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Lecture 10 MapReduce Hadoop
No ratings yet
Lecture 10 MapReduce Hadoop
37 pages
Introduction to batch processing
No ratings yet
Introduction to batch processing
23 pages
By Christian Mechem and Geoff Crowley
No ratings yet
By Christian Mechem and Geoff Crowley
11 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages
Big Data notes (1)
No ratings yet
Big Data notes (1)
13 pages
Unit 3 Map Reduce
No ratings yet
Unit 3 Map Reduce
3 pages
Ecs765p W2
No ratings yet
Ecs765p W2
55 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
43 pages
Map Reduce
No ratings yet
Map Reduce
7 pages
Bda Ia1 Scheme
No ratings yet
Bda Ia1 Scheme
7 pages
Fundamentals of MapReduce With Example
No ratings yet
Fundamentals of MapReduce With Example
2 pages
Big Data Notes (All Lectures)
No ratings yet
Big Data Notes (All Lectures)
44 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
192 pages
Hadoop - Mapreduce (1)
No ratings yet
Hadoop - Mapreduce (1)
5 pages
Lecture 3 MapReduce Spark
No ratings yet
Lecture 3 MapReduce Spark
62 pages
Unit-2 Map Reduce Notes
No ratings yet
Unit-2 Map Reduce Notes
28 pages
MapReduce_Quora
No ratings yet
MapReduce_Quora
39 pages
Map Reduce
No ratings yet
Map Reduce
39 pages
Lecture 3 - MapReduce
No ratings yet
Lecture 3 - MapReduce
9 pages
Map Reduce
No ratings yet
Map Reduce
3 pages
BDA_UNIT_2
No ratings yet
BDA_UNIT_2
48 pages
Hadoop Mapreduce
No ratings yet
Hadoop Mapreduce
131 pages
Map Reduce: Simplified Processing On Large Clusters
No ratings yet
Map Reduce: Simplified Processing On Large Clusters
29 pages
Hadoop - MapReduce
No ratings yet
Hadoop - MapReduce
5 pages
CC_unit4_52e39303-d867-4b14-b5bf-38bc746359c6
No ratings yet
CC_unit4_52e39303-d867-4b14-b5bf-38bc746359c6
14 pages
132 P16cse5a-P16ite3a 2020052706582977
No ratings yet
132 P16cse5a-P16ite3a 2020052706582977
15 pages
Map Reduce Report
No ratings yet
Map Reduce Report
16 pages
Map Reduce Intro CS4961-L22
No ratings yet
Map Reduce Intro CS4961-L22
20 pages
Conceptual Programming: Conceptual Programming: Learn Programming the old way!
From Everand
Conceptual Programming: Conceptual Programming: Learn Programming the old way!
Avishek Sharma
No ratings yet
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
No ratings yet
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
16 pages
Advanced Java Syllabus
No ratings yet
Advanced Java Syllabus
3 pages
Common COGI Errors
No ratings yet
Common COGI Errors
2 pages
Valorant Diskless Launcher
No ratings yet
Valorant Diskless Launcher
3 pages
Replication in MongoDB
100% (1)
Replication in MongoDB
72 pages
A DNS Message Format Contains 5 Sections
No ratings yet
A DNS Message Format Contains 5 Sections
6 pages
Good SQL Collections
No ratings yet
Good SQL Collections
11 pages
Change Management Part 2
No ratings yet
Change Management Part 2
17 pages
Resume SwetaSharma PO
No ratings yet
Resume SwetaSharma PO
1 page
Online Faculty Evaluation System Documentation
67% (3)
Online Faculty Evaluation System Documentation
98 pages
Vector Search Demo Commands
No ratings yet
Vector Search Demo Commands
12 pages
Technical Solutions Engineer, Google Cloud Platform - Bangalore
No ratings yet
Technical Solutions Engineer, Google Cloud Platform - Bangalore
3 pages
Planning Process
No ratings yet
Planning Process
4 pages
10MCA17 UNIX Programs (MCA SEM 2, VTU)
No ratings yet
10MCA17 UNIX Programs (MCA SEM 2, VTU)
54 pages
Index of Linux Commands
No ratings yet
Index of Linux Commands
4 pages
Universal XML Universal Activity Module Mapping Guide
No ratings yet
Universal XML Universal Activity Module Mapping Guide
17 pages
Script and Smart
No ratings yet
Script and Smart
6 pages
Ajay Abinitio Admin
No ratings yet
Ajay Abinitio Admin
1 page
Unit 1: The Database Environment: Topic 1: Basic Concepts and Terminologies Data vs. Information - Data
No ratings yet
Unit 1: The Database Environment: Topic 1: Basic Concepts and Terminologies Data vs. Information - Data
27 pages
SRS Example
No ratings yet
SRS Example
23 pages
In 90 Administrator Guide
No ratings yet
In 90 Administrator Guide
445 pages
Jimma University: Jimma Institute of Technology
No ratings yet
Jimma University: Jimma Institute of Technology
8 pages
Presentation Workflow in Dynamics Ax
100% (1)
Presentation Workflow in Dynamics Ax
23 pages
05635801
No ratings yet
05635801
4 pages
DSS 12 S4 03 Design Specification
No ratings yet
DSS 12 S4 03 Design Specification
48 pages
Dbms Practical File
0% (1)
Dbms Practical File
15 pages
File Management
No ratings yet
File Management
14 pages
Raju Ordbms
No ratings yet
Raju Ordbms
36 pages
Ibase PDF
No ratings yet
Ibase PDF
360 pages
Ysar 15 0002e - 2
No ratings yet
Ysar 15 0002e - 2
2 pages

2 MapReduce continue

Uploaded by

2 MapReduce continue

Uploaded by

MapReduce Working and

The Intermediate output generated from the mapper is fed to

The major component in a MapReduce job is a Driver Class. It

• Instead of moving data to the processing unit, we are moving

It is very cost-effective to move processing unit to the data.

The processing time is reduced as all the nodes are working

Every node gets a part of the data to process and therefore,

You might also like