Anatomy of Map Reduce Job Run

The document describes the key components of MapReduce daemons: the client, resource manager, node managers, application master, and distributed filesystem. It explains the job submission process, job initialization where the application master retrieves input splits and creates tasks, task assignment to containers by the application master, task execution by YarnChild, and progress reporting until job completion.

Uploaded by

Hanumanthu Gouthami

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

323 views

Anatomy of Map Reduce Job Run

Uploaded by

Hanumanthu Gouthami

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 20

MapReduce Daemons

Anatomy of MapReduce
There are five independent entities:
 The client, which submits the MapReduce job.
 The YARN resource manager, which coordinates the allocation of
compute resources on the cluster.
 The YARN node managers, which launch and monitor the compute
containers on machines in the cluster.
 The MapReduce application master, which coordinates the tasks
running the MapReduce job The application master and the MapReduce tasks
run in containers that are scheduled by the resource manager and managed by
the node managers.
 The distributed filesystem, which is used for sharing job files between
the other entities.
Job Submission:

 The submit() method on Job creates an internal JobSubmitter instance and

calls submitJobInternal() on it.
 Having submitted the job, waitForCompletion polls the job’s progress once
per second and reports the progress to the console if it has changed since
the last report.
 When the job completes successfully, the job counters are displayed
Otherwise, the error that caused the job to fail is logged to the console.
The job submission process implemented by JobSubmitter does the
following:
 Asks the resource manager for a new application ID, used for the
MapReduce job ID.
 Checks the output specification of the job For example, if the output
directory has not been specified or it already exists, the job is not
submitted and an error is thrown to the MapReduce program.
 Computes the input splits for the job If the splits cannot be computed
(because the input paths don’t exist, for example), the job is not submitted
and an error is thrown to the MapReduce program.
 Copies the resources needed to run the job, including the job JAR file, the
configuration file, and the computed input splits, to the shared filesystem
in a directory named after the job ID. Submits the job by calling
submitApplication() on the resource manager.
Job Initialization :

 When the resource manager receives a call to its submitApplication()

method, it hands off the request to the YARN scheduler.
 The scheduler allocates a container, and the resource manager then
launches the application master’s process there, under the node manager’s
management.
 The application master for MapReduce jobs is a Java application whose
main class is MRAppMaster .
 It initializes the job by creating a number of bookkeeping objects to keep
track of the job’s progress, as it will receive progress and
 completion reports from the tasks. It retrieves the input splits computed in
the client from the shared filesystem.
 It then creates a map task object for each split, as well as a number of
reduce task objects determined by the mapreduce.job.reduces property (set
by the setNumReduceTasks() method on Job).
Task Assignment:

 If the job does not qualify for running as an uber task, then the application
master requests containers for all the map and reduce tasks in the job from
the resource manager .
 Requests for map tasks are made first and with a higher priority than those
for reduce tasks, since all the map tasks must complete before the sort
phase of the reduce can start.
Task Execution:

 Once a task has been assigned resources for a container on a particular node
by the resource manager’s scheduler, the application master starts the
container by contacting the node manager.
 The task is executed by a Java application whose main class is YarnChild.
Before it can run the task, it localizes the resources that the task needs,
including the job configuration and JAR file, and any files from the
distributed cache.
 Finally, it runs the map or reduce task.
Streaming:
 Streaming runs special map and reduce tasks for the purpose of launching the
user supplied executable and communicating with it.
 The Streaming task communicates with the process (which may be written in
any language) using standard input and output streams.
 During execution of the task, the Java process passes input key value pairs to
the external process, which runs it through the user defined map or reduce
function and passes the output key value pairs back to the Java process.
 From the node manager’s point of view, it is as if the child process run the
map or reduce code itself.
Progress and status updates :

• MapReduce jobs are long running batch jobs, taking anything from tens of
seconds to hours to run.
• A job and each of its tasks have a status, which includes such things as the
state of the job or task (e g running, successfully completed, failed), the
progress of maps and reduces, the values of the job’s counters, and a status
message or description (which may be set by user code).
• When a task is running, it keeps track of its progress (i e the proportion of
task is completed).
• For map tasks, this is the proportion of the input that has been processed.
• For reduce tasks, it’s a little more complex, but the system can still
estimate the proportion of the reduce input processed.
It does this by dividing the total progress into three parts, corresponding to the
three phases of the shuffle.

• As the map or reduce task runs, the child process communicates with its
parent application master through the umbilical interface.
• The task reports its progress and status (including counters) back to its
application master, which has an aggregate view of the job, every three
seconds over the umbilical interface.
• The resource manager web UI displays all the running applications with links
to the web UIs of their respective application masters, each of which displays
further details on the MapReduce job, including its progress.
• During the course of the job, the client receives the latest status by polling the
application master every second(the interval is set via
mapreduce.client.progressmonitor.pollinterval)..
Job Completion:

• When the application master receives a notification that the last task for a
job is complete, it changes the status for the job to Successful.
• Then, when the Job polls for status, it learns that the job has completed
successfully, so it prints a message to tell the user and then returns from the
waitForCompletion() .
• Finally, on job completion, the application master and the task containers
clean up their working state and the OutputCommitter’s commitJob ()
method is called.
• Job information is archived by the job history server to enable later
interrogation by users if desired.

Izinja Zehlathi by Precious Moloi 1-27
100% (1)
Izinja Zehlathi by Precious Moloi 1-27
168 pages
Big - Data Lab Manual
No ratings yet
Big - Data Lab Manual
65 pages
Cloud Unit3
No ratings yet
Cloud Unit3
26 pages
Anatomy of Map-Reduce Jobs PDF
No ratings yet
Anatomy of Map-Reduce Jobs PDF
30 pages
AM-12.20.026 en D.1 Main
100% (1)
AM-12.20.026 en D.1 Main
196 pages
Unit Iv Mapreduce Applications
No ratings yet
Unit Iv Mapreduce Applications
70 pages
English Week3 PDF
No ratings yet
English Week3 PDF
4 pages
Introduction To The Concept of IT Security PDF
No ratings yet
Introduction To The Concept of IT Security PDF
48 pages
Anatomy of Mapreduce Job Run: Some Slides Are Taken From Cmu PPT Presentation
No ratings yet
Anatomy of Mapreduce Job Run: Some Slides Are Taken From Cmu PPT Presentation
73 pages
Bda - Unit 3
No ratings yet
Bda - Unit 3
29 pages
P.prabu (28x61c) CCS334 BDA - Unit 4
No ratings yet
P.prabu (28x61c) CCS334 BDA - Unit 4
28 pages
Case Study DSBDA Report Final
No ratings yet
Case Study DSBDA Report Final
24 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
38 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
36 pages
BD - Unit - III - MapReduce
100% (1)
BD - Unit - III - MapReduce
31 pages
Characteristics of A Good SRS
No ratings yet
Characteristics of A Good SRS
2 pages
Bda Module 4 PPT (KM)
No ratings yet
Bda Module 4 PPT (KM)
76 pages
Cs3391 Oops Unit 1 Notes Eduengg
No ratings yet
Cs3391 Oops Unit 1 Notes Eduengg
60 pages
Jntu SL Lab Manual
No ratings yet
Jntu SL Lab Manual
33 pages
BDA Unit-2
No ratings yet
BDA Unit-2
11 pages
CS-605 Data - Analytics - Lab Complete Manual (2) - 1672730238
No ratings yet
CS-605 Data - Analytics - Lab Complete Manual (2) - 1672730238
56 pages
Daa-r22-Unit 1&2-Digital Notes Cse Dept (A.y 2024-25) @DR.K
No ratings yet
Daa-r22-Unit 1&2-Digital Notes Cse Dept (A.y 2024-25) @DR.K
50 pages
DAA Unit1
No ratings yet
DAA Unit1
139 pages
CCS334 Big Data Analytics Important Question
No ratings yet
CCS334 Big Data Analytics Important Question
1 page
DEVOPS QUESTION PAPERS 26-11-2024 AI&DS
No ratings yet
DEVOPS QUESTION PAPERS 26-11-2024 AI&DS
4 pages
BDA Unit 1-1
No ratings yet
BDA Unit 1-1
21 pages
CS6612 Compiler Lab Manual
100% (4)
CS6612 Compiler Lab Manual
60 pages
Module II
No ratings yet
Module II
22 pages
Hadoop ppt@87
No ratings yet
Hadoop ppt@87
16 pages
CCS354 Network Security
No ratings yet
CCS354 Network Security
87 pages
Python Notes 3rd Mca
No ratings yet
Python Notes 3rd Mca
99 pages
Unit 5 Notes
100% (3)
Unit 5 Notes
66 pages
Notes - Unit 3 - Map Reduce Applications
No ratings yet
Notes - Unit 3 - Map Reduce Applications
11 pages
Python Record Final With Viva Question
No ratings yet
Python Record Final With Viva Question
100 pages
Unit 1 - Compiler Design - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Compiler Design - WWW - Rgpvnotes.in
17 pages
Chap 1 Web Essentials
75% (4)
Chap 1 Web Essentials
100 pages
Unit 3 - FSW - Important Ques With Ans
No ratings yet
Unit 3 - FSW - Important Ques With Ans
36 pages
Siddharth Arya 76 ML Practical File
No ratings yet
Siddharth Arya 76 ML Practical File
30 pages
Understanding Inputs and Outputs of Mapreduce
No ratings yet
Understanding Inputs and Outputs of Mapreduce
13 pages
Cs3451 Ios Unit 5 Notes
No ratings yet
Cs3451 Ios Unit 5 Notes
21 pages
Ccs368-Stream Processing Lab Manual
No ratings yet
Ccs368-Stream Processing Lab Manual
50 pages
Cs3691-Important Two Marks
No ratings yet
Cs3691-Important Two Marks
22 pages
UNITI
No ratings yet
UNITI
6 pages
Ad3251 Unit 2 Notes Edu Engg
No ratings yet
Ad3251 Unit 2 Notes Edu Engg
35 pages
CS8691 AI CO-PO Mapping
No ratings yet
CS8691 AI CO-PO Mapping
6 pages
Nosql Databases Unit-1
No ratings yet
Nosql Databases Unit-1
16 pages
cs3451 Ios Unit III Notes
No ratings yet
cs3451 Ios Unit III Notes
31 pages
FDS Lesson Plan
No ratings yet
FDS Lesson Plan
8 pages
Ccs356 Oose Lab Manual Final
No ratings yet
Ccs356 Oose Lab Manual Final
132 pages
Hadoop Building Blocks
No ratings yet
Hadoop Building Blocks
30 pages
DBDM Unit Four
No ratings yet
DBDM Unit Four
33 pages
Bda Unit 3
No ratings yet
Bda Unit 3
22 pages
Unit 1: Database Management System (DBMS) Historical Perspective
100% (1)
Unit 1: Database Management System (DBMS) Historical Perspective
30 pages
CS3251 (UNIT 4) NOTES EduEngg
No ratings yet
CS3251 (UNIT 4) NOTES EduEngg
27 pages
Compiler Design Lab Manual
No ratings yet
Compiler Design Lab Manual
36 pages
Unit-5 (Coa) Notes
No ratings yet
Unit-5 (Coa) Notes
33 pages
Recursively Enumerable Languages
No ratings yet
Recursively Enumerable Languages
8 pages
Data Analytics Unit-3 Notes
No ratings yet
Data Analytics Unit-3 Notes
21 pages
BDC Previous Papers 2 Marks
100% (1)
BDC Previous Papers 2 Marks
7 pages
Synchronization in Java
No ratings yet
Synchronization in Java
23 pages
Ccs334 Big Data Analytics
0% (1)
Ccs334 Big Data Analytics
2 pages
Unit5 BD
100% (2)
Unit5 BD
91 pages
Introduction to Linux: Installation and Programming
From Everand
Introduction to Linux: Installation and Programming
N. B. Venkateswarlu
No ratings yet
Unit3 MapReduce
No ratings yet
Unit3 MapReduce
7 pages
JDBC
No ratings yet
JDBC
21 pages
Normalization_in_Database_Management_System
No ratings yet
Normalization_in_Database_Management_System
3 pages
INTRODUCTION TO ORACLE 10G
No ratings yet
INTRODUCTION TO ORACLE 10G
4 pages
WEEK 5
No ratings yet
WEEK 5
4 pages
Call by value and Call by reference in C
No ratings yet
Call by value and Call by reference in C
4 pages
Counting Oneness in A Window
No ratings yet
Counting Oneness in A Window
12 pages
Data Analysis UNIT-III
No ratings yet
Data Analysis UNIT-III
11 pages
Applications of Data Science UNIT-1
No ratings yet
Applications of Data Science UNIT-1
4 pages
2 - Stat-701 Correlation
No ratings yet
2 - Stat-701 Correlation
16 pages
Advantages and Disadvantages of Aphakics Correction, Types of Aphakic Correction, Problems of Newly Corrected Aphakics
100% (1)
Advantages and Disadvantages of Aphakics Correction, Types of Aphakic Correction, Problems of Newly Corrected Aphakics
78 pages
Atomic Structure and Amount of Substance Q
No ratings yet
Atomic Structure and Amount of Substance Q
30 pages
Foundations of Marketing 8th Edition download pdf
100% (3)
Foundations of Marketing 8th Edition download pdf
24 pages
The Development and Validation of A New
No ratings yet
The Development and Validation of A New
12 pages
Paleopathology in Russia, Buzhilova
No ratings yet
Paleopathology in Russia, Buzhilova
9 pages
System Analysis and Design Assignment (CSC 311)
No ratings yet
System Analysis and Design Assignment (CSC 311)
15 pages
8 - Studi Kasus Lapangan Panas Bumi Non-Vulkanik Di Sulawesi
No ratings yet
8 - Studi Kasus Lapangan Panas Bumi Non-Vulkanik Di Sulawesi
13 pages
Computer Graphics Viva Lab
No ratings yet
Computer Graphics Viva Lab
4 pages
Key Mastering Mathematics 3A
No ratings yet
Key Mastering Mathematics 3A
15 pages
Asme-B30.22 - 02
No ratings yet
Asme-B30.22 - 02
41 pages
Listening Practice - Steve Jobs' Speech - Mardiana Budiarti
No ratings yet
Listening Practice - Steve Jobs' Speech - Mardiana Budiarti
2 pages
Sec Registration Requirements
No ratings yet
Sec Registration Requirements
8 pages
Everest Simulation Instructions s2,2017
No ratings yet
Everest Simulation Instructions s2,2017
14 pages
Selecting Patient Escorts
No ratings yet
Selecting Patient Escorts
2 pages
Jescspsu 06
No ratings yet
Jescspsu 06
10 pages
Cambridge Assessment International Education
No ratings yet
Cambridge Assessment International Education
7 pages
The Influence of Tyre Cavity On Road Noise
No ratings yet
The Influence of Tyre Cavity On Road Noise
7 pages
DX Diag
No ratings yet
DX Diag
53 pages
GDG656 Nadg7a
No ratings yet
GDG656 Nadg7a
4 pages
The Influence of Granular Vortex Motion On The Volumetric Performance
No ratings yet
The Influence of Granular Vortex Motion On The Volumetric Performance
12 pages
Scheme Reading For Top Class Term 3
No ratings yet
Scheme Reading For Top Class Term 3
7 pages
8814.microsoft IT Academy - Certification Roadmap
No ratings yet
8814.microsoft IT Academy - Certification Roadmap
1 page
CVP Analysis + Absorption & Variable Costing
No ratings yet
CVP Analysis + Absorption & Variable Costing
1 page
Cata Macrom 03-2
No ratings yet
Cata Macrom 03-2
16 pages
CR 30-X - Service Manual For Download PDF
100% (1)
CR 30-X - Service Manual For Download PDF
487 pages

Anatomy of Map Reduce Job Run

Uploaded by

Anatomy of Map Reduce Job Run

Uploaded by

MapReduce Daemons

 The submit() method on Job creates an internal JobSubmitter instance and

 When the resource manager receives a call to its submitApplication()

You might also like