0% found this document useful (0 votes)

20 views31 pages

Chap8 YARN

The document discusses the architecture and features of YARN (Yet Another Resource Negotiator) as part of Hadoop 2.x, emphasizing its role in resource management and application execution. It highlights YARN's scalability, multi-tenancy, compatibility, and serviceability, which improve upon the limitations of the earlier Hadoop v1 architecture. Additionally, it explains how YARN allows multiple applications to run concurrently on the same cluster, enhancing overall resource utilization.

Uploaded by

rjaoued75

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views31 pages

Chap8 YARN

Uploaded by

rjaoued75

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

V11.

2
Unit 5. MapReduce and YARN

Uempty

YARN architecture

MapReduce and YARN © Copyright IBM Corporation 2021

Figure 5-33. YARN architecture

© Copyright IBM Corp. 2016, 2021 5-41

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

Topics
• Introduction to MapReduce
• Hadoop v1 and MapReduce v1 architecture and limitations
• YARN architecture
• Hadoop and MapReduce v1 compared to v2

MapReduce and YARN © Copyright IBM Corporation 2021

Figure 5-34. Topics

© Copyright IBM Corp. 2016, 2021 5-42

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

YARN
• Acronym for Yet Another Resource Negotiator.
• New resource manager is included in Hadoop 2.x and later.
• De-couples the Hadoop workload and resource management.
• Introduces a general-purpose application container.
• Hadoop 2.2.0 includes the first generally available (GA) version of
YARN.
• Most Hadoop vendors support YARN.

MapReduce and YARN © Copyright IBM Corporation 2021

Figure 5-35. YARN

YARN is a key component of the Hortonworks Data Platform (HDP).

© Copyright IBM Corp. 2016, 2021 5-43

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

YARN high-level architecture

In Hortonworks Data Platform (HDP), users can use YARN and
applications that are written to YARN APIs.

Existing MapReduce Applications

Apache
MapReduce v2 Tez HBase Others
(batch) (interactive) (online) Spark (varied)
(in memory)

YARN
(cluster resource management)

HDFS

MapReduce and YARN © Copyright IBM Corporation 2021

Figure 5-36. YARN high-level architecture

© Copyright IBM Corp. 2016, 2021 5-44

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

Running an application in YARN (1 of 7)

NodeManager @node133

NodeManager @node134

NodeManager @node135
Resource
Manager
@node132

NodeManager @node136

MapReduce and YARN © Copyright IBM Corporation 2021

Figure 5-37. Running an application in YARN (1 of 7)

© Copyright IBM Corp. 2016, 2021 5-45

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

Running an application in YARN (2 of 7)

NodeManager @node133

Application 1:
Analyze lineitem table.
NodeManager @node134

Launch
NodeManager @node135
Resource
Manager Application
@node132 Master 1

NodeManager @node136

MapReduce and YARN © Copyright IBM Corporation 2021

Figure 5-38. Running an application in YARN (2 of 7)

© Copyright IBM Corp. 2016, 2021 5-46

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

Running an application in YARN (3 of 7)

NodeManager @node133

Application 1:
Analyze lineitem table.
NodeManager @node134

NodeManager @node135
Resource Resource request
Manager Application
@node132 Master 1
Container IDs

NodeManager @bigaperf136

MapReduce and YARN © Copyright IBM Corporation 2021

Figure 5-39. Running an application in YARN (3 of 7)

© Copyright IBM Corp. 2016, 2021 5-47

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

Running an application in YARN (4 of 7)

NodeManager @node133

Application 1:
Analyze lineitem table.
NodeManager @node134

App 1 App 1

Launch

NodeManager @node135
Resource
Manager Application
@node132 Master 1

Launch

NodeManager @node136

App 1

MapReduce and YARN © Copyright IBM Corporation 2021

Figure 5-40. Running an application in YARN (4 of 7)

© Copyright IBM Corp. 2016, 2021 5-48

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

Running an application in YARN (5 of 7)

NodeManager @node133

Application 1:
Analyze
lineitem
table. NodeManager @node134

App 1 App 1

Application 2:
Analyze customer table.

NodeManager @node135
Resource
Manager Application
@node132 Master 1

NodeManager @node136
Application
Master 2 App 1

MapReduce and YARN © Copyright IBM Corporation 2021

Figure 5-41. Running an application in YARN (5 of 7)

© Copyright IBM Corp. 2016, 2021 5-49

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

Running an application in YARN (6 of 7)

NodeManager @nodef133

Application 1:
Analyze
lineitem
table.
NodeManager @node134

App 1 App 1
Application 2:
Analyze customer table.

NodeManager @node135
Resource
Manager Application
@node132 Master 1

NodeManager @node136
Application
Master 2 App 1

MapReduce and YARN © Copyright IBM Corporation 2021

Figure 5-42. Running an application in YARN (6 of 7)

© Copyright IBM Corp. 2016, 2021 5-50

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

Running an application in YARN (7 of 7)

NodeManager @node133

App 2

Application 1:
Analyze
lineitem
table.
NodeManager @node134

App 1 App 1
Application 2:
Analyze customer table.

NodeManager @node135
Resource
Manager Application
AppApp
2 2
@node132 Master 1

NodeManager @nodef136
Application
Master 2 App 1

MapReduce and YARN © Copyright IBM Corporation 2021

Figure 5-43. Running an application in YARN (7 of 7)

© Copyright IBM Corp. 2016, 2021 5-51

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

How YARN runs an application

Application Resource
client manager
1: Submit
YARN
Client node Resource manager node
application.

2a: Start container.

NodeManager
3: Allocate resources (heartbeat).

2b: Launch.

Container

Application NodeManager
process 4a: Start
container.
4b: Launch.
Node manager node
Container

Application
process

Node manager node

MapReduce and YARN © Copyright IBM Corporation 2021

Figure 5-44. How YARN runs an application

To run an application on YARN, a client contacts the resource manager and prompts it to run an
application master process (step 1). The resource manager then finds a node manager that can
launch the application master in a container (steps 2a and 2b). Precisely what the application
master does after it is running depends on the application. It might simply run a computation in the
container it is running in and return the result to the client, or it might request more containers from
the resource managers (step 3) and use them to run a distributed computation (steps 4a and 4b).
For more information, see White, T. (2015) Hadoop: The definitive guide (4th ed.). Sabastopol, CA:
O'Reilly Media, p. 80.

© Copyright IBM Corp. 2016, 2021 5-52

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

YARN features
• Scalability
• Multi-tenancy
• Compatibility
• Serviceability
• Higher cluster utilization
• Reliability and availability

MapReduce and YARN © Copyright IBM Corporation 2021

Figure 5-45. YARN features

© Copyright IBM Corp. 2016, 2021 5-53

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

YARN features: Scalability

• There is one Application Master per job, which is why YARN scales
better than the previous Hadoop v1 architecture. The Application Master
for a job can run on an arbitrary cluster node, and it runs until the job
reaches termination.
• The separation of functions enables the individual operations to be
improved with less effect on other operations.
• YARN supports rolling upgrades without downtime.

ResourceManager focuses exclusively on scheduling, enabling clusters

to expand to thousands of nodes managing petabytes of data.

MapReduce and YARN © Copyright IBM Corporation 2021

Figure 5-46. YARN features: Scalability

YARN lifts the scalability ceiling in Hadoop by splitting the roles of the Hadoop JobTracker into two
processes: A ResourceManager controls access to the cluster’s resources (memory, CPU, and
other components), and the ApplicationManager (one per job) controls task execution.
YARN can run on larger clusters than MapReduce v1. MapReduce v1 reaches scalability
bottlenecks in the region of 4,000 nodes and 40,000 tasks, which stems from the fact that the
JobTracker must manage both jobs and tasks. YARN overcomes these limitations by using its split
ResourceManager / ApplicationMaster architecture: It is designed to scale up to 10,000 nodes and
100,000 tasks.
In contrast to the JobTracker, each instance of an application has a dedicated ApplicationMaster,
which runs for the duration of the application. This model is closer to the original Google
MapReduce paper, which describes how a master process is started to coordinate Map and
Reduce tasks running on a set of workers.

© Copyright IBM Corp. 2016, 2021 5-54

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

YARN features: Multi-tenancy

• YARN allows multiple access engines (either open source or
proprietary) to use Hadoop as the common standard for batch,
interactive, and real-time engines that can simultaneously access the
same data sets.
• YARN uses a shared pool of nodes for all jobs.
• YARN allows the allocation of Hadoop clusters of fixed size from the
shared pool.

Multi-tenant data processing improves an

enterprise's return on its Hadoop investment.

MapReduce and YARN © Copyright IBM Corporation 2021

Figure 5-47. YARN features: Multi-tenancy

Multi-tenancy generally refers to a set of features that enable multiple business users and
processes to share a common set of resources, such as an Apache Hadoop cluster that uses a
policy rather than physical separation, without negatively impacting service-level agreements
(SLA), violating security requirements, or even revealing the existence of each party.
What YARN does is de-couple Hadoop workload management from resource management, which
means that multiple applications can share a common infrastructure pool. Although this idea is not
new, it is new to Hadoop. Earlier versions of Hadoop consolidated both workload and resource
management functions into a single JobTracker. This approach resulted in limitations for customers
hoping to run multiple applications on the same cluster infrastructure.
To borrow from object-oriented programming terminology, multi-tenancy is an overloaded term. It
means different things to different people depending on their orientation and context. To say that a
solution is multi-tenant is not helpful unless you are specific about the meaning.

© Copyright IBM Corp. 2016, 2021 5-55

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty
Some interpretations of multi-tenancy in big data environments are:
• Support for multiple concurrent Hadoop jobs
• Support for multiple lines of business on a shared infrastructure
• Support for multiple application workloads of different types (Hadoop and non-Hadoop)
• Provisions for security isolation between tenants
• Contract-oriented service level guarantees for tenants
• Support for multiple versions of applications and application frameworks concurrently
Organizations that are sophisticated in their view of multi-tenancy need all these capabilities and
more. YARN promises to address some of these requirements and does so in large measure.
However, you will find in future releases of Hadoop that there are other approaches that are being
addressed to provide other forms of multi-tenancy.
Although it is an important technology, the world is not suffering from a shortage of resource
managers. Some Hadoop providers are supporting YARN, and others are supporting Apache
Mesos.

© Copyright IBM Corp. 2016, 2021 5-56

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

YARN features: Compatibility

• To the user (a developer, not an administrator), the changes are almost
invisible.
• It is possible to run unmodified MapReduce jobs by using the same
MapReduce API and CLI, although you might need to recompile.

There is no reason not to migrate from MRv1 to YARN.

MapReduce and YARN © Copyright IBM Corporation 2021

Figure 5-48. YARN features: Compatibility

To ease the transition from Hadoop v1 to YARN, a major goal of YARN and the MapReduce
framework implementation on top of YARN was to ensure that existing MapReduce applications
that were programmed and compiled against previous MapReduce APIs (MRv1 applications) can
continue to run with little or no modification on YARN (MRv2 applications).
For many users who use the [Link] APIs, MapReduce on YARN ensures full
binary compatibility. These existing applications can run on YARN directly without recompilation.
You can use JAR files from your existing application that code against mapred APIs and use
bin/hadoop to submit them directly to YARN.
Unfortunately, it was difficult to ensure full binary compatibility with the existing applications that
compiled against MRv1 [Link] APIs. These APIs have gone through
many changes. For example, several classes stopped being abstract classes and changed to
interfaces. Therefore, the YARN community compromised by supporting source compatibility only
for [Link] APIs. Existing applications that use MapReduce APIs are
source-compatible and can run on YARN either with no changes, with simple recompilation against
MRv2 .jar files that are included with Hadoop 2, or with minor updates.

© Copyright IBM Corp. 2016, 2021 5-57

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

YARN features: Higher cluster utilization

• Higher cluster utilization is where resources that are not used by one
framework can be consumed by another one.
• The NodeManager is a more generic and efficient version of the
TaskTracker:
Instead of having a fixed number of Map and Reduce slots, the
NodeManager has several dynamically created resource containers.
The size of a container depends upon the amount of resources that are
assigned to it, such as memory, CPU, disk, and network I/O.

The YARN dynamic allocation of cluster resources improves utilization

over the more static MapReduce rules that are used in early versions
of Hadoop (v1).

MapReduce and YARN © Copyright IBM Corporation 2021

Figure 5-49. YARN features: Higher cluster utilization

The NodeManager is a more generic and efficient version of the TaskTracker. Instead of having a
fixed number of Map and Reduce slots, the NodeManager has several dynamically created
resource containers. The size of a container depends upon the amount of resources it contains,
such as memory, CPU, disk, and network I/O.
Currently, only memory and CPU are supported (YARN-3); cgroups might be used to control disk
and network I/O in the future.
The number of containers on a node is a product of configuration parameters and the total amount
of node resources (such as total CPUs and total memory) outside the resources that are dedicated
to the secondary daemons and the OS.

© Copyright IBM Corp. 2016, 2021 5-58

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

YARN features: Reliability and availability

• High availability (HA) for the ResourceManager:
An application recovery is performed after the restart of ResourceManager.
The ResourceManager stores information about running applications and
completed tasks in HDFS.
If the ResourceManager is restarted, it re-creates the state of applications
and reruns only incomplete tasks.
• Has a HA NameNode, making the Hadoop cluster much more efficient,
powerful, and reliable.

HA is work in progress and is close to completion. Its features have

been actively tested by the community

MapReduce and YARN © Copyright IBM Corporation 2021

Figure 5-50. YARN features: Reliability and availability

© Copyright IBM Corp. 2016, 2021 5-59

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

YARN major features summarized

• Multi-tenancy:
YARN allows multiple access engines (either open source or proprietary) to use
Hadoop as the common standard for batch, interactive, and real-time engines
that can simultaneously access the same data sets.
Multi-tenant data processing improves an enterprise's return on its Hadoop
investments.
• Cluster utilization
The YARN dynamic allocation of cluster resources feature improves utilization
over more static MapReduce rules that are used in early versions of Hadoop.
• Scalability
Data center processing power continues to rapidly expand. YARN
ResourceManager focuses exclusively on scheduling and keeps pace as clusters
expand to thousands of nodes managing petabytes of data.
• Compatibility
Existing MapReduce applications that are developed for Hadoop 1 can run YARN
without any disruption to existing processes that already work.

MapReduce and YARN © Copyright IBM Corporation 2021

Figure 5-51. YARN major features summarized

© Copyright IBM Corp. 2016, 2021 5-60

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

Apache Spark with Hadoop 2+

• Apache Spark is an alternative in-memory framework to MapReduce.
• Supports general workloads and streaming, interactive queries, and
machine learning to provide performance gains.
• Apache Spark SQL provides APIs that allow SQL queries to be
embedded in Java, Scala, or Python programs in Apache Spark.
• MLlib: An Apache Spark optimized library that supports machine
learning functions.
• GraphX: API for graphs and parallel computation.
• Apache Spark streaming: Writes applications to process streaming data
in Java, Scala, or Python.

Figure 5-52. Apache Spark with Hadoop 2+

Apache Spark is a new, alternative in-memory framework to MapReduce.

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty
5.4. Hadoop and MapReduce v1 compared to
v2

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

Hadoop and MapReduce v1

compared to v2

Figure 5-53. Hadoop and MapReduce v1 compared to v2

The original Hadoop (v1) and MapReduce (v1) had limitations, and several issues surfaced over
time. We review these issues in preparation for looking at the differences and changes that were
introduced with Hadoop v2 and MapReduce v2.

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

Topics
• Introduction to MapReduce
• Hadoop v1 and MapReduce v1 architecture and limitations
• YARN architecture
• Hadoop and MapReduce v1 compared to v2

Figure 5-54. Topics

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

Hadoop v1 to Hadoop v2

Single-use system Multi-purpose platform

Batch apps usually Batch, Interactive, Online, and
Streaming

Hadoop 1.0 Hadoop 2.0

MR2 Pig Hive Other … RT, HBase

Stream, +
Pig Hive Other … Graph Service
execution / data processing s
MapReduce
(Cluster resource YARN
management and data (Cluster resource management)
processing)
HDFS HDFS2
(redundant, reliable storage) (redundant, reliable storage)

Figure 5-55. Hadoop v1 to Hadoop v2

The most notable change from Hadoop v1 to Hadoop v2 is the separation of cluster and resource
management from the execution and data processing environment. This change allows for many
new application types to run on Hadoop, including MapReduce v2.
HDFS is common to both versions. MapReduce is the only execution engine in Hadoop v1. The
YARN framework provides work scheduling that is neutral to the nature of the work that is
performed. Hadoop v2 supports many execution engines, including a port of MapReduce that is
now a YARN application.

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

YARN modifies MRv1

MapReduce has been modified with YARN. The two major functions of
JobTracker (resource management and job scheduling and monitoring)
are split into separate daemons:
• ResourceManager (RM):
The global ResourceManager and the per-node worker, the NodeManager
(NM)) form the data-computation framework.
The ResourceManager is the ultimate authority that arbitrates resources
among all the applications in the system.
• ApplicationMaster (AM):
The per-application ApplicationMaster is, in effect, a framework-specific
library that is tasked for negotiating resources from the ResourceManager
and working with the NodeManagers to run and monitor the tasks
An application is either a single job in the classical sense of MapReduce jobs
or a directed acyclic graph (DAG) of jobs.

Figure 5-56. YARN modifies MRv1

The fundamental idea of YARN and MRv2 is to split the two major functions of the JobTracker,
resource management and job scheduling / monitoring, into separate daemons. The idea is to have
a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is
either a single job in the classical sense of MapReduce jobs or a DAG of jobs.
The ResourceManager and per-node worker, the NodeManager (NM), form the data-computation
framework. The ResourceManager is the ultimate authority that arbitrates resources among all the
applications in the system.
The per-application ApplicationMaster is, in effect, a framework-specific library that is tasked with
negotiating resources from the ResourceManager and working with the NodeManagers to run and
monitor the tasks.

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty
The ResourceManager has two main components: Scheduler and ApplicationsManager:
• The Scheduler is responsible for allocating resources to the various running applications. The
Scheduler is pure scheduler in the sense that it performs no monitoring or tracking of status for
the application. Also, it offers no guarantees about restarting failed tasks either due to
application failure or hardware failures. The Scheduler performs its scheduling function based
the resource requirements of the applications; it does so based on the abstract notion of a
resource Container, which incorporates elements such as memory, CPU, disk, network, and
other resources. In the first version, only memory is supported.
The Scheduler has a pluggable policy plug-in, which is responsible for partitioning the cluster
resources among the various queues, applications, and other items. The current MapReduce
schedulers, such as the CapacityScheduler and the FairScheduler, are some examples of the
plug-in.
The CapacityScheduler supports hierarchical queues to allow for more predictable sharing of
cluster resources.
• The ApplicationsManager is responsible for accepting job submissions and negotiating the first
container for running the application-specific ApplicationMaster. It provides the service for
restarting the ApplicationMaster container on failure.
The NodeManager is the per-machine framework agent that is responsible for containers,
monitoring their resource usage (CPU, memory, disk, and network), and reporting the same to
the ResourceManager / Scheduler.
The per-application ApplicationMaster has the task of negotiating appropriate resource
containers from the Scheduler, tracking their status, and monitoring for progress.
MRv2 maintains API compatibility with previous stable release (hadoop-1.x), which means that
all MapReduce jobs should still run unchanged on top of MRv2 with just a recompile.

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

Architecture of MRv1
Classic version of MapReduce (MRv1)

TaskTracker

Map task Reduce task

Client

Client JobTracker TaskTracker

Reduce task
Client
• Runs Map and
Reduce tasks.

• Schedules a job that is TaskTracker • Reports to the

JobTracker.
submitted by clients.
• Tracks live TaskTrackers Map task Map task
and available Map and
Reduce slots.
• Monitors job and task
execution on the cluster.

Figure 5-57. Architecture of MRv1

In MapReduce v1, there is only one JobTracker that is responsible for allocation of resources, task
assignment to data nodes (as TaskTrackers), and ongoing monitoring ("heartbeat") as each job is
run (the TaskTrackers constantly report back to the JobTracker on the status of each running task).

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

YARN architecture
High-level architecture of YARN
ResourceManager (RM) NodeManager
NodeManager (NM)

• Tracks of live NodeManagers • Provides computational resources

and available resources. MR app Giraph Map in the form of containers.
• Allocates available resources to the master task task • Manager processes running
appropriate applications and tasks. in containers.
• Monitors application masters.

NodeManager ApplicationMaster (AM)

MR client
Resource • Coordinates the execution of all
manager Map Giraph tasks within its application.
Giraph client app master
task • Requests appropriate resource
containers to run tasks.

Client NodeManager Containers

• Can submit any type • Can run different types of tasks

of application that is Giraph Map Reduce (also Application Masters).
supported by YARN. task task task • Has different sizes, for example,
RAM and CPU.

Figure 5-58. YARN architecture

In the YARN architecture, a global ResourceManager runs as a master daemon, usually on a

dedicated machine, that arbitrates the available cluster resources among various competing
applications. The ResourceManager tracks how many live nodes and resources are available on
the cluster and coordinates what applications that are submitted by users should get these
resources and when.
The ResourceManager is the single process that has this information, so it can make its allocation
(or rather, scheduling) decisions in a shared, secure, and multi-tenant manner (for example,
according to an application priority, a queue capacity, ACLs, data locality, and other tasks).
When a user submits an application, an instance of a lightweight process that is called the
ApplicationMaster is started to coordinate the execution of all tasks within the application, which
includes monitoring tasks, restarting failed tasks, speculatively running slow tasks, and calculating
the total values of application counters. These responsibilities were previously assigned to the
single JobTracker for all jobs. The ApplicationMaster and tasks that belong to its application run in
resource containers that are controlled by the NodeManagers.

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty
The NodeManager is a more generic and efficient version of the TaskTracker. Instead of having a
fixed number of Map and Reduce slots, the NodeManager has several dynamically created
resource containers. The size of a container depends upon the amount of resources it contains,
such as memory, CPU, disk, and network I/O. Currently, only memory and CPU (YARN-3) are
supported. cgroups might be used to control disk and network I/O in the future. The number of
containers on a node is a product of configuration parameters and the total amount of node
resources (such as total CPU and total memory) outside the resources that are dedicated to the
secondary daemons and the OS.
The ApplicationMaster can run any type of task inside a container. For example, the MapReduce
ApplicationMaster requests a container to start a Map or a Reduce task, and the Giraph
ApplicationMaster requests a container to run a Giraph task. You can also implement a custom
ApplicationMaster that runs specific tasks and invent a new distributed application framework. I
encourage you to read about Apache Twill, which aims to make it easier to write distributed
applications sitting on top of YARN.
In YARN, MapReduce is simply degraded to the role of a distributed application (but still a useful
one) and is now called MRv2. MRv2 is simply the re implementation of the classical MapReduce
engine, now called MRv1, that runs on top of YARN.

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 5. MapReduce and YARN

Uempty

Terminology changes from MRv1 to YARN

YARN terminology MRv1 terminology

ResourceManager Cluster Manager

ApplicationMaster JobTracker
(but dedicated and short-lived)

NodeManager TaskTracker

Distributed Application One particular MapReduce job

Container Slot

Figure 5-59. Terminology changes from MRv1 to YARN

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7 Yarn
No ratings yet
7 Yarn
17 pages
6 Yarn
No ratings yet
6 Yarn
10 pages
Hadoop YARN vs MapReduce Architecture
No ratings yet
Hadoop YARN vs MapReduce Architecture
31 pages
Understanding YARN in Hadoop 2
No ratings yet
Understanding YARN in Hadoop 2
16 pages
Download
No ratings yet
Download
7 pages
Unit - 4 Yarn
No ratings yet
Unit - 4 Yarn
20 pages
YARN Yet Another Resource Negotiator
No ratings yet
YARN Yet Another Resource Negotiator
10 pages
Hadoop 2.0: YARN & HDFS for Experts
No ratings yet
Hadoop 2.0: YARN & HDFS for Experts
20 pages
YARN Tutorial: Architecture & Use Cases
No ratings yet
YARN Tutorial: Architecture & Use Cases
14 pages
Introduction To YARN
No ratings yet
Introduction To YARN
17 pages
Apache Hadoop Next Generation Compute Platform: Bikas Saha @bikassaha
No ratings yet
Apache Hadoop Next Generation Compute Platform: Bikas Saha @bikassaha
22 pages
05-MapReduce and Yarn
No ratings yet
05-MapReduce and Yarn
82 pages
Understanding YARN in Apache Hadoop
No ratings yet
Understanding YARN in Apache Hadoop
2 pages
Mod 5
No ratings yet
Mod 5
46 pages
Bda Unit 3 - Mam
No ratings yet
Bda Unit 3 - Mam
89 pages
Adoop Cosystem: S W S A, T L at 68
No ratings yet
Adoop Cosystem: S W S A, T L at 68
22 pages
Hadoop YARN Architecture
No ratings yet
Hadoop YARN Architecture
5 pages
Apache Hadoop YARN - Enabling Next Generation Data Applications
No ratings yet
Apache Hadoop YARN - Enabling Next Generation Data Applications
64 pages
4 PPT On YARN MapReduce 31 10 20
No ratings yet
4 PPT On YARN MapReduce 31 10 20
17 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Unit V Data Analytics Notes
No ratings yet
Unit V Data Analytics Notes
22 pages
Lecture 06
No ratings yet
Lecture 06
26 pages
YARN Essentials - Sample Chapter
No ratings yet
YARN Essentials - Sample Chapter
12 pages
Understanding Hadoop 2.0 and YARN
No ratings yet
Understanding Hadoop 2.0 and YARN
7 pages
Hadoop YARN Technology
No ratings yet
Hadoop YARN Technology
3 pages
Big Data Notes Unit-3
No ratings yet
Big Data Notes Unit-3
7 pages
Hadoop YARN: Resource Management Revolution: by Triveni Jayaram
No ratings yet
Hadoop YARN: Resource Management Revolution: by Triveni Jayaram
8 pages
Hadoop YARN: Resource Management Revolution: by Triveni Jayaram
No ratings yet
Hadoop YARN: Resource Management Revolution: by Triveni Jayaram
8 pages
Unit-4: Illustrate Mapreduce Architecture With Diagram
No ratings yet
Unit-4: Illustrate Mapreduce Architecture With Diagram
7 pages
Bigdata and Hadoop - Unit III
No ratings yet
Bigdata and Hadoop - Unit III
24 pages
BDMA Part 3
No ratings yet
BDMA Part 3
22 pages
2 - Yarn
No ratings yet
2 - Yarn
59 pages
Big Data Batch Analytics Lecture
100% (1)
Big Data Batch Analytics Lecture
36 pages
Scalable Big Data Architecture with Java
No ratings yet
Scalable Big Data Architecture with Java
31 pages
Big Data Analytics: Hadoop & MapReduce
No ratings yet
Big Data Analytics: Hadoop & MapReduce
7 pages
SABDE3G05 Big Data MapReduce Yarn
No ratings yet
SABDE3G05 Big Data MapReduce Yarn
69 pages
Bda Unit 3
No ratings yet
Bda Unit 3
50 pages
Lecture 8 - Batch Analysis Part 1
No ratings yet
Lecture 8 - Batch Analysis Part 1
29 pages
YARN Architecture and Resource Management
No ratings yet
YARN Architecture and Resource Management
11 pages
Hadoop MapReduce & YARN Overview
No ratings yet
Hadoop MapReduce & YARN Overview
26 pages
06 - YARN in Hadoop - An Introduction
No ratings yet
06 - YARN in Hadoop - An Introduction
41 pages
Introduction To Hadoop (1) - 1
No ratings yet
Introduction To Hadoop (1) - 1
39 pages
04 MapRed 6 JobExecutionOnYarn
No ratings yet
04 MapRed 6 JobExecutionOnYarn
20 pages
Apache Hadoop Yarn
No ratings yet
Apache Hadoop Yarn
2 pages
Hadoop Class 2 PDF
No ratings yet
Hadoop Class 2 PDF
18 pages
YARN: Advanced Cluster Management
No ratings yet
YARN: Advanced Cluster Management
34 pages
1.1.2 and 1.1.3
No ratings yet
1.1.2 and 1.1.3
21 pages
Big Data-Week 3 - 1
No ratings yet
Big Data-Week 3 - 1
22 pages
Hadoop Yarn
No ratings yet
Hadoop Yarn
13 pages
Unit4 4.2 DataAnalytics ApacheHadoop BatchDataAnalysis IoT
No ratings yet
Unit4 4.2 DataAnalytics ApacheHadoop BatchDataAnalysis IoT
29 pages
UNIT-4 BIG DATA (NoSql)
No ratings yet
UNIT-4 BIG DATA (NoSql)
38 pages
Lecture 11 Chapter 6 Part 2 Big Data Processing Concepts
No ratings yet
Lecture 11 Chapter 6 Part 2 Big Data Processing Concepts
14 pages
YARN: Hadoop's Resource Manager
0% (2)
YARN: Hadoop's Resource Manager
14 pages
Hadoop Yarn
No ratings yet
Hadoop Yarn
13 pages
Big Data Unit 3 Own
No ratings yet
Big Data Unit 3 Own
20 pages
Bigdata Lecture 4
No ratings yet
Bigdata Lecture 4
23 pages
YARN: Advanced Hadoop Resource Management
No ratings yet
YARN: Advanced Hadoop Resource Management
12 pages
The Electrical Engineering Project-II: Gujarat Technological University Ahmedabad
No ratings yet
The Electrical Engineering Project-II: Gujarat Technological University Ahmedabad
34 pages
Internship Report On " Thermal Power Station Ukai
No ratings yet
Internship Report On " Thermal Power Station Ukai
34 pages
Delia Bucur CV
No ratings yet
Delia Bucur CV
4 pages
سلسلة شوم 3000 مسألة محلولة ـ مكتبة الفريد الإلكترونية
No ratings yet
سلسلة شوم 3000 مسألة محلولة ـ مكتبة الفريد الإلكترونية
208 pages
10h STD - ICSE - Maths Mid Term Test
No ratings yet
10h STD - ICSE - Maths Mid Term Test
7 pages
R-2R Ladder DAC Example Problem
No ratings yet
R-2R Ladder DAC Example Problem
5 pages
ISO 23999 - Determination of Dimensional Stability and Curling After Exposure To Heat
No ratings yet
ISO 23999 - Determination of Dimensional Stability and Curling After Exposure To Heat
16 pages
100 Digital Products Ideas PDF
No ratings yet
100 Digital Products Ideas PDF
10 pages
Background Study On Cloud Computing A Literature Review
No ratings yet
Background Study On Cloud Computing A Literature Review
5 pages
A320 Operational Limitations Guide
100% (1)
A320 Operational Limitations Guide
9 pages
Acl 9 Tutorial 1 PDF Free
No ratings yet
Acl 9 Tutorial 1 PDF Free
7 pages
Short and Long Term Goals Jaswanth
No ratings yet
Short and Long Term Goals Jaswanth
3 pages
Big Data Practicals
No ratings yet
Big Data Practicals
29 pages
Aqc Boiler Start Up 2
No ratings yet
Aqc Boiler Start Up 2
8 pages
DS 20221130 SG36 40 50CX-P2 Datasheet V13 EN
No ratings yet
DS 20221130 SG36 40 50CX-P2 Datasheet V13 EN
2 pages
Long Wall & Short Wall Method PDF
No ratings yet
Long Wall & Short Wall Method PDF
5 pages
Question Bank
No ratings yet
Question Bank
6 pages
Procurement Requisition for School Construction
No ratings yet
Procurement Requisition for School Construction
26 pages
Custom Hierarchy Extraction in SAP BW 7.3 (Part1) - SAP Blogs
No ratings yet
Custom Hierarchy Extraction in SAP BW 7.3 (Part1) - SAP Blogs
6 pages
Technology's Grip in "Nosedive"
No ratings yet
Technology's Grip in "Nosedive"
4 pages
N-BaIoTNetwork-Based Detection of IoT Botnet Attacks Using Deep Autoencoders
No ratings yet
N-BaIoTNetwork-Based Detection of IoT Botnet Attacks Using Deep Autoencoders
11 pages
RTU C50 B
No ratings yet
RTU C50 B
4 pages
Characters Based On Age Etc
No ratings yet
Characters Based On Age Etc
5 pages
Comparison of Newton Raphson and Gauss Seidel Methods For Power Flow Analysis
No ratings yet
Comparison of Newton Raphson and Gauss Seidel Methods For Power Flow Analysis
8 pages
Confirmation 487413
No ratings yet
Confirmation 487413
2 pages
JATCO JF506E Transmission Issue
No ratings yet
JATCO JF506E Transmission Issue
36 pages
TMS Cheat Sheet
No ratings yet
TMS Cheat Sheet
4 pages
Saudi Aramco Well Control Assessment
No ratings yet
Saudi Aramco Well Control Assessment
7 pages
DLP - Scope and Delimitation
No ratings yet
DLP - Scope and Delimitation
5 pages
Wheelock Strobe
No ratings yet
Wheelock Strobe
4 pages