0% found this document useful (0 votes)

42 views

Traditional Way Vs Map Reduce Way and Steps in Mapreduce (Word Count) - 1

it is useful for all

Uploaded by

jagadiish21

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views

Traditional Way Vs Map Reduce Way and Steps in Mapreduce (Word Count) - 1

it is useful for all

Uploaded by

jagadiish21

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Big Data Analytics Unit II

MapReduce Tutorial: Traditional Way

Let us look at the challenges associated with this traditional approach:

1. Critical path problem: It is the amount of time taken to finish the job without delaying
the next milestone or actual completion date. So, if, any of the machines delays the job,
the whole work gets delayed.
2. Reliability problem: What if, any of the machines which is working with a part of data
fails? The management of this failover becomes a challenge.
3. Equal split issue: How will I divide the data into smaller chunks so that each machine
gets even part of data to work with. In other words, how to equally divide the data such
that no individual machine is overloaded or under utilized.
4. Single split may fail: If any of the machine fails to provide the output, I will not be
able to calculate the result. So, there should be a mechanism to ensure this fault
tolerance capability of the system.
5. Aggregation of result: There should be a mechanism to aggregate the result generated
by each of the machines to produce the final output.

To overcome these issues, we have the MapReduce framework which allows us to perform
such parallel computations without bothering about the issues like reliability, fault tolerance
etc. Therefore, MapReduce gives you the flexibility to write code logic without caring about
the design issues of the system.

What is MapReduce?

MapReduce is a programming framework that allows us to perform distributed and parallel

processing on large data sets in a distributed environment.
Big Data Analytics Unit II

• MapReduce consists of two distinct tasks – Map and Reduce.

• As the name MapReduce suggests, reducer phase takes place after mapper phase has
been completed.
• So, the first is the map job, where a block of data is read and processed to produce key-
value pairs as intermediate outputs.
• The output of a Mapper or map job (key-value pairs) is input to the Reducer.
• The reducer receives the key-value pair from multiple map jobs.
• Then, the reducer aggregates those intermediate data tuples (intermediate key-value
pair) into a smaller set of tuples or key-value pairs which is the final output.

MapReduce Tutorial: A Word Count Example of MapReduce

Let us understand, how a MapReduce works by taking an example where I have a text file
called example.txt whose contents are as follows:

Dear, Bear, River, Car, Car, River, Deer, Car and Bear

Now, suppose, we have to perform a word count on the sample.txt using MapReduce. So, we
will be finding the unique words and the number of occurrences of those unique words.

• First, we divide the input in three splits as shown in the figure. This will distribute the
work among all the map nodes.
• Then, we tokenize the words in each of the mapper and give a hardcoded value (1) to
each of the tokens or words. The rationale behind giving a hardcoded value equal to 1
is that every word, in itself, will occur once.
• Now, a list of key-value pair will be created where the key is nothing but the individual
words and value is one. So, for the first line (Dear Bear River) we have 3 key-value
pairs – Dear, 1; Bear, 1; River, 1. The mapping process remains the same on all the
nodes.
• After mapper phase, a partition process takes place where sorting and shuffling happens
so that all the tuples with the same key are sent to the corresponding reducer.
• So, after the sorting and shuffling phase, each reducer will have a unique key and a list
of values corresponding to that very key. For example, Bear, [1,1]; Car, [1,1,1].., etc.
• Now, each Reducer counts the values which are present in that list of values. As shown
in the figure, reducer gets a list of values which is [1,1] for the key Bear. Then, it counts
the number of ones in the very list and gives the final output as – Bear, 2.
Big Data Analytics Unit II

• Finally, all the output key/value pairs are then collected and written in the output file.

MapReduce Tutorial: Advantages of MapReduce

The two biggest advantages of MapReduce are:

1. Parallel Processing:

In MapReduce, we are dividing the job among multiple nodes and each node works with a part
of the job simultaneously. So, MapReduce is based on Divide and Conquer paradigm which
helps us to process the data using different machines. As the data is processed by multiple
machine instead of a single machine in parallel, the time taken to process the data gets reduced
by a tremendous amount as shown in the figure below (2).

Fig.: Traditional Way Vs. MapReduce Way – MapReduce Tutorial

2. Data Locality:

Instead of moving data to the processing unit, we are moving processing unit to the data in the
MapReduce Framework. In the traditional system, we used to bring data to the processing unit
and process it. But, as the data grew and became very huge, bringing this huge amount of data
to the processing unit posed following issues:

• Moving huge data to processing is costly and deteriorates the network performance.
• Processing takes time as the data is processed by a single unit which becomes the
bottleneck.
• Master node can get over-burdened and may fail.

Now, MapReduce allows us to overcome above issues by bringing the processing unit to the
data. So, as you can see in the above image that the data is distributed among multiple nodes
Big Data Analytics Unit II

where each node processes the part of the data residing on it. This allows us to have the
following advantages:

• It is very cost effective to move processing unit to the data.

• The processing time is reduced as all the nodes are working with their part of the data
in parallel.
• Every node gets a part of the data to process and therefore, there is no chance of a node
getting overburdened.

ECS Troubleshooting Guide v1.11
100% (1)
ECS Troubleshooting Guide v1.11
53 pages
Guidelines For Hrmis PDF
No ratings yet
Guidelines For Hrmis PDF
1 page
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
32 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
32 pages
2 MapReduce continue
No ratings yet
2 MapReduce continue
12 pages
Map reduce
No ratings yet
Map reduce
35 pages
Map Reduce
No ratings yet
Map Reduce
18 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Data Science
No ratings yet
Data Science
7 pages
BIG DATA
No ratings yet
BIG DATA
120 pages
Da Unit 5 Data Analytics
No ratings yet
Da Unit 5 Data Analytics
43 pages
Map Reduce Workflow Colloquim
No ratings yet
Map Reduce Workflow Colloquim
30 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
Unit V Big Data Analytics
No ratings yet
Unit V Big Data Analytics
47 pages
Map Reduce
No ratings yet
Map Reduce
3 pages
Map Reduce
No ratings yet
Map Reduce
3 pages
Unit4 Fos
No ratings yet
Unit4 Fos
7 pages
Ecs765p W2
No ratings yet
Ecs765p W2
55 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
43 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Unit 5 Big Data
No ratings yet
Unit 5 Big Data
48 pages
Fundamentals of MapReduce With Example
No ratings yet
Fundamentals of MapReduce With Example
2 pages
Map Reduce
No ratings yet
Map Reduce
42 pages
Chapter 4
No ratings yet
Chapter 4
53 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages
Map Reduce
No ratings yet
Map Reduce
39 pages
Introduction to batch processing
No ratings yet
Introduction to batch processing
23 pages
Chapter4 - MapReduce
No ratings yet
Chapter4 - MapReduce
29 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
3.Map-Reduce Framework - 1
No ratings yet
3.Map-Reduce Framework - 1
47 pages
Chapter 3 - 大数据管理
No ratings yet
Chapter 3 - 大数据管理
38 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
55 pages
Mapreduce Model Principles
No ratings yet
Mapreduce Model Principles
65 pages
Map Reduce Tutorial-1
No ratings yet
Map Reduce Tutorial-1
7 pages
Unit 3 Map Reduce
No ratings yet
Unit 3 Map Reduce
3 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
No ratings yet
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
36 pages
Untitled
No ratings yet
Untitled
16 pages
Dean 08 Map Reduce
No ratings yet
Dean 08 Map Reduce
7 pages
Unit 4 CS 3RD Yr
No ratings yet
Unit 4 CS 3RD Yr
13 pages
Mapreduce article review
No ratings yet
Mapreduce article review
8 pages
The Mapreduce Paradigm: Michael Kleber
No ratings yet
The Mapreduce Paradigm: Michael Kleber
13 pages
Map Reduce Intro CS4961-L22
No ratings yet
Map Reduce Intro CS4961-L22
20 pages
Analysis of Mapreduce Algorithms: Harini Padmanaban
No ratings yet
Analysis of Mapreduce Algorithms: Harini Padmanaban
6 pages
Towards Efficient Mapreduce Using Mpi
No ratings yet
Towards Efficient Mapreduce Using Mpi
10 pages
777 1651400043 BD Module 4
No ratings yet
777 1651400043 BD Module 4
21 pages
Unit 5 Lecture 5
No ratings yet
Unit 5 Lecture 5
21 pages
Map Reduce: Simplified Processing On Large Clusters
No ratings yet
Map Reduce: Simplified Processing On Large Clusters
29 pages
Research Paper - Map Reduce - CSC3323
No ratings yet
Research Paper - Map Reduce - CSC3323
16 pages
Distributed and Cloud Computing
No ratings yet
Distributed and Cloud Computing
58 pages
Map Reduce
No ratings yet
Map Reduce
7 pages
Why MapReduce
No ratings yet
Why MapReduce
8 pages
Hadoop - Mapreduce (1)
No ratings yet
Hadoop - Mapreduce (1)
5 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
43 pages
The Map Reduce Programming
No ratings yet
The Map Reduce Programming
15 pages
Big Data Computing
No ratings yet
Big Data Computing
36 pages
Tableau 8.2 Training Manual: From Clutter to Clarity
From Everand
Tableau 8.2 Training Manual: From Clutter to Clarity
Larry Keller
No ratings yet
Conecte-Se e Proteja Com Meraki
No ratings yet
Conecte-Se e Proteja Com Meraki
50 pages
BUPA EVENTS Collaborative Address Validation
No ratings yet
BUPA EVENTS Collaborative Address Validation
70 pages
Assignment 5 Solution CLOUD COMPUTING 2024
No ratings yet
Assignment 5 Solution CLOUD COMPUTING 2024
4 pages
Guide Lacls MNT Assistant Ug 21.1.1
No ratings yet
Guide Lacls MNT Assistant Ug 21.1.1
73 pages
Netezza System Admin Guide
0% (1)
Netezza System Admin Guide
514 pages
Com Viva
No ratings yet
Com Viva
72 pages
How To Build A Cloud 720787 NDX
No ratings yet
How To Build A Cloud 720787 NDX
12 pages
SAP CFIN 1809 - Introduction
100% (1)
SAP CFIN 1809 - Introduction
17 pages
Kubernetes Helm Chart Scripts with Summaries
No ratings yet
Kubernetes Helm Chart Scripts with Summaries
15 pages
03 Solution Blueprint Review
No ratings yet
03 Solution Blueprint Review
26 pages
Increase Size of Log Segment Sybase
No ratings yet
Increase Size of Log Segment Sybase
5 pages
BK Ambari Installation
No ratings yet
BK Ambari Installation
72 pages
Powershell To Csharp Sample
No ratings yet
Powershell To Csharp Sample
76 pages
Cisco SD-WAN POC Process BRKRST-2559
No ratings yet
Cisco SD-WAN POC Process BRKRST-2559
84 pages
Open Elective IV - BI - 22.112018
No ratings yet
Open Elective IV - BI - 22.112018
2 pages
Onlineshopping System
No ratings yet
Onlineshopping System
11 pages
Class 12 Practical File Informatics Practices
No ratings yet
Class 12 Practical File Informatics Practices
22 pages
Registration For Rtcamp Internship Cum PPO Recruitment Drive For 2024 Batch
No ratings yet
Registration For Rtcamp Internship Cum PPO Recruitment Drive For 2024 Batch
3 pages
Chapter 4
No ratings yet
Chapter 4
4 pages
DigForSaveTheAnimals Lab Overview
No ratings yet
DigForSaveTheAnimals Lab Overview
3 pages
CB1PRAFJ Lesson5 DBTools
No ratings yet
CB1PRAFJ Lesson5 DBTools
50 pages
Route Redistribution Between OSPF and RIP
No ratings yet
Route Redistribution Between OSPF and RIP
6 pages
Diff Between Delete, Drop and Truncate SQL
No ratings yet
Diff Between Delete, Drop and Truncate SQL
6 pages
System Analysis and Design Uber
No ratings yet
System Analysis and Design Uber
16 pages
WD Syllabus
No ratings yet
WD Syllabus
2 pages
Howto Configure Admin Console OIDC AzureAD
No ratings yet
Howto Configure Admin Console OIDC AzureAD
17 pages
Relational Model and Data Normalization
No ratings yet
Relational Model and Data Normalization
48 pages
Resume Template
No ratings yet
Resume Template
2 pages

Traditional Way Vs Map Reduce Way and Steps in Mapreduce (Word Count) - 1

Uploaded by

Traditional Way Vs Map Reduce Way and Steps in Mapreduce (Word Count) - 1

Uploaded by

Big Data Analytics Unit II

MapReduce Tutorial: Traditional Way

Let us look at the challenges associated with this traditional approach:

MapReduce is a programming framework that allows us to perform distributed and parallel

• MapReduce consists of two distinct tasks – Map and Reduce.

MapReduce Tutorial: A Word Count Example of MapReduce

MapReduce Tutorial: Advantages of MapReduce

The two biggest advantages of MapReduce are:

Fig.: Traditional Way Vs. MapReduce Way – MapReduce Tutorial

• It is very cost effective to move processing unit to the data.

You might also like