0% found this document useful (0 votes)

158 views8 pages

Map Reduce and Its Phases With Numerical Example. - GeeksforGeeks

The document explains the MapReduce framework, which allows for the parallel processing of large data sets across clusters of hardware. It details the three main phases of MapReduce: Mapping, Shuffling and Sorting, and Reducing, along with an optional Combining phase to optimize performance. A numerical example using MovieLens data illustrates the mapping, shuffling, and reducing processes, accompanied by sample Python code for implementing the mapper and reducer.

Uploaded by

ramyatech25

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

158 views8 pages

Map Reduce and Its Phases With Numerical Example. - GeeksforGeeks

Uploaded by

ramyatech25

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Search...

DSA Practice Problems Python C C++ Java Courses Machine Learning DevOps Web D

Map Reduce and its Phases with numerical

example.
Last Updated : 18 May, 2023

Map Reduce :-
It is a framework in which we can write applications to run huge amount
of data in parallel and in large cluster of commodity hardware in a
reliable manner.
Different Phases of MapReduce:-
MapReduce model has three major and one optional phase.

Mapping
Shuffling and Sorting
Reducing
Combining

Mapping :- It is the first phase of MapReduce programming. Mapping

Phase accepts key-value pairs as input as (k, v), where the key
represents the Key address of each record and the value represents the
entire record content.T he output of the Mapping phase will also be in
the key-value format (k’, v’).

Shuffling and Sorting :- The output of various mapping parts (k’, v’),
then goes into Shuffling and Sorting phase.All the same values are
deleted, and different values are grouped together based on same keys.
The output of the Shuffling and Sorting phase will be key-value pairs
again as key and array of values (k, v[ ]).

Reducer :- The output of the Shuffling and Sorting phase (k, v[]) will be
the input of the Reducer phase.In this phase reducer function’s logic is
executed and all the values are Collected against their corresponding
keys. Reducer stabilize outputs of various mappers and computes the
final output.
Combining :- It is an optional phase in the MapReduce phases .The
combiner phase is used to optimize the performance of MapReduce
phases. This phase makes the Shuffling and Sorting phase work even
quicker by enabling additional performance features in MapReduce
phases.

flow chart

Numerical:-
MovieLens Data
USER_ID MOVIE_ID RATING TIMESTAMP

196 242 3
881250949

186 302 3
891717742

196 377 1
878887116

244 51 2
880606923

166 346 1
886397596

186 474 4
884182806

186 265 2
881171488

Solution : –
Step 1 – First we have to map the values , it is happen in 1st phase of
Map Reduce model.

196:242 ; 186:302 ; 196:377 ; 244:51 ; 166:346 ; 186:274 ;

186:265

Step 2 – After Mapping we have to shuffle and sort the values.

166:346 ; 186:302,274,265 ; 196:242,377 ; 244:51

Step 3 – After completion of step1 and step2 we have to reduce each

key’s values.

Now, put all values together

Solution

CODE FOR MAPPER AND REDUCER TOGETHER:

Python3

from mrjob.job import MRJob

from mrjob.step import MRStep

class RatingsBreak(MRJob):
def steps(self):
return [
MRstep(mapper=self.mapper_get_ratings,
reducer=self.reducer_count_ratings)
]
# MAPPER CODE
def mapper_get_ratings(self, _, line):
(User_id, Movie_id, Rating, Timestamp) = line.split('/t')
yield rating,
# REDUCER CODE

def reducer_count_ratings(self, key, values):

yield key, sum(values)

Comment More info Next Article

Hadoop - Daemons and Their
Advertise with us Features

Similar Reads
MapReduce Architecture
MapReduce and HDFS are the two major components of Hadoop which
makes it so powerful and efficient to use. MapReduce is a programming…

15+ min read

Hadoop - Mapper In MapReduce

Map-Reduce is a programming model that is mainly divided into two
phases Map Phase and Reduce Phase. It is designed for processing the…

15+ min read

MapReduce Programming Model and its role in Hadoop.

In the Hadoop framework, MapReduce is the programming model.
MapReduce utilizes the map and reduce strategy for the analysis of data.…

15+ min read

Hadoop Tutorial
Big Data is a collection of data that is growing exponentially, and it is
huge in volume with a lot of complexity as it comes from various…

14 min read

Hadoop - Different Modes of Operation

As we all know Hadoop is an open-source framework which is mainly
used for storage purpose and maintaining and analyzing a large amount …

15+ min read

Hadoop - Schedulers and Types of Schedulers

In Hadoop, we can receive multiple jobs from different clients to perform.
The Map-Reduce framework is used to perform multiple tasks in parallel…

15+ min read

Hadoop Ecosystem
Overview: Apache Hadoop is an open source framework intended to make
interaction with big data easier, However, for those who are not…

15+ min read

Introduction to Apache Pig

Pig Represents Big Data as data flows. Pig is a high-level platform or tool
which is used to process the large datasets. It provides a high-level of…

15+ min read

Anatomy of File Read and Write in HDFS

Big data is nothing but a collection of data sets that are large, complex,
and which are difficult to store and process using available data…

15+ min read

map vs unordered_map in C++

Pre-requisite : std::map, std::unordered_mapWhen it comes to efficiency,
there is a huge difference between maps and unordered maps. We must…

2 min read
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate
Tower, Sector- 136, Noida, Uttar Pradesh
(201305)

Registered Address:
K 061, Tower K, Gulshan Vivante
Apartment, Sector 137, Noida, Gautam
Buddh Nagar, Uttar Pradesh, 201305

Advertise with us

Company Explore
About Us Job-A-Thon Hiring Challenge
Legal GfG Weekly Contest
Privacy Policy Offline Classroom Program
Careers DSA in JAVA/C++
In Media Master System Design
Contact Us Master CP
GfG Corporate Solution GeeksforGeeks Videos
Placement Training Program

Languages DSA
Python Data Structures
Java Algorithms
C++ DSA for Beginners
PHP Basic DSA Problems
GoLang DSA Roadmap
SQL DSA Interview Questions
R Language Competitive Programming
Android Tutorial

Data Science & ML Web Technologies

Data Science With Python HTML
Data Science For Beginner CSS
Machine Learning JavaScript
ML Maths TypeScript
Data Visualisation ReactJS
Pandas NextJS
NumPy NodeJs
NLP Bootstrap
Deep Learning Tailwind CSS

Python Tutorial Computer Science

Python Programming Examples GATE CS Notes
Django Tutorial Operating Systems
Python Projects Computer Network
Python Tkinter Database Management System
Web Scraping Software Engineering
OpenCV Tutorial Digital Logic Design
Python Interview Question Engineering Maths

DevOps System Design

Git High Level Design
AWS Low Level Design
Docker UML Diagrams
Kubernetes Interview Guide
Azure Design Patterns
GCP OOAD
DevOps Roadmap System Design Bootcamp
Interview Questions

School Subjects Databases

Mathematics SQL
Physics MYSQL
Chemistry PostgreSQL
Biology PL/SQL
Social Science MongoDB
English Grammar

Preparation Corner More Tutorials

Company-Wise Recruitment Process Software Development
Aptitude Preparation Software Testing
Puzzles Product Management
Company-Wise Preparation Project Management
Linux
Excel
All Cheat Sheets

Machine Learning/Data Science Programming Languages

Complete Machine Learning & Data Science Program - [LIVE] C Programming with Data Structures
Data Analytics Training using Excel, SQL, Python & PowerBI - C++ Programming Course
[LIVE] Java Programming Course
Data Science Training Program - [LIVE] Python Full Course
Data Science Course with IBM Certification
Clouds/Devops GATE 2026
DevOps Engineering GATE CS Rank Booster
AWS Solutions Architect Certification GATE DA Rank Booster
Salesforce Certified Administrator Course GATE CS & IT Course - 2026
GATE DA Course 2026
GATE Rank Predictor

Map Reduce
100% (1)
Map Reduce
33 pages
Chapter 4 - Understanding Map Reduce Fundamentals
No ratings yet
Chapter 4 - Understanding Map Reduce Fundamentals
45 pages
Data Science
No ratings yet
Data Science
7 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
26 pages
Map Reduce
No ratings yet
Map Reduce
31 pages
MapReduce for Big Data Analysis
No ratings yet
MapReduce for Big Data Analysis
59 pages
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
No ratings yet
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
23 pages
Bda Unit-3
No ratings yet
Bda Unit-3
44 pages
21SE28 BDA CA III SET B-Key
No ratings yet
21SE28 BDA CA III SET B-Key
8 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
81 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Chap 6 - MapReduce Programming
No ratings yet
Chap 6 - MapReduce Programming
37 pages
MapReduce for Data Scientists
No ratings yet
MapReduce for Data Scientists
213 pages
Bda 03
No ratings yet
Bda 03
10 pages
05 Movies Data Analysis Using Mapreduce
No ratings yet
05 Movies Data Analysis Using Mapreduce
20 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
Big Data Lecture # 07
No ratings yet
Big Data Lecture # 07
21 pages
MapReduce Basics for Big Data Beginners
No ratings yet
MapReduce Basics for Big Data Beginners
32 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
32 pages
02 Hadoop
No ratings yet
02 Hadoop
117 pages
The Mapreduce Programming Model
No ratings yet
The Mapreduce Programming Model
64 pages
MapReduce for Big Data Developers
No ratings yet
MapReduce for Big Data Developers
9 pages
Hadoop - Mapreduce
No ratings yet
Hadoop - Mapreduce
5 pages
Bda Unit 3
No ratings yet
Bda Unit 3
20 pages
MapReduce and Hadoop Overview
No ratings yet
MapReduce and Hadoop Overview
69 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
17 pages
Cloud Computing & MapReduce Basics
No ratings yet
Cloud Computing & MapReduce Basics
55 pages
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
No ratings yet
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
26 pages
Unit 2
No ratings yet
Unit 2
19 pages
Introduction to MapReduce and Hadoop
No ratings yet
Introduction to MapReduce and Hadoop
15 pages
MapReduce & Hadoop Overview
No ratings yet
MapReduce & Hadoop Overview
15 pages
3.Map-Reduce Framework - 1
No ratings yet
3.Map-Reduce Framework - 1
47 pages
L04 MapReduce
No ratings yet
L04 MapReduce
37 pages
Big Data Infrastructure: Week 2: Mapreduce Algorithm Design (2/2)
No ratings yet
Big Data Infrastructure: Week 2: Mapreduce Algorithm Design (2/2)
55 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
67 pages
PBDS Unit4
No ratings yet
PBDS Unit4
32 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
64 pages
Hadoop and MapReduce
No ratings yet
Hadoop and MapReduce
31 pages
Lecture 2 - Map Reduce
No ratings yet
Lecture 2 - Map Reduce
20 pages
Hadoop Architecture & MapReduce Guide
No ratings yet
Hadoop Architecture & MapReduce Guide
7 pages
Big Data
No ratings yet
Big Data
120 pages
Map Reduce
No ratings yet
Map Reduce
3 pages
Lecture 05
No ratings yet
Lecture 05
23 pages
05 - MapReduce in Hadoop - An Introduction
No ratings yet
05 - MapReduce in Hadoop - An Introduction
31 pages
MapReduce: Data Flow and Functions
No ratings yet
MapReduce: Data Flow and Functions
12 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
43 pages
Understanding Inputs and Outputs of Mapreduce
No ratings yet
Understanding Inputs and Outputs of Mapreduce
13 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
BDP 2024 08
No ratings yet
BDP 2024 08
14 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
53 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
PDC Lecture 13
No ratings yet
PDC Lecture 13
32 pages
Module2 D MapReduceParadigm
No ratings yet
Module2 D MapReduceParadigm
90 pages
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
No ratings yet
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
54 pages
Convolutional Neural Network - Layers, Types, & More
No ratings yet
Convolutional Neural Network - Layers, Types, & More
25 pages
Difference Between Traditional Data and Big Data - GeeksforGeeks
No ratings yet
Difference Between Traditional Data and Big Data - GeeksforGeeks
10 pages
The CAP Theorem in DBMS - GeeksforGeeks
No ratings yet
The CAP Theorem in DBMS - GeeksforGeeks
6 pages
Map Reduce Algorithm
No ratings yet
Map Reduce Algorithm
8 pages
One Day Tour Package For Amritsar (Golden Temple)
No ratings yet
One Day Tour Package For Amritsar (Golden Temple)
3 pages
Same Day Delhi Sightseeing Tour by AC Luxury Volvo Bus
No ratings yet
Same Day Delhi Sightseeing Tour by AC Luxury Volvo Bus
10 pages
Delhi Sightseeing - Delhi Darshan Tour Packages in AC Bus, Same Day Trip - India Incredible
No ratings yet
Delhi Sightseeing - Delhi Darshan Tour Packages in AC Bus, Same Day Trip - India Incredible
5 pages
Construction Scheduling Basics
No ratings yet
Construction Scheduling Basics
84 pages
Especificaciones Técnicas Super Soco
No ratings yet
Especificaciones Técnicas Super Soco
1 page
Success: A Breeding Ground For Complacency?: John P. Kotter
No ratings yet
Success: A Breeding Ground For Complacency?: John P. Kotter
17 pages
Batch Input
No ratings yet
Batch Input
25 pages
MotiveWave - Connect Your Account
No ratings yet
MotiveWave - Connect Your Account
8 pages
11
No ratings yet
11
4 pages
Wireless Tech: Types & Trends
No ratings yet
Wireless Tech: Types & Trends
12 pages
Project Management Under Internet Era: Shaopei Lin Dan Huang
No ratings yet
Project Management Under Internet Era: Shaopei Lin Dan Huang
291 pages
Form For Fiber Based Services New Connection: Bharat Sanchar Nigam Limited
No ratings yet
Form For Fiber Based Services New Connection: Bharat Sanchar Nigam Limited
2 pages
CV For CERN
No ratings yet
CV For CERN
1 page
Mat Foundations (Part B) : DR Nor Faizah Bawadi
No ratings yet
Mat Foundations (Part B) : DR Nor Faizah Bawadi
20 pages
Practicum Report On Substation Operation and Maintenance of (33/11 KV) Substation Narsingdi Palli Biddyut Samity-2
No ratings yet
Practicum Report On Substation Operation and Maintenance of (33/11 KV) Substation Narsingdi Palli Biddyut Samity-2
22 pages
Business Administration - BS101 - Group Assignments 2024
No ratings yet
Business Administration - BS101 - Group Assignments 2024
2 pages
Account - Statement - 20240331 - 20250331 - Sun May 04 2025 18-44-32 GMT+0530 (India Standard Time)
No ratings yet
Account - Statement - 20240331 - 20250331 - Sun May 04 2025 18-44-32 GMT+0530 (India Standard Time)
20 pages
Manual - F22S
No ratings yet
Manual - F22S
32 pages
BIM Project Execution Planning (001 140) PDF
No ratings yet
BIM Project Execution Planning (001 140) PDF
140 pages
CIRI Technilogies Company Profile
No ratings yet
CIRI Technilogies Company Profile
8 pages
grade 12 mock exams new - (2020-2019) عمان
No ratings yet
grade 12 mock exams new - (2020-2019) عمان
71 pages
P&S Tightening Instructions 07 GB
No ratings yet
P&S Tightening Instructions 07 GB
12 pages
Solong: and Thanks For All The Fish!
No ratings yet
Solong: and Thanks For All The Fish!
10 pages
ADV80
No ratings yet
ADV80
202 pages
Manual de Usuario Casio Collection AE-1300WH-1AVEF (5 Páginas)
No ratings yet
Manual de Usuario Casio Collection AE-1300WH-1AVEF (5 Páginas)
2 pages
Call For Paper Marketech-2025
No ratings yet
Call For Paper Marketech-2025
7 pages
Delphi La Cabeza Del Rotor-Cav Cabeza Del Rotor 7123-340W
0% (1)
Delphi La Cabeza Del Rotor-Cav Cabeza Del Rotor 7123-340W
6 pages
Pin Diagram of 8085 8086
No ratings yet
Pin Diagram of 8085 8086
3 pages
Part Master
No ratings yet
Part Master
1,946 pages
Fichas Tecnicas Sello Perimetral
No ratings yet
Fichas Tecnicas Sello Perimetral
1 page
How To Install Presets To Your Computer
No ratings yet
How To Install Presets To Your Computer
5 pages
Class 3 Computer Exam Guide
No ratings yet
Class 3 Computer Exam Guide
3 pages

Map Reduce and Its Phases With Numerical Example. - GeeksforGeeks

Uploaded by

Map Reduce and Its Phases With Numerical Example. - GeeksforGeeks

Uploaded by

Search...

Map Reduce and its Phases with numerical

Mapping :- It is the first phase of MapReduce programming. Mapping

196:242 ; 186:302 ; 196:377 ; 244:51 ; 166:346 ; 186:274 ;

Step 2 – After Mapping we have to shuffle and sort the values.

166:346 ; 186:302,274,265 ; 196:242,377 ; 244:51

Step 3 – After completion of step1 and step2 we have to reduce each

Now, put all values together

CODE FOR MAPPER AND REDUCER TOGETHER:

from mrjob.job import MRJob

def reducer_count_ratings(self, key, values):

Comment More info Next Article

15+ min read

Hadoop - Mapper In MapReduce

15+ min read

MapReduce Programming Model and its role in Hadoop.

15+ min read

Hadoop - Different Modes of Operation

15+ min read

Hadoop - Schedulers and Types of Schedulers

15+ min read

15+ min read

Introduction to Apache Pig

15+ min read

Anatomy of File Read and Write in HDFS

15+ min read

map vs unordered_map in C++

Data Science & ML Web Technologies

Python Tutorial Computer Science

DevOps System Design

School Subjects Databases

Preparation Corner More Tutorials

Machine Learning/Data Science Programming Languages

@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved

You might also like