Understanding Inputs and Outputs of Mapreduce

The document discusses the inputs and outputs of MapReduce jobs. It explains that MapReduce takes key-value pairs as input and produces key-value pairs as output, where the key and value types can range from simple to complex data structures. The map function produces an output type that is used as input to the reduce function. Keys must implement a WritableComparable interface to allow sorting before the reduce phase. Input types are transformed to output types via the map and reduce functions.

Uploaded by

Divya Panta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

741 views13 pages

Understanding Inputs and Outputs of Mapreduce

Uploaded by

Divya Panta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 13

UNDERSTANDING

INPUTS AND OUTPUTS

OF MapReduce

DIVYA PANTA
21109
MapReduce Theory
Map and Reduce functions produce input and output
– Input and output can range from Text to Complex data
structures
– Specified via Job’s configuration
– Relatively easy to implement your own
• Generally we can treat the flow as
(K1,V1) → list (K2,V2) reduce: (K2,list(V2)) → list (K3,V3)
– Reduce input types are the same as map output types
Map Reduce Flow of Data
Node#1
Mapper Map
Data
Task Outp
Split
ut

Reduce Reduce
Task output

Data Mappe Map

Split r Task Output
Key and Value Types
Utilizes Hadoop’s serialization mechanism for writing data in
and out of network, database or files
– Optimized for network serialization
– A set of basic types is provided
– Easy to implement your own
• Extends Writable interface
– Framework’s serialization mechanisms
– Defines how to read and write fields

4
Key and Value Types
Keys must implement Writable Comparable interface
– Extends Writable and java.lang.Comparable
– Required because keys are sorted prior reduce phase
• Hadoop is shipped with many default implementations
of Writable Comparable
– Wrappers for primitives (String, Integer, etc...)
– Or you can implement your own

5
Inputs and Outputs
 The MapReduce framework operates on <key, value> pairs,
that is, the framework views the input to the job as a set of
<key, value> pairs and produces a set of <key, value> pairs
as the output of the job, conceivably of different types.
 The key and the value classes should be in serialized manner
by the framework and hence, need to implement the
Writable interface.
 Additionally, the key classes have to implement the Writable-
Comparable interface to facilitate sorting by the framework.
Input and Output types of a MapReduce job − (Input) <k1,
v1> → map → <k2, v2> → reduce → <k3, v3>(Output).
INPUT OUTPUT

MAP <k1, v1> list (<k2, v2>)

REDUCE <k2, list(v2)> list (<k3, v3>)

7
Example

 Let us understand, how a MapReduce works by taking an

example where I have a text file called example.txt whose
contents are as follows:
Dear, Bear, River, Car, Car, River, Deer, Car and Bear
 Now, suppose, we have to perform a word count on the
sample.txt using MapReduce. So, we will be finding the
unique words and the number of occurrences of those
unique words.

8
Cont...
Cont...
 First, we divide the input in three splits as shown in the figure. This
will distribute the work among all the map nodes.
 Then, we tokenize the words in each of the mapper and give a
hardcoded value (1) to each of the tokens or words. The rationale
behind giving a hardcoded value equal to 1 is that every word, in itself,
will occur once.
 Now, a list of key-value pair will be created where the key is nothing
but the individual words and value is one. So, for the first line (Dear
Bear River) we have 3 key-value pairs – Dear, 1; Bear, 1; River, 1. The
mapping process remains the same on all the nodes.
Cont....
 After mapper phase, a partition process takes place where sorting
and shuffling happens so that all the tuples with the same key are
sent to the corresponding reducer.
 So, after the sorting and shuffling phase, each reducer will have a
unique key and a list of values corresponding to that very key. For
example, Bear, [1,1]; Car, [1,1,1].., etc.
Cont....
 Now, each Reducer counts the values which are present in
that list of values. As shown in the figure, reducer gets a
list of values which is [1,1] for the key Bear. Then, it
counts the number of ones in the very list and gives the
final output as – Bear, 2.
 Finally, all the output key/value pairs are then collected
and written in the output file.
Thank you

Parallel and Distributed Computing Complete Notes
No ratings yet
Parallel and Distributed Computing Complete Notes
41 pages
KRR UNIT-5
100% (1)
KRR UNIT-5
51 pages
AM601PC KRR unit 1
No ratings yet
AM601PC KRR unit 1
16 pages
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
No ratings yet
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
23 pages
NLP Unit-Iv
No ratings yet
NLP Unit-Iv
124 pages
KRR UNIT-3
No ratings yet
KRR UNIT-3
19 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Word Level Analysis
No ratings yet
Word Level Analysis
49 pages
ML unit-3
No ratings yet
ML unit-3
23 pages
KRR UNIT 1
No ratings yet
KRR UNIT 1
26 pages
Reference Manual for Cisco (Networking Essentials) course
No ratings yet
Reference Manual for Cisco (Networking Essentials) course
9 pages
Post-Si Validation Tutorial
No ratings yet
Post-Si Validation Tutorial
127 pages
Sea_Manifest_SAM
No ratings yet
Sea_Manifest_SAM
113 pages
MX LINUX Manual PDF
No ratings yet
MX LINUX Manual PDF
182 pages
3 - Functions & Strings + Answers
100% (1)
3 - Functions & Strings + Answers
8 pages
Vtu NLP Questions
100% (1)
Vtu NLP Questions
5 pages
Manual / Guidelines: Web Technology Laboratory With Mini Project-17Csl77
No ratings yet
Manual / Guidelines: Web Technology Laboratory With Mini Project-17Csl77
31 pages
U4 NLP Notes
No ratings yet
U4 NLP Notes
5 pages
5. Efficient convolution algorithms
No ratings yet
5. Efficient convolution algorithms
13 pages
Power BI 101 Relationship
No ratings yet
Power BI 101 Relationship
19 pages
CN Lab Expt 4 To 9
No ratings yet
CN Lab Expt 4 To 9
22 pages
Learning Rules
No ratings yet
Learning Rules
60 pages
Natural Language Processing
No ratings yet
Natural Language Processing
47 pages
Dnv Ru Ship Pt.6 Ch.4
No ratings yet
Dnv Ru Ship Pt.6 Ch.4
210 pages
Fsd Unit III
100% (1)
Fsd Unit III
36 pages
Quizaide Codegenerator
No ratings yet
Quizaide Codegenerator
16 pages
Unit 3
100% (1)
Unit 3
11 pages
unit V
No ratings yet
unit V
67 pages
1991-09 The Computer Paper - BC Edition
No ratings yet
1991-09 The Computer Paper - BC Edition
104 pages
NLP Important and Super Important Questions-18CS743
No ratings yet
NLP Important and Super Important Questions-18CS743
2 pages
Presented By: Divya Panta IT & Telecom 21109
No ratings yet
Presented By: Divya Panta IT & Telecom 21109
12 pages
NLP SEM QUESTIONS AND ANSWERS
No ratings yet
NLP SEM QUESTIONS AND ANSWERS
72 pages
Synopsis diabetic retinopathy
No ratings yet
Synopsis diabetic retinopathy
23 pages
NLP Unit-1 Notes
No ratings yet
NLP Unit-1 Notes
59 pages
Completed Final UNIT-V 9.10.17
100% (1)
Completed Final UNIT-V 9.10.17
74 pages
NLP Notes
No ratings yet
NLP Notes
18 pages
Cyber Command
No ratings yet
Cyber Command
2 pages
Motorola 1
No ratings yet
Motorola 1
4 pages
NLP UNIT-II
No ratings yet
NLP UNIT-II
71 pages
NLP Module 4 Notes
No ratings yet
NLP Module 4 Notes
8 pages
HalfLine-Wolfram Language Documentation
No ratings yet
HalfLine-Wolfram Language Documentation
3 pages
Dynamics 365 Design Principles and Best Practices For Data Entities
No ratings yet
Dynamics 365 Design Principles and Best Practices For Data Entities
4 pages
Lect Note On Chapter 2 - Part I - Prog With Event Driven
No ratings yet
Lect Note On Chapter 2 - Part I - Prog With Event Driven
49 pages
CCS369 - TSS-Unit 3
No ratings yet
CCS369 - TSS-Unit 3
55 pages
Ccs334 Big Data Analytics
0% (1)
Ccs334 Big Data Analytics
2 pages
Types & Classification of Wireless Sensor Networks
No ratings yet
Types & Classification of Wireless Sensor Networks
4 pages
NLP - Srilakshmi H - PPT Assignment
No ratings yet
NLP - Srilakshmi H - PPT Assignment
29 pages
Two Methods of Programming VISA GPIB in VB6
No ratings yet
Two Methods of Programming VISA GPIB in VB6
2 pages
NLP Course File Notes
No ratings yet
NLP Course File Notes
71 pages
Bare Metal Virtual Machines and Containers in Openstack
No ratings yet
Bare Metal Virtual Machines and Containers in Openstack
8 pages
XML Basics Extensible Markup Language: Divya Panta 21109
No ratings yet
XML Basics Extensible Markup Language: Divya Panta 21109
17 pages
10 C's of Amazon: Submitted By: DIVYA PANTA (21109) Jyoti Singh
No ratings yet
10 C's of Amazon: Submitted By: DIVYA PANTA (21109) Jyoti Singh
51 pages
Hash Maps
No ratings yet
Hash Maps
2 pages
Complaint Redressal: Divya Panta 21109
No ratings yet
Complaint Redressal: Divya Panta 21109
14 pages
Irs Question Papers
No ratings yet
Irs Question Papers
6 pages
IBM ECM EMEA Partner Solution Handbook v3
No ratings yet
IBM ECM EMEA Partner Solution Handbook v3
102 pages
Thyroid Disease Classification Using Machine Learning Project
No ratings yet
Thyroid Disease Classification Using Machine Learning Project
34 pages
Common Faults and Problems of Computer
0% (1)
Common Faults and Problems of Computer
3 pages
Mini PROJECT Report Format 2011-2012
No ratings yet
Mini PROJECT Report Format 2011-2012
12 pages
IM Series-Launch
No ratings yet
IM Series-Launch
3 pages
Unit 3 Indexing
100% (1)
Unit 3 Indexing
10 pages
Unit 4 Knowledge Representation
No ratings yet
Unit 4 Knowledge Representation
13 pages
NLP Lab Manual Updated
No ratings yet
NLP Lab Manual Updated
34 pages
WT Unit 3
No ratings yet
WT Unit 3
57 pages
Studocu DAA Unit 1 Notes
No ratings yet
Studocu DAA Unit 1 Notes
52 pages
Hive Lecture Notes
100% (1)
Hive Lecture Notes
17 pages
Associative Memory Neural Networks
100% (1)
Associative Memory Neural Networks
35 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
Single Phase Rectifier Bridge: I 36 A V 1600 V
No ratings yet
Single Phase Rectifier Bridge: I 36 A V 1600 V
2 pages
Introduction To Arduino Programming: Jinasena Innovation & Technology Institute - Ekala
No ratings yet
Introduction To Arduino Programming: Jinasena Innovation & Technology Institute - Ekala
60 pages
NLP Unit-3-Semantics-And-Pragmatics
No ratings yet
NLP Unit-3-Semantics-And-Pragmatics
20 pages
CHAPTER 03: Big Data Technology Landscape
No ratings yet
CHAPTER 03: Big Data Technology Landscape
81 pages
DATA STRUCTURES AND ALGORITHMS - Unit 5
No ratings yet
DATA STRUCTURES AND ALGORITHMS - Unit 5
35 pages
MultiFlow - WMF09010EN R320 389
No ratings yet
MultiFlow - WMF09010EN R320 389
46 pages
FIND-S Algorithm: Machine Learning 15CSL76
No ratings yet
FIND-S Algorithm: Machine Learning 15CSL76
3 pages
Cloud Computing Unit-1 Notes
No ratings yet
Cloud Computing Unit-1 Notes
12 pages
Devops Unit II
No ratings yet
Devops Unit II
41 pages
Module 3 - Paper 1 - Extracting Relations From Text From Word Sequences To Dependency Paths
No ratings yet
Module 3 - Paper 1 - Extracting Relations From Text From Word Sequences To Dependency Paths
11 pages
Unit 4 NLP Notes
No ratings yet
Unit 4 NLP Notes
35 pages
NLP QB
100% (2)
NLP QB
14 pages
SWOT Analysis On SAP
100% (1)
SWOT Analysis On SAP
7 pages
Unit I
No ratings yet
Unit I
30 pages
Explain Item Normalization?
No ratings yet
Explain Item Normalization?
7 pages
Unit-Iii: A Weather Dataset
No ratings yet
Unit-Iii: A Weather Dataset
12 pages
NLP UNIT 2 (Ques Ans Bank)
No ratings yet
NLP UNIT 2 (Ques Ans Bank)
26 pages
Course Plan: Department of Computer Science Enginnering
No ratings yet
Course Plan: Department of Computer Science Enginnering
8 pages
SEM-2-NLP Questions
No ratings yet
SEM-2-NLP Questions
3 pages
Accenture Microsoft Teams Rapid Resource Guide
No ratings yet
Accenture Microsoft Teams Rapid Resource Guide
31 pages
IR UNIT I - Notes
No ratings yet
IR UNIT I - Notes
23 pages
Uid-Graphical System Advatages
No ratings yet
Uid-Graphical System Advatages
21 pages
Ieee Paper
No ratings yet
Ieee Paper
5 pages
GST P-9910B Handheld Programmer
No ratings yet
GST P-9910B Handheld Programmer
15 pages
Cp5151 Advanced Data Structures and Algorithims
No ratings yet
Cp5151 Advanced Data Structures and Algorithims
3 pages
CSE4022 Natural-Language-Processing ETH 1 AC41
No ratings yet
CSE4022 Natural-Language-Processing ETH 1 AC41
6 pages

Understanding Inputs and Outputs of Mapreduce

Uploaded by

Understanding Inputs and Outputs of Mapreduce

Uploaded by

UNDERSTANDING

INPUTS AND OUTPUTS

Data Mappe Map

MAP <k1, v1> list (<k2, v2>)

REDUCE <k2, list(v2)> list (<k3, v3>)

 Let us understand, how a MapReduce works by taking an

You might also like