Developing A Map

The document discusses developing a MapReduce application by writing map and reduce functions with unit tests, configuring the development environment, and using MRUnit to test MapReduce jobs through mappers, reducers, combiners, and workflows of jobs.

Uploaded by

hemantsingh

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Developing A Map

Uploaded by

hemantsingh

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

Developing A Map-Reduce Application:

Writing a program in MapReduce follows a certain pattern. You start by writing your map and
reduce functions, ideally with unit tests to make sure they do what you expect. Then you write a
driver program to run a job, which can run from your IDE using a small subset of the data to
check that it is working.If it fails, you can use your IDE’s debugger to find the source of the
problem. When the program runs as expected against the small dataset, you are ready to unleash
it on a cluster. Running against the full dataset is likely to expose some more issues, which you
can fix by expanding your tests and altering your mapper or reducer to handle the new cases.

After the program is working, you may wish to do some tuning:

 First by running through some standard checks for making MapReduce programs faster

 Second by doing task profiling.

Profiling distributed programs are not easy, but Hadoop has hooks to aid in the process. Before
we start writing a MapReduce program, we need to set up and configure the development
environment. Components in Hadoop are configured using Hadoop’s own configuration API. An
instance of the Configuration class represents a collection of configuration properties and their
values. Each property is named by a String, and the type of a value may be one of several,
including Java primitives such as boolean, int, long, and float and other useful types such as
String, Class, and java.io.File; and collections of Strings.

Unit Tests with MR Unit:

Hadoop MapReduce jobs have a unique code architecture that follows a specific template with
specific constructs. This architecture raises interesting issues when doing test-driven
development (TDD) and writing unit tests. With MR Unit, you can craft test input, push it
through your mapper and/or reducer, and verify its output all in a JUnit test.

As do other JUnit tests, this allows you to debug your code using the JUnit test as a driver. A
map/reduce pair can be tested using MR Unit’s Map Reduce Driver, a combiner can be tested
using Map Reduce Driver as well as a Pipeline Map Reduce Driver allows you to test a
workflow of map/reduce jobs. Currently, partitioners do not have a test driver under MR Unit.

MR Unit allows you to do TDD(Test Driven Development) and write lightweight unit tests
which accommodate Hadoop’s specific architecture and constructs.

Example: We’re processing road surface data used to create maps. The input contains both linear
surfaces and intersections. The mapper takes a collection of these mixed surfaces as input,
discards anything that isn’t a

linear road surface, i.e., intersections, and then processes each road surface and writes it out to
HDFS. We can keep count and eventually print out how many non-road surfaces are input. For
debugging purposes, we can
additionally print out how many road surfaces were processed.

Download ebooks file Toronto Notes 2023 39th Edition Anders W. Erickson all chapters
100% (2)
Download ebooks file Toronto Notes 2023 39th Edition Anders W. Erickson all chapters
65 pages
C# For Beginners: An Introduction to C# Programming with Tutorials and Hands-On Examples
From Everand
C# For Beginners: An Introduction to C# Programming with Tutorials and Hands-On Examples
Nathan Metzler
5/5 (1)
UNIT4 Notes
No ratings yet
UNIT4 Notes
32 pages
New Microsoft Office Word Document
No ratings yet
New Microsoft Office Word Document
10 pages
Unit 4 BDA
No ratings yet
Unit 4 BDA
31 pages
PBDS Unit4
No ratings yet
PBDS Unit4
32 pages
Unit 2 - From Hadoop Streaming PDF
No ratings yet
Unit 2 - From Hadoop Streaming PDF
20 pages
Unit 2 Topic 5 Developing A Map Reduce Application
No ratings yet
Unit 2 Topic 5 Developing A Map Reduce Application
52 pages
Unit4 Bigdata
No ratings yet
Unit4 Bigdata
9 pages
21CS1601 UNIT 5 UNDERSTANDING BIG DATA TECHNOLGIES
No ratings yet
21CS1601 UNIT 5 UNDERSTANDING BIG DATA TECHNOLGIES
20 pages
UNIT 3 NOTES (1)
No ratings yet
UNIT 3 NOTES (1)
21 pages
Unit 3
No ratings yet
Unit 3
13 pages
DSBDA Manual Assignment 11
No ratings yet
DSBDA Manual Assignment 11
6 pages
Unit 4 Session 3
No ratings yet
Unit 4 Session 3
16 pages
18mcs35e U4
No ratings yet
18mcs35e U4
7 pages
Big Data Unit 2 AKTU Notes
No ratings yet
Big Data Unit 2 AKTU Notes
63 pages
Parallel Project
No ratings yet
Parallel Project
32 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
Big Data Analytics-4
No ratings yet
Big Data Analytics-4
26 pages
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
No ratings yet
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
83 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
BDA Unit-2
No ratings yet
BDA Unit-2
11 pages
bda megh
No ratings yet
bda megh
50 pages
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
No ratings yet
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
54 pages
05 Movies Data Analysis Using Mapreduce
No ratings yet
05 Movies Data Analysis Using Mapreduce
20 pages
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
No ratings yet
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
71 pages
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
No ratings yet
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
12 pages
BDA UNIT-3 (1) - Merged
No ratings yet
BDA UNIT-3 (1) - Merged
98 pages
Notes Bug Data and of Apache
No ratings yet
Notes Bug Data and of Apache
4 pages
UNIT – III
No ratings yet
UNIT – III
38 pages
Bda Unit-3
No ratings yet
Bda Unit-3
20 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
BDA Unit 3 Notes
No ratings yet
BDA Unit 3 Notes
11 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
12 13 14 Map Reduce
No ratings yet
12 13 14 Map Reduce
57 pages
Unit 5 Frameworks and Visualizatoins Hadoop Map Reduce Architecture and Example
No ratings yet
Unit 5 Frameworks and Visualizatoins Hadoop Map Reduce Architecture and Example
45 pages
Certified Hadoop and Spark Course Curriculum
No ratings yet
Certified Hadoop and Spark Course Curriculum
9 pages
CS702_Big_Data_Programs
No ratings yet
CS702_Big_Data_Programs
58 pages
Testing Big Data: Camelia Rad
No ratings yet
Testing Big Data: Camelia Rad
31 pages
1.4 Map Reduce
No ratings yet
1.4 Map Reduce
30 pages
Hadoop Map Reduce Concepts - Teaching - 1
No ratings yet
Hadoop Map Reduce Concepts - Teaching - 1
53 pages
Running Head: Title of Paper in Caps 1: Hadoop, Mapreduce and HDFS: A Developers Perspective
No ratings yet
Running Head: Title of Paper in Caps 1: Hadoop, Mapreduce and HDFS: A Developers Perspective
5 pages
Hadoop Spark
No ratings yet
Hadoop Spark
34 pages
HadoopMapreduce Summerization
No ratings yet
HadoopMapreduce Summerization
24 pages
6. Map Reduce Programming
No ratings yet
6. Map Reduce Programming
67 pages
Mapreduce Programming Framework
No ratings yet
Mapreduce Programming Framework
23 pages
3.1.How Map Reduce Works & 3.2 Anatomy
No ratings yet
3.1.How Map Reduce Works & 3.2 Anatomy
11 pages
Hadoop
No ratings yet
Hadoop
34 pages
Big Data Mapreduce and Streaming
No ratings yet
Big Data Mapreduce and Streaming
10 pages
Notes - Unit 3 - Map Reduce Applications
No ratings yet
Notes - Unit 3 - Map Reduce Applications
11 pages
Unit Iv-1
No ratings yet
Unit Iv-1
84 pages
Unit 5
No ratings yet
Unit 5
35 pages
BDA practical (1)
No ratings yet
BDA practical (1)
18 pages
Chap 6 - MapReduce Programming
No ratings yet
Chap 6 - MapReduce Programming
37 pages
UNIT 4
No ratings yet
UNIT 4
56 pages
MapR Certified Hadoop Developer Study Guide (MCHD)
No ratings yet
MapR Certified Hadoop Developer Study Guide (MCHD)
26 pages
Data Mining With Hadoop and Hive Introduction To Architecture
No ratings yet
Data Mining With Hadoop and Hive Introduction To Architecture
39 pages
Mastering ArcGIS Server Development with JavaScript
From Everand
Mastering ArcGIS Server Development with JavaScript
Doman Ken
4/5 (1)
Beginning R: The Statistical Programming Language
From Everand
Beginning R: The Statistical Programming Language
Mark Gardener
4.5/5 (4)
Practical Play Framework: Focus on what is really important
From Everand
Practical Play Framework: Focus on what is really important
Alberto Souza
No ratings yet
Simple Golang Programming for Beginners
From Everand
Simple Golang Programming for Beginners
Terry T. Diaz
No ratings yet
Nanny Town Excavations Rewriting Jamaicas History
No ratings yet
Nanny Town Excavations Rewriting Jamaicas History
9 pages
The New Public Service: Serving, Not Steering 4th Edition, (Ebook PDF) Download PDF
No ratings yet
The New Public Service: Serving, Not Steering 4th Edition, (Ebook PDF) Download PDF
42 pages
Von Neumann
No ratings yet
Von Neumann
3 pages
Design of Standalone PV System
No ratings yet
Design of Standalone PV System
6 pages
Factors Affecting The Performance of Undergraduate Students in India: An Empirical Validation of Conceptual Model
No ratings yet
Factors Affecting The Performance of Undergraduate Students in India: An Empirical Validation of Conceptual Model
13 pages
Reactions of Ketene - Ind. Eng. Chem., 1949, 41 (4), PP 765-770
No ratings yet
Reactions of Ketene - Ind. Eng. Chem., 1949, 41 (4), PP 765-770
6 pages
Form 10 A Informed Consent Form
No ratings yet
Form 10 A Informed Consent Form
3 pages
Venue List Walkathon 10sep2023 - Main
No ratings yet
Venue List Walkathon 10sep2023 - Main
7 pages
PLC Programming
No ratings yet
PLC Programming
92 pages
SP400
No ratings yet
SP400
14 pages
Tianjin Plastics
No ratings yet
Tianjin Plastics
19 pages
Spot Welding PDF
No ratings yet
Spot Welding PDF
6 pages
Sony Style Presentation
100% (1)
Sony Style Presentation
31 pages
TBM Crossing Stations
No ratings yet
TBM Crossing Stations
21 pages
To 3415222500078 R Pos
No ratings yet
To 3415222500078 R Pos
2 pages
6 Finetuning For Classification - Build A Large Language Model (From Scratch)
No ratings yet
6 Finetuning For Classification - Build A Large Language Model (From Scratch)
24 pages
Cash and Liquidity Management - Topic 3
No ratings yet
Cash and Liquidity Management - Topic 3
47 pages
Standard Tables For Wind Moment Connection
No ratings yet
Standard Tables For Wind Moment Connection
18 pages
Busienss Communication 2
No ratings yet
Busienss Communication 2
9 pages
Srt7004 Usb SW Update en
No ratings yet
Srt7004 Usb SW Update en
1 page
TAS5612L-TAS5614LDDVEVM: User's Guide
No ratings yet
TAS5612L-TAS5614LDDVEVM: User's Guide
36 pages
Expression of Interest (EoI) For Technology Tie-Up For Air Cooled Condenser
No ratings yet
Expression of Interest (EoI) For Technology Tie-Up For Air Cooled Condenser
12 pages
Arcolectric Switches C3972bbaaa
No ratings yet
Arcolectric Switches C3972bbaaa
2 pages
Ratios Analysis PPT Updated
No ratings yet
Ratios Analysis PPT Updated
37 pages
co housing
No ratings yet
co housing
6 pages
2021 Corporate Profile
No ratings yet
2021 Corporate Profile
21 pages
2018 PRP RRD
No ratings yet
2018 PRP RRD
178 pages
Jet Engine
100% (1)
Jet Engine
21 pages
For Printing Moral Damages Cases
No ratings yet
For Printing Moral Damages Cases
88 pages

Developing A Map

Uploaded by

Developing A Map

Uploaded by

Developing A Map-Reduce Application:

After the program is working, you may wish to do some tuning:

 Second by doing task profiling.

Unit Tests with MR Unit:

You might also like