0% found this document useful (0 votes)
10 views

Developing A Map

The document discusses developing a MapReduce application by writing map and reduce functions with unit tests, configuring the development environment, and using MRUnit to test MapReduce jobs through mappers, reducers, combiners, and workflows of jobs.

Uploaded by

hemantsingh
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Developing A Map

The document discusses developing a MapReduce application by writing map and reduce functions with unit tests, configuring the development environment, and using MRUnit to test MapReduce jobs through mappers, reducers, combiners, and workflows of jobs.

Uploaded by

hemantsingh
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Developing A Map-Reduce Application:

Writing a program in MapReduce follows a certain pattern. You start by writing your map and
reduce functions, ideally with unit tests to make sure they do what you expect. Then you write a
driver program to run a job, which can run from your IDE using a small subset of the data to
check that it is working.If it fails, you can use your IDE’s debugger to find the source of the
problem. When the program runs as expected against the small dataset, you are ready to unleash
it on a cluster. Running against the full dataset is likely to expose some more issues, which you
can fix by expanding your tests and altering your mapper or reducer to handle the new cases.

After the program is working, you may wish to do some tuning:

 First by running through some standard checks for making MapReduce programs faster

 Second by doing task profiling.

Profiling distributed programs are not easy, but Hadoop has hooks to aid in the process. Before
we start writing a MapReduce program, we need to set up and configure the development
environment. Components in Hadoop are configured using Hadoop’s own configuration API. An
instance of the Configuration class represents a collection of configuration properties and their
values. Each property is named by a String, and the type of a value may be one of several,
including Java primitives such as boolean, int, long, and float and other useful types such as
String, Class, and java.io.File; and collections of Strings.

Unit Tests with MR Unit:

Hadoop MapReduce jobs have a unique code architecture that follows a specific template with
specific constructs. This architecture raises interesting issues when doing test-driven
development (TDD) and writing unit tests. With MR Unit, you can craft test input, push it
through your mapper and/or reducer, and verify its output all in a JUnit test.

As do other JUnit tests, this allows you to debug your code using the JUnit test as a driver. A
map/reduce pair can be tested using MR Unit’s Map Reduce Driver, a combiner can be tested
using Map Reduce Driver as well as a Pipeline Map Reduce Driver allows you to test a
workflow of map/reduce jobs. Currently, partitioners do not have a test driver under MR Unit.

MR Unit allows you to do TDD(Test Driven Development) and write lightweight unit tests
which accommodate Hadoop’s specific architecture and constructs.

Example: We’re processing road surface data used to create maps. The input contains both linear
surfaces and intersections. The mapper takes a collection of these mixed surfaces as input,
discards anything that isn’t a

linear road surface, i.e., intersections, and then processes each road surface and writes it out to
HDFS. We can keep count and eventually print out how many non-road surfaces are input. For
debugging purposes, we can
additionally print out how many road surfaces were processed.

You might also like