2 MapReduce continue
2 MapReduce continue
Advantages
MapReduce
MapReduce is a programming framework that allows us to perform distributed
and parallel processing on large data sets in a distributed environment.
More about MapReduce and its components. MapReduce majorly
has the following three Classes.
Mapper Class
• The first stage in Data Processing using MapReduce is
the Mapper Class. Here, RecordReader processes each Input
record and generates the respective key-value pair. Hadoop’s
Mapper store saves this intermediate data into the local disk.
• Input Split
It is the logical representation of data. It represents a block of
work that contains a single map task in the MapReduce
Program.
• RecordReader
It interacts with the Input split and converts the obtained data
in the form of Key-Value Pairs.
Reducer Class
1. Parallel Processing:
In MapReduce, we are dividing the job among multiple nodes
and each node works with a part of the job simultaneously.
So, MapReduce is based on Divide and Conquer paradigm
which helps us to process the data using different machines.
As the data is processed by multiple machines instead of a
single machine in parallel, the time taken to process the data
gets reduced by a tremendous amount
2. Data Locality: