Matrix Multiplication Using Hadoop Map-Reduce
Matrix Multiplication Using Hadoop Map-Reduce
Map-Reduce
Step 1: Install Hadoop in Stand-Alone Mode
java -version
Hadoop requires that you set the path to Java, either as an environment variable or
in the Hadoop configuration file.
Output :
/usr/lib/jvm/java-11-openjdk-amd64/
/usr/local/hadoop/bin/hadoop
Output :
The help means we've successfully configured Hadoop to run in stand-alone mode.
We'll ensure that it is functioning properly by running the example MapReduce
program it ships with. To do so, create a directory called input in our home
directory and copy Hadoop's configuration files into it to use those files as our
data.
mkdir ~/input
cp /usr/local/hadoop/etc/hadoop/*.xml ~/input
Next, we can use the following command to run the MapReduce hadoop-mapreduce-examples
program, a Java archive with several options. We'll invoke its grep program, one of many
examples included in hadoop-mapreduce-examples, followed by the input directory, input and
the output directory grep_example. The MapReduce grep program will count the matches of a
literal word or regular expression. Finally, we'll supply a regular expression to find
occurrences of the word principal within or at the end of a declarative sentence. The
expression is case-sensitive, so we wouldn't find the word if it were capitalized at the
beginning of a sentence:
/usr/local/hadoop/bin/hadoop jar
/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar grep
~/input ~/grep_example 'principal[.]*'
When the task completes, it provides a summary of what has been processed and errors it has
encountered, but this doesn't contain the actual results
Results are stored in the output directory and can be checked by running cat on the output
directory:
cat ~/grep_example/*
Step 2: Matrix Multi1plicationUsing MapReduce Programming
2.1. In mathematics, matrix multiplication or the matrix product is a binary operation that
produces a matrix from two matrices. The definition is motivated by linear equations and linear
transformations on vectors, which have numerous applications in applied mathematics, physics, and
engineering. In more detail, if A is an n × m matrix and B is an m × p matrix, their matrix product
AB is an n × p matrix, in which the m entries across a row of A are multiplied with the m entries
down a column of B and summed to produce an entry of AB. When two linear transformations are
represented by matrices, then the matrix product represents the composition of the two
transformations.
Algorithm for Map Function.
a. for each element mij of M do
produce (key,value) pairs as ((i,k), (M,j,mij), for k=1,2,3,.. upto the number of
columns of N
b. for each element njk of N do
produce (key,value) pairs as ((i,k),(N,j,Njk), for i = 1,2,3,.. Upto the number of rows
of M.
c. return Set of (key,value) pairs that each key (i,k), has list with values (M,j,mij)
and (N, j,njk) for all possible values of j.
Algorithm for Reduce Function.
for each key (i,k) do
sort values begin with M by j in listM
sort values begin with N by j in listN
multiply mij and njk for jth value of each list
sum up mij x njk return (i,k), Σj=1 mij x njk
ls -R operation/
2.9 Uploading the M, N file which contains the matrix multiplication data to HDFS.
Refer File ‘M’
Refer File ‘N’
2.10 Executing the jar file using hadoop command and thus how fetching record from
HDFS and storing output in HDFS.
hadoop jar MatrixMultiply.jar MatrixMultiply Matrix result