0% found this document useful (0 votes)

12 views7 pages

Bda Exp2 Chinmay

Uploaded by

Chinmay Pichad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views7 pages

Bda Exp2 Chinmay

Uploaded by

Chinmay Pichad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Name Chinmay Vasant Pichad

UID no. 2020300053

Experiment No. EXP 2

AIM Map Reduce Implementation

Program 1

PROBLEM Develop a MapReduce implementation in Hadoop to analyze large datasets efficiently,

STATEMENT: addressing scalability, fault tolerance, and performance challenges in distributed data
processing.

THEORY: Hadoop Word Count program using the normal method involves reading, parsing, and
counting words sequentially, while MapReduce distributes tasks for parallel processing,
optimizing efficiency and scalability.

MapReduce is a programming model and processing framework developed by Google to

handle large-scale data processing tasks in a distributed computing environment.
Hadoop, an open-source project, has become the de facto standard for implementing the
MapReduce paradigm in big data processing.

The core idea behind MapReduce is to divide a large dataset into smaller chunks and
distribute them across a cluster of commodity hardware. The processing of data is
divided into two main phases: the Map phase and the Reduce phase.

● Map Phase: In this phase, the input data is split into smaller parts, and a mapping
function is applied to each chunk independently. This function transforms the
data into a set of key-value pairs, where the key is often used for grouping related
data.
● Shuffling and Sorting: After the Map phase, the framework shuffles and sorts the
generated key-value pairs to group data with the same key together. This step is
critical for the efficiency of the Reduce phase.
● Reduce Phase: In this phase, another function, the reducing function, is applied to
each group of key-value pairs. This function aggregates, filters, or processes the
data to produce the final output.

Hadoop provides a distributed file system called Hadoop Distributed File System
(HDFS) that stores data across multiple nodes in the cluster. It also manages the
distribution of Map and Reduce tasks, monitors task progress, and handles task failures,
ensuring fault tolerance.
Key advantages of MapReduce using Hadoop include:

● Scalability: Hadoop can handle datasets of virtually any size by adding more
nodes to the cluster.
● Fault Tolerance: Hadoop automatically replicates data and tasks to ensure that if
a node fails, processing can continue on another node.
● Data Locality: Hadoop tries to process data on the same node where it resides,
reducing network overhead.
● Programming Flexibility: Developers can write Map and Reduce functions in
various programming languages, making it accessible to a wide range of users.
● Ecosystem: Hadoop has a rich ecosystem of tools and libraries for various data
processing tasks, including Hive, Pig, and Spark.

CODE: import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

public static class TokenizerMapper

extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(Object key, Text value, Context context

) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

public static class IntSumReducer

extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,

Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
OUTPUT
AND
RESULT:
CONCLUSION:

In conclusion, the Hadoop Word Count program showcases the transformative power of MapReduce by
enabling distributed, parallel processing of vast datasets. This approach greatly improves performance and
scalability compared to traditional sequential methods, making it a cornerstone in the world of big data
analytics.
REFERENCES:

● https://www.youtube.com/watch?v=6sK3LDY7Pp4

● https://www.projectpro.io/hadoop-tutorial/hadoop-mapreduce-wordcount-tutorial#:~:text=T
he%20text%20from%20the%20input,and%20value%20is%20'1'.&text=This%20is%20how
%20the%20MapReduce,in%20any%20given%20input%20file.

● https://www.youtube.com/watch?v=WoZ2KSAfujQ

X505 Service Guide Chapter 02-V1.0
100% (1)
X505 Service Guide Chapter 02-V1.0
33 pages
Manual de Operaciones-MK9Config 3 Instrukcja 3.1.8.7 en
No ratings yet
Manual de Operaciones-MK9Config 3 Instrukcja 3.1.8.7 en
34 pages
Tm244acc6 LCD
No ratings yet
Tm244acc6 LCD
4 pages
Java Advanced OOP
100% (1)
Java Advanced OOP
0 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
Example - (Map Function in Word Count)
No ratings yet
Example - (Map Function in Word Count)
6 pages
Big Data Practical 2
No ratings yet
Big Data Practical 2
11 pages
Week-8 de
No ratings yet
Week-8 de
9 pages
Ravikant Hadoop File
No ratings yet
Ravikant Hadoop File
22 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
Practical 2-1
No ratings yet
Practical 2-1
4 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
58 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
Unit IV Programming Model
No ratings yet
Unit IV Programming Model
30 pages
02-Wordcount Mapreduce
No ratings yet
02-Wordcount Mapreduce
5 pages
Part B Assignment - No - 1
No ratings yet
Part B Assignment - No - 1
6 pages
Features of Hadoop: - Suitable For Big Data Analysis
No ratings yet
Features of Hadoop: - Suitable For Big Data Analysis
6 pages
Big Data 4 Vivek
No ratings yet
Big Data 4 Vivek
3 pages
BDA Experiment 3
No ratings yet
BDA Experiment 3
7 pages
Mapreduce Programming Framework
No ratings yet
Mapreduce Programming Framework
23 pages
BDC Output 3
No ratings yet
BDC Output 3
4 pages
BDA Lab 8 Manual
No ratings yet
BDA Lab 8 Manual
7 pages
Prerequisites: Single Node Setup Cluster Setup
No ratings yet
Prerequisites: Single Node Setup Cluster Setup
5 pages
Experiment 6 BDA
No ratings yet
Experiment 6 BDA
4 pages
Practical 2c
No ratings yet
Practical 2c
2 pages
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
No ratings yet
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
5 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
59 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
5 pages
Hadoop - Mapreduce
No ratings yet
Hadoop - Mapreduce
5 pages
Bda Unit 3
No ratings yet
Bda Unit 3
20 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
M4 06 MapReduce
No ratings yet
M4 06 MapReduce
28 pages
Palak
No ratings yet
Palak
10 pages
Map Reduce
No ratings yet
Map Reduce
57 pages
Map Reduce Design and Execution Framework Part 1
No ratings yet
Map Reduce Design and Execution Framework Part 1
19 pages
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
No ratings yet
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
9 pages
Kick Start Hadoop: Word Count - Hadoop Map Reduce Example
No ratings yet
Kick Start Hadoop: Word Count - Hadoop Map Reduce Example
13 pages
Practical-2 Aim: Write A Program of Word Count in Map Reduce Over HDFS. Description
No ratings yet
Practical-2 Aim: Write A Program of Word Count in Map Reduce Over HDFS. Description
6 pages
Bda Lab Exercises Lab Mannual - 2023
No ratings yet
Bda Lab Exercises Lab Mannual - 2023
72 pages
Ravinder Big Data 4 PDF
No ratings yet
Ravinder Big Data 4 PDF
15 pages
Exp 5 Bdafinal
No ratings yet
Exp 5 Bdafinal
7 pages
MapReduce Programs
No ratings yet
MapReduce Programs
10 pages
Map Reduce
No ratings yet
Map Reduce
18 pages
B1 Instructions
No ratings yet
B1 Instructions
9 pages
Bda Megh
No ratings yet
Bda Megh
50 pages
Bda 03
No ratings yet
Bda 03
10 pages
Exp 4 Word Count
No ratings yet
Exp 4 Word Count
4 pages
Experiment-4 BDA LAB
No ratings yet
Experiment-4 BDA LAB
7 pages
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
No ratings yet
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
13 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
3 MapReduce Program Ex Code
No ratings yet
3 MapReduce Program Ex Code
14 pages
DSBDA GRP B Print
No ratings yet
DSBDA GRP B Print
21 pages
Hadoop and Map Reduce
No ratings yet
Hadoop and Map Reduce
27 pages
UNIT 2-tt1
No ratings yet
UNIT 2-tt1
7 pages
Steps: /usr/lib/hadoop-0.20/ Usr/lib/hadoop-0.20/lib
No ratings yet
Steps: /usr/lib/hadoop-0.20/ Usr/lib/hadoop-0.20/lib
4 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
Import Import Import Import Import Import Import Import Public Class Extends Implements
No ratings yet
Import Import Import Import Import Import Import Import Public Class Extends Implements
7 pages
Big Data Analytics Lab Manual (BE AI&DS)
No ratings yet
Big Data Analytics Lab Manual (BE AI&DS)
29 pages
Bda Exp1 Chinmay
No ratings yet
Bda Exp1 Chinmay
13 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
Big Data Akshat
No ratings yet
Big Data Akshat
57 pages
BDT Lab Manual
No ratings yet
BDT Lab Manual
48 pages
BSIT Nov Dec 2012 2nd Cycle
No ratings yet
BSIT Nov Dec 2012 2nd Cycle
59 pages
Smartphone Enrollment For Android
No ratings yet
Smartphone Enrollment For Android
8 pages
Step by Step Smart Forms
No ratings yet
Step by Step Smart Forms
45 pages
Computer Organization and Architecture: Chapter Five
No ratings yet
Computer Organization and Architecture: Chapter Five
23 pages
inBIO460 Installation Guide PDF
No ratings yet
inBIO460 Installation Guide PDF
2 pages
Configuring Link Aggregation With Etherchannel: Implementing Spanning Tree
No ratings yet
Configuring Link Aggregation With Etherchannel: Implementing Spanning Tree
15 pages
CC101 Introduction To Computing
No ratings yet
CC101 Introduction To Computing
3 pages
RwRemote User Guide v2 Rev15
No ratings yet
RwRemote User Guide v2 Rev15
23 pages
MTE 241 Final Review
No ratings yet
MTE 241 Final Review
19 pages
General Architecture: JDBC - Java Database Connectivity
No ratings yet
General Architecture: JDBC - Java Database Connectivity
4 pages
Arrays Assembly
No ratings yet
Arrays Assembly
6 pages
Susanta Roy Chouwdhury: Skills and Technology
No ratings yet
Susanta Roy Chouwdhury: Skills and Technology
4 pages
Validation Controls
No ratings yet
Validation Controls
46 pages
Official AccuPoint-NG Installation-Guide
No ratings yet
Official AccuPoint-NG Installation-Guide
22 pages
Blue Screen of Death
No ratings yet
Blue Screen of Death
4 pages
Bit Blast
No ratings yet
Bit Blast
1 page
Operating Systems
No ratings yet
Operating Systems
4 pages
The Data Transfer Techniques: The Batch Input Technique
No ratings yet
The Data Transfer Techniques: The Batch Input Technique
3 pages
PGTR - Trafos Secos
No ratings yet
PGTR - Trafos Secos
2 pages
Technology Tools For Collaborative Work
No ratings yet
Technology Tools For Collaborative Work
48 pages
Huawei
No ratings yet
Huawei
49 pages
Step-1: Open: ISE Design Suite Step-2: Create New Project From File Tab
No ratings yet
Step-1: Open: ISE Design Suite Step-2: Create New Project From File Tab
9 pages
CSC3C03-Problem Solving Using C
100% (1)
CSC3C03-Problem Solving Using C
98 pages
SNJB's KBJ COE, Chandwad Department of Computer Engineering
No ratings yet
SNJB's KBJ COE, Chandwad Department of Computer Engineering
15 pages
1) WAP To Show A Boxing and Un-Boxing of Types
No ratings yet
1) WAP To Show A Boxing and Un-Boxing of Types
15 pages
Os Unit 4
No ratings yet
Os Unit 4
27 pages

Bda Exp2 Chinmay

Uploaded by

Bda Exp2 Chinmay

Uploaded by

Name Chinmay Vasant Pichad

UID no. 2020300053

Experiment No. EXP 2

AIM Map Reduce Implementation

PROBLEM Develop a MapReduce implementation in Hadoop to analyze large datasets efficiently,

MapReduce is a programming model and processing framework developed by Google to

CODE: import java.io.IOException;

public class WordCount {

public static class TokenizerMapper

private final static IntWritable one = new IntWritable(1);

public void map(Object key, Text value, Context context

public static class IntSumReducer

public void reduce(Text key, Iterable<IntWritable> values,

public static void main(String[] args) throws Exception {

You might also like