0% found this document useful (0 votes)
18 views

Running Jar Program

Uploaded by

nkr189
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Running Jar Program

Uploaded by

nkr189
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Steps to run a Hadoop Jar file using Eclipse

1. Download and Open eclipse(version: mars) in Linux


2. Create a java project
3. Create a java class named WordCount as follows:

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

public static class TokenizerMapper


extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);


private Text word = new Text();

public void map(Object key, Text value, Context context


) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

public static class IntSumReducer


extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,


Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}

public static void main(String[] args) throws Exception {


Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

4. Right click the project and select run as configurations and click new launch configuration
under java application. Give a name of the configuration and select the project name and
class of the project. Press on apply button and then close button to close the window.
5. Right click on the project and click on export. Select runnable jar file under java. Select
launch configuration file and destination directory (ex: dest/wordcount.jar) to store the jar
file.
6. Make sure that $jps command will list all the running daemons as follows.

Node manager
Name node
Data node
Secondary Name node
Resource manager

7. Create the input directory in hdfs as follows:

$hdfs dfs -mkdir -p /user/dbda/input

8. Copy the input directory having input files into hdfs as follows:

$hdfs dfs -copyFromLocal <local file directory> /user/dbda/input

9. Run the hadoop program using jar file as follows:

$yarn jar <jar path>/wordcount.jar /user/dbda/input /user/dbda/output

Program starts running

10. To see the results, display the contents of the output directory. It has two files _success and
part-r-00000.
11. The results can be displayed by using the following command:

$hdfs dfs -cat /user/dbda/output/part-r-00000

The result will be shown in command window.


12. The results can be downloaded using browser also.
13. Open the browser using URL localhost:50070. NameNode information will be displayed.
14. Go to utilities tab and select browse the file system. Navigate to the input and output folder
created in DFS. Input data as well as output generated can be downloaded.
Steps to run a Hadoop Jar file using Command Line

Compile WordCount.java and create a jar:

$ javac –cp $(hadoop classpath) WordCount.java

$ jar cf wc.jar WordCount*.class

Assuming that:

 /user/dbda/wordcount/input - input directory in HDFS


 /user/dbda/wordcount/output - output directory in HDFS

Sample text-files as input:

$ hdfs dfs -ls /user/dbda/wordcount/input/

/user/dbda/wordcount/input/file01
/user/dbda/wordcount/input/file02

$ hdfs dfs -cat /user/dbda/wordcount/input/file01

Hello World Bye World

$ hdfs dfs -cat /user/dbda/wordcount/input/file02

Hello Hadoop Goodbye Hadoop

Run the application:

$ yarn jar wc.jar WordCount /user/dbda/wordcount/input


/user/dbda/wordcount/output

Output:

$ hdfs dfs -cat /user/dbda/wordcount/output/part-r-00000

Bye 1
Goodbye 1
Hadoop 2
Hello 2
World 2

You might also like