0% found this document useful (0 votes)

0 views

Homework_Labs_Lecture2

This document outlines a lab exercise for undergraduates on running a MapReduce job using Apache Hadoop, specifically focusing on counting word occurrences in Shakespeare's works. It provides detailed steps for compiling Java files, creating a JAR, submitting the job, and reviewing the output. Additionally, it emphasizes the importance of managing job execution and stopping running jobs if necessary.

Uploaded by

Alexandru Cristian Popa

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

Homework_Labs_Lecture2

Uploaded by

Alexandru Cristian Popa

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Apache Hadoop –

A course for undergraduates

Homework Labs, Lecture 2

Copyright © 2010-2014 Cloudera, Inc. All rights reserved. 1

Not to be reproduced without prior written consent.
Lab: Running a MapReduce Job
Files and Directories Used in this Exercise

Source directory: ~/workspace/wordcount/src/stubs

Files:
WordCount.java: A simple MapReduce driver class.
WordMapper.java: A mapper class for the job.
SumReducer.java: A reducer class for the job.
wc.jar: The compiled, assembled WordCount program

In this lab you will compile Java files, create a JAR, and run MapReduce jobs.

In addition to manipulating files in HDFS, the wrapper program hadoop is used to
launch MapReduce jobs. The code for a job is contained in a compiled JAR file.
Hadoop loads the JAR into HDFS and distributes it to the worker nodes, where the
individual tasks of the MapReduce job are executed.

One simple example of a MapReduce job is to count the number of occurrences of
each word in a file or set of files. In this lab you will compile and submit a
MapReduce job to count the number of occurrences of every word in the works of
Shakespeare.

Compiling and Submitting a MapReduce Job

1. In a terminal window, change to the lab source directory, and list the contents:

$ cd ~/workspace/wordcount/src
$ ls

List the files in the stubs package directory:

$ ls stubs

The package contains the following Java files:

Copyright © 2010-2014 Cloudera, Inc. All rights reserved. 2

Not to be reproduced without prior written consent.
WordCount.java: A simple MapReduce driver class.
WordMapper.java: A mapper class for the job.
SumReducer.java: A reducer class for the job.

Examine these files if you wish, but do not change them. Remain in this
directory while you execute the following commands.

2. Before compiling, examine the classpath Hadoop is configured to use:

$ hadoop classpath

This shows lists the locations where the Hadoop core API classes are installed.

3. Compile the three Java classes:

$ javac -classpath `hadoop classpath` stubs/*.java

Note: in the command above, the quotes around hadoop classpath are
backquotes. This runs the hadoop classpath command and uses its
output as part of the javac command.

The compiled (.class) files are placed in the stubs directory.

4. Collect your compiled Java files into a JAR file:

$ jar cvf wc.jar stubs/*.class

5. Submit a MapReduce job to Hadoop using your JAR file to count the occurrences
of each word in Shakespeare:

$ hadoop jar wc.jar stubs.WordCount \

shakespeare wordcounts

This hadoop jar command names the JAR file to use (wc.jar), the class
whose main method should be invoked (stubs.WordCount), and the HDFS
input and output directories to use for the MapReduce job.

Copyright © 2010-2014 Cloudera, Inc. All rights reserved. 3

Not to be reproduced without prior written consent.
Your job reads all the files in your HDFS shakespeare directory, and places its
output in a new HDFS directory called wordcounts.

6. Try running this same command again without any change:

$ hadoop jar wc.jar stubs.WordCount \

shakespeare wordcounts

Your job halts right away with an exception, because Hadoop automatically fails
if your job tries to write its output into an existing directory. This is by design;
since the result of a MapReduce job may be expensive to reproduce, Hadoop
prevents you from accidentally overwriting previously existing files.

7. Review the result of your MapReduce job:

$ hadoop fs -ls wordcounts

This lists the output files for your job. (Your job ran with only one Reducer, so
there should be one file, named part-r-00000, along with a _SUCCESS file
and a _logs directory.)

8. View the contents of the output for your job:

$ hadoop fs -cat wordcounts/part-r-00000 | less

You can page through a few screens to see words and their frequencies in the
works of Shakespeare. (The spacebar will scroll the output by one screen; the
letter 'q' will quit the less utility.) Note that you could have specified
wordcounts/* just as well in this command.

Not to be reproduced without prior written consent.
Wildcards in HDFS file paths

Take care when using wildcards (e.g. *) when specifying HFDS filenames;
because of how Linux works, the shell will attempt to expand the wildcard
before invoking hadoop, and then pass incorrect references to local files instead
of HDFS files. You can prevent this by enclosing the wildcarded HDFS filenames
in single quotes, e.g. hadoop fs –cat 'wordcounts/*'

9. Try running the WordCount job against a single file:

$ hadoop jar wc.jar stubs.WordCount \

shakespeare/poems pwords

When the job completes, inspect the contents of the pwords HDFS directory.

10. Clean up the output files produced by your job runs:

$ hadoop fs -rm -r wordcounts pwords

Stopping MapReduce Jobs

It is important to be able to stop jobs that are already running. This is useful if, for
example, you accidentally introduced an infinite loop into your Mapper. An
important point to remember is that pressing ^C to kill the current process (which
is displaying the MapReduce job's progress) does not actually stop the job itself.

A MapReduce job, once submitted to Hadoop, runs independently of the initiating
process, so losing the connection to the initiating process does not kill the job.
Instead, you need to tell the Hadoop JobTracker to stop the job.

Not to be reproduced without prior written consent.
1. Start another word count job like you did in the previous section:

$ hadoop jar wc.jar stubs.WordCount shakespeare \

count2

2. While this job is running, open another terminal window and enter:

$ mapred job -list

This lists the job ids of all running jobs. A job id looks something like:
job_200902131742_0002

3. Copy the job id, and then kill the running job by entering:

$ mapred job -kill jobid

The JobTracker kills the job, and the program running in the original terminal
completes.

This is the end of the lab.

Not to be reproduced without prior written consent.

ADC01-DOC-146 ADVC Protocol Manual DNP3 R15 WEB PDF
No ratings yet
ADC01-DOC-146 ADVC Protocol Manual DNP3 R15 WEB PDF
40 pages
Inverted Index
No ratings yet
Inverted Index
9 pages
HP Alphaserver Gs1280 Systems
No ratings yet
HP Alphaserver Gs1280 Systems
59 pages
Labs Lecture2
No ratings yet
Labs Lecture2
6 pages
03_Run the WordCount program instructions.docx
No ratings yet
03_Run the WordCount program instructions.docx
4 pages
Big Data Cloudera TP
No ratings yet
Big Data Cloudera TP
33 pages
Word Count
No ratings yet
Word Count
10 pages
Running Jar Program
No ratings yet
Running Jar Program
3 pages
Run The WordCount Program Instructions
No ratings yet
Run The WordCount Program Instructions
3 pages
Big Data Analytics - Lecture 6
No ratings yet
Big Data Analytics - Lecture 6
33 pages
Cloudera Academic Partnership 4 PDF
No ratings yet
Cloudera Academic Partnership 4 PDF
38 pages
Lecture 4 PDF
No ratings yet
Lecture 4 PDF
38 pages
DSBDA 11
No ratings yet
DSBDA 11
15 pages
TPhadoop
No ratings yet
TPhadoop
27 pages
Activity 2
No ratings yet
Activity 2
31 pages
Practical 2c
No ratings yet
Practical 2c
2 pages
Steps to create jar file and execute word count problem in mapper reducer
No ratings yet
Steps to create jar file and execute word count problem in mapper reducer
5 pages
Intellipaat Hands On Exercises PDF
No ratings yet
Intellipaat Hands On Exercises PDF
49 pages
Prerequisites: Single Node Setup Cluster Setup
No ratings yet
Prerequisites: Single Node Setup Cluster Setup
5 pages
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
No ratings yet
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
9 pages
Experiment-4 BDA LAB
No ratings yet
Experiment-4 BDA LAB
7 pages
Cloudera Academic Partnership 3 PDF
0% (1)
Cloudera Academic Partnership 3 PDF
103 pages
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
11 pages
Word Count using MapReduce on Hadoop
No ratings yet
Word Count using MapReduce on Hadoop
14 pages
Ravinder Big Data 4 PDF
No ratings yet
Ravinder Big Data 4 PDF
15 pages
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
No ratings yet
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
22 pages
Commands Guide.: 5.3 Walk-Through
No ratings yet
Commands Guide.: 5.3 Walk-Through
1 page
Lab2 WC
No ratings yet
Lab2 WC
2 pages
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
No ratings yet
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
5 pages
6 WIBD-Practicals
No ratings yet
6 WIBD-Practicals
19 pages
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
No ratings yet
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
13 pages
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
BDM Lab Manual 2
No ratings yet
BDM Lab Manual 2
4 pages
ExNo04
No ratings yet
ExNo04
4 pages
Setup Hadoop Gettingstart
No ratings yet
Setup Hadoop Gettingstart
4 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
CS702_Big_Data_Programs
No ratings yet
CS702_Big_Data_Programs
58 pages
Word_Count(2021)
No ratings yet
Word_Count(2021)
50 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
WordCount Program Hadoop Task 2
No ratings yet
WordCount Program Hadoop Task 2
7 pages
Unit IV Programming Model
No ratings yet
Unit IV Programming Model
30 pages
Hadoop and Map Reduce
No ratings yet
Hadoop and Map Reduce
27 pages
Ravikant_Hadoop_file
No ratings yet
Ravikant_Hadoop_file
22 pages
hadoop2
No ratings yet
hadoop2
31 pages
Hands On
No ratings yet
Hands On
26 pages
Go To Cloudera Quickstart VM To Download A Pre-Setup CDH Virtual Machine
No ratings yet
Go To Cloudera Quickstart VM To Download A Pre-Setup CDH Virtual Machine
20 pages
Lab11 B
No ratings yet
Lab11 B
9 pages
BDA3
No ratings yet
BDA3
7 pages
BDA
No ratings yet
BDA
6 pages
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
02-Wordcount Mapreduce
No ratings yet
02-Wordcount Mapreduce
5 pages
100 Recipes for Programming Java
From Everand
100 Recipes for Programming Java
Jamie Munro
4.5/5 (2)
Hands-On Exercises With Big Data: Lab Sheet 1: Getting Started With Mapreduce and Hadoop
No ratings yet
Hands-On Exercises With Big Data: Lab Sheet 1: Getting Started With Mapreduce and Hadoop
14 pages
Cloud PDF
No ratings yet
Cloud PDF
47 pages
Hadoop Training #4: Programming With Hadoop
100% (2)
Hadoop Training #4: Programming With Hadoop
46 pages
MR YARN - Lab 2 - Cloud - Updated-V2.0
No ratings yet
MR YARN - Lab 2 - Cloud - Updated-V2.0
22 pages
Execute Java Programs in MapReduce
No ratings yet
Execute Java Programs in MapReduce
5 pages
Extracting Real Value From Your Data With Apache Hadoop: Sarah Sproehnle
No ratings yet
Extracting Real Value From Your Data With Apache Hadoop: Sarah Sproehnle
51 pages
Climate Change Guidelines For WASH in Eastern Equatoria State (EES) in South Sudan
No ratings yet
Climate Change Guidelines For WASH in Eastern Equatoria State (EES) in South Sudan
3 pages
Group 11 Assignment 4
No ratings yet
Group 11 Assignment 4
10 pages
Mapreduce, Hadoop and Amazon Aws: Yasser Ganjisaffar
No ratings yet
Mapreduce, Hadoop and Amazon Aws: Yasser Ganjisaffar
33 pages
Hadoop Map-Reduce
No ratings yet
Hadoop Map-Reduce
2 pages
DSP-8 (DSP Processors)
No ratings yet
DSP-8 (DSP Processors)
8 pages
BDA Mod2@AzDOCUMENTS - in
No ratings yet
BDA Mod2@AzDOCUMENTS - in
64 pages
Assignment Cover Sheet Qualification Module Number and Title
No ratings yet
Assignment Cover Sheet Qualification Module Number and Title
8 pages
BGP Notes
No ratings yet
BGP Notes
4 pages
sfc
No ratings yet
sfc
8 pages
Rockchip Roadmap V1.0 2022
No ratings yet
Rockchip Roadmap V1.0 2022
13 pages
R07 Ind780 PLC en
No ratings yet
R07 Ind780 PLC en
256 pages
Ros v5.3 Rsg2488 User-Guide en
No ratings yet
Ros v5.3 Rsg2488 User-Guide en
366 pages
Tcode: SMICM Then Header Menu Path: Goto - Services Incorrect
No ratings yet
Tcode: SMICM Then Header Menu Path: Goto - Services Incorrect
3 pages
Vsphere Esxi 803 Upgrade Guide
No ratings yet
Vsphere Esxi 803 Upgrade Guide
125 pages
ZXA10 C6XX MIB Work Order Dispatching Process Sorting - V1.2
No ratings yet
ZXA10 C6XX MIB Work Order Dispatching Process Sorting - V1.2
34 pages
Oracle 11g Installation 1,2
No ratings yet
Oracle 11g Installation 1,2
18 pages
SOW For System Integrator DSCL v1.1
No ratings yet
SOW For System Integrator DSCL v1.1
46 pages
Message
No ratings yet
Message
2 pages
BSD Magazine PDF
No ratings yet
BSD Magazine PDF
68 pages
Linux Introduction
No ratings yet
Linux Introduction
20 pages
HE163750 - Nguyen Quoc Huy - Part1
No ratings yet
HE163750 - Nguyen Quoc Huy - Part1
15 pages
MatrikonOPC Server For OMRON User Manual
No ratings yet
MatrikonOPC Server For OMRON User Manual
161 pages
Win32.Alman.B Submission Summary
No ratings yet
Win32.Alman.B Submission Summary
6 pages
MIBs For HP H3C and 3com v5.0 Release Notes
No ratings yet
MIBs For HP H3C and 3com v5.0 Release Notes
7 pages
Vmotion Vs Live Migration 10 11
No ratings yet
Vmotion Vs Live Migration 10 11
36 pages
DS Lab - Manual - Assignment 11
No ratings yet
DS Lab - Manual - Assignment 11
3 pages
Ip Camera Manual by WatchMeIp
100% (1)
Ip Camera Manual by WatchMeIp
46 pages
Anti Revering Techniques (Zer0day) PDF
No ratings yet
Anti Revering Techniques (Zer0day) PDF
51 pages
Introduction To Wireless Sensor Networks With 6lowpan and Contiki
No ratings yet
Introduction To Wireless Sensor Networks With 6lowpan and Contiki
22 pages
Particulars of Laboratories With Systems & Hardware Configuration Availability (All Dept)
No ratings yet
Particulars of Laboratories With Systems & Hardware Configuration Availability (All Dept)
2 pages
Game Log
No ratings yet
Game Log
63 pages
Mastering Embedded Linux Programming - Sample Chapter
75% (4)
Mastering Embedded Linux Programming - Sample Chapter
19 pages

Homework_Labs_Lecture2

Uploaded by

Homework_Labs_Lecture2

Uploaded by

Apache Hadoop –

A course for undergraduates

Homework Labs, Lecture 2

Copyright © 2010-2014 Cloudera, Inc. All rights reserved. 1

Source directory: ~/workspace/wordcount/src/stubs

Compiling and Submitting a MapReduce Job

List the files in the stubs package directory:

The package contains the following Java files:

Copyright © 2010-2014 Cloudera, Inc. All rights reserved. 2

3. Compile the three Java classes:

$ javac -classpath `hadoop classpath` stubs/*.java

4. Collect your compiled Java files into a JAR file:

$ jar cvf wc.jar stubs/*.class

$ hadoop jar wc.jar stubs.WordCount \

Copyright © 2010-2014 Cloudera, Inc. All rights reserved. 3

6. Try running this same command again without any change:

$ hadoop jar wc.jar stubs.WordCount \

7. Review the result of your MapReduce job:

$ hadoop fs -ls wordcounts

8. View the contents of the output for your job:

$ hadoop fs -cat wordcounts/part-r-00000 | less

Copyright © 2010-2014 Cloudera, Inc. All rights reserved. 4

9. Try running the WordCount job against a single file:

$ hadoop jar wc.jar stubs.WordCount \

$ hadoop fs -rm -r wordcounts pwords

Stopping MapReduce Jobs

Copyright © 2010-2014 Cloudera, Inc. All rights reserved. 5

$ hadoop jar wc.jar stubs.WordCount shakespeare \

$ mapred job -list

$ mapred job -kill jobid

This is the end of the lab.

Copyright © 2010-2014 Cloudera, Inc. All rights reserved. 6

You might also like