0% found this document useful (0 votes)

79 views8 pages

Hadoop Installation Guide for Ubuntu

This document provides a step-by-step guide for installing Hadoop on a single-node Ubuntu cluster, starting with checking Java and Hadoop versions, updating the system, and installing Java. It covers downloading and configuring Hadoop, setting up environment variables, enabling SSH, and configuring Hadoop files for core, HDFS, MapReduce, and YARN settings. Finally, it details formatting the Namenode, starting Hadoop services, verifying running services, and accessing Hadoop web interfaces for monitoring.

Uploaded by

Gopika

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views8 pages

Hadoop Installation Guide for Ubuntu

Uploaded by

Gopika

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Hadoop Installation on Ubuntu (Single Node Cluster)

Step 1: Check Java and Hadoop Versions

Before installing Hadoop, it is important to check whether Java is installed because

Hadoop is built on Java and requires it to function. The java -version command verifies the
Java installation, ensuring compatibility. Similarly, hadoop version checks if Hadoop is
already installed to avoid conflicts when setting up a new version. If Java is missing, we
must install it before proceeding with Hadoop installation.

Before installing Hadoop, ensure that Java is installed.

java -version

hadoop version

Step 2: Update and Upgrade the System

Running sudo apt update and sudo apt upgrade -y ensures that all system packages are up
to date. This step prevents dependency issues while installing new software like Java and
Hadoop. Updating the package list ensures we get the latest versions, and upgrading
applies security patches and software improvements.

Updating ensures that all the installed packages are up to date.

sudo apt update

sudo apt upgrade -y

The -y flag automatically confirms updates.

Step 3: Install Java

Hadoop requires Java to execute its processes. OpenJDK 11 is a stable, widely used version
that works well with Hadoop 3.x. By installing it with sudo apt install openjdk-11-jdk -y, we
ensure that Hadoop has the necessary Java runtime environment. This step is crucial
because, without Java, Hadoop will not function.

Hadoop requires Java to run. Install OpenJDK 11 using:

sudo apt install openjdk-11-jdk -y

java -version
Step 4: Download and Extract Hadoop

Hadoop is downloaded from Apache’s official website using wget. The command fetches
the Hadoop package (hadoop-3.3.6.tar.gz), which is then extracted using tar -xvzf. This
unpacks Hadoop into a directory. Finally, the extracted folder is moved to
/usr/local/hadoop, a common location for system-wide software installations. This makes
Hadoop easily accessible to all users on the system.

Download Hadoop from the official Apache website.

wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz

Extract the downloaded file:

tar -xvzf hadoop-3.3.6.tar.gz

Move Hadoop to the /usr/local directory for system-wide access:

sudo mv hadoop-3.3.6 /usr/local/Hadoop

Step 5: Configure Environment Variables

After installation, we need to configure environment variables to make Hadoop and Java
easily executable from any terminal session. This is done by adding the Hadoop and Java
paths to ~/.bashrc. We define JAVA_HOME, HADOOP_HOME, PATH, and
HADOOP_CONF_DIR, ensuring that the system recognizes Hadoop commands without
requiring full paths.

Edit the ~/.bashrc file to set up Hadoop and Java paths.

nano ~/.bashrc

Add the following lines at the end of the file:

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

export HADOOP_HOME=/usr/local/hadoop

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/Hadoop

Save and exit (Ctrl + X, then Y, then Enter).

Apply the changes:

Once the environment variables are added, they need to be applied for the current
session. Running source ~/.bashrc reloads the bash profile so that changes take effect
immediately without needing to restart the terminal. This ensures that any Hadoop-
related commands work as expected.
Source ~/.bashrc

Step 6: Enable SSH for Hadoop

Hadoop requires SSH (Secure Shell) for communication between nodes in a distributed
environment. Even in a single-node setup, SSH is needed to start and stop Hadoop services
without manually logging in each time. This step is essential because Hadoop’s daemons
interact over SSH.

To enable password-less SSH login, we generate an SSH key pair using ssh-keygen -t rsa -P
"" -f ~/.ssh/id_rsa. The public key is then added to the authorized_keys file using cat
~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys. This setup allows Hadoop daemons to
communicate securely without repeatedly asking for passwords, which is crucial for
automation.

Hadoop requires passwordless SSH access.

ssh localhost

If SSH is not installed, install it using:

sudo apt install ssh -y

Generate SSH keys and configure passwordless SSH:

ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

chmod 600 ~/.ssh/authorized_keys

Now, verify SSH:

ssh localhost

Step 7: Configure Hadoop Files

Core-Site Configuration:

The core-site.xml file specifies Hadoop’s core settings. The <fs.defaultFS> property is set to
hdfs://localhost:9000, defining the default Hadoop filesystem as HDFS. The
<hadoop.tmp.dir> property sets a temporary directory for Hadoop’s intermediate
operations. This configuration is necessary to initialize and manage HDFS correctly.

Edit the core-site.xml file:

nano $HADOOP_HOME/etc/hadoop/core-site.xml

Add the following content:

<name>fs.defaultFS</name>

<value>hdfs://localhost:9000</value>

</property>

<name>hadoop.tmp.dir</name>

<value>/usr/local/hadoop/tmp</value>

<description>A base directory for HDFS and other temporary files.</description>

</property>

</configuration>

Save and exit.

HDFS-Site Configuration:

The hdfs-site.xml file configures the Hadoop Distributed File System (HDFS). The
<dfs.replication> property is set to 1, meaning each file block is stored only once, which is
ideal for a single-node setup. <dfs.namenode.name.dir> and <dfs.datanode.data.dir>
specify directories for storing metadata and actual file data, ensuring proper data
organization.

Edit the hdfs-site.xml file:

nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Add the following content:

<name>dfs.replication</name>

<description>Number of replicas for HDFS blocks (set to 1 for single-node

cluster).</description>

</property>
<property>

<name>dfs.namenode.name.dir</name>

<value>file:///usr/local/hadoop/hdfs/namenode</value>

<description>Directory for Namenode metadata.</description>

</property>

<name>dfs.datanode.data.dir</name>

<value>file:///usr/local/hadoop/hdfs/datanode</value>

<description>Directory for Datanode storage.</description>

</property>

</configuration>

Save and exit

MapReduce Configuration:

This file configures the MapReduce framework in Hadoop. The

<mapreduce.framework.name> property is set to yarn, meaning Hadoop will use YARN to
manage computational resources. <mapreduce.jobhistory.address> is set to
localhost:10020, enabling the job history server to track completed MapReduce jobs. This
configuration is essential for executing and monitoring Hadoop jobs.

Edit the mapred-site.xml file:

nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

Add the following content:

<name>mapreduce.framework.name</name>

</property>
<property>

<name>mapreduce.jobhistory.address</name>

<value>localhost:10020</value>

</property>

</configuration>

Save and exit

YARN Configuration:

The yarn-site.xml file sets up YARN, the resource management layer of Hadoop.
<yarn.resourcemanager.hostname> is set to localhost, defining where the
ResourceManager will run. <yarn.nodemanager.aux-services> is set to mapreduce_shuffle,
enabling data shuffling for MapReduce jobs. These settings ensure that YARN efficiently
schedules and executes tasks.

Edit the yarn-site.xml file:

nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

Add the following content:

<name>yarn.resourcemanager.hostname</name>

<value>localhost</value>

</property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

Save and exit

Step 8: Format Namenode:

Before starting Hadoop for the first time, the NameNode must be formatted using hdfs
namenode -format. This command initializes the HDFS metadata and clears any previous
data. Without formatting, the system might face inconsistencies, preventing Hadoop from
functioning correctly. This step is only required for the first setup.

Before starting Hadoop, format the HDFS Namenode:

hdfs namenode -format

Step 9: Start Hadoop Services:

To launch Hadoop, we run start-dfs.sh to start HDFS services (NameNode and DataNode)
and start-yarn.sh to start YARN (ResourceManager and NodeManager). These scripts
initialize the distributed storage and resource management layers of Hadoop. Running
them ensures that the cluster is up and ready for processing tasks.

Start the Hadoop Distributed File System (HDFS):

start-dfs.sh

start-yarn.sh

Step 10: Verify Running Services:

After setting up Hadoop, we use the jps command to list all running Java processes. This
helps verify if essential Hadoop daemons like NameNode, DataNode, ResourceManager,
and NodeManager are running properly. If any service is missing, troubleshooting is
needed before proceeding.

Check if Hadoop processes are running:

Jps

Expected output:

 NameNode

 DataNode

 SecondaryNameNode

 ResourceManager

 NodeManager
Step 11: Hadoop Web Interfaces

 You can access the following Hadoop web UIs:

Service URL Description

NameNode UI http://localhost:9870/ Shows HDFS file system status.

ResourceManager UI http://localhost:8088/ Monitors running applications in

YARN.

DataNode UI http://localhost:9864/ Displays DataNode status.

NodeManager UI http://localhost:8042/ Shows NodeManager details.

Hadoop provides web interfaces for real-time monitoring:

 NameNode UI (http://localhost:9870/): Shows HDFS status, including storage

capacity and active nodes.
 ResourceManager UI (http://localhost:8088/): Displays running and completed
YARN applications.
 DataNode UI (http://localhost:9864/): Monitors individual DataNode health.
 NodeManager UI (http://localhost:8042/): Shows the status of compute nodes.

These web UIs are useful for troubleshooting and observing cluster activity.

Hadoop Configuration
No ratings yet
Hadoop Configuration
12 pages
Hadoop Setup Guide for Linux Users
No ratings yet
Hadoop Setup Guide for Linux Users
23 pages
Hive INstallation
No ratings yet
Hive INstallation
13 pages
Install Sqoop
No ratings yet
Install Sqoop
7 pages
Week 1 Lab
No ratings yet
Week 1 Lab
8 pages
Install Hadoop: Standalone & Pseudo Modes
No ratings yet
Install Hadoop: Standalone & Pseudo Modes
13 pages
Installation of Hadoop in Ubuntu
No ratings yet
Installation of Hadoop in Ubuntu
15 pages
Hadoop Setup Guide for Ubuntu 16.04/18.04
No ratings yet
Hadoop Setup Guide for Ubuntu 16.04/18.04
20 pages
BDA Lab Manual UPDATED
No ratings yet
BDA Lab Manual UPDATED
45 pages
Hadoop Installation
No ratings yet
Hadoop Installation
6 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
18 pages
Bdamanual
No ratings yet
Bdamanual
8 pages
Installing Hadoop 3.2.4 Guide
No ratings yet
Installing Hadoop 3.2.4 Guide
7 pages
Experiment No - 1
No ratings yet
Experiment No - 1
13 pages
$ Sudo Apt-Get Install Oracle-Java8-Installer
No ratings yet
$ Sudo Apt-Get Install Oracle-Java8-Installer
4 pages
Hadoop Installation Step by Step
No ratings yet
Hadoop Installation Step by Step
8 pages
Install Hadoop on Ubuntu: Step-by-Step Guide
No ratings yet
Install Hadoop on Ubuntu: Step-by-Step Guide
15 pages
Big Data Record
No ratings yet
Big Data Record
69 pages
DataVisuaization Lab
No ratings yet
DataVisuaization Lab
5 pages
Sqoop Data Transfer Tutorial
No ratings yet
Sqoop Data Transfer Tutorial
11 pages
Hbase Installationn
No ratings yet
Hbase Installationn
12 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
6 pages
Single Node Hadoop Installation Guide
100% (1)
Single Node Hadoop Installation Guide
6 pages
Lab Manual
No ratings yet
Lab Manual
27 pages
Group A 1st
No ratings yet
Group A 1st
4 pages
Bda Lab
No ratings yet
Bda Lab
37 pages
Hadoop Single Node Setup on Ubuntu
No ratings yet
Hadoop Single Node Setup on Ubuntu
7 pages
Ex 1
No ratings yet
Ex 1
5 pages
Single Node Hadoop Cluster
No ratings yet
Single Node Hadoop Cluster
9 pages
Hadoop Install
No ratings yet
Hadoop Install
19 pages
Install Hadoop on Ubuntu Single Node
No ratings yet
Install Hadoop on Ubuntu Single Node
8 pages
BDA Practical Experiment 1
No ratings yet
BDA Practical Experiment 1
5 pages
Exp 1 1
No ratings yet
Exp 1 1
24 pages
Hadoop Installation Guide for Ubuntu
No ratings yet
Hadoop Installation Guide for Ubuntu
7 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
80 pages
Hadoop Installation and YARN Setup Guide
No ratings yet
Hadoop Installation and YARN Setup Guide
11 pages
Hadoop Setup Guide for Developers
No ratings yet
Hadoop Setup Guide for Developers
7 pages
Hadoop 3 Installation
No ratings yet
Hadoop 3 Installation
10 pages
Hadoop Installation and MapReduce Guide
No ratings yet
Hadoop Installation and MapReduce Guide
25 pages
Hadoop Installation
No ratings yet
Hadoop Installation
4 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
49 pages
BigData Lab Manual
No ratings yet
BigData Lab Manual
44 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Hadoop Setup Guide for Developers
No ratings yet
Hadoop Setup Guide for Developers
3 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
Hadoop Setup & File Management Guide
No ratings yet
Hadoop Setup & File Management Guide
16 pages
Install Oracle Java 8 on Ubuntu
No ratings yet
Install Oracle Java 8 on Ubuntu
7 pages
2023MCS320004 HEMANTH TARRA - Hadoop Installation - Assignment
No ratings yet
2023MCS320004 HEMANTH TARRA - Hadoop Installation - Assignment
9 pages
Hadoop 0.20.2 Installation Guide
No ratings yet
Hadoop 0.20.2 Installation Guide
8 pages
Practical 5
No ratings yet
Practical 5
3 pages
Setting Up and Running Hadoop 0.20.2
No ratings yet
Setting Up and Running Hadoop 0.20.2
20 pages
Unix Commands Part 2
No ratings yet
Unix Commands Part 2
37 pages
Assignment Tanupriya BDDV
No ratings yet
Assignment Tanupriya BDDV
8 pages
Hadoop Installatio1
No ratings yet
Hadoop Installatio1
22 pages
Formatting Hadoop Namenode
No ratings yet
Formatting Hadoop Namenode
27 pages
Hadoop MapReduce Dashboard Setup Guide
No ratings yet
Hadoop MapReduce Dashboard Setup Guide
39 pages
Bigdatamanual
No ratings yet
Bigdatamanual
45 pages
BDA Practical
No ratings yet
BDA Practical
38 pages
Overview of Elastic Cloud Computing
No ratings yet
Overview of Elastic Cloud Computing
21 pages
How To Change The Session Timeout in Oracle BI Publisher 11g
No ratings yet
How To Change The Session Timeout in Oracle BI Publisher 11g
2 pages
Install Websphere and Adobe LiveCycle
No ratings yet
Install Websphere and Adobe LiveCycle
125 pages
HP Data Protector
No ratings yet
HP Data Protector
5 pages
Testing Team and Development Team
100% (1)
Testing Team and Development Team
33 pages
Embedded Systems Development Tools
No ratings yet
Embedded Systems Development Tools
3 pages
n8n Cookbook
No ratings yet
n8n Cookbook
12 pages
DBMS Architecture
100% (1)
DBMS Architecture
35 pages
Stack Operations Tutorial CE317
No ratings yet
Stack Operations Tutorial CE317
39 pages
Jmeter Preview
No ratings yet
Jmeter Preview
21 pages
Verification Letter FDD
No ratings yet
Verification Letter FDD
8 pages
Office Timeline PowerPoint Add-In Trial
No ratings yet
Office Timeline PowerPoint Add-In Trial
2 pages
Unit 3 QB Cci
No ratings yet
Unit 3 QB Cci
3 pages
Ds Database Automation Pro Service
No ratings yet
Ds Database Automation Pro Service
4 pages
Cloud Security 7.0 For Azure Study Guide
No ratings yet
Cloud Security 7.0 For Azure Study Guide
110 pages
Mitel-Compatible Recording & Analytics
No ratings yet
Mitel-Compatible Recording & Analytics
2 pages
Mockito Programming Cookbook
100% (1)
Mockito Programming Cookbook
74 pages
Quality Assurance Process
100% (1)
Quality Assurance Process
130 pages
HP Device Manager 4.7 Disaster Recovery Guide
No ratings yet
HP Device Manager 4.7 Disaster Recovery Guide
17 pages
Role of Information Technology in Manufacturing Industries
No ratings yet
Role of Information Technology in Manufacturing Industries
16 pages
Infrastructure Deployment Options
No ratings yet
Infrastructure Deployment Options
18 pages
Software Engineering - Project Management Concepts MCQs ExamRadar - 1
No ratings yet
Software Engineering - Project Management Concepts MCQs ExamRadar - 1
9 pages
CO1508 Practical 10 Instructor Notes
No ratings yet
CO1508 Practical 10 Instructor Notes
2 pages
09 Race Conditions
No ratings yet
09 Race Conditions
93 pages
Digital Classroom-1
No ratings yet
Digital Classroom-1
30 pages
SQL Subqueries Explained
No ratings yet
SQL Subqueries Explained
12 pages
Milestone Systems: Quick Guide: Failover Clustering
No ratings yet
Milestone Systems: Quick Guide: Failover Clustering
15 pages
Chart Copying for Engineers
No ratings yet
Chart Copying for Engineers
2 pages
CyberArk DNA v6.2 Demo Script
No ratings yet
CyberArk DNA v6.2 Demo Script
10 pages
Java Developer Profile
No ratings yet
Java Developer Profile
3 pages

Hadoop Installation Guide for Ubuntu

Uploaded by

Hadoop Installation Guide for Ubuntu

Uploaded by

Hadoop Installation on Ubuntu (Single Node Cluster)

Step 1: Check Java and Hadoop Versions

Before installing Hadoop, it is important to check whether Java is installed because

Before installing Hadoop, ensure that Java is installed.

Step 2: Update and Upgrade the System

Updating ensures that all the installed packages are up to date.

sudo apt update

sudo apt upgrade -y

The -y flag automatically confirms updates.

Step 3: Install Java

Hadoop requires Java to run. Install OpenJDK 11 using:

sudo apt install openjdk-11-jdk -y

Download Hadoop from the official Apache website.

Extract the downloaded file:

tar -xvzf hadoop-3.3.6.tar.gz

Move Hadoop to the /usr/local directory for system-wide access:

sudo mv hadoop-3.3.6 /usr/local/Hadoop

Step 5: Configure Environment Variables

Edit the ~/.bashrc file to set up Hadoop and Java paths.

Add the following lines at the end of the file:

Save and exit (Ctrl + X, then Y, then Enter).

Step 6: Enable SSH for Hadoop

Hadoop requires passwordless SSH access.

If SSH is not installed, install it using:

sudo apt install ssh -y

Generate SSH keys and configure passwordless SSH:

ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

chmod 600 ~/.ssh/authorized_keys

Now, verify SSH:

Step 7: Configure Hadoop Files

Edit the core-site.xml file:

Add the following content:

<description>A base directory for HDFS and other temporary files.</description>

Save and exit.

Edit the hdfs-site.xml file:

Add the following content:

<description>Number of replicas for HDFS blocks (set to 1 for single-node

<description>Directory for Namenode metadata.</description>

<description>Directory for Datanode storage.</description>

Save and exit

This file configures the MapReduce framework in Hadoop. The

Edit the mapred-site.xml file:

Add the following content:

Save and exit

Edit the yarn-site.xml file:

Add the following content:

Save and exit

Before starting Hadoop, format the HDFS Namenode:

hdfs namenode -format

Step 9: Start Hadoop Services:

Start the Hadoop Distributed File System (HDFS):

Step 10: Verify Running Services:

Check if Hadoop processes are running:

 You can access the following Hadoop web UIs:

Service URL Description

ResourceManager UI http://localhost:8088/ Monitors running applications in

DataNode UI http://localhost:9864/ Displays DataNode status.

NodeManager UI http://localhost:8042/ Shows NodeManager details.

Hadoop provides web interfaces for real-time monitoring:

 NameNode UI (http://localhost:9870/): Shows HDFS status, including storage

You might also like