Installing Standalone and pseudocode Hadoop cluster
1. Setting up VMWare virtual machine
a. Download VMWare player from following URL.
https://my.vmware.com/en/web/vmware/free#desktop_end_user_computing/vmware_workst
ation_player/12_0
b. download the centos 6.4 image file: CentOS-6.4-x86_64-bin-DVD1 from following URL
http://vault.centos.org/6.4/isos/x86_64/
c. Start VMWare player and create new Virtual machine
d. Brows to the iso file location and select it
e. Name the virtual machine and create a user with password.
f. Select the location to store the virtual disk of the VM, this can be any location on the windows
file system.
g. Select the disk size and type as shown in the screenshot bellow.
h. Customize the hardware settings to set the RAM to 4mb(select as per your machine
configuration) and the network settings as custom: VMnet8(NAT) and click finish.Press
i. Hit the enter key
2. Setting up password less SSH:
a. vi /etc/sysconfig/network set the hostnames of each node in the
cluster.
b. find out the ip address of the VM machine as follows.
c. Update the host file on each machine in the cluster as follows.
d. Reset the network settings as follows and check the hostname it should be as follows.
service network restart
e. ssh-keygen and hit enter for all the promptings.
f. navigate to ~/.ssh folder and check the rsa keys
g. copy the public key to the authorized_keys file
cat id_rsa.pub >> authorized_keys
h. Disable the firewall
/etc/init.d/iptables stop
3. Installing JAVA:
a. Download java 1.8 from following URL
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
b. Using winscp move the .rpm file to /usr/local/src folder on the VM machine
c. Install java using the following yum command
yum localinstall -y jdk-8u231-linux-x64.rpm
d. Check the installation by issuing the following command
java -version
4. Setting up Hadoop:
a. Download Hadoop 2.7.1 from the URL bellow in /usr/local/src folder
https://archive.apache.org/dist/hadoop/core/hadoop-2.7.1
b. create a folder called Apache
mkdir /apache
c. extract the gz file in this folder by running following command
tar -xvzf /usr/local/src/hadoop-2.7.1.tar.gz -C /apache
d. Create a soft link in the /apache folder as follows which allows us to switch between the Hadoop
version without changing the environment settings
ln -s hadoop-2.7.1 hadoop
e. Set the Linux environment variables in the .bashrc file as follows
export HADOOP_HOME=/apache/hadoop
export HADOOP_CONF=/apache/hadoop/etc/hadoop
export JAVA_HOME=/usr/java/jdk1.8.0_241-amd64
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
f. Source the .bashrc file
source .bashrc
g. The standalone hadoop installation is complete
hdfs dfs -ls /
h. Update the configuration files
1. Update core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
2. Update the hdfs-site.xml
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/grid/hadoop/hdfs/nn</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/grid/hadoop/hdfs/dn</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
3. Ceate mapred-site.xml file from the mapred-site.xml.template file
cp mapred-site.xml.template mapred-site.xml
4. Update the mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
5. Update the yarn-ite.xml
<configuration>
<property>
<name>yarn.nodemanager.address</name>
<value>0.0.0.0:45454</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/grid/hadoop/yarn/tmp/</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-
services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.schedu
ler.capacity.CapacityScheduler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.nodemanager.address</name>
<value>master</value>
</property>
<property>
<description>NM Webapp address.</description>
<name>yarn.nodemanager.webapp.address</name>
<value>master</value>
</property>
</configuration>
6. Update masters with ‘master’
7. Update slaves with ‘master’
i. Create the required folders
mkdir -p /grid/hadoop/hdfs/nn
mkdir -p /grid/hadoop/hdfs/dn
mkdir -p /grid/hadoop/yarn/tmp
j. Format the namenode
hadoop namenode -format
k. Check the name node metadata directory
l. Start the hadoop daemons
$HADOOP_HOME/sbin/start-all.sh
m. Check the hadoop daemons bu jps command
n. Check the namenode webui from the 50070 port
o. Check the resource manager webui from the 8088 port