BDA Lab
BDA Lab
Open Ubuntu Terminal and enter the following commands for Hadoop Installation,
configuration and running HDFS files.
1. Install java jdk 8
sudo apt install openjdk-8-jdk -y
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:/usr/lib/jvm/java-8-openjdk-amd64/bin
export HADOOP_HOME=~/hadoop-3.2.4/
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_STREAMING=$HADOOP_HOME/share/hadoop/tools/lib/hadoop-
streaming-3.2.4.jar
export HADOOP_LOG_DIR=$HADOOP_HOME/logs
export PDSH_RCMD_TYPE=ssh
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value> </property>
<property>
<name>hadoop.proxyuser.dataflair.groups</name> <value>*</value>
</property>
<property>
<name>hadoop.proxyuser.dataflair.hosts</name> <value>*</value>
</property>
<property>
<name>hadoop.proxyuser.server.hosts</name> <value>*</value>
</property>
<property>
<name>hadoop.proxyuser.server.groups</name> <value>*</value>
</property>
</configuration>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_
HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
14. To start
start-all.sh
https://localhost:9870
15. To stop
stop-all.sh
2. Implement the following file management tasks in Hadoop: Adding files and
directories, retrieving files and Deleting files.
1. Create a Directory
hdfs dfs -mkdir -p tdata
Output → 5
➔ WordCountMapper.java
package org.myorg.Demo;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.io.LongWritable;
➔ WordCountReducer.java
package org.myorg.Demo;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
→ WordCount.java
package org.myorg.Demo;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
To Run the Project using command line interface do the following steps:
1. Start Hadoop
start-all.sh
2. Create a Directory
hdfs dfs -mkdir -p test
3. Insert input file into the directory
hdfs dfs -put /home/veeranna/input.txt test/
input.txt
➔ MaxTemperatureMapper.java
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
while (tokenizer.hasMoreTokens())
{
String year= tokenizer.nextToken();
k.set(year);
String temp= tokenizer.nextToken().trim();
int v = Integer.parseInt(temp);
context.write(k,new IntWritable(v));
}
}
}
6. Create another class that performs the reduce job
➔ MaxTemperatureReducer.java
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
→ MaxTemperature.java
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
To Run the Project using command line interface do the following steps:
1. Start Hadoop
start-all.sh
2. Create a Directory
hdfs dfs -mkdir -p test
3. Insert input file into the directory
hdfs dfs -put /home/veeranna/Temperature.txt test/
Temperature.txt
#java
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:JAVA_HOME/bin
#pig
export PIG_HOME=$HOME/pig-0.17.0
export PATH=$PATH:$PIG_HOME/bin
5. start pig
→ pig
grunt> A = load 'passwd' using PigStorage(':');
grunt> B = foreach A generate $0 as id;
grunt> dump B;
OUTPUT
(John,18,4.0F)
(Mary,19,3.7F)
(Bill,20,3.9F)
(Joe,22,3.8F)
(Jill,20,4.0F)
OUTPUT
(John,18,4.0F)
(Joe,22,3.8F)
(Jill,20,4.0F