0% found this document useful (0 votes)
39 views8 pages

Assignment Tanupriya BDDV

Uploaded by

Tanu Ameta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views8 pages

Assignment Tanupriya BDDV

Uploaded by

Tanu Ameta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

ASSIGNMENT

Hands-on with HDFS Task:


Install Hadoop in pseudo-distributed mode or use an online simulator.
Upload and retrieve a sample file using HDFS commands.
Deliverable: Screenshots of steps + command list. Evaluation Criteria:
Execution, clarity of explanation.

Steps to Install Hadoop in Pseudo-Distributed Mode (Conceptual with


Command Examples):

1. Install Java:adoop requires Java. Let's assume you have it installed.


You can check with:
```bash
java -version
```
2. Download and Extract Hadoop:
Let's say you download the Hadoop binary (e.g., `hadoop-
3.3.6.tar.gz`) to your home directory.
```bash
tar -xzvf hadoop-3.3.6.tar.gz
cd hadoop-3.3.6
```

3. Set Environment Variables: You'll need to configure your `~/.bashrc`


or `~/.zshrc` file. Add the following lines (adjust the path if your Hadoop
directory is different):
```bash
export HADOOP_HOME=/home/$USER/hadoop-3.3.6
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
```
Then, apply the changes:
```bash
source ~/.bashrc
“’
4. Configure Hadoop Configuration Files: Navigate to the `etc/hadoop`
directory within your Hadoop installation. You'll need to edit a few key
files:
‘hadoop-env.sh`: Set the `JAVA_HOME` variable.
```bash
nano etc/hadoop/hadoop-env.sh
```
Add or uncomment the line similar to:
```bash
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
```
‘core-site.xml`: Configure the HDFS default name node.
```bash
nano etc/hadoop/core-site.xml
```
Add the following within the `<configuration>` tags:
```xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
```
`hdfs-site.xml`: Configure the HDFS data node directory.
```bash
nano etc/hadoop/hdfs-site.xml
```
Add the following within the `<configuration>` tags:
```xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/tmp/hadoop-data</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/tmp/hadoop-name</value>
</property>
```
`mapred-site.xml`: Configure MapReduce execution mode. You might
need to rename the template file first:
```bash
cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-
site.xml
nano etc/hadoop/mapred-site.xml
```
Add the following within the `<configuration>` tags:
```xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
```
```
Add the following within the `<configuration>` tags:
```xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
```
5. This initializes the HDFS file system.
```bash
hdfs namenode -format
```
6.Start Hadoop Services:
```bash
start-dfs.sh
start-yarn.sh
```
7.Access Hadoop Web UIs (Optional but useful for monitoring)
NameNode: `http://localhost:9870` (or `http://localhost:5007` for older
versions)
```bash
echo "This is a sample file for Hadoop HDFS." > sample.txt
```

Now, let's use HDFS commands:


1. Create a directory in HDFS (optional but good practice):
```bash
hdfs dfs -mkdir /user/$USER/input
```
2. Upload the local file to HDFS:
```bash
hdfs dfs -put sample.txt /user/$USER/input/
```
3. List the files in the HDFS directory:
```bash
hdfs dfs -ls /user/$USER/input/
```
4.
```bash
hdfs dfs -get /user/$USER/input/sample.txt retrieved_sample.txt
```
5.
```bash
cat retrieved_sample.txt
```
6. ```bash
stop-yarn.sh
stop-dfs.sh
```

You might also like