0% found this document useful (0 votes)

6 views

Hadoop1

The document provides an overview of the Hadoop ecosystem, detailing its components such as HDFS, MapReduce, and YARN, along with various tools that support big data processing. It also includes basic HDFS commands for file management and manipulation, illustrating how to create, copy, move, and delete files and directories within HDFS. Additionally, it touches on using Pig for processing datasets, emphasizing the flexibility and scalability of Hadoop for handling large volumes of data.

Uploaded by

itsabhi739

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Hadoop1

Uploaded by

itsabhi739

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Experiment 1

Study of Hadoop ecosystem

Apache Hadoop is an open source framework intended to make interaction with big
data easier, However, for those who are not acquainted with this technology, one
question arises that what is big data ? Big data is a term given to the data sets which
can’t be processed in an efficient manner with the help of traditional methodology
such as RDBMS. Hadoop has made its place in the industries and companies that
need to work on large data sets which are sensitive and needs efficient handling.
Hadoop is a framework that enables processing of large data sets which reside in the
form of clusters. Being a framework, Hadoop is made up of several modules that are
supported by a large ecosystem of technologies.

Hadoop Ecosystem is a platform or a suite which provides various services to solve

the big data problems. It includes Apache projects and various commercial tools and
solutions. There are four major elements of Hadoop i.e. HDFS, MapReduce, YARN,
and Hadoop Common. Most of the tools or solutions are used to supplement or
support these major elements. All these tools work collectively to provide services
such as absorption, analysis, storage and maintenance of data etc.

Here are some key components of the Hadoop ecosystem:

1. Hadoop Distributed File System (HDFS): This is the storage component of Hadoop,
designed to store large files across multiple machines in a distributed manner.

2. MapReduce: It's a programming model and processing engine used to process

vast amounts of data in parallel across a distributed cluster.

3. YARN (Yet Another Resource Negotiator): This is the resource management layer
of Hadoop that manages resources and schedules tasks across the cluster.

4. Hadoop Common: This includes libraries and utilities needed by other modules
within the Hadoop ecosystem.

5. Hive: A data warehouse infrastructure built on top of Hadoop for querying and
analyzing large datasets stored in HDFS using a SQL-like language called HiveQL.

6. Pig: Another high-level platform for analyzing large datasets. It provides a simple
language called Pig Latin, which is used to perform data manipulation tasks.

7. HBase: It's a NoSQL database that runs on top of Hadoop and provides real-time
read/write access to large datasets.

8. Spark: While not part of the core Hadoop ecosystem, Spark works seamlessly with
Hadoop and provides a faster and more general-purpose alternative to MapReduce
for data processing.

9. Mahout: It's a machine learning library built on top of Hadoop for scalable and
distributed machine learning algorithms.

10. ZooKeeper: A centralized service for maintaining configuration information,
naming, providing distributed synchronization, and group services.

11. Sqoop: A tool designed for efficiently transferring bulk data between Hadoop and
structured data stores like relational databases.

12. Flume and Kafka: These are used for streaming data into Hadoop from various
sources.

The Hadoop ecosystem is continuously evolving, with new tools and technologies
being developed to address different aspects of big data processing, storage,
and analysis. Its flexibility and scalability make it a popular choice for organizations
dealing with large volumes of data.
Experiment 2
Basic HDFS Commands

a. ls: This command is used to list all the files. Use lsr for recursive approach. It is
useful when we want a hierarchy of a folder.
Syntax:
bin/hdfs dfs -ls <path>
Example:
bin/hdfs dfs -ls /
It will print all the directories present in HDFS. bin directory contains executables so,
bin/hdfs means we want the executables of hdfs particularly dfs(Distributed File
System) commands.

b. mkdir: To create a directory. In Hadoop dfs there is no home directory by default. So

let’s first create it.
Syntax:
bin/hdfs dfs -mkdir <folder name>

creating home directory:

hdfs/bin -mkdir /user

hdfs/bin -mkdir /user/username -> write the username of your computer
Example:
bin/hdfs dfs -mkdir /geeks => '/' means absolute path
bin/hdfs dfs -mkdir geeks2 => Relative path -> the folder will be
created relative to the home directory.

c. touchz: It creates an empty file.

Syntax:
bin/hdfs dfs -touchz <file_path>
Example:
bin/hdfs dfs -touchz /geeks/myfile.txt

d. copyFromLocal (or) put: To copy files/folders from local file system to hdfs store. This
is the most important command. Local filesystem means the files present on the OS.
Syntax:
bin/hdfs dfs -copyFromLocal <local file path> <dest(present on hdfs)>
Example: Let’s suppose we have a file AI.txt on Desktop which we want to copy to
folder geeks present on hdfs.
bin/hdfs dfs -copyFromLocal ../Desktop/AI.txt /geeks

(OR)

bin/hdfs dfs -put ../Desktop/AI.txt /geeks

e. cat: To print file contents.

Syntax:
bin/hdfs dfs -cat <path>
Example:
// print the content of AI.txt present
// inside geeks folder.
bin/hdfs dfs -cat /geeks/AI.txt ->

f. copyToLocal (or) get: To copy files/folders from hdfs store to local file system.
Syntax:
bin/hdfs dfs -copyToLocal <<srcfile(on hdfs)> <local file dest>
Example:
bin/hdfs dfs -copyToLocal /geeks ../Desktop/hero

(OR)

bin/hdfs dfs -get /geeks/myfile.txt ../Desktop/hero

myfile.txt from geeks folder will be copied to folder hero present on Desktop.
Note: Observe that we don’t write bin/hdfs while checking the things present on local
filesystem.

g. moveFromLocal: This command will move file from local to hdfs.

Syntax:
bin/hdfs dfs -moveFromLocal <local src> <dest(on hdfs)>
Example:
bin/hdfs dfs -moveFromLocal ../Desktop/cutAndPaste.txt /geeks

h. cp: This command is used to copy files within hdfs. Lets copy folder geeks to
geeks_copied.
Syntax:
bin/hdfs dfs -cp <src(on hdfs)> <dest(on hdfs)>
Example:
bin/hdfs -cp /geeks /geeks_copied

i. mv: This command is used to move files within hdfs. Lets cut-paste a file myfile.txt
from geeks folder to geeks_copied.
Syntax:
bin/hdfs dfs -mv <src(on hdfs)> <src(on hdfs)>
Example:
bin/hdfs -mv /geeks/myfile.txt /geeks_copied
j. rmr: This command deletes a file from HDFS recursively. It is very useful command
when you want to delete a non-empty directory.
Syntax:
bin/hdfs dfs -rmr <filename/directoryName>
Example:
bin/hdfs dfs -rmr /geeks_copied -> It will delete all the content inside the
directory then the directory itself.

k. du: It will give the size of each file in directory.

Syntax:
bin/hdfs dfs -du <dirName>
Example:
bin/hdfs dfs -du /geeks

l. dus:: This command will give the total size of directory/file.

Syntax:
bin/hdfs dfs -dus <dirName>
Example:
bin/hdfs dfs -dus /geeks
m. stat: It will give the last modified time of directory or path. In short it will give stats of
the directory or file.
Syntax:
bin/hdfs dfs -stat <hdfs file>
Example:
bin/hdfs dfs -stat /geeks

n. setrep: This command is used to change the replication factor of a file/directory in

HDFS. By default it is 3 for anything which is stored in HDFS (as set in hdfs
core-site.xml).

Example 1: To change the replication factor to 6 for geeks.txt stored in HDFS.

bin/hdfs dfs -setrep -R -w 6 geeks.txt

Example 2: To change the replication factor to 4 for a directory geeksInput stored in

HDFS.

bin/hdfs dfs -setrep -R 4 /geeks

Note: The -w means wait till the replication is completed. And -R means recursively,
we use it for directories as they may also contain many files and folders inside them.

Note: There are more commands in HDFS but we discussed the commands which are
commonly used when working with Hadoop. You can check out the list of dfs commands
using the following command: bin/hdfs dfs
Experiment 3
Hadoop filesystem navigation and manupulation using commands

Hadoop FS Command Line

The Hadoop FS command line is a simple way to access and interface with HDFS. Below
are some basic HDFS commands in Linux, including operations like creating directories,
moving files, deleting files, reading files, and listing directories.

To use HDFS commands, start the Hadoop services using the following command:

sbin/start-all.sh

To check if Hadoop is up and running:

jps

Below cover several basic HDFS commands, along with a list of more File system
commands given command -help.

mkdir:
To create a directory, similar to Unix ls command.

Options:
-p : Do not fail if the directory already exists

$ hadoop fs -mkdir [-p]

ls:
List directories present under a specific directory in HDFS, similar to Unix ls command. The
-lsr command can be used for recursive listing of directories and files.

Options:
-d : List the directories as plain files
-h : Format the sizes of files to a human-readable manner instead of number of bytes
-R : Recursively list the contents of directories

$ hadoop fs -ls [-d] [-h] [-R]

copyFromLocal:
Copy files from the local file system to HDFS, similar to -put command. This command will
not work if the file already exists. To overwrite the destination if the file already exists, add -f
flag to command.

Options:

-p : Preserves access and modification time, ownership and the mode

-f : Overwrites the destination
$ hadoop fs -copyFromLocal [-f] [-p] …

copyToLocal:
Copy files from HDFS to local file system, similar to -get command.

$ hadoop fs -copyToLocal [-p] [-ignoreCrc] [-crc] ...

cat:
Display contents of a file, similar to Unix cat command.

$ hadoop fs -cat /user/data/sampletext.txt

cp:
Copy files from one directory to another within HDFS, similar to Unix cp command.

$ hadoop fs -cp /user/data/sample1.txt /user/hadoop1

$ hadoop fs -cp /user/data/sample2.txt /user/test/in1

mv:
Move files from one directory to another within HDFS, similar to Unix mv command.

$ hadoop fs -mv /user/hadoop/sample1.txt /user/text/

rm:
Remove a file from HDFS, similar to Unix rm command. This command does not delete
directories. For recursive delete, use command -rm -r.

Options:
-r : Recursively remove directories and files
-skipTrash : To bypass trash and immediately delete the source
-f : Mention if there is no file existing
-rR : Recursively delete directories

$ hadoop fs -mv /user/hadoop/sample1.txt /user/text/

getmerge:
Merge a list of files in one directory on HDFS into a single file on local file system. One of the
most important and useful commands when trying to read the contents of map reduce job or
pig job’s output files.

$ hadoop fs -getmerge /user/data

setrep:
Change replication factor of a file to a specific instead of default replication factor for
remaining in HDFS. If it is a directory, then the command will recursively change in the
replication of all the files in the directory tree given the input provided.
Options:
-w : Request the command wait for the replication to be completed (potentially takes a long
time)
-r : Accept for backwards compatibility and has no effect

$ hadoop fs -setrep [-R] [-w]

touchz:
Creates an empty file in HDFS.

$ hadoop fs -touchz URI

test:
Test an HDFS file’s existence of an empty file or if it is a directory or not.

Options:
-w : Request the command wait for the replication to be completed (potentially takes a long
time)
-r : Accept for backwards compatibility and has no effect

$ hadoop fs -setrep [-R] [-w]

appendToFile:
Appends the contents of all given local files to the provided destination file on HDFS. The
destination file will be created if it doesn’t already exist.

$ hadoop fs -appendToFile

chmod:
Change the permission of a file, similar to Linux shell’s command but with a few exceptions.

<MODE> Same as mode used for the shell’s command with the only letters recognized are
‘rwxXt’

<OCTALMODE> Mode specified in 3 or 4 digits. It is not possible to specify only part of the
mode, unlike the shell command.

Options:
-R : Modify the files recursively

$ hadoop fs -chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH

chown:
Change owner and group of a file, similar to Linux shell’s command but with a few
exceptions.

Options:
-R : Modify the files recursively
$ hadoop fs -chown [-R] [OWNER][:[GROUP]] PATH

df:
Show capacity (free and used space) of the file system. The status of the root partitions are
provided if the file system has multiple partitions and no path is specified.

Options:
-h : Format the sizes of files to a human-readable manner instead of number of bytes

$ hadoop fs -df [-h] [<path> ...]

du:
Show size of each file in the directory.

Options:
-s : Show total summary size
-h : Format the sizes of files to a human-readable manner instead of number of bytes

$ hadoop fs -df [-h] [<path> ...]

tail:
Show the last 1KB of the file.

Options:
-f : Show appended data as the file grows

$ hadoop fs -tail [-f]

Experiment 4
Implement the following file mangement tasks in hadoop

In Hadoop Distributed File System (HDFS), you can perform various file management tasks
using command-line interfaces or programming languages like Java. Let's cover how you
can achieve these tasks:

A) Adding Files and Directories:

1. Adding Files to HDFS:

Use the hdfs dfs -put command to add files from your local file system to HDFS:

bash
hdfs dfs -put localfile.txt /user/hadoop/destination_directory/

Replace localfile.txt with the local file path and /user/hadoop/destination_directory/ with the
HDFS directory where you want to place the file.

2. Creating Directories in HDFS:

Use the hdfs dfs -mkdir command to create directories in HDFS:

bash
hdfs dfs -mkdir /user/hadoop/new_directory

This will create a new directory named new_directory under the /user/hadoop/ directory.

B) Retrieving Files:
Use the hdfs dfs -get command to retrieve files from HDFS to your local file system:

bash
hdfs dfs -get /user/hadoop/source_directory/file.txt localfile.txt
Replace /user/hadoop/source_directory/file.txt with the HDFS file path and localfile.txt with
the destination path in your local file system.

C) Deleting Files:
For deleting files or directories in HDFS, you can use the hdfs dfs -rm command for files or
hdfs dfs -rm -r for directories:

1. Delete File:

bash
hdfs dfs -rm /user/hadoop/file_to_delete.txt

Replace /user/hadoop/file_to_delete.txt with the path of the file you want to delete.

2. Delete Directory and Its Contents Recursively:

bash
hdfs dfs -rm -r /user/hadoop/directory_to_delete
Replace /user/hadoop/directory_to_delete with the directory path you want to delete along
with its contents.

Make sure to exercise caution while performing delete operations, especially for directories,
as the -r flag removes them recursively.

These commands can be executed in the terminal or command prompt when connected to a
machine with Hadoop installed and configured, and the appropriate permissions are granted
for file manipulation in HDFS. Adjust paths and filenames as per your specific HDFS
directory structure and file names.
Experiment 6
Process different datasets using pig.

Pig is a powerful tool for processing various datasets using its data flow language, Pig Latin.
Let's consider a scenario where you have multiple datasets, and you want to perform
operations on them using Pig.

Sample Datasets:
Let's say you have two datasets:

Users Dataset (users.csv):

user_id,name,age,gender
1,Alice,28,Female
2,Bob,35,Male
3,Charlie,22,Male
4,Diana,30,Female

Transactions Dataset (transactions.csv):

user_id,product,amount
1,Apple,10
2,Orange,15
3,Banana,8
1,Grapes,12
4,Apple,11

Pig Script to Process Datasets:

Here's a Pig script (data_processing.pig) that joins these datasets based on the user_id and
calculates the total amount spent by each user:

pig
-- Load the users data
users = LOAD 'users.csv' USING PigStorage(',') AS (user_id:int, name:chararray, age:int,
gender:chararray)
-- Load the transactions data
transactions = LOAD 'transactions.csv' USING PigStorage(',') AS (user_id:int,
product:chararray, amount:int);
-- Join the datasets on user_id
joined_data = JOIN users BY user_id, transactions BY user_id;
-- Group the joined data by user_id and calculate the total amount spent by each user
total_spent = FOREACH (GROUP joined_data BY users::user_id) {
user = group.users::user_id;
total = SUM(transactions.amount);
GENERATE user AS user_id, SUM(transactions.amount) AS total_amount_spent;
}
-- Store the results back to HDFS
STORE total_spent INTO 'output/total_spent_by_user' USING PigStorage(',');
Running the Pig Script:

To execute this Pig script, save it as data_processing.pig and use the following command in
the terminal or command prompt:

bash
pig data_processing.pig
This script will join the two datasets based on the user_id, calculate the total amount spent
by each user, and store the results in the HDFS directory output/total_spent_by_user.

Adjust file paths, delimiter, and column names in the script according to your actual dataset
structure and locations in HDFS.

Solutions For All Social Sciences Grade
No ratings yet
Solutions For All Social Sciences Grade
49 pages
Statistics, Data Analysis, and Decision Modeling, 5th Edition
100% (5)
Statistics, Data Analysis, and Decision Modeling, 5th Edition
556 pages
Hadoop Command Line Interface
No ratings yet
Hadoop Command Line Interface
10 pages
Bronto+5th Basic Training Book USB
No ratings yet
Bronto+5th Basic Training Book USB
288 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
How To Set Up A Hadoop Cluster in Docker
No ratings yet
How To Set Up A Hadoop Cluster in Docker
13 pages
Grade 4 Mathematics Practice Test: Nebraska Department of Education 2016
100% (1)
Grade 4 Mathematics Practice Test: Nebraska Department of Education 2016
18 pages
hadoop
No ratings yet
hadoop
6 pages
Experiment No 1
No ratings yet
Experiment No 1
13 pages
HDFS Commands
No ratings yet
HDFS Commands
15 pages
BDA UNIT -3 Updated (1).docx
No ratings yet
BDA UNIT -3 Updated (1).docx
25 pages
Practical 1 - 1 - Hadoop Commands
No ratings yet
Practical 1 - 1 - Hadoop Commands
3 pages
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
No ratings yet
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
35 pages
Dsa Practical File
No ratings yet
Dsa Practical File
16 pages
Hadoop Commands Only
No ratings yet
Hadoop Commands Only
19 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
Bda Practical File
No ratings yet
Bda Practical File
28 pages
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
74 pages
Hadoop Hdfs Commands
No ratings yet
Hadoop Hdfs Commands
2 pages
HDFS Commands1
No ratings yet
HDFS Commands1
18 pages
Hadoop File Complte
No ratings yet
Hadoop File Complte
18 pages
Lab2_BigData-HDFSp
No ratings yet
Lab2_BigData-HDFSp
4 pages
Hadoop-HDFS-commands
No ratings yet
Hadoop-HDFS-commands
1 page
Lab Assignment-1
No ratings yet
Lab Assignment-1
4 pages
BIG DATA UNIT -2
No ratings yet
BIG DATA UNIT -2
18 pages
Lista de Comandos HDFS
No ratings yet
Lista de Comandos HDFS
8 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
HDFS Commands Updated
No ratings yet
HDFS Commands Updated
87 pages
BDH Record - Merged
No ratings yet
BDH Record - Merged
47 pages
Command
No ratings yet
Command
1 page
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
HDFS Basic Commands
No ratings yet
HDFS Basic Commands
2 pages
Big Data Cheat Sheet
No ratings yet
Big Data Cheat Sheet
12 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
Unit 2-HDFS SGS
No ratings yet
Unit 2-HDFS SGS
29 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
HDFS Commands
No ratings yet
HDFS Commands
7 pages
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
No ratings yet
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
210 pages
bda 1 exp
No ratings yet
bda 1 exp
5 pages
Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar
No ratings yet
Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar
5 pages
SSJ Bda File
No ratings yet
SSJ Bda File
16 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
1 Hdfs Notes
No ratings yet
1 Hdfs Notes
38 pages
bigdatamanual(2)
No ratings yet
bigdatamanual(2)
45 pages
ccs 334 bigdata manual
No ratings yet
ccs 334 bigdata manual
45 pages
CCS334-BDA LAB MANUAL final (1)
No ratings yet
CCS334-BDA LAB MANUAL final (1)
46 pages
BDA Record (1)
No ratings yet
BDA Record (1)
34 pages
2335_m4_demo1_v1_b54_kwf9d75
No ratings yet
2335_m4_demo1_v1_b54_kwf9d75
8 pages
BigData_Lab_Manual
No ratings yet
BigData_Lab_Manual
44 pages
HDFS
No ratings yet
HDFS
6 pages
TP 1 - HDFS
No ratings yet
TP 1 - HDFS
40 pages
HDFS (Hadoop Distributed File System) : HDFS Architecture Components of The Architecture
No ratings yet
HDFS (Hadoop Distributed File System) : HDFS Architecture Components of The Architecture
10 pages
DSCI 551 _ Lab 2 _ Aayush Chamria (1)
No ratings yet
DSCI 551 _ Lab 2 _ Aayush Chamria (1)
3 pages
Exp-2 Hadoop Commands
No ratings yet
Exp-2 Hadoop Commands
6 pages
Hadoop Linux Hdfs Commands
No ratings yet
Hadoop Linux Hdfs Commands
2 pages
2 HDFS Commands
No ratings yet
2 HDFS Commands
7 pages
Exp3 BDI 60004200124
No ratings yet
Exp3 BDI 60004200124
5 pages
Big Data Cheat Sheet
No ratings yet
Big Data Cheat Sheet
1 page
BDA Record
No ratings yet
BDA Record
36 pages
Big data analytics lab-JD
No ratings yet
Big data analytics lab-JD
49 pages
Big Data Ia Answers
No ratings yet
Big Data Ia Answers
14 pages
HDFS Tutorial
No ratings yet
HDFS Tutorial
5 pages
Linux Commands By Example
From Everand
Linux Commands By Example
Khaled Jamal
4.5/5 (3)
Assignment - 3 Hardware Design Methodology: Submitted By: Shaily Garg MEC2019010 M.Tech (MI)
No ratings yet
Assignment - 3 Hardware Design Methodology: Submitted By: Shaily Garg MEC2019010 M.Tech (MI)
31 pages
Umar Capstone 1
No ratings yet
Umar Capstone 1
20 pages
Exams Questions Papers Sec1 5
No ratings yet
Exams Questions Papers Sec1 5
104 pages
Weight Monitoring and Weighing
100% (1)
Weight Monitoring and Weighing
11 pages
ORACLE Apps DBA Scripts
No ratings yet
ORACLE Apps DBA Scripts
11 pages
Idebe 3 Physics
No ratings yet
Idebe 3 Physics
157 pages
Download Full Linear Algebra Done Right Sheldon Axler PDF All Chapters
100% (3)
Download Full Linear Algebra Done Right Sheldon Axler PDF All Chapters
50 pages
Infineon-APP MotorControl XMC in Motor Control Applications XMC-TR-V02 00-En
100% (1)
Infineon-APP MotorControl XMC in Motor Control Applications XMC-TR-V02 00-En
45 pages
grade 5 science worksheet variable
No ratings yet
grade 5 science worksheet variable
2 pages
8. Xii-iit Gt- 8 Final Qp - Code- A -03.01-2025
No ratings yet
8. Xii-iit Gt- 8 Final Qp - Code- A -03.01-2025
12 pages
Flow Measurements Tutorial
No ratings yet
Flow Measurements Tutorial
18 pages
Parking Assignment/Transportation Model: Inputs
No ratings yet
Parking Assignment/Transportation Model: Inputs
4 pages
Poly Works Talisman User Guide For Apple Mobile Devices
No ratings yet
Poly Works Talisman User Guide For Apple Mobile Devices
45 pages
Product Data TURBOTHERM TTs SO2 06.2023 English
No ratings yet
Product Data TURBOTHERM TTs SO2 06.2023 English
2 pages
Einstein Conspiracy
No ratings yet
Einstein Conspiracy
51 pages
Bersabal - Experiment 25 - Ee435al
No ratings yet
Bersabal - Experiment 25 - Ee435al
8 pages
PHP 1 Og 4 Ko
No ratings yet
PHP 1 Og 4 Ko
11 pages
SQL
No ratings yet
SQL
2 pages
LINK-History of Protection
No ratings yet
LINK-History of Protection
7 pages
Recloser-Fuse Coordination Protection For Distributed Generation Systems Methodology and Priorities For Optimal Disconnections
No ratings yet
Recloser-Fuse Coordination Protection For Distributed Generation Systems Methodology and Priorities For Optimal Disconnections
6 pages
Stainless 316, 316L, 317, 317L: Element Percent by Weight Maximum Unless Range Is Specified 316 316L 317 317L
No ratings yet
Stainless 316, 316L, 317, 317L: Element Percent by Weight Maximum Unless Range Is Specified 316 316L 317 317L
3 pages
Vatech Smart Plus - Folder
No ratings yet
Vatech Smart Plus - Folder
6 pages
Bcs The Chartered Institute For It
No ratings yet
Bcs The Chartered Institute For It
4 pages
Beam Reinforced Shell Structure Using Offsets
No ratings yet
Beam Reinforced Shell Structure Using Offsets
12 pages
Drools Docs Introduction
No ratings yet
Drools Docs Introduction
108 pages
A 8bit Sequential Multiplier
No ratings yet
A 8bit Sequential Multiplier
6 pages

Hadoop1

Uploaded by

Hadoop1

Uploaded by

Experiment 1

Study of Hadoop ecosystem

Hadoop Ecosystem is a platform or a suite which provides various services to solve

Here are some key components of the Hadoop ecosystem:

​ 2. MapReduce: It's a programming model and processing engine used to process

b. mkdir: To create a directory. In Hadoop dfs there is no home directory by default. So

creating home directory:

hdfs/bin -mkdir /user

c. touchz: It creates an empty file.

bin/hdfs dfs -put ../Desktop/AI.txt /geeks

e. cat: To print file contents.

bin/hdfs dfs -get /geeks/myfile.txt ../Desktop/hero

g. moveFromLocal: This command will move file from local to hdfs.

k. du: It will give the size of each file in directory.

l. dus:: This command will give the total size of directory/file.

n. setrep: This command is used to change the replication factor of a file/directory in

Example 1: To change the replication factor to 6 for geeks.txt stored in HDFS.

Example 2: To change the replication factor to 4 for a directory geeksInput stored in

bin/hdfs dfs -setrep -R 4 /geeks

Hadoop FS Command Line

To check if Hadoop is up and running:

$ hadoop fs -mkdir [-p]

$ hadoop fs -ls [-d] [-h] [-R]

-p : Preserves access and modification time, ownership and the mode

$ hadoop fs -copyToLocal [-p] [-ignoreCrc] [-crc] ...

$ hadoop fs -cat /user/data/sampletext.txt

$ hadoop fs -cp /user/data/sample1.txt /user/hadoop1

$ hadoop fs -mv /user/hadoop/sample1.txt /user/text/

$ hadoop fs -mv /user/hadoop/sample1.txt /user/text/

$ hadoop fs -getmerge /user/data

$ hadoop fs -setrep [-R] [-w]

$ hadoop fs -touchz URI

$ hadoop fs -setrep [-R] [-w]

$ hadoop fs -chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH

$ hadoop fs -df [-h] [<path> ...]

$ hadoop fs -df [-h] [<path> ...]

$ hadoop fs -tail [-f]

A) Adding Files and Directories:

1. Adding Files to HDFS:

2. Creating Directories in HDFS:

2. Delete Directory and Its Contents Recursively:

Users Dataset (users.csv):

Transactions Dataset (transactions.csv):

Pig Script to Process Datasets:

You might also like

2. MapReduce: It's a programming model and processing engine used to process