Chapter 3 Hadoop

Uploaded by

Abhishek Nazare

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Chapter 3 Hadoop

Uploaded by

Abhishek Nazare

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Hadoop

In this approach, an enterprise will have a computer to store and process big data. For storage
purpose, the programmers will take the help of their choice of database vendors such as
Oracle, IBM, etc. In this approach, the user interacts with the application, which in turn handles
the part of data storage and analysis.

Limitation
This approach works fine with those applications that process less voluminous data that can
be accommodated by standard database servers, or up to the limit of the processor that is
processing the data. But when it comes to dealing with huge amounts of scalable data, it is a
hectic task to process such data through a single database bottleneck.
Google’s Solution
Google solved this problem using an algorithm called MapReduce. This algorithm
divides the task into small parts and assigns them to many computers, and collects
the results from them which when integrated, form the result dataset.
Hadoop
Using the solution provided by Google, Doug Cutting and his team developed an Open Source Project
called HADOOP.
Hadoop runs applications using the MapReduce algorithm, where the data is processed in parallel with others.
In short, Hadoop is used to develop applications that could perform complete statistical analysis on huge
amounts of data.
Hadoop is an Apache open source framework written in java that allows distributed processing of large
datasets across clusters of computers using simple programming models. The Hadoop framework
application works in an environment that provides distributed storage and computation across clusters of
computers. Hadoop is designed to scale up from single server to thousands of machines, each offering
local computation and storage.

Hadoop Architecture
At its core, Hadoop has two major layers namely −
•Processing/Computation layer (MapReduce), and
•Storage layer (Hadoop Distributed File System).
MapReduce
MapReduce is a parallel programming model for writing distributed applications devised at Google for
efficient processing of large amounts of data (multi-terabyte data-sets), on large clusters (thousands of
nodes) of commodity hardware in a reliable, fault-tolerant manner. The MapReduce program runs on
Hadoop which is an Apache open-source framework.

Hadoop Distributed File System

The Hadoop Distributed File System (HDFS) is based on the Google File System (GFS) and provides a
distributed file system that is designed to run on commodity hardware. It has many similarities with
existing distributed file systems. However, the differences from other distributed file systems are
significant. It is highly fault-tolerant and is designed to be deployed on low-cost hardware. It provides
high throughput access to application data and is suitable for applications having large datasets.

Apart from the above-mentioned two core components, Hadoop framework also includes the following
two modules −

•Hadoop Common − These are Java libraries and utilities required by other Hadoop modules.
•Hadoop YARN − This is a framework for job scheduling and cluster resource management.
How Does Hadoop Work?
It is quite expensive to build bigger servers with heavy configurations that handle large scale
processing, but as an alternative, you can tie together many commodity computers with single-CPU,
as a single functional distributed system and practically.
The clustered machines can read the dataset in parallel and provide a much higher throughput.
Moreover, it is cheaper than one high-end server.
So this is the first motivational factor behind using Hadoop that it runs across clustered and low-cost
machines. Hadoop runs code across a cluster of computers.

This process includes the following core tasks that Hadoop performs −
•Data is initially divided into directories and files. Files are divided into uniform sized blocks of 128M
and 64M (preferably 128M).
•These files are then distributed across various cluster nodes for further processing.
•HDFS, being on top of the local file system, supervises the processing.
•Blocks are replicated for handling hardware failure.
•Checking that the code was executed successfully.
•Performing the sort that takes place between the map and reduce stages.
•Sending the sorted data to a certain computer.
•Writing the debugging logs for each job.
Advantages of Hadoop

•Hadoop framework allows the user to quickly write and test distributed systems. It is efficient,
and it automatic distributes the data and work across the machines and in turn, utilizes the
underlying parallelism of the CPU cores.

•Hadoop does not rely on hardware to provide fault-tolerance and high availability (FTHA),
rather Hadoop library itself has been designed to detect and handle failures at the application
layer.

•Servers can be added or removed from the cluster dynamically and Hadoop continues to
operate without interruption.

•Another big advantage of Hadoop is that apart from being open source, it is compatible on all
the platforms since it is Java based.
Computing Model of Hadoop

AWS Academy Cloud Foundations Module 10 Student Guide
100% (2)
AWS Academy Cloud Foundations Module 10 Student Guide
50 pages
Part 02 - Big Data Solutions
No ratings yet
Part 02 - Big Data Solutions
17 pages
Unit 4 Hadoop
No ratings yet
Unit 4 Hadoop
31 pages
Hadoop 10
No ratings yet
Hadoop 10
8 pages
Report On An Exploratory Analysis of The
No ratings yet
Report On An Exploratory Analysis of The
19 pages
Hadoop and Mapreduce
No ratings yet
Hadoop and Mapreduce
21 pages
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
No ratings yet
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
20 pages
Chapter - 2 Hadoop
No ratings yet
Chapter - 2 Hadoop
32 pages
Big Data – Introduction to Hadoop
No ratings yet
Big Data – Introduction to Hadoop
61 pages
BD - HadoopEcoSystem Unit 2part 1
No ratings yet
BD - HadoopEcoSystem Unit 2part 1
12 pages
Unit-2 Hadoop and MapReduce
No ratings yet
Unit-2 Hadoop and MapReduce
32 pages
Exploring Bigdata With Hadoop: Dr.A.Bazila Banu Associate Professor Department of Cse
No ratings yet
Exploring Bigdata With Hadoop: Dr.A.Bazila Banu Associate Professor Department of Cse
23 pages
Hadoop
No ratings yet
Hadoop
13 pages
Unit-2 Hadoop
No ratings yet
Unit-2 Hadoop
16 pages
CC-KML051-Unit V
No ratings yet
CC-KML051-Unit V
17 pages
Hadoop
No ratings yet
Hadoop
11 pages
BIG Data_Unit_2
No ratings yet
BIG Data_Unit_2
24 pages
Hadoop Lab
100% (1)
Hadoop Lab
32 pages
shawn
No ratings yet
shawn
4 pages
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
100% (1)
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
89 pages
Hadoop-How It Works
No ratings yet
Hadoop-How It Works
5 pages
Hadoop Introduction PDF
No ratings yet
Hadoop Introduction PDF
3 pages
Unit 2-1
No ratings yet
Unit 2-1
43 pages
Unit 3 ETI (BDA)
No ratings yet
Unit 3 ETI (BDA)
34 pages
Hadoop Presentation: Swarnali B.SC Computer Science Hons. 2 Year Chandernagore Govt. College Halder
No ratings yet
Hadoop Presentation: Swarnali B.SC Computer Science Hons. 2 Year Chandernagore Govt. College Halder
8 pages
Big Data - Unit 2 Hadoop Framework
100% (1)
Big Data - Unit 2 Hadoop Framework
19 pages
BDA Manual
No ratings yet
BDA Manual
57 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Large-Scale Data Analytics: Traditional Database Systems
No ratings yet
Large-Scale Data Analytics: Traditional Database Systems
11 pages
Assignment 10
No ratings yet
Assignment 10
5 pages
Chapter 2
No ratings yet
Chapter 2
19 pages
Hadoop
No ratings yet
Hadoop
5 pages
BDA Notes
No ratings yet
BDA Notes
25 pages
Module-2
No ratings yet
Module-2
23 pages
Chapter2 Bdi
No ratings yet
Chapter2 Bdi
101 pages
SDL Module-No SQL Module Assignment No. 2: Q1 What Is Hadoop and Need For It? Discuss It's Architecture
No ratings yet
SDL Module-No SQL Module Assignment No. 2: Q1 What Is Hadoop and Need For It? Discuss It's Architecture
6 pages
CC Unit - 5
No ratings yet
CC Unit - 5
27 pages
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
No ratings yet
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
40 pages
Unit - III Advanced Analytics Technology and Tools
No ratings yet
Unit - III Advanced Analytics Technology and Tools
44 pages
Big Data Unit 2 Notes
No ratings yet
Big Data Unit 2 Notes
6 pages
Parallel Project
No ratings yet
Parallel Project
32 pages
Apache Hadoop
No ratings yet
Apache Hadoop
27 pages
hadoop Introduction
No ratings yet
hadoop Introduction
2 pages
UNIT II
No ratings yet
UNIT II
30 pages
Big Data RAJNEESH CCC
No ratings yet
Big Data RAJNEESH CCC
11 pages
Big Data ABHISHEK PRAJA C CCCCCCCCCCC
No ratings yet
Big Data ABHISHEK PRAJA C CCCCCCCCCCC
11 pages
Hadoop Quick Guide
No ratings yet
Hadoop Quick Guide
32 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
32 pages
BDA Unit 2
No ratings yet
BDA Unit 2
39 pages
Haddob Lab Report
No ratings yet
Haddob Lab Report
12 pages
HADOOP
No ratings yet
HADOOP
18 pages
02 Unit-II Hadoop Architecture and HDFS
No ratings yet
02 Unit-II Hadoop Architecture and HDFS
18 pages
UNIT 3-1
No ratings yet
UNIT 3-1
14 pages
Big Data?: Hadoop?
No ratings yet
Big Data?: Hadoop?
2 pages
INtroduction To Big DAta and HAdoop
No ratings yet
INtroduction To Big DAta and HAdoop
30 pages
CC-Unit 3
No ratings yet
CC-Unit 3
22 pages
Hadoop Notesforstudents
No ratings yet
Hadoop Notesforstudents
13 pages
Unit V Cloud Technologies and Advancements
No ratings yet
Unit V Cloud Technologies and Advancements
33 pages
Hadoop in bigdata processing concept
No ratings yet
Hadoop in bigdata processing concept
2 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
It353 Bct-Module 2
No ratings yet
It353 Bct-Module 2
34 pages
Chord
No ratings yet
Chord
47 pages
Unit 4
No ratings yet
Unit 4
20 pages
S8 Sem Computer Science Syllabus (Old Scheme)
No ratings yet
S8 Sem Computer Science Syllabus (Old Scheme)
14 pages
Types of Software PDF
No ratings yet
Types of Software PDF
4 pages
Blockchain Consensus
No ratings yet
Blockchain Consensus
33 pages
Daj User
No ratings yet
Daj User
16 pages
Notes Module1
No ratings yet
Notes Module1
14 pages
Minotaur Multi-Resource Blockchain Consensus
No ratings yet
Minotaur Multi-Resource Blockchain Consensus
14 pages
CC Handouts
No ratings yet
CC Handouts
6 pages
A Framework For The Engineering of Reliable Distributed Systems
No ratings yet
A Framework For The Engineering of Reliable Distributed Systems
8 pages
SCHEDULES in Transaction and Concurrency Control
No ratings yet
SCHEDULES in Transaction and Concurrency Control
46 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
1 page
Cryptocurrency (Ilmi Aur Shari'i Muhakama) - Prof Dr Hussain Mohi-ud-Din Qadri
No ratings yet
Cryptocurrency (Ilmi Aur Shari'i Muhakama) - Prof Dr Hussain Mohi-ud-Din Qadri
16 pages
BSV Academy - What Is BSV
No ratings yet
BSV Academy - What Is BSV
5 pages
Distributed Vs Parallel Computing
No ratings yet
Distributed Vs Parallel Computing
31 pages
R22 Unit-4
No ratings yet
R22 Unit-4
29 pages
SPC 2401 Distributed Systems Year III And IV Semester I
No ratings yet
SPC 2401 Distributed Systems Year III And IV Semester I
2 pages
Module 2
No ratings yet
Module 2
131 pages
unit-5-concurrency-control
No ratings yet
unit-5-concurrency-control
34 pages
Data Structure Questions Worksheet
No ratings yet
Data Structure Questions Worksheet
6 pages
Big Data Dan Penerapan Nya Dalam Bisnis
No ratings yet
Big Data Dan Penerapan Nya Dalam Bisnis
41 pages
State of Digital Assets in Brazil
No ratings yet
State of Digital Assets in Brazil
42 pages
MC0085 - Advanced Operating Systems
No ratings yet
MC0085 - Advanced Operating Systems
5 pages
Subcodes Help Upload Internals View / Update Data Pending Data / Download PDF Help Logout
No ratings yet
Subcodes Help Upload Internals View / Update Data Pending Data / Download PDF Help Logout
173 pages
Список Лучших Криптовалют На Рынке
No ratings yet
Список Лучших Криптовалют На Рынке
6 pages
A Transaction in The Context of Database Management Systems
No ratings yet
A Transaction in The Context of Database Management Systems
13 pages
1) AWS SA Associate Syllabus
No ratings yet
1) AWS SA Associate Syllabus
9 pages
Torolalana Bots Telegram Ahazoam-Bola
No ratings yet
Torolalana Bots Telegram Ahazoam-Bola
12 pages

Chapter 3 Hadoop

Uploaded by

Chapter 3 Hadoop

Uploaded by

Hadoop

Hadoop Distributed File System

You might also like