Hadoop

The document discusses Hadoop, an open-source software framework used to store and process large datasets. It describes Hadoop's architecture including HDFS for storage and MapReduce for processing. The document also covers Hadoop use cases like data warehousing, log processing, and recommendation systems.

Uploaded by

kajole7693

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Hadoop

Uploaded by

kajole7693

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Advance Hadoop

By: Siddhant Singhal

2021BCYX1046
Table of contents
• Introduction

• What is Hadoop?

• Hadoop Architecture

• Hadoop Distributed File System (HDFS)

• MapReduce

• Hadoop Use Cases

• Conclusion
Introduction
• Hadoop is an open-source software framework that is used to store and process large
datasets. It is designed to scale up from single servers to thousands of machines, each
offering local computation and storage.

• Hadoop is used by many organizations to store and process large amounts of data,
including Facebook, Yahoo!, and eBay. It is also used in many scientific applications,
including the Large Hadron Collider at CERN.

• Hadoop has become an essential tool for big data analytics and is widely used in the
industry.
What is Hadoop?
• Hadoop is an open-source, distributed computing platform that is used to store, process, and
analyze large datasets. It is designed to handle the challenges of big data, which include large
volumes, velocity, and variety of data.

• Hadoop is built on top of two main components: Hadoop Distributed File System (HDFS) and
MapReduce.

• HDFS is a distributed file system that is used to store large datasets across a cluster of
commodity hardware. MapReduce is a programming model that allows developers to write
code to process large datasets in parallel across the cluster.
Hadoop Architecture
HDFS
• Hadoop Distributed File System (HDFS) is a distributed file system designed to store and
manage large amounts of data in a distributed computing environment. It is part of the
Hadoop architecture and is used to store MapReduce jobs’ input and output data.

• HDFS is designed to run on commodity hardware and provides fault tolerance by replicating
data across multiple nodes in a cluster. Data is broken into blocks and distributed across the
cluster, with each block being replicated to multiple nodes to ensure high availability and data
durability.

• Components:
- Name Node
- Data Node
- Block
- Namespace
MapReduce
• MapReduce is a programming model for processing large data sets in a distributed computing
environment. It consists of two phases: the map phase and the reduce phase.

• In the map phase, data is processed in parallel across multiple nodes in the cluster. Each node
performs a specific operation on the data and generates intermediate results.

• In the reduce phase, the intermediate results are combined and processed to generate the final
output. The reduce phase is also performed in parallel across multiple nodes in the cluster.

• MapReduce is a key component of the Hadoop architecture and is widely used for processing
large-scale data sets.
Hadoop Use cases
1. Data Warehousing
2. Log Processing
3. Recommendation System
4. Social Media Analysis
Conclusion
• Hadoop is a powerful technology that enables organizations to process and analyze
large-scale data sets in a cost-effective and scalable way. With its distributed architecture
and fault-tolerance capabilities, Hadoop has become a key component of the big data
ecosystem and is widely used across various industries.

• Hadoop's ability to handle big data makes it an essential tool for organizations looking to
process and analyze large-scale data sets. With the right expertise and resources,
organizations can leverage Hadoop to gain valuable insights into their data, make better
decisions, and drive innovation.
References
1. Hadoop - Introduction - GeeksforGeeks
2. Hadoop Architecture in Big Data Explained: A Complete Guide with Its Components (simplil
earn.com)

3. What is MapReduce in Hadoop? Big Data Architecture (guru99.com)

Thank you

(PDF) Corejavabynageswararaopdffreedownload
No ratings yet
(PDF) Corejavabynageswararaopdffreedownload
3 pages
Meta - Practice Coding Questions
No ratings yet
Meta - Practice Coding Questions
2 pages
Big Data & Hadoop Training Material 0 1 PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
Chapter - 2 Hadoop
No ratings yet
Chapter - 2 Hadoop
32 pages
Unit 4 Hadoop
No ratings yet
Unit 4 Hadoop
31 pages
Big Data – Introduction to Hadoop
No ratings yet
Big Data – Introduction to Hadoop
61 pages
CS 4407 Discussion Forum Unit 2
No ratings yet
CS 4407 Discussion Forum Unit 2
2 pages
Chapter 3 Hadoop
No ratings yet
Chapter 3 Hadoop
10 pages
Hadoop
No ratings yet
Hadoop
11 pages
Chapter 2 Hadoop Eco System
No ratings yet
Chapter 2 Hadoop Eco System
34 pages
Chapter 2
No ratings yet
Chapter 2
19 pages
BIG Data_Unit_2
No ratings yet
BIG Data_Unit_2
24 pages
Part 02 - Big Data Solutions
No ratings yet
Part 02 - Big Data Solutions
17 pages
Hadoop and Mapreduce
No ratings yet
Hadoop and Mapreduce
21 pages
Hadoop
No ratings yet
Hadoop
5 pages
Bda Unit 4 Material
No ratings yet
Bda Unit 4 Material
37 pages
Shortnotes For Cloud
No ratings yet
Shortnotes For Cloud
22 pages
Unit 3 & 4 big data
No ratings yet
Unit 3 & 4 big data
18 pages
Module 2. 16974328568170
No ratings yet
Module 2. 16974328568170
113 pages
Report On An Exploratory Analysis of The
No ratings yet
Report On An Exploratory Analysis of The
19 pages
Hadoop in bigdata processing concept
No ratings yet
Hadoop in bigdata processing concept
2 pages
Cloud - UNIT V
No ratings yet
Cloud - UNIT V
18 pages
Big Data Analytics
No ratings yet
Big Data Analytics
12 pages
BDM 2
No ratings yet
BDM 2
5 pages
Big Data Analysis pdf 2
No ratings yet
Big Data Analysis pdf 2
18 pages
CC Unit - 5
No ratings yet
CC Unit - 5
27 pages
BDA-Module2
No ratings yet
BDA-Module2
43 pages
Bda Module 2
No ratings yet
Bda Module 2
12 pages
Module 2 CN
No ratings yet
Module 2 CN
23 pages
WWW Doubtly in Big Data Analytics Semester 7 Mu Ai Ds Viva Qna
No ratings yet
WWW Doubtly in Big Data Analytics Semester 7 Mu Ai Ds Viva Qna
7 pages
Apache Hadoop
No ratings yet
Apache Hadoop
27 pages
Bda Unit 2
No ratings yet
Bda Unit 2
79 pages
Chapter2 Bdi
No ratings yet
Chapter2 Bdi
101 pages
Unit-2-_Hadoop2_
No ratings yet
Unit-2-_Hadoop2_
30 pages
Module-2
No ratings yet
Module-2
23 pages
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
No ratings yet
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
20 pages
INSIDE CLOUD - CASE STUDY
No ratings yet
INSIDE CLOUD - CASE STUDY
11 pages
week_5_researchpaper
No ratings yet
week_5_researchpaper
7 pages
L02-Hadoop Framework
No ratings yet
L02-Hadoop Framework
40 pages
Hadoop Presentation: Swarnali B.SC Computer Science Hons. 2 Year Chandernagore Govt. College Halder
No ratings yet
Hadoop Presentation: Swarnali B.SC Computer Science Hons. 2 Year Chandernagore Govt. College Halder
8 pages
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
CC-Unit 3
No ratings yet
CC-Unit 3
22 pages
HADOOP
No ratings yet
HADOOP
10 pages
Parallel Project
No ratings yet
Parallel Project
32 pages
Unit 2-1
No ratings yet
Unit 2-1
43 pages
BDA Notes
No ratings yet
BDA Notes
25 pages
Unit V Cloud Technologies and Advancements
No ratings yet
Unit V Cloud Technologies and Advancements
33 pages
Exploring Bigdata With Hadoop: Dr.A.Bazila Banu Associate Professor Department of Cse
No ratings yet
Exploring Bigdata With Hadoop: Dr.A.Bazila Banu Associate Professor Department of Cse
23 pages
Hadoop Lab
100% (1)
Hadoop Lab
32 pages
Big Data - Unit 2 Hadoop Framework
100% (1)
Big Data - Unit 2 Hadoop Framework
19 pages
02 Unit-II Hadoop Architecture and HDFS
No ratings yet
02 Unit-II Hadoop Architecture and HDFS
18 pages
HADOOP
No ratings yet
HADOOP
55 pages
Big Data - Hadoop
No ratings yet
Big Data - Hadoop
20 pages
Big Data?: Hadoop?
No ratings yet
Big Data?: Hadoop?
2 pages
INtroduction To Big DAta and HAdoop
No ratings yet
INtroduction To Big DAta and HAdoop
30 pages
Introduction To Big Data and Hadoop
100% (1)
Introduction To Big Data and Hadoop
29 pages
Report Title: Wasit University
No ratings yet
Report Title: Wasit University
8 pages
Big Data Module 2
No ratings yet
Big Data Module 2
23 pages
INTRODUCTION TO DATA SCIENCE
No ratings yet
INTRODUCTION TO DATA SCIENCE
14 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Mastering Big Data and Hadoop: From Basics to Expert Proficiency
From Everand
Mastering Big Data and Hadoop: From Basics to Expert Proficiency
William Smith
No ratings yet
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
From Everand
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
Robert Johnson
No ratings yet
8085 Assemblylanguage
100% (1)
8085 Assemblylanguage
52 pages
Cloud Computing Unit-5
No ratings yet
Cloud Computing Unit-5
74 pages
C8817 Manual EN 101129
No ratings yet
C8817 Manual EN 101129
9 pages
Idealtech Pricelist
No ratings yet
Idealtech Pricelist
4 pages
Opti Plex 309 Factors FF
No ratings yet
Opti Plex 309 Factors FF
6 pages
Log
No ratings yet
Log
9 pages
VLSI: Design Flow
No ratings yet
VLSI: Design Flow
37 pages
2.5.5 Packet Tracer - Configure Initial Switch Settings
No ratings yet
2.5.5 Packet Tracer - Configure Initial Switch Settings
5 pages
Chapter 2-Controlling A Computer: Ashlynbagge
No ratings yet
Chapter 2-Controlling A Computer: Ashlynbagge
13 pages
L1 - Introduction To ADTs
No ratings yet
L1 - Introduction To ADTs
22 pages
Erro
No ratings yet
Erro
922 pages
Amilo Li 2732 2735 Data Sheet
No ratings yet
Amilo Li 2732 2735 Data Sheet
3 pages
Real Time Workday Integration Online Training Course Content
No ratings yet
Real Time Workday Integration Online Training Course Content
3 pages
Dual RS485 Option Module Install Guide RS232 Option Module (12980) Install Sheet
No ratings yet
Dual RS485 Option Module Install Guide RS232 Option Module (12980) Install Sheet
8 pages
DSLogic U3Pro32 Datasheet
No ratings yet
DSLogic U3Pro32 Datasheet
3 pages
Glade Tutorial
No ratings yet
Glade Tutorial
5 pages
LVP605 Series User's Manual
No ratings yet
LVP605 Series User's Manual
44 pages
Back Office Executive - Mumbai
No ratings yet
Back Office Executive - Mumbai
8 pages
Big Data Capacity Planning
No ratings yet
Big Data Capacity Planning
7 pages
256K (32K X 8) Paged Parallel Eeprom AT28C256: Features
No ratings yet
256K (32K X 8) Paged Parallel Eeprom AT28C256: Features
18 pages
Top 40 Python Interview Questions & Answers: MUST READ Click All AI Related Cheatsheets and Tutorials in One Place
100% (1)
Top 40 Python Interview Questions & Answers: MUST READ Click All AI Related Cheatsheets and Tutorials in One Place
7 pages
Exam Ref AZ-304 Microsoft Azure Architect Design
No ratings yet
Exam Ref AZ-304 Microsoft Azure Architect Design
51 pages
Talend Data Integration Basics
No ratings yet
Talend Data Integration Basics
3 pages
HUANANZHIH97 ZD3motherboardUserManual
No ratings yet
HUANANZHIH97 ZD3motherboardUserManual
17 pages
TP Debug Info
No ratings yet
TP Debug Info
21 pages
A6V10316241 NK8237 Installation
No ratings yet
A6V10316241 NK8237 Installation
96 pages
Putty Log
No ratings yet
Putty Log
27 pages
Unit 3 MapReduce
No ratings yet
Unit 3 MapReduce
15 pages

Hadoop

Uploaded by

Hadoop

Uploaded by

Advance Hadoop

By: Siddhant Singhal

• Hadoop Distributed File System (HDFS)

• Hadoop Use Cases

3. What is MapReduce in Hadoop? Big Data Architecture (guru99.com)

You might also like