0% found this document useful (0 votes)
7 views

Hadoop

The document discusses Hadoop, an open-source software framework used to store and process large datasets. It describes Hadoop's architecture including HDFS for storage and MapReduce for processing. The document also covers Hadoop use cases like data warehousing, log processing, and recommendation systems.

Uploaded by

kajole7693
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Hadoop

The document discusses Hadoop, an open-source software framework used to store and process large datasets. It describes Hadoop's architecture including HDFS for storage and MapReduce for processing. The document also covers Hadoop use cases like data warehousing, log processing, and recommendation systems.

Uploaded by

kajole7693
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Advance Hadoop

By: Siddhant Singhal


2021BCYX1046
Table of contents
• Introduction

• What is Hadoop?

• Hadoop Architecture

• Hadoop Distributed File System (HDFS)

• MapReduce

• Hadoop Use Cases

• Conclusion
Introduction
• Hadoop is an open-source software framework that is used to store and process large
datasets. It is designed to scale up from single servers to thousands of machines, each
offering local computation and storage.

• Hadoop is used by many organizations to store and process large amounts of data,
including Facebook, Yahoo!, and eBay. It is also used in many scientific applications,
including the Large Hadron Collider at CERN.

• Hadoop has become an essential tool for big data analytics and is widely used in the
industry.
What is Hadoop?
• Hadoop is an open-source, distributed computing platform that is used to store, process, and
analyze large datasets. It is designed to handle the challenges of big data, which include large
volumes, velocity, and variety of data.

• Hadoop is built on top of two main components: Hadoop Distributed File System (HDFS) and
MapReduce.

• HDFS is a distributed file system that is used to store large datasets across a cluster of
commodity hardware. MapReduce is a programming model that allows developers to write
code to process large datasets in parallel across the cluster.
Hadoop Architecture
HDFS
• Hadoop Distributed File System (HDFS) is a distributed file system designed to store and
manage large amounts of data in a distributed computing environment. It is part of the
Hadoop architecture and is used to store MapReduce jobs’ input and output data.

• HDFS is designed to run on commodity hardware and provides fault tolerance by replicating
data across multiple nodes in a cluster. Data is broken into blocks and distributed across the
cluster, with each block being replicated to multiple nodes to ensure high availability and data
durability.

• Components:
- Name Node
- Data Node
- Block
- Namespace
MapReduce
• MapReduce is a programming model for processing large data sets in a distributed computing
environment. It consists of two phases: the map phase and the reduce phase.

• In the map phase, data is processed in parallel across multiple nodes in the cluster. Each node
performs a specific operation on the data and generates intermediate results.

• In the reduce phase, the intermediate results are combined and processed to generate the final
output. The reduce phase is also performed in parallel across multiple nodes in the cluster.

• MapReduce is a key component of the Hadoop architecture and is widely used for processing
large-scale data sets.
Hadoop Use cases
1. Data Warehousing
2. Log Processing
3. Recommendation System
4. Social Media Analysis
Conclusion
• Hadoop is a powerful technology that enables organizations to process and analyze
large-scale data sets in a cost-effective and scalable way. With its distributed architecture
and fault-tolerance capabilities, Hadoop has become a key component of the big data
ecosystem and is widely used across various industries.

• Hadoop's ability to handle big data makes it an essential tool for organizations looking to
process and analyze large-scale data sets. With the right expertise and resources,
organizations can leverage Hadoop to gain valuable insights into their data, make better
decisions, and drive innovation.
References
1. Hadoop - Introduction - GeeksforGeeks
2. Hadoop Architecture in Big Data Explained: A Complete Guide with Its Components (simplil
earn.com)

3. What is MapReduce in Hadoop? Big Data Architecture (guru99.com)


Thank you

You might also like