0% found this document useful (0 votes)
82 views2 pages

Hadoop Component Overview

11

Uploaded by

Ajitha Aji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views2 pages

Hadoop Component Overview

11

Uploaded by

Ajitha Aji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Explain the components of Hadoop in detail.

Hadoop is an open-source framework designed to store and process large volumes of


data across clusters of commodity hardware. It consists of several key components
that work together to enable distributed storage and processing of big data. Here are
the main components of Hadoop:

1. Hadoop Distributed File System (HDFS):


● HDFS is the primary storage system used by Hadoop for distributed
storage of large datasets.
● It follows a master-slave architecture, with a single NameNode serving as
the master node that manages the file system namespace and metadata,
and multiple DataNodes serving as slave nodes that store the actual data
blocks.
● Data is stored redundantly across multiple DataNodes to ensure fault
tolerance and high availability.
● HDFS supports high throughput and streaming access to data, making it
suitable for storing large files and batch processing workloads.
2. Yet Another Resource Negotiator (YARN):
● YARN is the resource management and job scheduling framework in
Hadoop.
● It decouples resource management from job execution, allowing multiple
processing frameworks to run on the same cluster.
● YARN consists of two main components: ResourceManager, which
manages cluster resources and schedules jobs, and NodeManagers,
which run on each node in the cluster and manage resource usage and
execution of tasks.
3. MapReduce:
● MapReduce is a programming model and processing engine used for
distributed processing of large datasets in Hadoop.
● It divides processing tasks into two phases: Map phase, where input data
is processed and transformed into intermediate key-value pairs, and
Reduce phase, where intermediate results are aggregated and combined
to produce final output.
● MapReduce provides fault tolerance, scalability, and parallelism by
automatically handling task scheduling, data partitioning, and task
re-execution in case of failures.
4. Hadoop Common:
● Hadoop Common contains libraries and utilities that are used by other
Hadoop components.
● It includes common utilities, such as file I/O, networking, and security, as
well as Java APIs and tools for interacting with Hadoop clusters.
5. Hadoop Ecosystem Components:
● In addition to the core components mentioned above, the Hadoop
ecosystem includes a wide range of additional components and tools that
extend the functionality of the platform.
● These components include data ingestion tools (e.g., Apache Flume,
Apache Kafka), data processing frameworks (e.g., Apache Spark, Apache
Flink), data storage systems (e.g., Apache HBase, Apache Hive), and data
management and governance tools (e.g., Apache Ranger, Apache Atlas).

By leveraging these components, Hadoop provides a scalable, reliable, and


cost-effective platform for storing, processing, and analyzing large volumes of data
across distributed clusters.

You might also like