0% found this document useful (0 votes)
12 views21 pages

Lecture 14

The document outlines the course objectives and outcomes for the Big Data Engineering course at the Apex Institute of Technology for the academic session 2025-26. It covers key topics such as the Hadoop ecosystem, Apache Ambari, YARN architecture, and stream computing, along with suggested readings and e-resources. The course aims to equip students with foundational knowledge and practical skills in Big Data and Data Analytics.

Uploaded by

muskaan.targotra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views21 pages

Lecture 14

The document outlines the course objectives and outcomes for the Big Data Engineering course at the Apex Institute of Technology for the academic session 2025-26. It covers key topics such as the Hadoop ecosystem, Apache Ambari, YARN architecture, and stream computing, along with suggested readings and e-resources. The course aims to equip students with foundational knowledge and practical skills in Big Data and Data Analytics.

Uploaded by

muskaan.targotra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 21

Academic Session 2025-26

ODD Semester Jul-Dec 2025

UNIVERSITY INSTITUTE OF ENGINEERING


APEX INSTITUTE OF TECHNOLOGY
B.E CSE/ Big Data
(5th Semester)
Big Data Engineering
(22CSH-342)
Unit No. 1 Chapter No. 3 Lecture No. 13 n 14
Topic : Dr. Monica Luthra(E9836) Associate Professor
Big Data Engineering : COURSE OBJECTIVES

The Course aims to:


1. Make students understand the Big Data and Data Analytics, Horton works Data
Platform (HDP) and Apache Ambari
2. Make students understand and Storing and Querying data Zoo Keeper, Slider, and
Knox and Loading data with Sqooq
3. Enable students to develop and implement stream computing
4. Introduce students to the fundamentals of Apache Flink and its role in real- time
stream processing

2
COURSE OUTCOMES
On completion of this course, the students shall be able to:-

Understand the fundamentals of Big Data and Data Analytics, including the components of
CO1
Hortonworks Data Platform (HDP) and the significance of Apache Hadoop

Explain the process of loading, visualizing, and pre-processing data models in the context
CO2
of Big Data and Analytics

Apply simple learning strategies using stream computing to identify and implement
CO3
effective data processing techniques

CO4 Evaluate the security and optimization measures for Big SQL environments

CO5 Create the functionalities of Watson Studio, detailing the process of creating and managing

projects, adding collaborators, and efficiently handling data within the platform
3
Unit-1 Syllabus
Unit-1 Introduction to Big Data Analytics
Chapter-1 Gain Comprehensive knowledge of the Open-source Hadoop ecosystem , evaluate the major
distributions and acquire hands on experience with key components for building big data solutions.
Explore Apache Ambari's functions, manage Hadoop clusters, understand HDFS, and execute Map
Reduce and YARN jobs for efficient cluster operation.
Chapter-2
Learn Apache Spark principles, RDD usage, various data file formats, NoSQL data stores, Pig, Hive,
ZooKeeper , Apache Slider, and data loading techniques using
Chapter-3 Sqoop and Flume in the Hadoop environment

4
SUGGESTIVE READINGS

TEXT BOOKS:
T1 Data Science handbook, O’REILLY (2016).
T2: Hadoop: The Definitive Guide, Tom White.
T3: Big Data: A Revolution That Will Transform How We Live, Work, and Think, Viktor Mayer- Schönberger
and Kenneth Cukier.
T4: Data Engineering Teams, Mike Barlow

REFERENCE BOOKS:

R1 Big Data: Principles and Best Practices of Scalable Realtime Data Systems, Nathan Marzand James Warren
R2: Big Data: A Very Short Introduction, Dawn E. Holmes .
R3: Mastering Apache Spark 2.x: Scalable Machine Learning and Big Data Analytics,Romeo Kienzler
R4: Big Data SMACK: A Guide to Apache Spark, Mesos, Akka, Cassandra, and Kafka,Raul Estrada and Isaac Ruiz

5
Learning Outcome of this
lecture
•T
Unit Name Outcome

I Introduction to Big Data  Big data Introduction


Analytics
 Hadoop ecosystem
 Apache Ambari and Cluster Management
• Understand the fundamentals of Big Data and Data Analytics, including the
components of Hortonworks Data Platform (HDP) and the significance of
Apache Hadoop. (CO1)
• Explain the process of loading, visualizing, and pre-processing data models
in the context of Big Data and Analytics. (CO2)
APEX INSTITUTE OF TECHNOLOGY COMPUTER SCIENCE AND
6
ENGINEERING
Introduction to YARN
• • YARN = Yet Another Resource Negotiator
• • Introduced in Hadoop 2.x
• • Decouples resource management and job scheduling
Why YARN?
• • Overcomes MapReduce limitations
• • Supports multiple processing engines
• • Improves resource utilization and flexibility
YARN Architecture Overview
• • ResourceManager (RM)
• • NodeManager (NM)
• • ApplicationMaster (AM)
• • Container
YARN Components
• 1. ResourceManager – Global resource scheduler
• 2. NodeManager – Manages containers on each node
• 3. ApplicationMaster – Manages execution of an application
• 4. Container – Executes a task
YARN Workflow
• 1. User submits job
• 2. RM assigns container for AM
• 3. AM requests resources
• 4. NM launches containers to run tasks
Benefits of YARN
• • Better scalability and resource utilization
• • Supports multiple frameworks
• • Fault tolerance and isolation
YARN Use Cases
• • Big Data (Spark, Hive, Pig)
• • Machine Learning
• • Real-time streaming (Storm, Flink)
Conclusion
• • Flexible, scalable resource management
• • Backbone of modern Hadoop ecosystem
Summary of the lecture

• Learn about the basics of Big data


• Learn about the nature of architecture
• Learn about the Yarn

APEX INSTITUTE OF TECHNOLOGY COMPUTER SCIENCE AND 15


ENGINEERING
Questions of this lecture

• Define Hadoop ecosystem


• Define role of yarn
• Define role of analytics

APEX INSTITUTE OF TECHNOLOGY COMPUTER SCIENCE AND 16


ENGINEERING
Big Data Computing

• https://archive.nptel.ac.in/courses/106/104/106104189/

APEX INSTITUTE OF TECHNOLOGY CSE INFORMATION


17
SECURITY
E-Resources
1. https://www.geeksforgeeks.org/hadoop-an-introduction/
2. https://cloud.google.com/learn/what-is-hadoop
3. https://aws.amazon.com/what-is/hadoop/
4. https://www.tableau.com/learn/articles/big-data-hadoop-explained

APEX INSTITUTE OF TECHNOLOGY COMPUTER SCIENCE AND


18
ENGINEERING
REFERENCES
1. Kumar, Abhay & Jothimani, Dhanya. (2017). Big Data:
Challenges, Opportunities and Realities.
10.48550/arXiv.1705.04928.
2. Beakta, Rahul. (2015). Big Data And Hadoop: A Review
Paper. international journal of computer science &
information te.

APEX INSTITUTE OF TECHNOLOGY COMPUTER SCIENCE AND


19
ENGINEERING
Class Session Review
12

Thank You
For queries
Email: [email protected]

You might also like