0% found this document useful (0 votes)

12 views21 pages

Lecture 14

The document outlines the course objectives and outcomes for the Big Data Engineering course at the Apex Institute of Technology for the academic session 2025-26. It covers key topics such as the Hadoop ecosystem, Apache Ambari, YARN architecture, and stream computing, along with suggested readings and e-resources. The course aims to equip students with foundational knowledge and practical skills in Big Data and Data Analytics.

Uploaded by

muskaan.targotra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views21 pages

Lecture 14

Uploaded by

muskaan.targotra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 21

Academic Session 2025-26

ODD Semester Jul-Dec 2025

UNIVERSITY INSTITUTE OF ENGINEERING

APEX INSTITUTE OF TECHNOLOGY
B.E CSE/ Big Data
(5th Semester)
Big Data Engineering
(22CSH-342)
Unit No. 1 Chapter No. 3 Lecture No. 13 n 14
Topic : Dr. Monica Luthra(E9836) Associate Professor
Big Data Engineering : COURSE OBJECTIVES

The Course aims to:

1. Make students understand the Big Data and Data Analytics, Horton works Data
Platform (HDP) and Apache Ambari
2. Make students understand and Storing and Querying data Zoo Keeper, Slider, and
Knox and Loading data with Sqooq
3. Enable students to develop and implement stream computing
4. Introduce students to the fundamentals of Apache Flink and its role in realtime
stream processing

2
COURSE OUTCOMES
On completion of this course, the students shall be able to:-

Understand the fundamentals of Big Data and Data Analytics, including the components of
CO1
Hortonworks Data Platform (HDP) and the significance of Apache Hadoop

Explain the process of loading, visualizing, and pre-processing data models in the context
CO2
of Big Data and Analytics

Apply simple learning strategies using stream computing to identify and implement
CO3
effective data processing techniques

CO4 Evaluate the security and optimization measures for Big SQL environments

CO5 Create the functionalities of Watson Studio, detailing the process of creating and managing

projects, adding collaborators, and efficiently handling data within the platform
3
Unit-1 Syllabus
Unit-1 Introduction to Big Data Analytics
Chapter-1 Gain Comprehensive knowledge of the Open-source Hadoop ecosystem , evaluate the major
distributions and acquire hands on experience with key components for building big data solutions.
Explore Apache Ambari's functions, manage Hadoop clusters, understand HDFS, and execute Map
Reduce and YARN jobs for efficient cluster operation.
Chapter-2
Learn Apache Spark principles, RDD usage, various data file formats, NoSQL data stores, Pig, Hive,
ZooKeeper , Apache Slider, and data loading techniques using
Chapter-3 Sqoop and Flume in the Hadoop environment

4
SUGGESTIVE READINGS

TEXT BOOKS:
T1 Data Science handbook, O’REILLY (2016).
T2: Hadoop: The Definitive Guide, Tom White.
T3: Big Data: A Revolution That Will Transform How We Live, Work, and Think, Viktor Mayer- Schönberger
and Kenneth Cukier.
T4: Data Engineering Teams, Mike Barlow

REFERENCE BOOKS:

R1 Big Data: Principles and Best Practices of Scalable Realtime Data Systems, Nathan Marzand James Warren
R2: Big Data: A Very Short Introduction, Dawn E. Holmes .
R3: Mastering Apache Spark 2.x: Scalable Machine Learning and Big Data Analytics,Romeo Kienzler
R4: Big Data SMACK: A Guide to Apache Spark, Mesos, Akka, Cassandra, and Kafka,Raul Estrada and Isaac Ruiz

5
Learning Outcome of this
lecture
•T
Unit Name Outcome

I Introduction to Big Data  Big data Introduction

Analytics
 Hadoop ecosystem
 Apache Ambari and Cluster Management
• Understand the fundamentals of Big Data and Data Analytics, including the
components of Hortonworks Data Platform (HDP) and the significance of
Apache Hadoop. (CO1)
• Explain the process of loading, visualizing, and pre-processing data models
in the context of Big Data and Analytics. (CO2)
APEX INSTITUTE OF TECHNOLOGY COMPUTER SCIENCE AND
6
ENGINEERING
Introduction to YARN
• • YARN = Yet Another Resource Negotiator
• • Introduced in Hadoop 2.x
• • Decouples resource management and job scheduling
Why YARN?
• • Overcomes MapReduce limitations
• • Supports multiple processing engines
• • Improves resource utilization and flexibility
YARN Architecture Overview
• • ResourceManager (RM)
• • NodeManager (NM)
• • ApplicationMaster (AM)
• • Container
YARN Components
• 1. ResourceManager – Global resource scheduler
• 2. NodeManager – Manages containers on each node
• 3. ApplicationMaster – Manages execution of an application
• 4. Container – Executes a task
YARN Workflow
• 1. User submits job
• 2. RM assigns container for AM
• 3. AM requests resources
• 4. NM launches containers to run tasks
Benefits of YARN
• • Better scalability and resource utilization
• • Supports multiple frameworks
• • Fault tolerance and isolation
YARN Use Cases
• • Big Data (Spark, Hive, Pig)
• • Machine Learning
• • Real-time streaming (Storm, Flink)
Conclusion
• • Flexible, scalable resource management
• • Backbone of modern Hadoop ecosystem
Summary of the lecture

• Learn about the basics of Big data

• Learn about the nature of architecture
• Learn about the Yarn

APEX INSTITUTE OF TECHNOLOGY COMPUTER SCIENCE AND 15

ENGINEERING
Questions of this lecture

• Define Hadoop ecosystem

• Define role of yarn
• Define role of analytics

APEX INSTITUTE OF TECHNOLOGY COMPUTER SCIENCE AND 16

ENGINEERING
Big Data Computing

• https://archive.nptel.ac.in/courses/106/104/106104189/

APEX INSTITUTE OF TECHNOLOGY CSE INFORMATION

17
SECURITY
E-Resources
1. https://www.geeksforgeeks.org/hadoop-an-introduction/
2. https://cloud.google.com/learn/what-is-hadoop
3. https://aws.amazon.com/what-is/hadoop/
4. https://www.tableau.com/learn/articles/big-data-hadoop-explained

APEX INSTITUTE OF TECHNOLOGY COMPUTER SCIENCE AND

18
ENGINEERING
REFERENCES
1. Kumar, Abhay & Jothimani, Dhanya. (2017). Big Data:
Challenges, Opportunities and Realities.
10.48550/arXiv.1705.04928.
2. Beakta, Rahul. (2015). Big Data And Hadoop: A Review
Paper. international journal of computer science &
information te.

APEX INSTITUTE OF TECHNOLOGY COMPUTER SCIENCE AND

19
ENGINEERING
Class Session Review
12

Thank You
For queries
Email: [email protected]

Lecture 6
No ratings yet
Lecture 6
16 pages
B.Tech. CS - CE and CSE Syllabus 3rd Year 2024-25
No ratings yet
B.Tech. CS - CE and CSE Syllabus 3rd Year 2024-25
2 pages
Koe097big Data
No ratings yet
Koe097big Data
1 page
Big Data Analytics
No ratings yet
Big Data Analytics
20 pages
L8 Big Data Management en
No ratings yet
L8 Big Data Management en
58 pages
BDA Unit - II
No ratings yet
BDA Unit - II
66 pages
Big Data - Road Map
No ratings yet
Big Data - Road Map
22 pages
BD by Maaz
No ratings yet
BD by Maaz
19 pages
Topic 1 Big Data Technologies
No ratings yet
Topic 1 Big Data Technologies
5 pages
Unit 1
No ratings yet
Unit 1
19 pages
Big Data Analytics for B.Tech Students
No ratings yet
Big Data Analytics for B.Tech Students
119 pages
Big Data Management Syllabus
100% (1)
Big Data Management Syllabus
5 pages
Hadoop Ecosystem Overview
No ratings yet
Hadoop Ecosystem Overview
229 pages
Big Data Analytics
No ratings yet
Big Data Analytics
61 pages
Bda U1
No ratings yet
Bda U1
80 pages
BDA - Unit-1
No ratings yet
BDA - Unit-1
24 pages
Big Data Analytics Overview
No ratings yet
Big Data Analytics Overview
17 pages
Lecture 11 12
No ratings yet
Lecture 11 12
25 pages
Big Data Course Overview
No ratings yet
Big Data Course Overview
97 pages
LP BigData
No ratings yet
LP BigData
5 pages
Bda U2
No ratings yet
Bda U2
68 pages
Introduction To Big Data With Spark and Hadoop
No ratings yet
Introduction To Big Data With Spark and Hadoop
61 pages
Big Data Analytics for B.Tech Students
No ratings yet
Big Data Analytics for B.Tech Students
134 pages
Module - 1
No ratings yet
Module - 1
84 pages
MCAD2232 (PRESS) BIG DATA and Its Applications
No ratings yet
MCAD2232 (PRESS) BIG DATA and Its Applications
140 pages
20IT503 - Big Data Analytics - Unit4
No ratings yet
20IT503 - Big Data Analytics - Unit4
73 pages
CC ZG522 Course Handout
No ratings yet
CC ZG522 Course Handout
6 pages
Big Data Analytics: Prof. B. Santhi Sastra Shanthi@cse - Sastra.ac - in
No ratings yet
Big Data Analytics: Prof. B. Santhi Sastra Shanthi@cse - Sastra.ac - in
9 pages
Road Map 1741960074
No ratings yet
Road Map 1741960074
24 pages
Big Data Analytics-Digital Notes
No ratings yet
Big Data Analytics-Digital Notes
86 pages
BD Imp Ques 1
No ratings yet
BD Imp Ques 1
22 pages
No SQL Database in Bda
No ratings yet
No SQL Database in Bda
84 pages
Big Data and Hadoop For Developers - Syllabus
No ratings yet
Big Data and Hadoop For Developers - Syllabus
6 pages
Bdhs - Ebook
No ratings yet
Bdhs - Ebook
970 pages
Big Data Analytics 0th Lecture
No ratings yet
Big Data Analytics 0th Lecture
19 pages
BigData Session1
No ratings yet
BigData Session1
14 pages
Big Data Complete Notes
100% (2)
Big Data Complete Notes
33 pages
IIT Kharagpur Data Science PDF
No ratings yet
IIT Kharagpur Data Science PDF
22 pages
Data Science
No ratings yet
Data Science
87 pages
Data Bots Training Courses
100% (1)
Data Bots Training Courses
36 pages
Lecture 15
No ratings yet
Lecture 15
27 pages
Introduction of Subject
No ratings yet
Introduction of Subject
28 pages
Big Data and Analytics Syllabus 2021
No ratings yet
Big Data and Analytics Syllabus 2021
3 pages
IOT and Comp - Architecture
No ratings yet
IOT and Comp - Architecture
17 pages
Big Data Analytics - Sem 7 CVMU
No ratings yet
Big Data Analytics - Sem 7 CVMU
4 pages
Bca Bigdata Fifth - Sem Approved Syllabus
No ratings yet
Bca Bigdata Fifth - Sem Approved Syllabus
23 pages
Big Data Challenges and Solutions
No ratings yet
Big Data Challenges and Solutions
36 pages
Bba13 Notes BDF Unit 1
No ratings yet
Bba13 Notes BDF Unit 1
3 pages
22IS61 Big Data Analytics 2025
No ratings yet
22IS61 Big Data Analytics 2025
4 pages
BDA Syllabus - Sem VII - Mumbai University
No ratings yet
BDA Syllabus - Sem VII - Mumbai University
3 pages
Big Data Notes Pdf3
No ratings yet
Big Data Notes Pdf3
114 pages
Prepared by Richa Btech (Cse) 6 Sem Dav University Jalandhar
No ratings yet
Prepared by Richa Btech (Cse) 6 Sem Dav University Jalandhar
30 pages
Big Data Hadoop Complete Final Spaced
No ratings yet
Big Data Hadoop Complete Final Spaced
15 pages
Big Data
No ratings yet
Big Data
8 pages
Unit 1 J2 Big Data
No ratings yet
Unit 1 J2 Big Data
6 pages
SS2 Data Processing Practical Examination
100% (2)
SS2 Data Processing Practical Examination
2 pages
Computer Aided Machine Drawing Laboratory Manual
No ratings yet
Computer Aided Machine Drawing Laboratory Manual
126 pages
Zomato Data Analysis
100% (2)
Zomato Data Analysis
35 pages
Nexgine Utility Lip Manual
No ratings yet
Nexgine Utility Lip Manual
26 pages
Accounts Payable Construction
No ratings yet
Accounts Payable Construction
19 pages
Cs3311-Set 4
No ratings yet
Cs3311-Set 4
4 pages
Peoplelink 4K AF Soundbar Plus 2022 1
No ratings yet
Peoplelink 4K AF Soundbar Plus 2022 1
4 pages
Practical Answer
No ratings yet
Practical Answer
17 pages
Practice Worksheet - Class 5 (CH 2, CH 3)
No ratings yet
Practice Worksheet - Class 5 (CH 2, CH 3)
2 pages
Practical Use of ISO 25000
No ratings yet
Practical Use of ISO 25000
2 pages
IC Project Portfolio Summary 11126
No ratings yet
IC Project Portfolio Summary 11126
3 pages
ZTE H298A User Manual
No ratings yet
ZTE H298A User Manual
32 pages
SOLID - by Rom OD
No ratings yet
SOLID - by Rom OD
28 pages
Amaan
No ratings yet
Amaan
2 pages
FG Ebook Data Center FINAL
No ratings yet
FG Ebook Data Center FINAL
11 pages
Google Maps: Borders and Influence
No ratings yet
Google Maps: Borders and Influence
3 pages
Knowledge On RAC
No ratings yet
Knowledge On RAC
3 pages
AVR 16-bit Timer/Counter Guide
No ratings yet
AVR 16-bit Timer/Counter Guide
78 pages
Cover Leter CV Email
No ratings yet
Cover Leter CV Email
6 pages
Create Award Notice on PhilGEPS Portal
No ratings yet
Create Award Notice on PhilGEPS Portal
4 pages
Programming in C - Pointers
No ratings yet
Programming in C - Pointers
10 pages
ECO266 Info 5
No ratings yet
ECO266 Info 5
6 pages
EN - 5040-0-8004 - Technical Description PROGNOST SH - V8.0 - 2021-05-11
No ratings yet
EN - 5040-0-8004 - Technical Description PROGNOST SH - V8.0 - 2021-05-11
86 pages
Graph Theory for Network Analysis
No ratings yet
Graph Theory for Network Analysis
148 pages
Fundamentals of Computerized Tomography Image Reconstruction From Projections 2nd Edition by Gabor T Herman ISBN 1846287235 9781846287237 Download
100% (3)
Fundamentals of Computerized Tomography Image Reconstruction From Projections 2nd Edition by Gabor T Herman ISBN 1846287235 9781846287237 Download
30 pages
Basic HTML Structure
No ratings yet
Basic HTML Structure
2 pages
Writing Assignment 5
No ratings yet
Writing Assignment 5
1 page
MT 2009 Answers
No ratings yet
MT 2009 Answers
8 pages
Computer Application in Management
No ratings yet
Computer Application in Management
2 pages
Real World Systems of Inequalities Worksheet
100% (1)
Real World Systems of Inequalities Worksheet
3 pages

Lecture 14

Uploaded by

Lecture 14

Uploaded by

Academic Session 2025-26

ODD Semester Jul-Dec 2025

UNIVERSITY INSTITUTE OF ENGINEERING

The Course aims to:

I Introduction to Big Data  Big data Introduction

• Learn about the basics of Big data

APEX INSTITUTE OF TECHNOLOGY COMPUTER SCIENCE AND 15

• Define Hadoop ecosystem

APEX INSTITUTE OF TECHNOLOGY COMPUTER SCIENCE AND 16

APEX INSTITUTE OF TECHNOLOGY CSE INFORMATION

APEX INSTITUTE OF TECHNOLOGY COMPUTER SCIENCE AND

APEX INSTITUTE OF TECHNOLOGY COMPUTER SCIENCE AND

You might also like