0% found this document useful (0 votes)

88 views

Big Data & Hadoop - Course Curriculum

This 10 module course provides in-depth knowledge of concepts related to Big Data and Hadoop. The course covers topics such as Hadoop architecture, HDFS, MapReduce, Pig, Hive, HBase, Zookeeper, Oozie and how these components work together in a Hadoop implementation. Students will learn through hands-on demos and projects to solve Big Data problems using the Hadoop ecosystem.

Uploaded by

manish

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views

Big Data & Hadoop - Course Curriculum

Uploaded by

manish

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Course Curriculum: Your 10 Module Learning Plan

Big Data and Hadoop

About Edureka

Edureka is a leading e-learning platform providing live instructor-led interactive online

training. We cater to professionals and students across the globe in categories like Big Data
& Hadoop, Business Analytics, NoSQL Databases, Java & Mobile Technologies, System
Engineering, Project Management and Programming.

We have an easy and affordable learning solution that is accessible to millions of learners.
With our students spread across countries like the US, India, UK, Canada, Singapore,
Australia, Middle East, Brazil and many others, we have built a community of over 1 million
learners across the globe.

About The Course

Big Data and Hadoop training course is designed to provide knowledge and skills to become
a successful Hadoop Developer. In-depth knowledge of concepts such as Hadoop Distributed
File System, Hadoop Cluster- Single and Multi node, Hadoop 2.x, Flume, Sqoop, Map-Reduce,
PIG, Hive, HBase, Zookeeper, Oozie etc. will be covered in the course.

1
Module 1 Module 3
Understanding Big Data and Hadoop Hadoop MapReduce Framework - I
Learning Objectives - In this module, you will understand Big Learning Objectives - In this module, you will understand
Data, the limitations of the existing solutions for Big Data Hadoop MapReduce framework and the working of
problem, how Hadoop solves the Big Data problem, the MapReduce on data stored in HDFS. You will learn about
common Hadoop ecosystem components, Hadoop YARN concepts in MapReduce.
Architecture, HDFS, Anatomy of File Write and Read, Rack
Awareness. Topics
MapReduce Use Cases
Topics Traditional way Vs MapReduce way
Big Data Why MapReduce
Limitations and Solutions of existing Data Analytics Hadoop 2.x MapReduce Architecture
Architecture
Hadoop 2.x MapReduce Components
Hadoop
YARN MR Application Execution Flow
Hadoop Features
YARN Workflow
Hadoop Ecosystem Anatomy of MapReduce Program
Hadoop 2.x core components Demo on MapReduce.
Hadoop Storage: HDFS, Hadoop Processing:
MapReduce Framework
Anatomy of File Write and Read, Rack Awareness. Module 4
Hadoop MapReduce Framework - II
Learning Objectives - In this module, you will understand
Module 2 concepts like Input Splits in MapReduce, Combiner &
Hadoop Architecture and HDFS Partitioner and Demos on MapReduce using different data
Learning Objectives - In this module, you will learn the sets.
Hadoop Cluster Architecture, Important Configuration files in
a Hadoop Cluster, Data Loading Techniques. Topics
Input Splits
Topics Relation between Input Splits and HDFS Blocks
Hadoop 2.x Cluster Architecture - Federation and MapReduce Job Submission Flow
High Availability
Demo of Input Splits
A Typical Production Hadoop Cluster
MapReduce: Combiner & Partitioner
Hadoop Cluster Modes
Demo on de-identifying Health Care Data set
Common Hadoop Shell Commands
Demo on Weather Dataset
Hadoop 2.x Configuration Files
Password-Less SSH
MapReduce Job Execution
Data Loading Techniques: Hadoop Copy
Commands
FLUME
SQOOP

2
Module 5 Module 7
Advance MapReduce Hive
Learning Objectives - In this module, you will learn Advance Learning Objectives - This module will help you in
MapReduce concepts such as Counters, Distributed Cache, understanding Hive concepts, Loading and Querying Data in
MRunit, Reduce Join, Custom Input Format, Sequence Input Hive and Hive UDF.
Format and how to deal with complex MapReduce
programs. Topics
Hive Background
Topics Hive Use Case
Counters About Hive
Distributed Cache Hive Vs Pig
MRunit Hive Architecture and Components
Reduce Join Metastore in Hive
Custom Input Format Limitations of Hive
Sequence Input Format Comparison with Traditional Database
Hive Data Types and Data Models
Partitions and Buckets
Module 6 Hive Tables (Managed Tables and External Tables)
Pig Importing Data
Learning Objectives - In this module, you will learn Pig, types
Querying Data
of use case we can use Pig, tight coupling between Pig and
MapReduce, and Pig Latin scripting. Managing Outputs
Hive Script
Topics Hive UDF
About Pig Hive Demo on Healthcare Data set
MapReduce Vs Pig
Pig Use Cases
Programming Structure in Pig Module 8
Pig Running Modes Advance Hive and HBase
Learning Objectives - In this module, you will understand
Pig components
Advance Hive concepts such as UDF, dynamic Partitioning.
Pig Execution
You will also acquire in-depth knowledge of HBase, Hbase
Pig Latin Program Architecture and its components.
Data Models in Pig
Pig Data Types Topics
Pig Latin : Relational Operators, File Loaders, Hive QL: Joining Tables, Dynamic Partitioning,
Group Operator, COGROUP Operator, Joins and Custom Map/Reduce Scripts
COGROUP, Union, Diagnostic Operators Hive : Thrift Server, User Defined Functions
Pig UDF HBase: Introduction to NoSQL Databases and
Pig Demo on Healthcare Data set HBase, HBase v/s RDBMS, HBase Components,
HBase Architecture, HBase Cluster Deployment.

3
Module 9
Advance HBase
Learning Objectives - This module will cover Advance HBase concepts. We will see demos on Bulk Loading, Filters. You will also
learn what Zookeeper is all about, how it helps in monitoring a cluster, why HBase uses Zookeeper.

Topics
HBase Data Model
HBase Shell
HBase Client API
Data Loading Techniques
ZooKeeper Data Model
Zookeeper Service
Zookeeper
Demos on Bulk Loading
Getting and Inserting Data
Filters in HBase

Module 10
Oozie and Hadoop Project
Learning Objectives - In this module, you will understand working of multiple Hadoop ecosystem components together in a
Hadoop implementation to solve Big Data problems. We will discuss multiple data sets and specifications of the project. This
module will also cover Flume & Sqoop demo and Apache Oozie Workflow Scheduler for Hadoop Jobs.

Topics
Flume and Sqoop Demo
Oozie
Oozie Components
Oozie Workflow
Scheduling with Oozie
Demo on Oozie Workflow
Oozie Co-ordinator
Oozie Commands
Oozie Web Console
Hadoop Project Demo

4
Project Work

Towards the end of the Course, you will be working on a live project where you will be using PIG, HIVE, HBase and MapReduce to
perform Big Data analytics.
Here are the few Industry-wise Big Data case studies e.g. Finance, Retail, Media, Aviation etc. which you can take up as your
project work:

Project #1: Analyze social bookmarking sites to find insights

Industry: Social Media
Data: It comprises of the information gathered from sites like reddit.com, stumbleupon.com etc which are bookmarking sites and
allow you to bookmark, review, rate, search various links on any topic.reddit.com, stumbleupon.com, etc. A bookmarking site
allows you to bookmark, review, rate, search various links on any topic. The data is in XML format and contains various links/posts
URL, categories defining it and the ratings linked with it.
Problem Statement: Analyze the data in Hadoop Eco-system to:
1. Fetch the data into Hadoop Distributed File System and analyze it with the help of MapReduce, Pig and Hive to find the top
rated links based on the user comments, likes etc.
2. Using MapReduce convert the semi-structured format (XML data) into structured format and categorize the user rating as
positive and negative for each of the thousand links.
3. Push the output HDFS and then feed it into PIG, which splits the data into two parts: Category data and Ratings data.
4. Write a fancy Hive Query to analyze the data further and push the output is into relational database (RDBMS) using Sqoop.
5. Use a web server running on grails/java/ruby/python that renders the result in real time processing on a website.

Project #2: Customer Complaints Analysis

Industry: Retail
Data: Publicly available dataset, containing a few lakh observations with attributes like: CustomerId, Payment Mode, Product
Details, Complaint, Location, Status of the complaint, etc.
Problem Statement: Analyze the data in Hadoop Eco-system to:
1. Get the number of complaints filed under each products
2. Get the total number of complaints filed from a particular location
3. Get the list of complaints grouped by location which has no timely response

Project #3: Tourism Data Analysis

Industry: Tourism
Data: The dataset comprises attributes like: City pair (Combination of from and to), Adults traveling, Seniors traveling, Children
traveling, Air booking price, Car booking price, etc.
Problem Statement: Find the following insights from the data:
1. Top 20 destinations people travel most : Based on given data we can find the most popular destinations where people travel
frequently, based on the specific initial number of trips booked for a particular destination
2. Top 20 locations from where most of the trips start based on booked trip count
3. Top 20 high air-revenue destinations i.e which 20 cities generates high airline revenues for travel, so that the discount offers
can be given to attract more bookings for these destinations

5
Project #4: Airline Data Analysis
Industry: Aviation
Data: Publicly available dataset which contains the flight details of various airlines like : Airport id, Name of the airport, Main city
served by airport, Country or territory where airport is located, Code of Airport, Decimal degrees, Hours offset from UTC,
Timezone, etc.
Problem Statement: Analyze the airlines data to:
1. Find list of Airports operating in the Country
2. Find the list of Airlines having zero stops
3. List of Airlines operating with code share
4. Which country (or) territory has the highest number of Airports
5. Find the list of Active Airlines in the United States

Project #5: Analyze Loan Dataset

Industry: Banking and Finance
Data: Publicly available dataset which contains complete details of all the loans issued, including the current loan status (Current,
Late, Fully Paid, etc.) and latest payment information.
Problem Statement: Find the number of cases per location and categorize the count with respect to reason for taking loan and
display the average risk score

Project #6: Analyze Movie Ratings

Industry: Media
Data: Publicly available data from sites like rotten tomatoes, imdb, etc.
Problem Statement: Analyze the movie ratings by different users to:
1. Get the user who has rated the most number of movies
2. Get the user who has rated the least number of movies
3. Get the count of total number of movies rated by user belonging to a specific occupation
4. Get the number of under age users

Project #7: Analyze YouTube data

Industry: Social Media
Data: It is about the YouTube videos and contains attributes like : VideoID, Uploader, Age, Category, Length, views, ratings,
comments, etc.
Problem Statement: Find out the top 5 categories in which the most number of videos are uploaded, the top 10 rated videos, the
top 10 most viewed videos
Apart from these there are some twenty more use-cases to choose from :
Market data Analysis
Twitter Data Analysis
Olympics Data Analysis etc

Big Data and Hadoop

20IT503 - Big Data Analytics - Unit4
No ratings yet
20IT503 - Big Data Analytics - Unit4
73 pages
Big Data & Hadoop Training Material 0 1 PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
Shovel Truck Haulage
100% (1)
Shovel Truck Haulage
20 pages
Big Data Hadoop - Course Curriculum - V1
No ratings yet
Big Data Hadoop - Course Curriculum - V1
7 pages
Hadoop Course Circulum
No ratings yet
Hadoop Course Circulum
2 pages
Training For Bigdata and Hadoop: #I Background and Introduction
No ratings yet
Training For Bigdata and Hadoop: #I Background and Introduction
9 pages
Certified Hadoop and Spark Course Curriculum
No ratings yet
Certified Hadoop and Spark Course Curriculum
9 pages
Course Contents of Hadoop and Big Data
No ratings yet
Course Contents of Hadoop and Big Data
11 pages
Big Data
No ratings yet
Big Data
2 pages
Had Oop Details
No ratings yet
Had Oop Details
21 pages
DE_Python
No ratings yet
DE_Python
11 pages
Big Data Analytics
No ratings yet
Big Data Analytics
2 pages
Big Data and Hadoop For Developers - Syllabus
No ratings yet
Big Data and Hadoop For Developers - Syllabus
6 pages
Big Data Analytics- sem 7 CVMU
No ratings yet
Big Data Analytics- sem 7 CVMU
4 pages
B.Tech. CS_CE and CSE Syllabus 3rd Year 2024-25
No ratings yet
B.Tech. CS_CE and CSE Syllabus 3rd Year 2024-25
2 pages
BIG DATA ANALYTICS (1)
No ratings yet
BIG DATA ANALYTICS (1)
20 pages
Big Data and Hadoop Developer
No ratings yet
Big Data and Hadoop Developer
7 pages
Hadoop Development Training in Bangalore
No ratings yet
Hadoop Development Training in Bangalore
5 pages
Hadoop (Big Data) : Skills Gained
No ratings yet
Hadoop (Big Data) : Skills Gained
8 pages
Hadoop Course Content
No ratings yet
Hadoop Course Content
3 pages
Updated Hadoop Course Content..
No ratings yet
Updated Hadoop Course Content..
7 pages
Rstrainings Hadoop Course Content: Curriculum For Hadoop 2.X
No ratings yet
Rstrainings Hadoop Course Content: Curriculum For Hadoop 2.X
7 pages
Big Data Hadoop Training Certification 7
No ratings yet
Big Data Hadoop Training Certification 7
40 pages
big data analytics syallabus
No ratings yet
big data analytics syallabus
3 pages
4-2 Bda PPTS
No ratings yet
4-2 Bda PPTS
114 pages
Hadoop Development Download Syllabus PDF
No ratings yet
Hadoop Development Download Syllabus PDF
5 pages
Bigdata Hadoop Spark - Python
No ratings yet
Bigdata Hadoop Spark - Python
8 pages
CloudxLab BDHS Course Details
No ratings yet
CloudxLab BDHS Course Details
9 pages
Big Data
No ratings yet
Big Data
4 pages
Syllabus Big Data Analytics
No ratings yet
Syllabus Big Data Analytics
2 pages
Hadoop Online Training
No ratings yet
Hadoop Online Training
7 pages
Ccs334 Big Data Analytics
0% (1)
Ccs334 Big Data Analytics
2 pages
Big Data and Analytics Syllabus 2021
No ratings yet
Big Data and Analytics Syllabus 2021
3 pages
19ECS442: BIG DATA Question Bank
No ratings yet
19ECS442: BIG DATA Question Bank
4 pages
Big Data Analytics Digital Notes
No ratings yet
Big Data Analytics Digital Notes
119 pages
Ccs334 - Big Data Analytics
75% (4)
Ccs334 - Big Data Analytics
2 pages
Introduction Big Data With Hadoop
No ratings yet
Introduction Big Data With Hadoop
3 pages
Details
No ratings yet
Details
9 pages
LP BigData
No ratings yet
LP BigData
5 pages
DATA ANALYTICS Lab
No ratings yet
DATA ANALYTICS Lab
3 pages
IV Yr II Sem Lesson Plans
No ratings yet
IV Yr II Sem Lesson Plans
19 pages
Big Data Analytics Syllabus
No ratings yet
Big Data Analytics Syllabus
2 pages
Big Data Testing
100% (1)
Big Data Testing
34 pages
Big Data Technology E1UJ502B
No ratings yet
Big Data Technology E1UJ502B
11 pages
Big Data Syllabus For Theory and Lab
No ratings yet
Big Data Syllabus For Theory and Lab
4 pages
BDA-UNIT-1
No ratings yet
BDA-UNIT-1
32 pages
20ai402 Data Analytics Unit-2
No ratings yet
20ai402 Data Analytics Unit-2
72 pages
Hadoop Course Content
No ratings yet
Hadoop Course Content
2 pages
4.Syllabus_Copy
No ratings yet
4.Syllabus_Copy
2 pages
Day 1 Big Data Concepts For Executives and Senior Management Objective
No ratings yet
Day 1 Big Data Concepts For Executives and Senior Management Objective
2 pages
BigData and Hadoop - Syllabus
No ratings yet
BigData and Hadoop - Syllabus
2 pages
Learn Well Technocraft: Hadoop/Big Data Syllabus
No ratings yet
Learn Well Technocraft: Hadoop/Big Data Syllabus
12 pages
CT2 BDTT
No ratings yet
CT2 BDTT
6 pages
Twitter Sentimental Analysis
No ratings yet
Twitter Sentimental Analysis
42 pages
BDA - Unit-1
No ratings yet
BDA - Unit-1
24 pages
Big Data Engineer Course (2) (1)
No ratings yet
Big Data Engineer Course (2) (1)
31 pages
Big Data Analytics Comp Syllabus Sem7
No ratings yet
Big Data Analytics Comp Syllabus Sem7
4 pages
Mastering Hadoop
From Everand
Mastering Hadoop
Sandeep Karanth
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Mastering Big Data and Hadoop: From Basics to Expert Proficiency
From Everand
Mastering Big Data and Hadoop: From Basics to Expert Proficiency
William Smith
No ratings yet
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
From Everand
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
Peter Jones
No ratings yet
A340 Ata 24
No ratings yet
A340 Ata 24
105 pages
Business Plan ON Tata Bike: Submitted To: Submitted by
No ratings yet
Business Plan ON Tata Bike: Submitted To: Submitted by
16 pages
Btech Ce 6 Sem Engineering Economics Estimation and Costing 79393 Dec 2022
No ratings yet
Btech Ce 6 Sem Engineering Economics Estimation and Costing 79393 Dec 2022
4 pages
Filetype PDF The Diffusion of Electronic Data Interchange
No ratings yet
Filetype PDF The Diffusion of Electronic Data Interchange
2 pages
Hybrid Transmission For FWD Vehicles
No ratings yet
Hybrid Transmission For FWD Vehicles
9 pages
Tirfor OM
No ratings yet
Tirfor OM
12 pages
Architecture of OBIEE
No ratings yet
Architecture of OBIEE
4 pages
Chapter 8
No ratings yet
Chapter 8
27 pages
HP Innovations For Todays IT Infrastructure, Rev PDF
100% (1)
HP Innovations For Todays IT Infrastructure, Rev PDF
644 pages
Feed
No ratings yet
Feed
4 pages
British Standard: A Single Copy of This British Standard Is Licensed To
No ratings yet
British Standard: A Single Copy of This British Standard Is Licensed To
13 pages
FIDIC & Recent Infrastructure Developments - BOT
No ratings yet
FIDIC & Recent Infrastructure Developments - BOT
3 pages
Certificate of Fire Approval: Manufacturer Roxtec International AB
No ratings yet
Certificate of Fire Approval: Manufacturer Roxtec International AB
11 pages
Kirloskar Pneumatic Co. LTD., Pune
100% (1)
Kirloskar Pneumatic Co. LTD., Pune
80 pages
Annexure D For WQT
No ratings yet
Annexure D For WQT
1 page
Company Profile
No ratings yet
Company Profile
2 pages
First Article Inspection FAI AS9102 Presentation
100% (3)
First Article Inspection FAI AS9102 Presentation
28 pages
HT 1
No ratings yet
HT 1
2 pages
Managing OEE To Optimize Cement Plant Performance.: A Case Study For Cement Industry
No ratings yet
Managing OEE To Optimize Cement Plant Performance.: A Case Study For Cement Industry
26 pages
Tender Specifications HVAC
100% (1)
Tender Specifications HVAC
100 pages
Midterm Examination Schedule Fall 2019, Ver-Final
No ratings yet
Midterm Examination Schedule Fall 2019, Ver-Final
34 pages
O&M BW213dh, PDH, PDBH-3 10158027000 Up 00814011
100% (1)
O&M BW213dh, PDH, PDBH-3 10158027000 Up 00814011
102 pages
Commercial Boardsplusstands 1022
No ratings yet
Commercial Boardsplusstands 1022
8 pages
Municipal Flood Control Grants Program Guide and Application Guide Part 1: Administration
No ratings yet
Municipal Flood Control Grants Program Guide and Application Guide Part 1: Administration
15 pages
Textile Gujarat1 131123022032 Phpapp02
No ratings yet
Textile Gujarat1 131123022032 Phpapp02
45 pages
Visualization PDF
No ratings yet
Visualization PDF
2 pages
Bills 112hr7ih
No ratings yet
Bills 112hr7ih
847 pages
Sap Record BCRM
No ratings yet
Sap Record BCRM
39 pages
Ab7 - Ceramic SMD Micro Miniature Microprocessor Crystal
No ratings yet
Ab7 - Ceramic SMD Micro Miniature Microprocessor Crystal
2 pages

Big Data & Hadoop - Course Curriculum

Uploaded by

Big Data & Hadoop - Course Curriculum

Uploaded by

Course Curriculum: Your 10 Module Learning Plan

Big Data and Hadoop

Edureka is a leading e-learning platform providing live instructor-led interactive online

About The Course

Project #1: Analyze social bookmarking sites to find insights

Project #2: Customer Complaints Analysis

Project #3: Tourism Data Analysis

Project #5: Analyze Loan Dataset

Project #6: Analyze Movie Ratings

Project #7: Analyze YouTube data

Big Data and Hadoop

You might also like