Big Data & Hadoop - Course Curriculum
Big Data & Hadoop - Course Curriculum
About Edureka
We have an easy and affordable learning solution that is accessible to millions of learners.
With our students spread across countries like the US, India, UK, Canada, Singapore,
Australia, Middle East, Brazil and many others, we have built a community of over 1 million
learners across the globe.
1
Module 1 Module 3
Understanding Big Data and Hadoop Hadoop MapReduce Framework - I
Learning Objectives - In this module, you will understand Big Learning Objectives - In this module, you will understand
Data, the limitations of the existing solutions for Big Data Hadoop MapReduce framework and the working of
problem, how Hadoop solves the Big Data problem, the MapReduce on data stored in HDFS. You will learn about
common Hadoop ecosystem components, Hadoop YARN concepts in MapReduce.
Architecture, HDFS, Anatomy of File Write and Read, Rack
Awareness. Topics
MapReduce Use Cases
Topics Traditional way Vs MapReduce way
Big Data Why MapReduce
Limitations and Solutions of existing Data Analytics Hadoop 2.x MapReduce Architecture
Architecture
Hadoop 2.x MapReduce Components
Hadoop
YARN MR Application Execution Flow
Hadoop Features
YARN Workflow
Hadoop Ecosystem Anatomy of MapReduce Program
Hadoop 2.x core components Demo on MapReduce.
Hadoop Storage: HDFS, Hadoop Processing:
MapReduce Framework
Anatomy of File Write and Read, Rack Awareness. Module 4
Hadoop MapReduce Framework - II
Learning Objectives - In this module, you will understand
Module 2 concepts like Input Splits in MapReduce, Combiner &
Hadoop Architecture and HDFS Partitioner and Demos on MapReduce using different data
Learning Objectives - In this module, you will learn the sets.
Hadoop Cluster Architecture, Important Configuration files in
a Hadoop Cluster, Data Loading Techniques. Topics
Input Splits
Topics Relation between Input Splits and HDFS Blocks
Hadoop 2.x Cluster Architecture - Federation and MapReduce Job Submission Flow
High Availability
Demo of Input Splits
A Typical Production Hadoop Cluster
MapReduce: Combiner & Partitioner
Hadoop Cluster Modes
Demo on de-identifying Health Care Data set
Common Hadoop Shell Commands
Demo on Weather Dataset
Hadoop 2.x Configuration Files
Password-Less SSH
MapReduce Job Execution
Data Loading Techniques: Hadoop Copy
Commands
FLUME
SQOOP
2
Module 5 Module 7
Advance MapReduce Hive
Learning Objectives - In this module, you will learn Advance Learning Objectives - This module will help you in
MapReduce concepts such as Counters, Distributed Cache, understanding Hive concepts, Loading and Querying Data in
MRunit, Reduce Join, Custom Input Format, Sequence Input Hive and Hive UDF.
Format and how to deal with complex MapReduce
programs. Topics
Hive Background
Topics Hive Use Case
Counters About Hive
Distributed Cache Hive Vs Pig
MRunit Hive Architecture and Components
Reduce Join Metastore in Hive
Custom Input Format Limitations of Hive
Sequence Input Format Comparison with Traditional Database
Hive Data Types and Data Models
Partitions and Buckets
Module 6 Hive Tables (Managed Tables and External Tables)
Pig Importing Data
Learning Objectives - In this module, you will learn Pig, types
Querying Data
of use case we can use Pig, tight coupling between Pig and
MapReduce, and Pig Latin scripting. Managing Outputs
Hive Script
Topics Hive UDF
About Pig Hive Demo on Healthcare Data set
MapReduce Vs Pig
Pig Use Cases
Programming Structure in Pig Module 8
Pig Running Modes Advance Hive and HBase
Learning Objectives - In this module, you will understand
Pig components
Advance Hive concepts such as UDF, dynamic Partitioning.
Pig Execution
You will also acquire in-depth knowledge of HBase, Hbase
Pig Latin Program Architecture and its components.
Data Models in Pig
Pig Data Types Topics
Pig Latin : Relational Operators, File Loaders, Hive QL: Joining Tables, Dynamic Partitioning,
Group Operator, COGROUP Operator, Joins and Custom Map/Reduce Scripts
COGROUP, Union, Diagnostic Operators Hive : Thrift Server, User Defined Functions
Pig UDF HBase: Introduction to NoSQL Databases and
Pig Demo on Healthcare Data set HBase, HBase v/s RDBMS, HBase Components,
HBase Architecture, HBase Cluster Deployment.
3
Module 9
Advance HBase
Learning Objectives - This module will cover Advance HBase concepts. We will see demos on Bulk Loading, Filters. You will also
learn what Zookeeper is all about, how it helps in monitoring a cluster, why HBase uses Zookeeper.
Topics
HBase Data Model
HBase Shell
HBase Client API
Data Loading Techniques
ZooKeeper Data Model
Zookeeper Service
Zookeeper
Demos on Bulk Loading
Getting and Inserting Data
Filters in HBase
Module 10
Oozie and Hadoop Project
Learning Objectives - In this module, you will understand working of multiple Hadoop ecosystem components together in a
Hadoop implementation to solve Big Data problems. We will discuss multiple data sets and specifications of the project. This
module will also cover Flume & Sqoop demo and Apache Oozie Workflow Scheduler for Hadoop Jobs.
Topics
Flume and Sqoop Demo
Oozie
Oozie Components
Oozie Workflow
Scheduling with Oozie
Demo on Oozie Workflow
Oozie Co-ordinator
Oozie Commands
Oozie Web Console
Hadoop Project Demo
4
Project Work
Towards the end of the Course, you will be working on a live project where you will be using PIG, HIVE, HBase and MapReduce to
perform Big Data analytics.
Here are the few Industry-wise Big Data case studies e.g. Finance, Retail, Media, Aviation etc. which you can take up as your
project work:
5
Project #4: Airline Data Analysis
Industry: Aviation
Data: Publicly available dataset which contains the flight details of various airlines like : Airport id, Name of the airport, Main city
served by airport, Country or territory where airport is located, Code of Airport, Decimal degrees, Hours offset from UTC,
Timezone, etc.
Problem Statement: Analyze the airlines data to:
1. Find list of Airports operating in the Country
2. Find the list of Airlines having zero stops
3. List of Airlines operating with code share
4. Which country (or) territory has the highest number of Airports
5. Find the list of Active Airlines in the United States