Big Data Analytics
This course provides practical foundation level training that enables immediate and effective
participation in big data projects. At the end of this course, the student will become familiar with the
fundamental concepts of Big Data management and analytics; will become competent in recognizing
challenges faced by applications dealing with very large volumes of data as well as in proposing
scalable solutions for them; and will be able to understand how Big Data impacts business intelligence,
scientific discovery, and our day-to-day life.
Core topics:
Introduction to the Big Data problem. Current challenges, trends, and applications
Technologies for Big Data management
Hands on prototype projects to get the actua; working of these technologies
Module 1: Introduction to Big Data Analytics
The Evolution of Data Management, Defining Big Data, Traditional and advanced analytics. History of big data,
its elements, career related knowledge, advantages, and disadvantages. Application perspective of Big Data
covering topics such as using big data in marketing, analytics, retail, hospitality, consumer good, defense etc.
Module 2: Introduction to Big Data and Hadoop eco system
This module focuses on Data Explosion, Types of Data, Need for Big Data, Big Data and Its Sources,
Characteristics of Big Data Technology, Leveraging Multiple Sources of Data, Hadoop/Spark based technologies
for Handling Big Data.
Module 3: Interactive analysis
Hive: Introducing Hive, Getting Started with Hive, Hive Variables, Hive Properties, Data types in Hive, Loading
Files into Tables, Application in Hive, Inserting Data into Tables, Update in Hive.
Introduction to schema on write, dimensional models to exploit/analyse business metrics, data pond for
analysis, metrics and KPIs, Drilling/roll ups, slice/dice for big data, Implementing a sales analysis system
with Hive
Module 4: Advanced Analytics (structured and and Time series Analysis)
Introduction to Analysis Base Tables, Dimension reduction, ETL for analysis, data pond for analytics,
Implementing a customer profiling system with SparkSQL
Hbase: HBase Introduction, Characteristics of HBase, Companies Using HBase, HBase Architecture, Storage
Model of HBase, Row Distribution of Data between Region Servers , Data Storage in HBase, Data Model, HBase
vs. RDBMS, Implementing a time series analysis system for IoT
Module 5: Text Analytics
Pig: Introducing Pig, the Pig Architecture, Benefits of Pig, Installing Pig, Properties of Pig, Running Pig, Running
Pig Programs, Pig Latin Structure, Application Flow.
Text analysis, Tokenizing, filtering, scoring, corpus creation, implementing a Sentiment analysis system for
Tweets
Module 6: Real time Analytics
Introduction to Lambda architecture, aggregations, anomaly detection, CEP, batch layer models, real time
thresholds, implementing a real time analytics for impressions
Dr. Sridhar Vaithianathan
[email protected]u.in
Pig: Introducing Pig, the Pig Architecture, Benefits of Pig, Installing Pig, Properties of Pig,
Running Pig, Running Pig Programs, Pig Latin Structure, Application Flow.
Hive: Introducing Hive, Getting Started with Hive, Hive Variables, Hive Properties, Data
types in Hive, Loading Files into Tables, Application in Hive, Inserting Data into Tables,
Update in Hive.
Hbase: HBase Introduction, Characteristics of HBase, Companies Using HBase, HBase
Architecture, Storage Model of HBase, Row Distribution of Data between Region Servers ,
Data Storage in HBase, Data Model, HBase vs. RDBMS.