BigData Engineer Complete Syllabus
HADOOP | AWS | AZURE | DATABRICKS | MACHINE LEARNING
Presented By
Contact us @ +91 9715 010 010
Our Policy
▪ No Fast Track (30% Theory + 70% Hands-on)
▪ 100% Refund if you are not satisfied
▪ Interactive and Concept basis
▪ Latest Version upgrades (Hadoop 3x & Spark 3x)
▪ Private WhatsApp Group for your queries
Session Details
▪ Course Duration : 60 Hours (~approx.)
▪ Mode of Training : Online
▪ Programming : Python 3x
▪ Session Timing : Weekend (Saturday and Sunday)
▪ Per Class : 3:00 Hours
Course Materials
▪ Practice Hadoop VM Cluster (with Latest Version)
▪ Recorded Videos
▪ Practice Materials
▪ Online eLibraries
▪ Sample Interview Questions with Answer
Course Overview
Topic 1 – Python3 Fundamentals Topic 8 – Airflow Scheduler
Topic 2 – Hadoop 3x Topic 9 – AWS (Amazon Web Services)
Topic 3 – Hive Query Language Topic 10 – Azure Services
Topic 4 – SQOOP (Recording) Topic 11 – Databricks Cluster
Topic 5 – Spark (RDD, DF, SQL & ML) Topic 12 – Statistics Fundamentals
Topic 6 – Kafka Streaming Topic 13 – Machine Learning
Topic 7 – HBase (NoSQL) Topic 14 – Certification & Interview Tips
Topic - 1
Python Fundamentals Hands-on
Data Types Collections
Python 3
Variables •Standard, Int, String, •List, Tuple,
Installation Float, Char, Boolean Dictionary, Set
Classes Functions Control
•Objects, Method, •Lambda, Built in Statements String Slicing
Inheritance Functions, UDF •if else, elif, for, while
NumPy Pandas
Topic - 2
Hadoop Introduction
Brief Hadoop VS
Why Bigdata Google History of
Introduction 4V’s of Bigdata Google
needed now.? Concepts Databases
to Big Data Architecture
History of
Hadoop 1x vs Hadoop Secondary
Hadoop Layers Hadoop & Name Node
2x vs 3x Daemons: - Name Node
Ecosystems
Resource High
Node Manager
Data Node Manager / Job Heart Beat Block Report Availability
/ Task Tracker
Tracker (HA)
Replication
Special File
versus Erasure Block size Input Split
format
Encoding
Topic - 2
BigData Hadoop & YARN (cont..)
MR1 VS MR2 Mapper Reducer Combiner
Hadoop
Application
Commands Container YARN
Master
Hands-on
Hadoop Job
Opportunities
Topic - 3
Hive Introduction
Introduction Hive Hive Meta Hive Server 1 Beeline VS
on Hive Architecture Store vs 2 Hive CLI
Create Table Create Table Create Table Managed VS Connect hive
Thrift Server
with ORC with Parquet using Avro External Table via Beeline
Dynamic
Partitioning Static Partition Bucketing SerDe Hive Joins
Partition
Sample Top 10 Hive
Complex Data
Performance
Project 1 Types (JSON)
Tuning
Topic - 5
Apache Spark Introduction
Introduction of Spark Spark RDD
RDD Actions
Spark Architecture Components Transformation
RDD to Spark DataFrame
DataFrameReader DataFrameWriter
DataFrame DataFrames Transformation
Create a Create a Create a Create a
DataFrame
DataFrame via DataFrame using DataFrame using DataFrame using
Actions
JDBC Parquet file ORC Avro
Create a Create a
Sample
DataFrame using DataFrame using
Project 2
Hive Tables JSON
Topic - 5
Apache Spark Introduction (cont..)
Storage Level Spark-SQL Spark-SQL Spark-Streaming
Spark Cache()
Persist introduction Transformations (DStreams)
Spark-Structured Stateful Stateless Spark – Kafka Spark ML
Streaming Transformation Transformation Integration Libraries
Spark Job Spark Job Spark Job Top 10 Sample
Submission via Submission via Submission via Performance
Local Client Mode Cluster Mode Tuning in Spark Project 3
Topic - 6
Kafka Introduction
What is Kafka API Kafka VS Kafka
Producer
Kafka.? Connector Flume Architecture
In Sync Kafka
Offset Consumer Broker
Replica Serialization
Kafka Topic Kafka - Spark Sample
Creation Integration Project 3
Topic - 7
HBase Introduction
Introduction
CAP Theorem HMaster HRegionServer
of HBase
Row Key Column Family WAL HQuarumpeer
Data Model Sample
Operations Project 4
Topic - 8
Airflow Introduction
What is How DAG
WebServer Scheduler Task Instances
Airflow.? working
Bit shift
upstream and Sensors Executors Data Profiling Adhoc Queries
Downstream
Sqoop Spark Submit Sample
Operators Hive Operator
Operator Operator Project 5
Topic - 9
Amazon Web Services
Simple Storage Serverless
Introduction of Data Analytics
Service (S3) computing
AWS Services
Creation (Lambda)
Elastic Elastic Cloud
AWS Redshift MapReduce AWS RDS Computing
(EMR) (EC2)
Sample
Project 6
Topic - 10
Azure Services
Introduction of Data Analytics Virtual Machine
Blob Storage
Azure services Services (VM)
Azure Data Lake Azure Data Lake SQL Databases HD Insight
Gen 2 Gen 1
Sample
Project 7
Topic - 11
Databricks
Create a
Introduction Integrate with
Automated
of Databricks Azure
Cluster
Integrate with Schedule a
DBFS Storages
AWS Spark Job
Create a
DBFS Magic Sample
Interactive
functions Project 8
Cluster
Topic - 12
Statistics Fundamentals
Why Statistics Uni-variate Bi-variate Descriptive
Types of Data
needed.? Analysis Analysis Statistics
Mean,
VAR, Std Dev Inferential
Median, Skewness Kurtosis
and IQR Statistics
Mode
Central Limit Probability Hypothesis Summarize
Correlation
Theorem Distribution Testing data
Topic - 13
Machine Learning
Introduction of Types of Machine
ML Jargons Data Preprocessing
Machine Learning Learning
Missing Values EDA Handling Uniform Scaling Overfitting
Project 9 Project 10
Underfitting Confusion Matrix Build a Model using Build a Model Using
Python Libraries Spark ML Libraries
❑ Tips and Tricks for Cloudera Certification for Spark and Hadoop
Developer (CCA 175).
❑ Bigdata Interview related tips and tricks and Bigdata Interview
question with answers.
Please feel free to reach us If you have any
queries…
+91 9715 010 010