0% found this document useful (0 votes)

118 views21 pages

Bigdata Engineer Complete Syllabus: Presented by

The document provides an overview of a BigData Engineer Complete Syllabus that covers topics related to Hadoop, AWS, Azure, Databricks, and machine learning. The course includes 14 topics that will be covered over approximately 60 hours of online training on weekends. Topic areas include Python fundamentals, Hadoop, Hive, Spark, Kafka, HBase, Airflow, AWS, Azure, Databricks, statistics, and machine learning. Hands-on practice is a core part of the training with materials like practice Hadoop clusters and recorded video lessons provided.

Uploaded by

Chepuri Sravan Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

118 views21 pages

Bigdata Engineer Complete Syllabus: Presented by

Uploaded by

Chepuri Sravan Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

BigData Engineer Complete Syllabus

HADOOP | AWS | AZURE | DATABRICKS | MACHINE LEARNING

Presented By

Contact us @ +91 9715 010 010

Our Policy

▪ No Fast Track (30% Theory + 70% Hands-on)

▪ 100% Refund if you are not satisfied
▪ Interactive and Concept basis
▪ Latest Version upgrades (Hadoop 3x & Spark 3x)
▪ Private WhatsApp Group for your queries
Session Details

▪ Course Duration : 60 Hours (~approx.)

▪ Mode of Training : Online
▪ Programming : Python 3x
▪ Session Timing : Weekend (Saturday and Sunday)
▪ Per Class : 3:00 Hours
Course Materials

▪ Practice Hadoop VM Cluster (with Latest Version)

▪ Recorded Videos
▪ Practice Materials
▪ Online eLibraries
▪ Sample Interview Questions with Answer
Course Overview
Topic 1 – Python3 Fundamentals Topic 8 – Airflow Scheduler

Topic 2 – Hadoop 3x Topic 9 – AWS (Amazon Web Services)

Topic 3 – Hive Query Language Topic 10 – Azure Services

Topic 4 – SQOOP (Recording) Topic 11 – Databricks Cluster

Topic 5 – Spark (RDD, DF, SQL & ML) Topic 12 – Statistics Fundamentals

Topic 6 – Kafka Streaming Topic 13 – Machine Learning

Topic 7 – HBase (NoSQL) Topic 14 – Certification & Interview Tips

Topic - 1
Python Fundamentals Hands-on
Data Types Collections
Python 3
Variables •Standard, Int, String, •List, Tuple,
Installation Float, Char, Boolean Dictionary, Set

Classes Functions Control

•Objects, Method, •Lambda, Built in Statements String Slicing
Inheritance Functions, UDF •if else, elif, for, while

NumPy Pandas
Topic - 2
Hadoop Introduction
Brief Hadoop VS
Why Bigdata Google History of
Introduction 4V’s of Bigdata Google
needed now.? Concepts Databases
to Big Data Architecture

History of
Hadoop 1x vs Hadoop Secondary
Hadoop Layers Hadoop & Name Node
2x vs 3x Daemons: - Name Node
Ecosystems

Resource High
Node Manager
Data Node Manager / Job Heart Beat Block Report Availability
/ Task Tracker
Tracker (HA)

Replication
Special File
versus Erasure Block size Input Split
format
Encoding
Topic - 2
BigData Hadoop & YARN (cont..)

MR1 VS MR2 Mapper Reducer Combiner

Hadoop
Application
Commands Container YARN
Master
Hands-on

Hadoop Job
Opportunities
Topic - 3
Hive Introduction
Introduction Hive Hive Meta Hive Server 1 Beeline VS
on Hive Architecture Store vs 2 Hive CLI

Create Table Create Table Create Table Managed VS Connect hive

Thrift Server
with ORC with Parquet using Avro External Table via Beeline

Dynamic
Partitioning Static Partition Bucketing SerDe Hive Joins
Partition

Sample Top 10 Hive

Complex Data
Performance
Project 1 Types (JSON)
Tuning
Topic - 5
Apache Spark Introduction
Introduction of Spark Spark RDD
RDD Actions
Spark Architecture Components Transformation

RDD to Spark DataFrame

DataFrameReader DataFrameWriter
DataFrame DataFrames Transformation

Create a Create a Create a Create a

DataFrame
DataFrame via DataFrame using DataFrame using DataFrame using
Actions
JDBC Parquet file ORC Avro

Create a Create a
Sample
DataFrame using DataFrame using
Project 2
Hive Tables JSON
Topic - 5
Apache Spark Introduction (cont..)

Storage Level Spark-SQL Spark-SQL Spark-Streaming

Spark Cache()
Persist introduction Transformations (DStreams)

Spark-Structured Stateful Stateless Spark – Kafka Spark ML

Streaming Transformation Transformation Integration Libraries

Spark Job Spark Job Spark Job Top 10 Sample

Submission via Submission via Submission via Performance
Local Client Mode Cluster Mode Tuning in Spark Project 3
Topic - 6
Kafka Introduction

What is Kafka API Kafka VS Kafka

Producer
Kafka.? Connector Flume Architecture

In Sync Kafka
Offset Consumer Broker
Replica Serialization

Kafka Topic Kafka - Spark Sample

Creation Integration Project 3
Topic - 7
HBase Introduction
Introduction
CAP Theorem HMaster HRegionServer
of HBase

Row Key Column Family WAL HQuarumpeer

Data Model Sample

Operations Project 4
Topic - 8
Airflow Introduction

What is How DAG

WebServer Scheduler Task Instances
Airflow.? working

Bit shift
upstream and Sensors Executors Data Profiling Adhoc Queries
Downstream

Sqoop Spark Submit Sample

Operators Hive Operator
Operator Operator Project 5
Topic - 9
Amazon Web Services
Simple Storage Serverless
Introduction of Data Analytics
Service (S3) computing
AWS Services
Creation (Lambda)

Elastic Elastic Cloud

AWS Redshift MapReduce AWS RDS Computing
(EMR) (EC2)

Sample
Project 6
Topic - 10
Azure Services
Introduction of Data Analytics Virtual Machine
Blob Storage
Azure services Services (VM)

Azure Data Lake Azure Data Lake SQL Databases HD Insight

Gen 2 Gen 1

Sample
Project 7
Topic - 11
Databricks
Create a
Introduction Integrate with
Automated
of Databricks Azure
Cluster

Integrate with Schedule a

DBFS Storages
AWS Spark Job

Create a
DBFS Magic Sample
Interactive
functions Project 8
Cluster
Topic - 12
Statistics Fundamentals

Why Statistics Uni-variate Bi-variate Descriptive

Types of Data
needed.? Analysis Analysis Statistics

Mean,
VAR, Std Dev Inferential
Median, Skewness Kurtosis
and IQR Statistics
Mode

Central Limit Probability Hypothesis Summarize

Correlation
Theorem Distribution Testing data
Topic - 13
Machine Learning

Introduction of Types of Machine

ML Jargons Data Preprocessing
Machine Learning Learning

Missing Values EDA Handling Uniform Scaling Overfitting

Project 9 Project 10
Underfitting Confusion Matrix Build a Model using Build a Model Using
Python Libraries Spark ML Libraries
❑ Tips and Tricks for Cloudera Certification for Spark and Hadoop
Developer (CCA 175).

❑ Bigdata Interview related tips and tricks and Bigdata Interview

question with answers.
Please feel free to reach us If you have any
queries…
+91 9715 010 010

Midhun BIGDATA Curicullum
No ratings yet
Midhun BIGDATA Curicullum
17 pages
Comprehensive Azure SQL Training Guide
No ratings yet
Comprehensive Azure SQL Training Guide
6 pages
Data Warehousing Essentials
No ratings yet
Data Warehousing Essentials
29 pages
DWH Fundamentals
No ratings yet
DWH Fundamentals
63 pages
Data Warehouse - What Is It
No ratings yet
Data Warehouse - What Is It
5 pages
BD - Spark - Baladasu A - SightSpectrum
No ratings yet
BD - Spark - Baladasu A - SightSpectrum
3 pages
Hive Query Optimization Infinity
No ratings yet
Hive Query Optimization Infinity
13 pages
Introduction To Data Warehouse: Unit I: Data Warehousing
No ratings yet
Introduction To Data Warehouse: Unit I: Data Warehousing
110 pages
Building Data Pipelines - 3
No ratings yet
Building Data Pipelines - 3
29 pages
Facebook Hive POC
No ratings yet
Facebook Hive POC
18 pages
Architecting Data Pipelines on GCP
No ratings yet
Architecting Data Pipelines on GCP
24 pages
HDPDeveloper EnterpriseSpark1 StudentGuide
100% (1)
HDPDeveloper EnterpriseSpark1 StudentGuide
244 pages
Hive in Class Assignment Winter 2021
No ratings yet
Hive in Class Assignment Winter 2021
2 pages
Apache Hive
No ratings yet
Apache Hive
3 pages
Data Vault & HQDM Insights
No ratings yet
Data Vault & HQDM Insights
8 pages
6 Years of Experience in Functional, DB and ETL Testing
No ratings yet
6 Years of Experience in Functional, DB and ETL Testing
3 pages
Unstructured Dataload Into Hive Database Through PySpark
No ratings yet
Unstructured Dataload Into Hive Database Through PySpark
9 pages
DVS SPARK Course Content PDF
No ratings yet
DVS SPARK Course Content PDF
2 pages
Serverless Architecture For Product Defect Detection Using Computer Vision Ra
No ratings yet
Serverless Architecture For Product Defect Detection Using Computer Vision Ra
1 page
Top 10 ETL Design Tips
No ratings yet
Top 10 ETL Design Tips
37 pages
Big Data With Apache Spark 3 and Python From Zero To Expert
No ratings yet
Big Data With Apache Spark 3 and Python From Zero To Expert
28 pages
Flink Vs Spark by Slim Baltagi
No ratings yet
Flink Vs Spark by Slim Baltagi
67 pages
Documenting ETL Rules in CA ERwin
No ratings yet
Documenting ETL Rules in CA ERwin
25 pages
ODI Experts Blog-Changed Data Capture (CDC)
No ratings yet
ODI Experts Blog-Changed Data Capture (CDC)
7 pages
The Changing Role of The DBA in The Expanding Cloud World - Database Trends and Applications
No ratings yet
The Changing Role of The DBA in The Expanding Cloud World - Database Trends and Applications
5 pages
SQL Server Modernization Guide
No ratings yet
SQL Server Modernization Guide
74 pages
Hive Cheat Sheet - Quick Reference
No ratings yet
Hive Cheat Sheet - Quick Reference
19 pages
Technologies For Handling Big Data: Prepared By: Saidatul Rahah Hamidi
No ratings yet
Technologies For Handling Big Data: Prepared By: Saidatul Rahah Hamidi
49 pages
Etl
No ratings yet
Etl
13 pages
Chapter 12 - Data Warehousing and Online Analytical Processing
No ratings yet
Chapter 12 - Data Warehousing and Online Analytical Processing
20 pages
Hadoop ECO System
No ratings yet
Hadoop ECO System
1 page
(Hortonworks University) HDP Developer Apache Spark
100% (1)
(Hortonworks University) HDP Developer Apache Spark
66 pages
Hadoop JobTracker Explained
No ratings yet
Hadoop JobTracker Explained
8 pages
Dice Resume CV SAI KARTHIK
No ratings yet
Dice Resume CV SAI KARTHIK
4 pages
Qlik Sense On AWS Deployment Guide
No ratings yet
Qlik Sense On AWS Deployment Guide
45 pages
Ram Manohar Bheemana: Contact About Me
No ratings yet
Ram Manohar Bheemana: Contact About Me
7 pages
Mandapriyanka (7 0)
No ratings yet
Mandapriyanka (7 0)
3 pages
Cloud Foundry Developer Guide
No ratings yet
Cloud Foundry Developer Guide
7 pages
Sqoop Cammand
No ratings yet
Sqoop Cammand
8 pages
Data Warehouse Concepts & Models
No ratings yet
Data Warehouse Concepts & Models
7 pages
Databricks Delta for Developers
No ratings yet
Databricks Delta for Developers
11 pages
Akka PDF
No ratings yet
Akka PDF
454 pages
Python Data Pipeline Guide
No ratings yet
Python Data Pipeline Guide
38 pages
A Performance Comparison of SQL and NoSQL Databases
No ratings yet
A Performance Comparison of SQL and NoSQL Databases
5 pages
Trivago Pipeline
No ratings yet
Trivago Pipeline
18 pages
Big Data & Hadoop Essentials
No ratings yet
Big Data & Hadoop Essentials
4 pages
Database vs. Data Warehouse Testing
No ratings yet
Database vs. Data Warehouse Testing
17 pages
Google Interview Prep Guide
No ratings yet
Google Interview Prep Guide
100 pages
What Is DW2.0
No ratings yet
What Is DW2.0
13 pages
Hive Interview Questions Answers
No ratings yet
Hive Interview Questions Answers
6 pages
Apache Druid: Sudhindra Tirupati Nagaraj
No ratings yet
Apache Druid: Sudhindra Tirupati Nagaraj
12 pages
SubbaChary SQL DBA 4 Years
No ratings yet
SubbaChary SQL DBA 4 Years
4 pages
Data Warehousing
No ratings yet
Data Warehousing
39 pages
4 Data-Testing PDF
No ratings yet
4 Data-Testing PDF
79 pages
Big Data Testing
100% (1)
Big Data Testing
34 pages
The Hadoop Distributed File System
No ratings yet
The Hadoop Distributed File System
44 pages
Hive and Presto For Big Data
100% (1)
Hive and Presto For Big Data
31 pages
Mining Your Data Lake For Analytics Insights v3 101420
No ratings yet
Mining Your Data Lake For Analytics Insights v3 101420
16 pages
DE Python
No ratings yet
DE Python
11 pages
Big Data Hadoop & Spark Course
No ratings yet
Big Data Hadoop & Spark Course
30 pages
Popegm
No ratings yet
Popegm
246 pages
Sreedevi Hand Book (Unix Linux)
80% (5)
Sreedevi Hand Book (Unix Linux)
74 pages
New Sateesh Yellanki (Oracle 10g)
No ratings yet
New Sateesh Yellanki (Oracle 10g)
398 pages
Chaduvu 17 03 2021
No ratings yet
Chaduvu 17 03 2021
1 page
React - Js Cheat Sheet: Quick Learning
No ratings yet
React - Js Cheat Sheet: Quick Learning
16 pages
DevOps Evolution: Culture, Cloud, and Monitoring
No ratings yet
DevOps Evolution: Culture, Cloud, and Monitoring
3 pages
Artistic Skills and Techniques To Contemporary Art Creations
No ratings yet
Artistic Skills and Techniques To Contemporary Art Creations
40 pages
Images Line Drawings and Backplanes
No ratings yet
Images Line Drawings and Backplanes
27 pages
Chapter 6 - Multiphase Systems: CBE2124, Levicky
No ratings yet
Chapter 6 - Multiphase Systems: CBE2124, Levicky
27 pages
Whirlpool Schema
No ratings yet
Whirlpool Schema
11 pages
Drugs
No ratings yet
Drugs
22 pages
Organophosphate Insecticides (OPC)
No ratings yet
Organophosphate Insecticides (OPC)
27 pages
Falke Talk - The Falke 80 - 90 Serial No Database - 03
No ratings yet
Falke Talk - The Falke 80 - 90 Serial No Database - 03
5 pages
Loop SMPTE - TST-B1 Until You Have Completed The Questions
No ratings yet
Loop SMPTE - TST-B1 Until You Have Completed The Questions
1 page
USPCAS-E Manual
No ratings yet
USPCAS-E Manual
119 pages
Reflection Paper Guide for "The Billionaire"
No ratings yet
Reflection Paper Guide for "The Billionaire"
4 pages
Structure Syllabi
No ratings yet
Structure Syllabi
19 pages
Fractions Worksheet
No ratings yet
Fractions Worksheet
2 pages
iGCSE Biology Study Guide
100% (1)
iGCSE Biology Study Guide
4 pages
Cable Products Pricelist Cable Products Pricelist: Cable Products Price List Cable Products Price List
No ratings yet
Cable Products Pricelist Cable Products Pricelist: Cable Products Price List Cable Products Price List
24 pages
Chapter Three Searching and Sorting Algorithm
100% (1)
Chapter Three Searching and Sorting Algorithm
47 pages
MSDS Pigment Yellow 14
No ratings yet
MSDS Pigment Yellow 14
3 pages
Why Weightlifting Is Superior
No ratings yet
Why Weightlifting Is Superior
4 pages
Focus 4 Test 1 GR A
80% (5)
Focus 4 Test 1 GR A
4 pages
6089202f4e466 The Amorphous Nature of Agile No One Size Fits All
No ratings yet
6089202f4e466 The Amorphous Nature of Agile No One Size Fits All
42 pages
Well Productivity in An Iranian Gas-Cond
No ratings yet
Well Productivity in An Iranian Gas-Cond
11 pages
The Wizard's Harem - Volume Five - His Elven Dancer - Griz T. Orc & Kimiko Petaway - 2020 - Anna's Archive
No ratings yet
The Wizard's Harem - Volume Five - His Elven Dancer - Griz T. Orc & Kimiko Petaway - 2020 - Anna's Archive
45 pages
1.5.2 Strategy As Position: Why Strategy Execution Fails
No ratings yet
1.5.2 Strategy As Position: Why Strategy Execution Fails
12 pages
Devotional Insights of Gaura-kiçora
No ratings yet
Devotional Insights of Gaura-kiçora
95 pages
Cleaning Validation MACO Swab Rinse Ovais v1.1
No ratings yet
Cleaning Validation MACO Swab Rinse Ovais v1.1
8 pages
P 1515 - Design and Contstruction of Anchored and Strutted Sheet Pile Walls Iin Soft Clay PDF
No ratings yet
P 1515 - Design and Contstruction of Anchored and Strutted Sheet Pile Walls Iin Soft Clay PDF
36 pages
Ocular Ischemic Syndrome Case Report
No ratings yet
Ocular Ischemic Syndrome Case Report
18 pages
Lance Design For Argon Bubbling in Molten Metal
No ratings yet
Lance Design For Argon Bubbling in Molten Metal
12 pages
Electromagnetic Warp Drive Theory
No ratings yet
Electromagnetic Warp Drive Theory
16 pages
Medical Forms The High School Programme 2020-21
No ratings yet
Medical Forms The High School Programme 2020-21
4 pages
Selling Task % Weight of Task in Sales Process % Advertising Contribution To Task Advertising's Contribution To Sales Estimated Estimated Projected
100% (1)
Selling Task % Weight of Task in Sales Process % Advertising Contribution To Task Advertising's Contribution To Sales Estimated Estimated Projected
2 pages

Bigdata Engineer Complete Syllabus: Presented by

Uploaded by

Bigdata Engineer Complete Syllabus: Presented by

Uploaded by

BigData Engineer Complete Syllabus

HADOOP | AWS | AZURE | DATABRICKS | MACHINE LEARNING

Contact us @ +91 9715 010 010

▪ No Fast Track (30% Theory + 70% Hands-on)

▪ Course Duration : 60 Hours (~approx.)

▪ Practice Hadoop VM Cluster (with Latest Version)

Topic 2 – Hadoop 3x Topic 9 – AWS (Amazon Web Services)

Topic 3 – Hive Query Language Topic 10 – Azure Services

Topic 4 – SQOOP (Recording) Topic 11 – Databricks Cluster

Topic 6 – Kafka Streaming Topic 13 – Machine Learning

Topic 7 – HBase (NoSQL) Topic 14 – Certification & Interview Tips

Classes Functions Control

MR1 VS MR2 Mapper Reducer Combiner

Create Table Create Table Create Table Managed VS Connect hive

Sample Top 10 Hive

RDD to Spark DataFrame

Create a Create a Create a Create a

Storage Level Spark-SQL Spark-SQL Spark-Streaming

Spark-Structured Stateful Stateless Spark – Kafka Spark ML

Spark Job Spark Job Spark Job Top 10 Sample

What is Kafka API Kafka VS Kafka

Kafka Topic Kafka - Spark Sample

Row Key Column Family WAL HQuarumpeer

Data Model Sample

What is How DAG

Sqoop Spark Submit Sample

Elastic Elastic Cloud

Azure Data Lake Azure Data Lake SQL Databases HD Insight

Integrate with Schedule a

Why Statistics Uni-variate Bi-variate Descriptive

Central Limit Probability Hypothesis Summarize

Introduction of Types of Machine

Missing Values EDA Handling Uniform Scaling Overfitting

❑ Bigdata Interview related tips and tricks and Bigdata Interview

You might also like