edureka!
edureka!
Discover Learning
Data Engineer Masters Program
About Edureka
Edureka is one of the world’s largest and most effective online education platform for
technology professionals. In a span of 10 years, 100,000+ students from over 176 countries
have upskilled themselves with the help of our online courses. Since our inception, we have
been dedicated to helping technology professionals from all corners of the world learn
Programming, Data Science, Big Data, Cloud Computing, DevOps, Business Analytic, Java &
Mobile Technologies, Software Testing, Web Development, System Engineering, Project
Management, Digital Marketing, Business Intelligence, Cybersecurity, RPA and more.
We have an easy and affordable learning solution that is accessible to millions of learners. With
our learners spread across countries like the US, India, UK, Canada, Singapore, Australia, Middle
East, Brazil, and many others, we have built a community of over 1 million learners across the
globe.
About the Program
Edureka’s Data Engineer Masters program is curated by industry experts to provide learners
with a deep understanding of the principles and practices of data engineering through its
extensive course work and hands-on projects. The well researched curriculum enables learners
to design and build data pipelines, manage databases, and develop data infrastructure to meet
the requirements of any organization. Unleash the power of data and accelerate your career—
join the global revolution now!
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Index
1 Linux Fundamentals
2 Apache Spark and Scala Certification Training Course
3 MongoDB Certification Training Course
4 Azure Fundamentals
5 Big Data Hadoop Certification Training Course
6 PySpark Certification Training Course
7 Microsoft SQL Server Certification Course
8 DP 203: Data Engineering on Microsoft Azure
9 Microsoft Power BI Certification Training Course
*Depending on industry requirements, Edureka may make changes to the course curriculum
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
edureka!
Discover Learning
LinuxIndex
Fundamentals (Self-paced)
Course Curriculum
Course Outline
Module 1: Linux Fundamentals
Topics:
• History of Linux
• Linux vs Unix
• Features of Linux
• Components of Linux OS
• Architecture of Linux OS
• Linux Distribution
• Shell Scripting
• User Interface in Linux
• Linux Commands
Module 2: User Administration
Topics:
• File Systems and its Types
• Software Package Management
• Users in Linux
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
• User Groups in Linux
• File/Folder Permissions
• Special Permissions
Module 3: Shell Scripting
Topics:
• Process Management
• Process Synchronization
• Some Basic Linux Commands
• Scripting
• BASH Scripting
• Expect Scripting
Module 4: Networking
Topics:
• OSI Layers
• Protocols
• DNS
• ICMP
• Packet Capturing Tools
• Linux Firewalls
• Iptables
• Linux Security
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
edureka!
Discover Learning
Apache Spark and Scala Certification
*Depending on industry requirements, Edureka may make changes to the course curriculum
Training (Self -Paced) Course Curriculum
Course Outline
Module 1: Introduction to Big Data Hadoop and Spark
Topics:
• What is Big Data?
• Big Data Customer Scenarios
• Limitations and Solutions of Existing Data Analytics Architecture with Uber Use Case
• How Hadoop Solves the Big Data Problem?
• What is Hadoop?
• Hadoop’s Key Characteristics
• Hadoop Ecosystem and HDFS
• Hadoop Core Components
• Rack Awareness and Block Replication YARN and its Advantage
• Hadoop Cluster and its Architecture
• Hadoop: Different Cluster Modes
• Big Data Analytics with Batch & Real-time Processing
• Why Spark is needed?
• What is Spark?
• How Spark differs from other frameworks?
• Spark at Yahoo!
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 2: Introduction to Scala and Apache Spark
Topics:
• What is Scala?
• Scala in other Frameworks
• Basic Scala Operations
• Control Structures in Scala
• Collections in Scala- Array
• Why Scala for Spark?
• Introduction to Scala REPL
• Variable Types in Scala
• Foreach loop, Functions, and Procedures
• ArrayBuffer, Map, Tuples, Lists, and more
Module 3: Functional Programming and OOPs Concepts in Scala
Topics:
• Functional Programming
• Anonymous Functions
• Getters and Setters
• Properties with only Getters
• Singletons
• Overriding Methods
• Higher Order Functions
• Class in Scala
• Custom Getters and Setters
• Auxiliary Constructor and Primary Constructor
• Extending a Class
• Traits as Interfaces
• Layered Traits
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 4: Deep Dive into Apache Spark Framework
Topics:
• Spark’s Place in Hadoop Ecosystem
• Spark Components & its Architecture
• Spark Deployment Modes
• Introduction to Spark Shell
• Writing your first Spark Job Using SBT
• Submitting Spark Job
• Spark Web UI
• Data Ingestion using Sqoop.
Module 5: Playing with Spark RDDs
Topics:
• Challenges in Existing Computing Methods
• Probable Solution & How RDD Solves the Problem
• What is RDD, Its Functions, Transformations & Actions?
• Data Loading and Saving Through RDDs
• Key-Value Pair RDDs
• Other Pair RDDs o RDD Lineage
• RDD Lineage
• RDD Persistence
• WordCount Program Using RDD Concepts
• RDD Partitioning & How It Helps Achieve Parallelization
• Passing Functions to Spark
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 6: DataFrames and Spark SQL
Topics:
• Need for Spark SQL
• What is Spark SQL?
• Spark SQL Architecture
• SQL Context in Spark SQL
• User Defined Functions
• Data Frames & Datasets
• Interoperating with RDDs
• JSON and Parquet File Formats
• Loading Data through Different Sources
• Spark – Hive Integration
Module 7: Machine Learning using Spark MLlib
Topics:
• Why Machine Learning?
• What is Machine Learning?
• Where Machine Learning is Used?
• Face Detection: USE CASE
• Different Types of Machine Learning Techniques
• Introduction to MLlib
• Features of MLlib and MLlib Tools
• Various ML algorithms supported by MLlib
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 8: Deep Dive into Spark MLlib
Topics:
• Supervised Learning - Linear Regression, Logistic Regression, DecisionmTree, Random
Forest
• Unsupervised Learning - K-Means Clustering & How It Works with MLlib
• Analysis on US Election Data using MLlib (K-Means)
Module 9: Understanding Apache Kafka & Apache Flume
Topics:
• Need for Kafka
• Core Concepts of Kafka
• Where is Kafka Used?
• What is Kafka?
• Kafka Architecture
• Understanding the Components of Kafka Cluster
• Configuring Kafka Cluster
• Need of Apache Flume
• What is Apache Flume?
• Flume Sources
• Flume Channels
• Integrating Apache Flume and Apache Kafka
• Basic Flume Architecture
• Flume Sinks
• Flume Configuration
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 10: Apache Spark Streaming- Processing Multiple Batches
Topics:
• Drawbacks in Existing Computing Methods
• Why Streaming is Necessary?
• What is Spark Streaming?
• Spark Streaming Features
• Spark Streaming Workflow
• How Uber Uses Streaming Data
• Streaming Context & DStreams
• Transformations on DStreams
• Describe Windowed Operators and Why it is Useful
• Important Windowed Operators
• Slice, Window and ReduceByWindow Operators
• Stateful Operators
Module 11: Apache Spark Streaming- Data Sources
Topics:
• Apache Spark Streaming: Data Sources
• Streaming Data Source Overview
• Apache Flume and Apache Kafka Data Sources
• Example: Using a Kafka Direct Data Source
• Perform Twitter Sentimental Analysis Using Spark Streaming
Module 12: In Class Project
Learning Objectives
Work on an end-to-end Financial domain project covering all the major concepts of Spark
taught during the course.
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 13: Spark GraphX (Self-Paced)
Learning Objectives
In this module, you will be learning the key concepts of Spark GraphX programming and
operations along with different GraphX algorithms and their implementations.
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
edureka!
Discover Learning
*Depending on industry requirements, Edureka may make changes to the course curriculum
MongoDB Certification Training Course
(Self-paced)
Course Curriculum
Course Outline
Module 1: Introduction to MongoDB - Architecture and Installation
Topics:
• Understanding the basic concepts of a Database
• Database categories: What is NoSQL? Why NoSQL? Benefit over RDBMS
• Types of NoSQL Database, and NoSQL vs. SQL Comparison, ACID & Base Property
• CAP Theorem, implementing NoSQL and what is MongoDB?
• Overview of MongoDB, Design Goals for MongoDB Server and Database, MongoDB tools
• Understanding the following: Collection, Documents and Key/ Values, etc.
• Introduction to JSON and BSON documents
• Environment setup (live Hands-on) and using various MongoDB tools available in the
MongoDB Package
• Case study discussion
Module 2: Schema Design and Data Modelling
Topics:
• Data Modelling Concepts
• Why Data Modelling? Data Modelling Approach
• Analogy between RDBMS & MongoDB Data Model, MongoDB Data Model (Embedding
& Linking)
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
• Challenges for Data Modelling in MongoDB
• Data Model Examples and Patterns
• Model Relationships between Documents
• Model Tree Structures
• Model Specific Application Contexts
• Use Case discussion of Data modeling
Module 3: CRUD Operations
Topics:
• MongoDB Development Architecture
• MongoDB Production Architecture
• MongoDB CRUD Introduction, MongoDB CRUD Concepts
• MongoDB CRUD Concerns (Read & Write Operations)
• Concern Levels, Journaling, etc.
• Cursor Query Optimizations, Query Behavior in MongoDB
• Distributed Read & Write Queries
• MongoDB Datatypes
• MongoDB CRUD Syntax & Queries (Live Hands on)
Module 4: Indexing and Aggregation Framework
Topics:
• Index Introduction, Index Concepts, Index Types, Index Properties
• Index Creation and Indexing Reference
• Introduction to Aggregation
• Approach to Aggregation
• Types of Aggregation (Pipeline, MapReduce & Single Purpose)
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
• Performance Tuning
Module 5: MongoDB Administration
Topics:
• Administration concepts in MongoDB
• Monitoring issues related to Database
• Monitoring at Server, Database, Collection level, and various Monitoring tools related to
MongoDB
• Database Profiling, Locks, Memory Usage, No of connections, page fault etc.
• Backup and Recovery Methods for MongoDB
• Export and Import of Data to and from MongoDB
• Run time configuration of MongoDB
• Production notes/ best practices
• Data Managements in MongoDB (Capped Collections/ Expired data from TTL), Hands on
Administrative Tasks
Module 6: Scalability and Availability
Topics:
• Introduction to Replication (High Availability)
• Concepts around Replication
• What is Replica Set and Master Slave Replication?
• Type of Replication in MongoDB
• How to setup a replicated cluster & managing replica sets etc.
• Introduction to Sharding (Horizontal Scaling)
• Concepts around Sharding, what is shards, Key
• Config Server, Query Router etc.
• How to setup a Sharding
• Type of Sharding (Hash Based, Range Based etc.), and Managing Shards
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 7: MongoDB Security
Topics:
• Security Introduction
• Security Concepts
• Integration of MongoDB with Jaspersoft
• Integration of MongoDB with Pentaho
• Integration of MongoDB with Hadoop/Hive
• Integration of MongoDB with Java
• Integration of MongoDB with GUI Tool Robomongo
• Case Study MongoDB and Java
Module 8: Application Engineering and MongoDB Tools
Topics:
• MongoDB Package Components
• Configuration File Options
• MongoDB Limits and Thresholds
• Connection String URI Format/ Integration of any compatible tool with MongoDB API
and Drivers for MongoDB
• MMS (MongoDB Monitoring Service)
• HTTP and Rest Interface
• Integration of MongoDB with Hadoop and Data Migration MongoDB with Hadoop
(MongoDB to Hive)
• Integration with R
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 9: MongoDB on the Cloud
Topics:
• Overview of MongoDB Cloud products
• Using Cloud Manager to monitor MongoDB deployments
• Introduction to MongoDB Stitch
• MongoDB Cloud Atlas
• MongoDB Cloud Manager
• Working with MongoDB Ops Manager
Module 10: Diagnostics and Fixes
Topics:
• Overview of tools
• MongoDB Diagnostic Tools
• Diagnostics Commands
• MongoDB Deployment
• Setup & Configuration, Scalability, Management & Security
• Slow Queries
• Connectivity
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
edureka!
Discover Learning
Azure Fundamentals
• Vectors, and how to build them using Arrays and Linked Lists with Pointers
Course Curriculum
Course Outline
Module 1: Introduction to Azure and Azure VM
Topics:
• What is Cloud?
• Cloud Computing Patterns
• Service Models
• What is Azure
• Azure Features
• Azure Platform
• Azure Services
• What is Virtual Machine?
• Why Virtual Networks?
• Virtual Networks and its Components
• Azure Portal
Hands-On:
• Exploring Microsoft Azure Portal
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 2: Azure Storage
Topics:
• Why Storage?
• What is Azure Storage?
• Components of Azure Storage
• Blobs
• Queues
• File System
• Tables
Hands-On:
• Creating a Storage Account
• Creating Blobs
• Creating Queues
Module 3: Azure Virtual Network
Topics:
• Why Virtual Networks?
• What is a Virtual Network?
• Azure Subnets
• Network Security Groups
• Virtual Network Architecture
Hands-On:
• Creating Network Security Groups
• Create a Virtual Network
• Create a Webserver VM and Database VM
• Configure the Network Security Groups for respective VMs
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 4: Azure Comparison
Topics:
• What is AWS?
• What is Azure?
• AWS vs Azure
• AWS vs Azure vs GCP
• General Cloud Questions
• General Azure Questions
• Azure Interview Questions
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
edureka!
Discover Learning
Big Data Hadoop Certification Training
*Depending on industry requirements, Edureka may make changes to the course curriculum
Course
Course Curriculum
Course Outline
Module 1: Understanding Big Data and Hadoop
Topics:
• Introduction to Big Data & Big Data Challenges
• Limitations & Solutions of Big Data Architecture
• Data types and Operations
• Hadoop Storage: HDFS (Hadoop Distributed File System)
• Hadoop & its Features
• Hadoop Processing: MapReduce Framework
• Different Hadoop Distributions
• Hadoop Ecosystem
• Hadoop 2.x Core Components
Module 2: Hadoop Architecture and HDFS
Topics:
• Typical Production Hadoop Cluster
• Common Hadoop Shell Commands
• Hadoop 2.x Cluster Architecture
• Hadoop Cluster Modes
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
• Federation and High Availability Architecture
• Hadoop 2.x Configuration Files
• Single Node Cluster & Multi-Node Cluster set up
• Basic Hadoop Administration
Module 3: Hadoop MapReduce Framework
Topics:
• Traditional way vs MapReduce way
• Why MapReduce
• YARN Components
• YARN Architecture
• YARN MapReduce Application Execution Flow
• YARN Workflow
• Anatomy of MapReduce Program
• MapReduce: Combiner & Partitioner
• Input Splits, Relation between Input Splits and HDFS Blocks
• Demo of Health Care Dataset
• Demo of Weather Dataset
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 4: Advanced Hadoop MapReduce
Topics:
• Counters
• Distributed Cache
• MRunit
• Reduce Join
• Custom Input Format
• Sequence Input Format
• XML file Parsing using MapReduce
Module 5: Apache Pig
Topics:
• Introduction to Apache Pig
• MapReduce vs Pig
• Pig Components & Pig Execution
• Pig Latin Programs
• Pig Data Types & Data Models in Pig
• Shell and Utility Commands
• Pig UDF & Pig Streaming
• Testing Pig scripts with Punit
• Aviation use-case in PIG
• Pig Demo of Healthcare Dataset
Module 6: Apache Hive
Topics:
• Introduction to Apache Hive
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
• Hive vs Pig
• Hive Architecture and Components
• Hive Metastore
• Limitations of Hive
• Hive Partition
• Comparison with Traditional Database
• Hive Data Types and Data Models
• Hive Tables (Managed Tables & External Tables)
• Hive Bucketing
• Importing Data Hive Script & Hive UDF
• Querying Data & Managing Outputs
• Retail use case in Hive
• Hive Demo on Healthcare Dataset
Module 7: Advanced Apache Hive and HBase
Topics:
• Hive QL: Joining Tables, Dynamic Partitioning
• Custom MapReduce Scripts
• Hive Indexes and views
• Hive Query Optimizers
• Hive Thrift Server
• Hive UDF
• HBase v/s RDBMS
• HBase Components
• HBase Architecture
• HBase Run Modes
• HBase Configuration
• HBase Cluster Deployment
• Apache HBase: Introduction to NoSQL Databases and Hbase
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 8: Advanced Apache HBase
Topics:
• HBase Data Model HBase Shell
• HBase Client API
• Hive Data Loading Techniques
• Apache Zookeeper
• Introduction ZooKeeper
• Data Model
• Zookeeper Service
• HBase Bulk Loading
• Getting and Inserting Data
• HBase Filters
Module 9: Processing Distributed Data with Apache Spark
Topics:
• What is Spark
• Spark Ecosystem
• Spark Components
• What is Scala
• Why Scala
• SparkContext
• Spark RDD
Module 10: Oozie and Hadoop Project
Topics:
• Oozie
• Topics
• Oozie Components
• Oozie Workflow
• Demo of Oozie Workflow
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
• Scheduling Jobs with Oozie Scheduler
• Oozie Coordinator
• Oozie Commands
• Oozie Web Console
• Oozie for MapReduce Hive in Oozie
• Combining flow of MapReduce Jobs
• Hadoop Project Demo
• Hadoop Talend Integration
Module 11: Certification Project
Topics:
• Find out the frequency of books published each year. (Hint: Sample dataset provided)
• Find out in which year maximum number of books were published
• Find out how many books were published based on ranking in the year 2002
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
edureka !
Discover Learning
PySpark Certification Training Course
Course Curriculum
Course Outline
Module 1: Introduction to Big Data Hadoop and Spark
Topics:
• What is Big Data?
• Big Data Customer Scenarios
• Limitations and Solutions of Existing Data Analytics Architecture with Uber Use Case
• How Hadoop Solves the Big Data Problem?
• What is Hadoop?
• Hadoop’s Key Characteristics
• Hadoop Ecosystem and HDFS
• Hadoop Core Components
• Rack Awareness and Block Replication
• YARN and its Advantage
• Hadoop Cluster and its Architecture
• Hadoop: Different Cluster Modes
• Big Data Analytics with Batch & Real-Time Processing
• Why is Spark Needed?
• What is Spark?
• How Spark Differs from its Competitors?
• Spark at eBay
• Spark’s Place in Hadoop Ecosystem
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 2: Introduction to Python for Apache Spark
Topics:
• Overview of Python
• Different Applications where Python is Used
• Values, Types, Variables
• Operands and Expressions
• Conditional Statements
• Loops
• Command Line Arguments
• Writing to the Screen
• Python files I/O Functions
• Numbers
• Strings and related operations
• Tuples and related operations
• Lists and related operations
• Dictionaries and related operations
• Sets and related operations
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 3: Functions, OOPS, and Modules in Python
Topics:
• Spark Components & its Architecture
• Spark Deployment Modes
• Introduction to PySpark Shell
• Submitting PySpark Job
• Spark Web UI
• Writing your first PySpark Job Using Jupyter Notebook
• Data Ingestion using Sqoop
Module 4: Playing with Spark RDDs
Topics:
• Challenges in Existing Computing Methods
• Probable Solution & How RDD Solves the Problem
• What is RDD, It’s Operations, Transformations & Actions
• Data Loading and Saving Through RDDs
• Key-Value Pair RDDs
• Other Pair RDDs, Two Pair RDDs
• RDD Lineage
• RDD Persistence
• WordCount Program Using RDD Concepts
• RDD Partitioning & How it Helps Achieve Parallelization
• Passing Functions to Spark
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 5: DataFrames and Spark SQL
Topics:
• Need for Spark SQL
• What is Spark SQL
• Spark SQL Architecture
• SQL Context in Spark SQL
• Schema RDDs
• User Defined Functions
• Data Frames & Datasets
• Interoperating with RDDs
• JSON and Parquet File Formats
• Loading Data through Different Sources
• Spark-Hive Integration
Module 6: Machine Learning using Spark MLlib
Topics:
• Why Machine Learning
• What is Machine Learning
• Where Machine Learning is used
• Face Detection: USE CASE
• Different Types of Machine Learning Techniques
• Introduction to MLlib
• Features of MLlib and MLlib Tools
• Various ML algorithms supported by MLlib
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 7: Deep Dive into Spark MLlib
Topics:
• Supervised Learning: Linear Regression, Logistic Regression, Decision Tree, Random
Forest
• Unsupervised Learning: K-Means Clustering & How It Works with MLlib
• Analysis of US Election Data using MLlib (K-Means)
Module 8: Understanding Apache Kafka and Apache Flume
Topics:
• Need for Kafka
• What is Kafka
• Core Concepts of Kafka
• Kafka Architecture
• Where is Kafka Used
• Understanding the Components of Kafka Cluster
• Configuring Kafka Cluster
• Kafka Producer and Consumer Java API
• Need of Apache Flume
• What is Apache Flume
• Basic Flume Architecture
• Flume Sources
• Flume Sinks
• Flume Channels
• Flume Configuration
• Integrating Apache Flume and Apache Kafka
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 9: Apache Spark Streaming - Processing Multiple Batches
Topics:
• Drawbacks in Existing Computing Methods
• Why Streaming is Necessary
• What is Spark Streaming
• Spark Streaming Features
• Spark Streaming Workflow
• How Uber Uses Streaming Data
• Streaming Context & DStreams
• Transformations on DStreams
• Describe Windowed Operators and Why it is Useful
• Important Windowed Operators
• Slice, Window and ReduceByWindow Operators
• Stateful Operators
Module 10: Apache Spark Streaming - Data Sources
Topics:
• Apache Spark Streaming: Data Sources
• Streaming Data Source Overview
• Apache Flume and Apache Kafka Data Sources
• Example: Using a Kafka Direct Data Source
Module 11: Spark GraphX (Self-Paced)
Topics:
• Introduction to Spark GraphX
• Information about a Graph
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
• GraphX Basic APIs and Operations
• Spark GraphX Algorithm - PageRank, Personalized PageRank, Triangle Count, Shortest
Paths, Connected Components, Strongly Connected Components, Label Propagation
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
edureka!
Discover Learning
Microsoft SQL Server Certification
*Depending on industry requirements, Edureka may make changes to the course curriculum
Course Course Curriculum
Course Outline
Module 1: Introduction to RDBMS and SQL Server
Topics:
• Database Systems
• RDBMS
• Properties of Databases
• Introduction to SQL
• E-R Model
• Client Server Model
• MS SQL Server
• Microsoft SQL Management Studio (SSMS)
Module 2: Database Normalization, DDL, and DML Commands
Topics:
• Database Systems
• RDBMS
• Properties of Databases
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
• Introduction to SQL
• E-R Model
• Client Server Model
• MS SQL Server
• Microsoft SQL Management Studio (SSMS)
Module 3: Querying Data using Built-in Functions and T-SQL
Topics:
• Database Systems
• RDBMS
• Properties of Databases
• Introduction to SQL
• E-R Model
• Client Server Model
• MS SQL Server
• Microsoft SQL Management Studio (SSMS)
Module 4: Working with Advanced SQL
Topics:
• Database Systems
• RDBMS
• Properties of Databases
• Introduction to SQL
• E-R Model
• Client Server Model
• MS SQL Server
• Microsoft SQL Management Studio (SSMS)
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 5: UDFs, Backup & Restore in SQL Server
Topics:
• Ranking Functions
• Date and Time Functions
• UDFs (User Defined Functions)
• Backup and Restore Databases
• Triggers
• Index
Module 6: SQL Server Optimization and Performance
Topics:
• Introduction to Optimization
• Understanding Performance
• Optimizing Queries
• Indexing for Performance
• Performance Tuning
Module 7: MS SQL User Administration
Topics:
• Architecture of Security Model
• Server Authentication Modes
• Managing Users, Roles, and Logins
• Permissions (GRANT, DENY, REVOKE)
• Understanding Server Agents
• Server Agent Jobs and Schedules
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 8: Advanced SQL Server Administration
Topics:
• Database Mails via Server Agents
• Activity Monitor
• Log Shipping
• Configuring Log Shipping
• Transparent Data Encryption
Module 9: Introduction to Azure
Topics:
• Introduction to Azure
• Creating Azure Account
• Creating and configuring Azure VMs
• Azure SQL Database
• Accessing Azure Services
• Query SQL database in Azure
Module 10: Migrating SQL Workloads to Azure
Topics:
• Introduction to Microsoft Data Migration Assistant
• Setting up Migration Assistant
• Migrate Local SQL Server Database to Azure SQL Database
• Migration Data Checks
• Best Practices
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
edureka!
Discover Learning
DP 203: Data Engineering on Microsoft
*Depending on industry requirements, Edureka may make changes to the course curriculum
Azure
Course Curriculum
Course Outline
Module 1: Introduction to Microsoft Azure and its Services
Topics:
• Azure Subscriptions
• Azure Resources
• Azure Free Tier Account
• Azure Resource Manager
• Azure Resource Manager Template
• Azure Storage
• Types of Azure Storage
Module 2: Introduction to Azure Data Engineering
Topics:
• Understand the evolving world of data
• Data abundance
• Understanding the Data Engineering Problem
• Understand job responsibilities
• Understanding Data Engineering Processing - Extract Transform and Load
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
• Overview of Azure Data Engineering Services
• Understand data storage in Azure Storage
• Understand data storage in Azure Data Lake Storage
• Understand Azure Cosmos DB
• Understand Azure SQL Database
• Understand Azure Synapse Analytics
• Understand Azure Stream Analytics
• Understand Azure HDInsight
• Understand other Azure data services
Module 3: Storing Data in Azure
Topics:
• How to choose an Azure Storage Service in Azure
• Create an Azure Storage Account
• Connect an app to Azure Storage API
• Connect to your Azure storage account
• Explore Azure Storage security features
• Understand storage account keys
• Understand shared access signatures
• Control network access to your storage account
• Understand Advanced Threat Protection for Azure Storage
• Explore Azure Data Lake Storage security features
• Introduction to Blob storage
• What are blobs?
• Design a storage organization strategy
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 4: Azure Data Factory Part - I
Topics:
• Integrate data with Azure Data Factory or Azure Synapse Pipeline
• Understand Azure Data Factory
• Describe data integration patterns
• Explain the data factory process
• Understand Azure Data Factory components
• Azure Data Factory security
• Set-up Azure Data Factory
• Create linked services
• Create datasets
• Create data factory activities and pipelines
• Manage integration runtimes
• Petabyte-scale ingestion with Azure Data Factory or Azure Synapse Pipeline
• List the data factory ingestion methods
• Describe data factory connectors
• Understand data ingestion security considerations
Module 5: Azure Data Factory Part - II
Topics:
• Explain Data Factory transformation methods
• Describe Data Factory transformation types
• Debug mapping data flow
• Describe slowly changing dimensions
• Choose between slowly changing dimension types
• Understand data factory control flow
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
• Work with data factory pipelines
• Debug data factory pipelines
• Add parameters to data factory components
• Execute data factory packages
• Describe SQL Server Integration Services
• Understand the Azure-SIS integration runtime
• Set-up Azure-SIS integration runtime
• Run SSIS packages in Azure Data Factory
• Migrate SSIS packages to Azure Data Factory
• Configure a git repository with a development factory
• Create and merge a feature branch
• Deploy a release pipeline
• Visually monitor pipeline runs
• Integrate with Azure Monitor
• Set up alerts
• Rerun pipeline runs
Module 6: Azure Synapse Analytics Part - I
Topics:
• What is Azure Synapse Analytics
• How Azure Synapse Analytics works
• When to use Azure Synapse Analytics
• Create Azure Synapse Analytics workspace
• Describe Azure Synapse Analytics SQL
• Explain Apache Spark in Azure Synapse Analytics
• Orchestrate data integration with Azure Synapse pipelines
• Visualize your analytics with Power BI
• Understand hybrid transactional analytical processing with Azure Synapse Link
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
• Use Azure Synapse Studio
Module 7: Azure Synapse Analytics Part - II
Topics:
• Functions
• Function Parameters
• Global variables
• Variable scope and Returning Values
• Lambda Functions
• Object Oriented Concepts
• Standard Libraries
• Modules Used in Python (OS, Sys, Date and Time etc.)
• The Import statements
• Module search path
• Package installation ways
• Errors and Exception Handling
• Handling multiple exceptions
Module 8: Work with Data Warehouses using Azure Synapse Analytics - Part I
Topics:
• Describe a modern data warehouse
• Define a modern data warehouse architecture
• Exercise - Identify modern data warehouse architecture components
• Design ingestion patterns for a modern data warehouse
• Understand data storage for a modern data warehouse
• Understand file formats and structure for a modern data warehouse
• Prepare and transform data with Azure Synapse Analytics
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 9: Work with Data Warehouses using Azure Synapse Analytics - Part II
Topics:
• Understand data load design goals
• Explain load methods into Azure Synapse Analytics
• Manage source data files
• Manage singleton updates
• Set-up dedicated data load accounts
• Implement workload management
• Simplify ingestion with the Copy Activity
• Understand performance issues related to tables
Module 10: Optimizing Data Queries in Azure
Topics:
• Understand table distribution design
• Use indexes to improve query performance
• Understand query plans
• Create statistics to improve query performance
• Improve query performance with materialized views
• Use read committed snapshot for data consistency
• How does statistics affect a query plan?
• Describe the integration methods between SQL and spark pools in Azure Synapse
Analytics
• Understand the use-cases for SQL and spark pools integration
• Exercise: Integrate SQL and spark pools in Azure Synapse Analytics
• Externalize the use of spark pools within Azure Synapse Workspace
• Transfer data outside the synapse workspace using the PySpark connector
• Explore the development tools for Azure Synapse Analytics
• Understand transact-SQL language capabilities for Azure Synapse Analytics
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 11: Managing Workloads in Azure Synapse Analytics
Topics:
• Scale compute resources in Azure Synapse Analytics
• Pause compute in Azure Synapse Analytics
• Manage workloads in Azure Synapse Analytics
• Use Azure Advisor to review recommendations
• Use dynamic management views to identify and troubleshoot query performance
• Understand skewed data and space usage
• Understand network security options for Azure Synapse Analytics
• Configure Conditional Access
• Configure authentication
• Manage authorization through column and row level security
• Manage sensitive data with Dynamic Data Masking
• Implement encryption in Azure Synapse Analytics
Module 12: Deep Dive into Azure Databricks
Topics:
• Get started with Azure Databricks
• Identify Azure Databricks workloads
• Understand key concepts
• Use Apache Spark in Azure Databricks
• Create a Spark cluster
• Use Spark in notebooks
• Use Spark to work with data files
• Visualize data
• Get Started with Delta Lake
• Create Delta Lake tables
• Create and query catalog tables
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
• Use Delta Lake for streaming data
• Get started with SQL Warehouses
• Create databases and tables
• Create queries and dashboards
• Understand Azure Databricks notebooks and pipelines
• Create a linked service for Azure Databricks
• Use a Notebook activity in a pipeline
• Use parameters in a notebook
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
edureka!
Discover Learning
Microsoft Power
*Depending on BI Certification
industry requirements, Edureka may make changes to the course curriculum
Training Course
Course Curriculum
Course Outline
Module 1: Introduction to Power BI
Topics
• Introduction to Business Intelligence
• Self-Service Business Intelligence (SSBI)
• Introduction to Power BI
• Traditional BI vs. Power BI
• Power BI vs. Tableau vs. QlikView
• Uses of Power BI
• The Flow of Work in Power BI
• Working with Power BI
• Basic Components of Power BI
• Comparison of Power BI Version
• Introduction to Building Blocks of Power BI
• Data model and importance of Data Modeling
Module 2: Power BI Desktop and Data Transformation
Topics
• Data Sources in Power BI Desktop
• Loading Data in Power BI Desktop
• Views in Power BI Desktop
• Query Editor in Power BI
• Transform, Clean, Shape, and Model Data
• Manage Data Relationship
• Editing a Relationship
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
• Cross Filter Direction
• Saving Workfile
• Measures
Module 3: Data Analysis Expression (DAX)
Topics:
• Introduction to DAX
• Importance of DAX
• Data Types in DAX
• DAX Operators
• DAX Calculation Types
• Steps to Create Calculated Columns
• Steps to Create Calculated Tables
• Measures in DAX
• DAX Syntax
• DAX Functions
• DAX Tables and Filtering
Module 4: Data Visualization
Topics:
• Introduction to Visuals In Power BI
• Visualization Charts in Power BI
• Matrixes and Tables
• Slicers and Map Visualizations
• Gauges and Single Number Cards
• Modifying Colors in Charts And Visuals
• Shapes, Text Boxes, and Images
• Custom Visuals
• Page Layout and Formatting
• Bookmarks and Selection Pane
• KPI Visuals
• Z-order
• Grouping and Binning
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 5: Power BI Service
Topics:
• Introduction to Power BI Service
• Creating a Dashboard
• Quick Insights in Power BI
• Configuring a Dashboard
• Power BI Q&A
• Ask Questions about your Data
• Power BI Embedded
• Bookmarks and buttons
Module 6: Connectivity Modes
Topics:
• Data Sources Supported in Power BI
• Exploring Live Connections to Data Sources
• Connecting Directly to SQL Azure
• Connecting Directly to SQL Server Analysis Services/My SQL
• Import Power View and Power Pivot
• Data Gateways
• Direct Query vs. Import Connectivity modes
• Connecting Power BI in Excel
Module 7: Power BI Report Server
Topics:
• What is Power BI Report Server?
• Key Features of Report Server
• The architecture of the Report Server
• Limitations of Report Server
• Power BI Report Server vs. Power BI Service
• Acquiring and Installing Power BI Report Server
• What is a Web Portal?
• Paginated Reports
• Row Level Security
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
Module 8: R & Python in Power BI
Topics:
• Brief concepts about R
• R Programming Concepts
• Create R Scripts for BI
• Python Programming
• Python Scripts in BI
• Python integration with Power BI
Module 9: Advance Analytics in Power BI
Topics:
• Use Parameters
• Create a data flow
• Introduction to Anomaly Detection
• Introduction to Smart Narrative
• Introduction to Sensitivity labels in Power BI
• Deployment Pipeline
• Hands-on:
• Connecting with Power BI service
• Creating Data flow
• Creating scorecard
Module 10: In-Class Project
Industry - 1: Retail Sector
Problem Statement:
Global Super Store is an online supergiant store that has worldwide operations. This store takes
orders, delivers products across the globe, and deals with all the major product categories like
furniture, office supplies & technology. As a sales manager for this store, you want to analyze
the sales of the products based on provided historical data; this analysis will help you to plan
your inventory and business processes accordingly. Also, to know the products & customers
behavior.
Topics:
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
edureka!
• Power BI Service
• Row Level Security
• Visuals and Charts
• Power BI Desktop
• Handling Workspaces
Industry - 2: Sales and Finance
Problem Statement:
PEW Retail Inc. Ltd has subsidiaries across the globe, and they sell products to various
customers scattered in a different geography. They are looking to have a consolidated
dashboard.
Topics:
• Power BI Gateway
• Power BI Service
• Data Visualization
• Dashboard Management
www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.