0% found this document useful (0 votes)

38 views134 pages

Databricks Course Deck

The document outlines an end-to-end project using Azure Databricks with Unity Catalog, focusing on Continuous Integration and Continuous Deployment (CICD). It covers prerequisites, course content, and the architecture of Azure Databricks, including cluster types, access modes, and utilities like DBUtils. Additionally, it discusses Delta Lake, its benefits, and how to implement it for reliable data management in data lakes.

Uploaded by

sahindra52

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views134 pages

Databricks Course Deck

Uploaded by

sahindra52

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 134

Azure Databricks end

to end project with

Unity Catalog CICD

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Project Architecture

Unity Catalog - Governance

Pedal Cycle

Bronze Silver Gold

/landing
Layer Layer Layer
Two Wheeler

LGV Container Bronze Schema Silver Schema Gold Schema

Azure Data Lake Storage Gen 2

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Continuous Integration + Continuous Deployment
Author: Shanmukh Sattiraju
https://www.linkedin.com/in/shanmukh-sattiraju/
Prerequisites

• No experience needed for Azure Databricks , we will start from

Scratch

• An Azure account for hands-on practical

• Basic knowledge on Python and SQL

• Basic knowledge on Azure Cloud Environment

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
What you’ll get from this course?

• Nearly 15+ hours of updated learning content

• Hands-on end to end project
• Practical understanding on Delta lake
• Implementing CICD in Databricks
• Lifetime access to this Course
• Certificate of completion at end of the course

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Environment Setup

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Databricks Access Connector Unity Catalog

Storage Blob
Data
Contributor
Workspace
Folder
Container

Azure Datalake Gen2 Azure Databricks Service

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Azure Databricks – An Introduction

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Big data approach

Cluster
RAM &
STORAGE

RAM &
STORAGE
RAM RAM RAM RAM
& & & &
RAM & STORAGE STORAGE STORAGE STORAGE

STORAGE

Single Computer for Data Storage Distributed Approach (adding

and Processing (Monolithic) multiple machines to achieve
Author: Shanmukh Sattiraju parallel processing)
https://www.linkedin.com/in/shanmukh-sattiraju/
Drawbacks of MapReduce
Traditional Hadoop MapReduce processing

HDFS HDFS HDFS HDFS

Read Iteration 1 Iteration 2 Write
Write Read
Storage Storage Storage
Process data Process data
HDFS Disk HDFS Disk HDFS Disk

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Emergence of spark

HDFS
Read
RAM RAM RAM

HDFS Disk
Iteration 1 Iteration 2 Iteration 3
Or
Analyse Data Analyse Data Analyse Data
Any cloud
storage

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Apache Spark

Apache Spark is an open source in-memory application framework for

distributed data processing and iterative analysis on massive data
volumes

In simple terms, Spark is a

• Compute Engine
• Unified data processing System

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Apache Spark Ecosystem
Spark Spark ML Spark Graph SparkR
(Graph
Spark SQL Streaming (Mllib) Computation)
(R on spark)

(Interactive
Queries) Higher level APIs
DataFrame/ Dataset APIs

Spark Core
Scala Java Python SQL R
Spark Core API

RDD – Resilient Distributed Dataset RDD APIs

Distributed
Compute Engine Spark Engine

Cluster or Resource Manager (YARN, Mesos, Standalone, Kubernetes)

Distributed Storage (Azure Storage, Amazon S3 , GCP)

Author: Shanmukh Sattiraju
https://www.linkedin.com/in/shanmukh-sattiraju/
What is Databricks?
• Unified Interface
• Open analytics platform
• Compute Management
• Notebooks
• Integrates with Cloud Storages
• MLFlow modeling
• Git
• SQL Warehouses

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
How Databricks Work with Azure?
• Unified billing
• Integration with Data services
• Azure Entra ID (previously Azure Active Directory
• Azure Data Factory
• Power BI
• Azure DevOps

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Azure Databricks Architecture
Azure Databricks Service

Control Plane (Databricks)

User
SSO
Authentication

Azure Entra ID
(Azure Active
Databricks Web Notebooks Jobs & Queries Cluster Manager
Directory)
Application

Launch Cluster Pull/Push Logs Job Results Pull/Push metadata

Compute Plane (Azure)

vNet
Azure Storage
External data
sources

Cluster (Virtual Machines) Azure Datalake Gen2

Get/share data
Author: Shanmukh Sattiraju
https://www.linkedin.com/in/shanmukh-sattiraju/
How Databricks Work with Azure?
• Unified billing
• Integration with Data services
• Azure Entra ID (previously Azure Active Directory)
• Azure Data Factory
• Power BI
• Azure DevOps

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Azure Databricks Compute
• Cluster is a set of computation resources and configurations to run
your workloads
• Workloads can be:
1. Set of commands in a notebook
2. A job that you run as a automated workflow
• Cluster types:
1. All purpose Cluster
• To execute set of commands in a notebook
2. Job Cluster
• To execute a job that you run as a automated workflow

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Cluster Types
1. All purpose Cluster
▪ To interactively run the commands in your notebook
▪ Multiple users can share such clusters to do collaborative interactive analysis.
▪ You can terminate, restart, attach , detach these clusters to multiple notebooks
▪ You can choose
▪ Multi-node cluster = Driver node and executor nodes will be on separate machine
▪ Single node cluster = Only there will be a single driver node with single machine
2. Job Cluster
▪ To run a job that you run as a automated workflow
▪ It runs a new job cluster and terminates the cluster automatically when the job is
complete.
▪ You cannot restart a job cluster.

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
To create a new Cluster

• The policy

• The access mode, which controls the security features used when
interacting with data

• The runtime version

• The cluster worker and driver node types

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Cluster Access modes
Access Mode Visible to user UC Support Supported Languages Notes
Can be assigned to
Single user Always Yes Python, SQL, Scala, R and used by a single
user.
1. Python (on Databricks
Runtime 11.1 and above),
Can be used by
2. SQL,
Always (Premium plan multiple users with
Shared Yes 3. Scala (on Unity Catalog-
or above required) data isolation among
enabled clusters using
users.
Databricks Runtime 13.3
and above)
Yes, Admins can hide
There is a related
this cluster type
account-level setting
No Isolation Shared by enforcing user No Python, SQL, Scala, R
for No Isolation
isolation in the admin
Shared clusters.
settings page.

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Cluster Runtime version:
• Databricks Runtime is the set of core components that run on your clusters
So which version to use?
• For all purpose compute:
• Databricks recommends using the latest Databricks Runtime version.
• Using the most current version will ensure you have the latest optimizations and most up-to-date
compatibility between your code and preloaded packages.
• For Job compute:
• As these will be operational workloads, consider using the Long Term Support (LTS) Databricks
Runtime version.
• Using the LTS version will ensure you don’t run into compatibility issues and can thoroughly test
your workload before upgrading.
• For ML Workloads:
• For advanced machine learning use cases, consider the specialized ML Runtime version.
Author: Shanmukh Sattiraju
https://www.linkedin.com/in/shanmukh-sattiraju/
Cluster policies ( in Unity Catalog)
• Policies are a set of rules configured by admins
• These are used to limit the configuration options available to users
when they create a cluster
• Policies have access control lists that regulate which users and groups
have access to the policies.
• Any user with unrestricted policy can create any type of cluster

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Cluster pools ( in Unity Catalog)
• Refer documentation
• Also refer videos from Ramesh and Scholarnest
• https://www.databricks.com/blog/2019/11/11/databricks-pools-speed-
up-data-pipelines.html

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Magic commands
Magic Language Description
command
• You can use multiple %python Python Execute
a Python query
languages in one notebook against Spark Context.

• You need to specify %scala Scala Execute a Scala query

against Spark Context.
language magic command at
the beginning of a cell. %sql Spark SQL Execute
a SparkSQL query
• By default, the entire against Spark Context.
notebook will work on the
%r R Execute a R query
language that you choose at against Spark Context.
the top

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
DBUtils

• Azure Databricks provides set of utilities to efficiently interact with

your notebooks
• Most commonly used DBUtils are:
• File System Utilities
• Widget Utilities
• Notebook Utilities

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
File System Utilities
dbutils.fs provides utilities for working with FileSystems
Below are the available utilities
cp : Copies a file or directory, possibly across FileSystems
head : Returns up to the first 20 records
ls : Lists the contents of a directory
mkdirs : Creates the given directory if it does not exist, also creating any necessary
parent directories
mv :Moves a file or directory, possibly across FileSystems
put : Writes the given String out to a file
rm : Removes a file or directory

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Widgets Utilities
Dbutils.Widgets Utilities helps to gets the input value using parameters.
Widget types are:
• combobox : Creates a combobox input widget with a given name, default
value and choices
• dropdown : Creates a dropdown input widget a with given name, default
value and choices
• get : Retrieves current value of an input widget
• multiselect : Creates a multiselect input widget with a given name, default
value and choices
• remove : Removes an input widget from the notebook
• removeAll : Removes all widgets in the notebook
• text : Creates a text input widget with a given name and default value

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Notebook Utilities
• Exit : This method lets you exit a notebook with a value
• Run : This method runs a notebook and returns its exit value

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Delta Lake

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Drawbacks of ADLS

ADLS != Database
Atomicity

Consistency
Relational database
Isolation

Durability
Author: Shanmukh Sattiraju
https://www.linkedin.com/in/shanmukh-sattiraju/
Drawbacks of ADLS

• No ACID properties
• Job failures lead to inconsistent data
• Simultaneous writes on same folder brings incorrect results
• No schema enforcement
• No support for updates
• No support for versioning
• Data quality issues

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
What is delta lake

• Open-source storage framework that brings reliability to data

lakes
• Brings transaction capabilities to data lakes
• Runs on top of your existing datalake and supports parquet
• Enables Lakehouse architecture

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Lakehouse Architecture

Datawarehouse Modern Datawarehouse Lakehouse Architecture

(usesAuthor:
Datalake)
Shanmukh Sattiraju
https://www.linkedin.com/in/shanmukh-sattiraju/
Lakehouse Architecture

Best elements of Best elements of

Data lake Data warehouse

Lakehouse

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Lakehouse Architecture

BI Reports Data-Science ML

Metadata, caching Layer

Datalake

Structured, Semi- Structured & Unstructured Data

Author: Shanmukh Sattiraju
https://www.linkedin.com/in/shanmukh-sattiraju/
How to create delta lake?
Instead of parquet.. Replace with delta..

dataframe. dataframe.
write\ write\
.format(“parquet”)\ .format(“delta”)\
.save(“/data/”) .save(“/data/”)

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Delta format

Azure Data Lake

Storage

Parquet + Transaction Log

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
delta/

_delta_log/
0000.json Contains transaction
information applied on
0001.json actual data

Partition directory (if applied)

file01.parquet Contains actual data

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Understanding Transaction log file (Delta Log)

• Contains records of every transaction performed on the delta

table

• Files under _delta_log will be stored in JSON format

• Single source of truth

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Transaction log contents
JSON File = result of set of actions

• metadata – Table’s name, schema, partitioning ,etc

• Add – info of added file (with optional statistics)
• Remove – info of removed file
• Set Transaction – contains record of transaction id
• Change protocol – Contains the version that is used
• Commit info – Contains what operation was performed on this

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Schema Enforcement

Loading new data Delta Table

Col1 Col2 Col3 Col4 Col5 Col1 Col2 Col3 Col4

WRITE

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
How does schema enforcement works?
Delta lake uses Schema validation on “writes” .

Schema Enforcement Rules:

1. Cannot contain any additional columns that are not present in the target table's schema
2. Cannot have column data types that differ from the column data types in the target
table.

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Schema Evolution

Loading new data Delta Table

Col1 Col2 Col3 Col4 Col5 Col1 Col2 Col3 Col4

WRITE

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Audit Data Changes & Time Travel

• Delta automatically versions every operation that you perform

• You can time travel to historical versions

• This versioning makes it easy to audit data changes, roll back data in
case of accidental bad writes or deletes, and reproduce experiments
and reports.

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Vacuum in Delta lake

• Vacuum helps to remove parquet files which are not in latest state in
transaction log
• It will skip the files that are starting with _ (underscore) that includes
_delta_log
• It deletes the files that are older then retention threshold
• Default retention threshold in 7 days
• If you run VACUUM on a Delta table, you lose the ability to time
travel back to a version older than the specified data retention period.

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Optimize in Delta lake

Operation parquet files _delta_log Line number State

Column

CREATE TABLE 000.json

WRITE aabb.parquet 001.json 100 Active

WRITE ccdd.parquet 002.json 101 Inactive

WRITE eeff.parquet 003.json 102 Inactive

DELETE 101 004.json

UPDATE 102 iijj.parquet 005.json 99 Active

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
UPSERT (Merge) in delta lake
• We can UPSERT (UPDATE + INSERT) data using MERGE
command.
• If any matching rows found, it will update them
• If no matching rows found, this will insert that as new row

MERGE INTO <Destination_Table>

USING <Source_Table>
ON <Dest>.Col2 = <Source>.Col2
WHEN MATCHED
THEN UPDATE SET
<Dest>.Col1 = <Source>.Col1,
<Dest>.Col2 = <Source>.Col2
WHEN NOT MATCHED
THEN INSERT
VALUES(Source.Col1, Source.Col2)

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Unity Catalog

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Azure Databricks Workspace

User Management

Hive Metastore

Access Controls

Clusters , SQL Warehouses

Azure Datalake Gen2

Author: Shanmukh Sattiraju
https://www.linkedin.com/in/shanmukh-sattiraju/
Azure Databricks Workspace Azure Databricks Workspace

User Management User Management

Hive Metastore Hive Metastore

Access Controls Access Controls

Clusters , SQL Warehouses Clusters , SQL Warehouses

Azure Datalake Gen2

Author: Shanmukh Sattiraju
https://www.linkedin.com/in/shanmukh-sattiraju/
Without Unity Catalog With Unity Catalog
Azure Databricks Azure Databricks Unity Catalog
Workspace 1 Workspace 2
User Management Metastore
User Management User Management
Access Controls

Hive Metastore Hive Metastore

Access Controls Access Controls

Azure Databricks Azure Databricks

Clusters , SQL Clusters , SQL Workspace 1 Workspace 2
Warehouses Warehouses Clusters , SQL Clusters , SQL
Warehouses Warehouses

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Databricks Unity Catalog

Access
Lineage Discovery Monitoring Auditing Sharing
Control

Metadata Management
(Tables | Notebooks | Dashboards)

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Databricks Access Connector Unity Catalog

Storage Blob
Data
Contributor
Workspace
Folder
Container

Azure Datalake Gen2 Azure Databricks Service

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Databricks Premium Workspace

To use Unity Catalog Configure Metastore

Attach workspace to Metastore

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Unity Catalog and Azure

Account
Console

Databricks
AAD Tenant
Account

Azure Azure
Subscription Subscription

Databricks Databricks Databricks Databricks

workspaces workspaces workspaces workspaces

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Unity Catalog Object Model

Metastore

Storage External
Connection Catalog Share Recipient Provider
Credential Location

Schema

Table View Function Volume Model

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Roles in Unity Catalog

• Create metastore and link workspaces

Account Admin • User and Group Management
• Billing and Cost

Metastore • Create and manage Catalogs

Admin • Create and manage external locations

Workspace • Create and manage workspaces

Admin • Create and manage clusters

Workspace
Users • Can create tables, schemas , objects

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
User and Group Management

• Invite and add users to Unity Catalog

• Create groups
• Workspace admins
• Developers
• Assign groups to users
• Workspace admins – Jarvis
• Developers - Steve
• Assign roles to groups
• Workspace Admin – Workspace Admins Group
• Workspace User – Developers Group
Author: Shanmukh Sattiraju
https://www.linkedin.com/in/shanmukh-sattiraju/
Cluster policy

• To control user’s ability to configure clusters based on a set of rules.

• These rules specify which attributes or attribute values can be used
during cluster creation.
• Cluster policies have ACLs that limit their use to specific users and
groups.
• A user who has unrestricted cluster create permission can select the
Unrestricted policy and create fully-configurable clusters.

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Without Cluster pools

Workflow Job Notebooks

Azure

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
With Cluster pools

Workflow Job Notebooks

Databricks Pool

Azure

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Catalog Schema
Bronze Table
P1_Dev
Silver Table
Gold Table
P1_UAT
Bronze Table
Silver Table
P1_Prod
Gold Table

Metastore Bronze Table

P2_Dev
Silver Table
P_Org Gold Table
P2_UAT

. .
P2_Prod
. .
. .
P3_Dev
. .
. .
P3_UAT .
.
P3_Prod Author: Shanmukh Sattiraju
https://www.linkedin.com/in/shanmukh-sattiraju/
Unity Catalog Privileges

• Privileges are permissions that we assign on objects to users

• Can use SQL command or Unity Catalog UI
Eg:
GRANT privilege_type ON securable_object TO principal

Privilege_Type : Unity Catalog permissions like SELECT, CREATE

Securable_object: Any object like SCHEMA, TABLE , etc
Principal: Can be a user, group, etc.

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Unity Catalog - Three level Namespace

SELECT * FROM `Catalog`.`schema`.Table

Metastore

SELECT * FROM `s_dev`.`sales`.products

Catalog Level 1

Schema Level 2

Table View Function Level 3

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Databricks Access Connector Unity Catalog

Storage Blob Workspace

Data
Contributor ?

Azure Databricks Service

Folder Folder
Container Container

Azure Datalake Gen2 Azure Datalake Gen2

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Unity Catalog Object Model

Metastore

Storage External
Connection Catalog
Credential Location

Schema

Table View Function Volume Model

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Storage blob data contributor
Managed Identity Container

Storage Credential External Location

Managed Identity Path of container

Storage Credential

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Storage Credential External Location

An authentication and authorization Serves as a reference point for External

mechanism for accessing data stored storage

Stores the access Credentials to provide Stores the path of the external storages
access to External Location that you want to access.

Credentials can be Managed Identities / Makes use of Storage credential to get

Service principles access to External Storage

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
• Managed Tables
• These can be defined without a specified location
• The data files are stored within managed storage in Delta format
• Dropping the table not only removes its metadata from the catalog, but also
deletes the actual data but in Unity Catalog the underlying data will be present
for 30 days.

• External Tables
• You need to have an EXTERNAL LOCATION and STORAGE
CREDENTIALS created to access the external storage.
• These can be defined for a custom file location, other than the managed
storage
• Dropping the table deletes the metadata from the catalog, but doesn't affect the
data files.

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Spark Structured Streaming

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Spark Structured Streaming

Incoming data stream

Unbounded table

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Spark Structured Streaming flow
Streaming Source

Read

Micro-batch Transform

Write

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Spark Structured Streaming flow
Streaming Source

Streaming Background Query Read

Transform

Write

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Supported Sources and Sinks

Sources Sinks

File Source File Sink

Kafka Source Kafka Sink

Socket Source Foreach Sink

Rate Source Console Sink

Table
Table

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
StreamWriter

<StreamingDataframe>.writeStream
.option('checkpointLocation’,<Location>)
.outputMode('append’)
.toTable(‘<TableName>’)

Checkpoint
• To develop fault-tolerant and resilient Spark applications.
• It maintains intermediate state on fault-tolerant compatible file systems like HDFS, ADLS and S3
storage systems to recover from failures.
• Must be unique to each stream

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
outputModes

OutputMode Usage Description

The records from incoming streams will be
Append outputMode(‘append’)
appended to destination

Complete outputMode(‘complete’) All the processed rows will be displayed

outputMode(‘update’) Spark will output only updated rows.

Update
This is valid only if there are aggregation results;
otherwise, this would be similar to Append mode.

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Triggers
Triggers Usage Description

Unspecified will trigger the microbatch for every 500 ms or half a

(default) second

processingTime
. trigger(processingTime='2 minutes') You can set processing time or time interval for
(Fixed Interval)
each execution .

consumes all available records from previous

availableNow .trigger(availableNow = True) execution as an incremental batch
(OneTime)

Continuous .trigger(continuous = ‘1 second’) For ~1ms latency

(experimental)

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Autoloader

Lakehouse

Auto loader
Bronze Silver Gold

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Autoloader
• Autoloader is an optimized data ingestion tool that incrementally and efficiently
processes new data files as they arrive in the cloud storage built into the
Databricks Lakehouse.
• Auto Loader incrementally and efficiently processes new data files as they arrive
in cloud storage without any additional setup.
• Auto Loader can load data files from Cloud Storages without being vendor
specific (AWS S3 , Azure ADLS , Google Cloud Storage, DBFS).
• Auto Loader can ingest JSON, CSV, PARQUET, AVRO, ORC, TEXT, and
BINARYFILE file formats
• This Auto loader is beneficial when you are ingesting data into your lakehouse
particularly into bronze layer as a streaming query.

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Implementing Autoloader

df_str = (spark.readStream
.format("cloudFiles") ## This will tell the spark to use AutoLoader.
.option("cloudFiles.format","csv") ## Tells Autoloader to expect csv files
.option('header','true')
.schema(schema)
.load(f'{source_dir}')
)

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Schema evolution

• Schema evolution is the process of managing changes in data schema

as it evolves over time, often due to updates in software or changing
business requirements, which can cause schema drift

• Ways to handle schema changes

• Fail the stream
• Manually change the existing schema
• Evolve automatically with change in schema

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Schema validation

Col1 Col2 Col3

Autoloader
/schemaLocation
{
Col1 : int
Col2 : String
Col3: int
}

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Schema validation

Col1 Col2 Col3 Col4 Autoloader

/schemaLocation
{
Validates schema Col1 : int
Col2 : String
Col3: int
}

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Schema Evolution

• addNewColumns = Stream fails. New columns are added to the schema. Existing
columns do not evolve data types.
• failOnNewColumns = Stream fails. Stream does not restart unless the provided
schema is updated, or the offending data file is removed
• rescue = Schema is never evolved and stream does not fail due to schema
changes. All new columns are recorded in the rescued data column.
• none = ignore any new columns (Does not evolve the schema, new columns are
ignored, and data is not rescued unless the rescuedDataColumn option is set.
Stream does not fail due to schema changes.)

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Project Overview

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Medallion Architecture

Kafka BI

Data Lake Reporting

Bronze Silver Gold

Database Data Science

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Project Architecture

Unity Catalog - Governance

Pedal Cycle

Bronze Silver Gold

/landing
Layer Layer Layer
Two Wheeler

LGV Container Bronze Schema Silver Schema Gold Schema

Azure Data Lake Storage Gen 2

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Raw Traffic counts dataset

Pedal Cycle Two Wheeler motor vehicles Buses and coaches

LGV (Large Goods Vehicle) HGV Author:

(Heavy Shanmukh
Goods Sattiraju
Vehicle) Electric Vehicles
https://www.linkedin.com/in/shanmukh-sattiraju/
Data Dictionary
1. Record ID
2. Count point id
3. Direction of travel Vehicle flow point
4. Year
5. Count date
6. hour
7. Region id
8. Region name
9. Local authority name
10. Road name
11. Road Category ID Travel info of vehicle
12. Start junction road name
13. End junction road name
14. Latitude
15. Longitude
16. Link length km
17. Pedal cycles
18. Two wheeled motor vehicles
19. Cars and taxis
20. Buses and coaches Count of types of vehicle
21. LGV Type
22. HGV Type
23. EV Car
24. EV Bike Author: Shanmukh Sattiraju
https://www.linkedin.com/in/shanmukh-sattiraju/
Data Dictionary
1. Record ID = Uniquely identifies a record
2. Count point id = A unique reference for the road link
3. Direction of travel = Direction of travel
4. Year = Year it happened
5. Count date = The date when the actual count took place
6. hour = Hour 7 represents from 7am to 8am, and 17 tells from 5pm to 6pm.
7. Region id = Website region identifier
8. Region name = The name of the Region that travel took place
9. Local authority name = Local authority that region
10. Road name = This is the road name (for instance M25 or A3).
11. Road Category ID = Uniquely identifies road ID
12. Start junction road name = The road name of the start junction of the link
13. End junction road name = The road name of the end junction of the link
14. Latitude = Latitude of the Location
15. Longitude = Longitude of the Location
16. Link length km = Total length of the network road link
17. Pedal cycles = Counts for pedal cycles
18. Two wheeled motor vehicles = Counts of Two wheeled motor vehicles
19. Cars and taxis = Counts of Cars and taxis
20. Buses and coaches = Counts of Buses and coaches
21. LGV Type = Counts of LGV Type
22. HGV Type = Counts of HGV Type
23. EV Car = Counts of EV Car
24. EV Bike = Counts of EV Bike
Author: Shanmukh Sattiraju
https://www.linkedin.com/in/shanmukh-sattiraju/
Raw Roads dataset

Road Category Road Types

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Project Setup

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Expected Setup

Dev Workspace

Dev Catalog

Bronze Silver Gold

raw_traffic silver_traffic gold_traffic

raw_roads silver_roads gold_roads

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Project Architecture

Unity Catalog - Governance

Pedal Cycle

Bronze Silver Gold

/landing
Layer Layer Layer
Two Wheeler

LGV Container Bronze Schema Silver Schema Gold Schema

Azure Data Lake Storage Gen 2

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Containers Folders

landing

raw_traffic
External Locations

1. Landing
raw_roads 2. Checkpoints
3. Bronze
medallion 4. Silver
5. Gold
bronze

silver

gold

checkpoints

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Ingesting Raw Traffic dataset

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Ingestion to Bronze

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Project Architecture

Unity Catalog - Governance

Pedal Cycle

Bronze Silver Gold

/landing
Layer Layer Layer
Two Wheeler

LGV Container Bronze Schema Silver Schema Gold Schema

Azure Data Lake Storage Gen 2

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Ingesting data to Bronze Layer

Schema: bronze

Data Lake Tables:

1. raw_traffic
Bronze 2. raw_roads

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Silver Layer Transformations

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Transforming data in Silver Layer

Schema: Silver

Tables:
1. silver_traffic
Bronze Silver 2. silver_roads

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Transforming Raw Traffic dataset

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Renaming Columns
1. Record ID Record_ID
2. Count point id Count_point_id
3. Direction of travel Direction_of_travel
4. Year Year
5. Count date Count_date
6. hour hour
7. Region id Region_id
8. Region name Region_name
9. Local authority name Local_authority_name
10. Road name Road_name
11. Road Category ID Road_Category_ID

. .
. .
. .
Author: Shanmukh Sattiraju
https://www.linkedin.com/in/shanmukh-sattiraju/
Creating Electric_Vehicles_Count

1. Record_ID 1. Record_ID
2. Count_point_id 2. Count_point_id
3. Direction_of_travel 3. Direction_of_travel
4. Year 4. Year
5. Count_date 5. Count_date
6. hour 6. hour
7. Region_id 7. Region_id
. .
. .
24. EV_Bike 24. EV_Bike
25. Electric_Vehicles_Count

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Creating Motor_Vehicles_Count

Two_wheeled_motor_vehicle + Cars_and_taxis + Buses_and_coaches + LGV_Type + HGV_Type + Electric_Vehicle_Count

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Transforming Raw Roads dataset

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Raw Roads dataset

Road Category Road Types

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Renaming Columns

1. Road ID 1. Record_ID
2. Road category id 2. Road_category_id
3. Road category 3. Road_category
4. Region id 4. Region_id
5. Region name 5. Region_name
6. Total link length km 6. Total_link_length_km
7. Total link length miles 7. Total_link_length_miles
8. All motor vehicles 8. All_motor_vehicles

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Creating Road_Category_Name

1. Record_ID 1. Record_ID
2. Road_category_id 2. Road_category_id
3. Road_category 3. Road_category
4. Region_id 4. Region_id
5. Region_name 5. Region_name
6. Total_link_length_km 6. Total_link_length_km
7. Total_link_length_miles 7. Total_link_length_miles
8. All_motor_vehicles 8. All_motor_vehicles
9. Road_Category_Name

When Road_Category = TA THEN Class A Trunk Road

When Road_Category = TM THEN Class A Trunk Motor
When Road_Category = PA THEN Class A Principal road
When Road_Category = PM THEN Class A Principal Motorway
When Road_Category = M THEN Class B road

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Creating Road_Type

WHEN Road_Category_Name Contains Class A THEN Major

WHEN Road_Category_Name Contains Class B THEN Minor

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Transforming & Loading Silver datasets

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Creating Vehicle_Intensity

Vehicle Intensity = Motor_Vehicles_Count / Link_length_km

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Loading to Gold Layer

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Loading data to Gold Layer

Schema: Gold

Tables:
1. gold_traffic
Silver Gold 2. gold_roads

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Orchestrating with Workflows

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Reporting data to Power BI

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Delta
DeltaLive
Live Tables (DLT)
Tables (DLT)

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Delta Live Tables (DLT) Origin

Reporting
Bronze Silver Gold

Data Science

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Medallion/Lakehouse Architecture Tables

Reporting

Data Science

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Considerations in Lakehouse Architecture

Discovery Quality Checks

Version Control

Reporting

Data Science
Checkpointing
Dependency
Governance

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Declarative programming
Declarative programming say what should be done , not how to do it

Procedural programming Declarative programming

Numbers = [..] SELECT SUM(n)

FROM numbers
Sum = 0

For n in numbers:
sum = sum + n

Print (n)

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Declarative ETL with DLT
Declarative programming say what should be done , not how to do it

Procedural ETL Declarative ETL

• Apache Airflow Delta live tables

• Azure Data Factory

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Delta Live Tables (DLT)

Delta Live Tables (DLT) is a declarative ETL framework for

the Databricks Data Intelligence Platform that helps data teams
simplify streaming and batch ETL cost-effectively.
Simply define the transformations to perform on your data and let DLT
pipelines automatically manage task orchestration, cluster management,
monitoring, data quality and error handling.

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Delta Live Table Execution

• Requires premium workspace

• Supports only Python and SQL languages

• Can’t run interactively

• No support for magic commands like %run

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Expectations in DLT pipeline

Action Result Usage

warn Invalid records are written to the target; failure is

--
(default) reported as a metric for the dataset.

Invalid records are dropped before data is written

drop to the target; failure is reported as a metrics for On Violation Drop Row
the dataset.
Invalid records prevent the update from
fail succeeding. Manual intervention is required On Violation Fail Update
before

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Continuous Integration and Continuous Deployment

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Expected Setup

Metastore

Dev UAT Prod

Workspace Workspace Workspace

Dev Catalog UAT Catalog Prod Catalog

Bronze Silver Gold Bronze Silver Gold Bronze Silver Gold

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Continuous Integration

Main CI Pipeline
T Store latest
Branch available code in
Workspace
Azure
DevOps Git

User

Live Folder
Dev
Databricks
Author: Shanmukh Sattiraju
https://www.linkedin.com/in/shanmukh-sattiraju/
Continuous Deployment

T
Release Release
Pipeline Pipeline

DEV Approval UAT Approval PROD

Dev UAT PROD

Datalake Datalake Datalake

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Creating UAT resources in Azure

• Resource Group: databricks-uat-rg

• Databricks workspace: databricks-uat-ws
• Storage Account: databricksuatstg

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Continuous Integration

Main CI Pipeline
T Store latest
Branch available code in
Workspace
Azure
DevOps Git

User

Live Folder
Dev
Databricks
Author: Shanmukh Sattiraju
https://www.linkedin.com/in/shanmukh-sattiraju/
Continuous Deployment

Release
Pipeline

DEV Approval UAT

Dev UAT
Datalake Datalake

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/
Congratulations

Author: Shanmukh Sattiraju

https://www.linkedin.com/in/shanmukh-sattiraju/

Introduction To Databricks A Beginneers Guide
No ratings yet
Introduction To Databricks A Beginneers Guide
20 pages
Course Notes
No ratings yet
Course Notes
11 pages
Azure Databricks Engineering 1746278570
No ratings yet
Azure Databricks Engineering 1746278570
96 pages
Azure Databricks Mastery
No ratings yet
Azure Databricks Mastery
95 pages
Data Bricks
No ratings yet
Data Bricks
115 pages
Azure Databricks: A Hands-On Guide
No ratings yet
Azure Databricks: A Hands-On Guide
36 pages
Data Engineering With Databricks Da
100% (3)
Data Engineering With Databricks Da
232 pages
Azure Databricks Course Slide Deck
75% (4)
Azure Databricks Course Slide Deck
169 pages
004 Azure Databricks Course Slide Deck V3
0% (1)
004 Azure Databricks Course Slide Deck V3
219 pages
Azure Databricks Course Slide Deck V4
100% (5)
Azure Databricks Course Slide Deck V4
308 pages
Databricks Interview Questions With Detailed Solution
No ratings yet
Databricks Interview Questions With Detailed Solution
171 pages
Cluster in Databricks
No ratings yet
Cluster in Databricks
9 pages
Azure Data Bricks
No ratings yet
Azure Data Bricks
8 pages
Azure Databricks Power Guide: 170+ Pages
No ratings yet
Azure Databricks Power Guide: 170+ Pages
173 pages
Azure Databricks for Data Engineers
No ratings yet
Azure Databricks for Data Engineers
87 pages
Azure Databricks - An Introduction
No ratings yet
Azure Databricks - An Introduction
38 pages
Master Databrciks
No ratings yet
Master Databrciks
79 pages
Azure Databricks Mastery
No ratings yet
Azure Databricks Mastery
53 pages
Databricks Platform & Workspace Guide
No ratings yet
Databricks Platform & Workspace Guide
131 pages
Azure Databricks An Introduction
100% (1)
Azure Databricks An Introduction
54 pages
Azure Databricks
67% (6)
Azure Databricks
69 pages
Azure Databricks Brief Introduction
No ratings yet
Azure Databricks Brief Introduction
40 pages
Databricks Guide
No ratings yet
Databricks Guide
27 pages
Data Bricks
No ratings yet
Data Bricks
42 pages
Azure Databricks Overview
100% (1)
Azure Databricks Overview
4 pages
Azuredatabricks New
No ratings yet
Azuredatabricks New
22 pages
Databricks, An Introduction: Chuck Connell, Insight Digital Innovation
No ratings yet
Databricks, An Introduction: Chuck Connell, Insight Digital Innovation
36 pages
Databricks Workspace Guide
No ratings yet
Databricks Workspace Guide
27 pages
Azure Data Engineer + Databricks Content
No ratings yet
Azure Data Engineer + Databricks Content
7 pages
DP 3011 ENU PowerPoint - 01 Content
No ratings yet
DP 3011 ENU PowerPoint - 01 Content
42 pages
Databricks 2
No ratings yet
Databricks 2
22 pages
Databricks Guide
No ratings yet
Databricks Guide
31 pages
Types of Azure Databricks Cluster Types Lyst1726566822070
No ratings yet
Types of Azure Databricks Cluster Types Lyst1726566822070
2 pages
Databricks Associate Data Engineer Notes
No ratings yet
Databricks Associate Data Engineer Notes
39 pages
1 Spark
No ratings yet
1 Spark
2 pages
Lesson02 DatabricksPerfTuningHardware
No ratings yet
Lesson02 DatabricksPerfTuningHardware
30 pages
Databricks Clusters
No ratings yet
Databricks Clusters
29 pages
150+ +Azure+Databricks+Slides
No ratings yet
150+ +Azure+Databricks+Slides
35 pages
Customer Course Catalog
No ratings yet
Customer Course Catalog
93 pages
Day13 Notes
No ratings yet
Day13 Notes
3 pages
Azure Databricks Onboarding Guide
No ratings yet
Azure Databricks Onboarding Guide
298 pages
Azure Databricks - An Introduction 2019 Roadshow
No ratings yet
Azure Databricks - An Introduction 2019 Roadshow
13 pages
Course Catalog
No ratings yet
Course Catalog
64 pages
Databricks Academy Course Catalog
No ratings yet
Databricks Academy Course Catalog
58 pages
Introduction To Spark
No ratings yet
Introduction To Spark
30 pages
Data Engineering With Databricks (Verma, Sumit) (Z-Library)
No ratings yet
Data Engineering With Databricks (Verma, Sumit) (Z-Library)
219 pages
Lab 3 - Enabling Team Based Data Science With Azure Databricks
No ratings yet
Lab 3 - Enabling Team Based Data Science With Azure Databricks
18 pages
Get Started With Databricks For Machine Learning
No ratings yet
Get Started With Databricks For Machine Learning
85 pages
Azure Databricks Overview
No ratings yet
Azure Databricks Overview
23 pages
Data Bricks S
No ratings yet
Data Bricks S
18 pages
Data Engineering Databricks
No ratings yet
Data Engineering Databricks
139 pages
Ultimate Data Engineering Masters Program v1
No ratings yet
Ultimate Data Engineering Masters Program v1
10 pages
Data Engineering With Databricks (Verma, Sumit) (Z-Library)
No ratings yet
Data Engineering With Databricks (Verma, Sumit) (Z-Library)
193 pages
Explain Databricks
No ratings yet
Explain Databricks
26 pages
DEV3600SlideGuide PDF
No ratings yet
DEV3600SlideGuide PDF
555 pages
Azure Databricks Interview Question
No ratings yet
Azure Databricks Interview Question
12 pages
IBM Campus Recruitment 2025
No ratings yet
IBM Campus Recruitment 2025
1 page
NFS Pentesting Best Practices
No ratings yet
NFS Pentesting Best Practices
4 pages
Excel 2016 Comprehensive Session Plan
No ratings yet
Excel 2016 Comprehensive Session Plan
9 pages
SAP BW On Hana Resume-2023
No ratings yet
SAP BW On Hana Resume-2023
4 pages
Divanshu Soni - BCA 5EA - 03129802022
No ratings yet
Divanshu Soni - BCA 5EA - 03129802022
4 pages
PHP Server Setup Guide
No ratings yet
PHP Server Setup Guide
46 pages
Hyper Table
No ratings yet
Hyper Table
12 pages
PHP Login System Guide
No ratings yet
PHP Login System Guide
6 pages
C HANAIMP 18 SAP Certified Application Associate SAP HANA 2 0 SPS06
No ratings yet
C HANAIMP 18 SAP Certified Application Associate SAP HANA 2 0 SPS06
61 pages
Analisis Kesalahan Siswa Dalam Menyelesaikan Soal Cerita Matematika Berdasarkan Analisis Kesalahan Newman Pada Siswa Kelas VIII SMP Negeri 7 Padang
No ratings yet
Analisis Kesalahan Siswa Dalam Menyelesaikan Soal Cerita Matematika Berdasarkan Analisis Kesalahan Newman Pada Siswa Kelas VIII SMP Negeri 7 Padang
6 pages
Topic 07
No ratings yet
Topic 07
87 pages
CentOS 7 Disaster Recovery Guide
No ratings yet
CentOS 7 Disaster Recovery Guide
6 pages
Operate Database Application Power Ponit
100% (1)
Operate Database Application Power Ponit
19 pages
Prasad Reddy19 - Power BI 4.2yr
No ratings yet
Prasad Reddy19 - Power BI 4.2yr
4 pages
Dbms Outline
No ratings yet
Dbms Outline
6 pages
Platform Technologies - P4
No ratings yet
Platform Technologies - P4
38 pages
PHP Bab 31 ODBC
No ratings yet
PHP Bab 31 ODBC
3 pages
Lesson 5
No ratings yet
Lesson 5
6 pages
A Project Report On: Food Ordering Management System
No ratings yet
A Project Report On: Food Ordering Management System
39 pages
Binary Search Tree Operations Guide
No ratings yet
Binary Search Tree Operations Guide
6 pages
Mesa County Election Forensic Analysis
100% (6)
Mesa County Election Forensic Analysis
22 pages
Unit 1
No ratings yet
Unit 1
10 pages
Presentation of Online Examination System
No ratings yet
Presentation of Online Examination System
15 pages
Audit Trail Config
No ratings yet
Audit Trail Config
3 pages
SQL Query Samples For Unilever Reports
No ratings yet
SQL Query Samples For Unilever Reports
2 pages
Batuan Voter List: Barangay Behind the Clouds
No ratings yet
Batuan Voter List: Barangay Behind the Clouds
9 pages
Oracle Apex Image Upload Guide
No ratings yet
Oracle Apex Image Upload Guide
11 pages
Create Database SelfExercis
No ratings yet
Create Database SelfExercis
3 pages
Computer Science Paper 2 SL Markscheme
No ratings yet
Computer Science Paper 2 SL Markscheme
17 pages
Mcs - 45 Complete
No ratings yet
Mcs - 45 Complete
6 pages