0% found this document useful (0 votes)

9 views47 pages

Building A Production-Ready Data Pipeline With Azure - Complete Guide To Medallion Architecture - by Yasar Kocyigit - Jun, 2025 - Medium

This document provides a comprehensive guide on building a production-ready data pipeline using Azure's Medallion Architecture. It outlines the architecture's three layers—Bronze, Silver, and Gold—each serving different data quality and processing needs, and details the Azure services involved, including Azure Data Factory, Azure Databricks, and Azure Data Lake. The tutorial includes steps for setting up infrastructure, deploying linked services, and managing metadata for efficient data processing and transformation.

Uploaded by

sasa332138

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views47 pages

Building A Production-Ready Data Pipeline With Azure - Complete Guide To Medallion Architecture - by Yasar Kocyigit - Jun, 2025 - Medium

Uploaded by

sasa332138

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Building a Production-Ready Data

Pipeline with Azure: Complete
Guide to Medallion Architecture
Yasar Kocyigit 15 min read · 11 hours ago

Introduction: The Modern Data Challenge

In today’s data-driven world, organizations are drowning in data but starving
for insights. Traditional ETL processes are too rigid, too slow, and too
expensive to scale. What if I told you there’s a better way to build data
pipelines that are:

• Metadata-driven (no code changes for new tables)

1 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

• Scalable (handles growing data volumes automatically)

• Reliable (comprehensive error handling and monitoring)

• Cost-effective (optimized resource usage)

Today, I’ll walk you through building a complete Medallion Architecture data
pipeline using Azure services. By the end of this article, you’ll have a
production-ready solution that can process terabytes of data with minimal
maintenance.

What is Medallion Architecture?

The Medallion Architecture is a data design pattern that organizes data in
layers of increasing quality and refinement:

Bronze Layer (Raw Data)

• Purpose: Exact copy of source system data

• Format: Parquet files for efficient storage

• Characteristics: Preserves original data types, includes duplicates

• Use Case: Data archival, compliance, re-processing scenarios

2 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Silver Layer (Cleaned & Enriched)

• Purpose: Business-ready data with quality controls

• Format: Delta Lake for ACID transactions

• Characteristics: Deduplicated, validated, enriched with metadata

• Use Case: Analytics, reporting, machine learning

Gold Layer (Analytics Ready)

• Purpose: Highly refined, aggregated data

• Format: Delta Lake with business models

• Characteristics: KPIs, dimensions, domain-specific views

• Use Case: Executive dashboards, regulatory reporting

In this tutorial, we’ll implement Bronze → Silver transformation, which covers

80% of most data pipeline requirements.

Architecture Overview

3 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Here’s what we’re building:

• Orchestration:
Azure Data Factory
Pipeline management and data movement

4 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

• Transformation:
Azure Databricks
Data processing with Apache Spark

• Storage:
Azure Data Lake Gen2
Scalable data storage

• Metadata:
Azure SQL Database
Control tables and processing logs

• Security:
Azure Key Vault
Secrets management

Why This Stack?

Azure Data Factory

• Metadata-driven: Configure once, process hundreds of tables

• Visual interface: Easy to understand and maintain

• Built-in connectors: Support for 90+ data sources

5 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

• Cost-effective: Pay only for pipeline runs

Azure Databricks
• Apache Spark: Distributed processing for large datasets

• Delta Lake: ACID transactions, schema evolution, time travel

• Auto-scaling: Clusters scale based on workload

• Optimization: Automatic performance tuning

Delta Lake Deep Dive

Delta Lake is the secret sauce that makes our Silver layer powerful:

ACID Transactions

# Concurrent writes don't corrupt data

df.write.format("delta").mode("append").save("/path/to/table")

Time Travel

6 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

-- View data as it existed yesterday

SELECT * FROM delta.`/mnt/silver/customers/` TIMESTAMP AS OF '2025-01-14'

Schema Evolution

# Add new columns without breaking existing queries

df.withColumn("new_column", lit("default_value")) \
.write.format("delta").option("mergeSchema", "true").save("/path")

Upsert Operations

# Merge new data with existing records

silver_table.alias("target").merge(
new_data.alias("source"),
"target.customer_id = source.customer_id"
).whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()

7 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Step 1: Setting Up the Foundation

Prerequisites
• Azure subscription with appropriate permissions

• PowerShell 7+ with Az modules

• SQL Server Management Studio

Repository Structure

azure-data-pipeline/
├── sql/ # Database setup scripts
│ ├── 01-create-control-tables.sql
│ ├── 02-create-stored-procedures.sql
│ └── 03-sample-data-setup.sql
├── databricks/
│ ├── notebooks/
│ │ └── bronze_to_silver.py # Main transformation logic
│ └── cluster-config.json
├── deployment/ # Infrastructure as Code
│ ├── 01-deploy-integration-runtime.ps1
│ ├── 02-deploy-datasets.ps1
│ └── 03-deploy-pipelines.ps1
└── config/

8 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

└── config-template.json

Control Database Schema

The heart of our metadata-driven approach is the control database:

-- Source Systems Configuration

CREATE TABLE ctl.SourceSystems (
SourceSystemId INT IDENTITY(1,1) PRIMARY KEY,
SourceSystemName NVARCHAR(100) NOT NULL UNIQUE,
SourceSystemType NVARCHAR(50) NOT NULL DEFAULT 'SqlServer',
ConnectionString NVARCHAR(500) NOT NULL,
IsActive BIT DEFAULT 1
);

-- Tables Configuration
CREATE TABLE ctl.Tables (
TableId INT IDENTITY(1,1) PRIMARY KEY,
SourceSystemId INT NOT NULL,
SchemaName NVARCHAR(50) NOT NULL,
TableName NVARCHAR(100) NOT NULL,
LoadType NVARCHAR(20) CHECK (LoadType IN ('Full',
'Incremental')),
PrimaryKeyColumns NVARCHAR(500),
BronzePath NVARCHAR(500) NOT NULL,
SilverPath NVARCHAR(500) NOT NULL,
IsActive BIT DEFAULT 1,
Priority INT DEFAULT 100

9 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

);

-- Processing Status Tracking

CREATE TABLE ctl.ProcessingStatus (
StatusId BIGINT IDENTITY(1,1) PRIMARY KEY,
TableId INT NOT NULL,
Layer NVARCHAR(20) CHECK (Layer IN ('Bronze', 'Silver', 'Gold')),
ProcessingDate DATE NOT NULL,
Status NVARCHAR(20) CHECK (Status IN ('Running', 'Success',
'Failed')),
StartTime DATETIME,
EndTime DATETIME,
RecordsProcessed BIGINT,
ErrorMessage NVARCHAR(MAX)
);

This schema enables us to:

• Configure new tables without code changes

• Track processing status in real-time

• Handle errors with detailed logging

• Support multiple source systems with different connection types

Step 2: Infrastructure Deployment

Now comes the exciting part — deploying our infrastructure using

10 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

PowerShell scripts.

Script 1: Integration Runtime and Linked Services

# deployment/01-deploy-integration-runtime.ps1

param(
[Parameter(Mandatory=$true)]
[string]$SubscriptionId,

[Parameter(Mandatory=$true)]
[string]$ResourceGroupName,

[Parameter(Mandatory=$true)]
[string]$DataFactoryName,

[Parameter(Mandatory=$true)]
[string]$KeyVaultName,

[Parameter(Mandatory=$true)]
[string]$StorageAccountName,

[Parameter(Mandatory=$true)]
[string]$DatabricksWorkspaceUrl
)

Set-AzContext -SubscriptionId $SubscriptionId

Write-Host "Deploying Integration Runtime and Linked Services" -

ForegroundColor Green

11 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

# Create Self-Hosted Integration Runtime

try {
$existingIR = Get-AzDataFactoryV2IntegrationRuntime -
ResourceGroupName $ResourceGroupName -DataFactoryName
$DataFactoryName -Name "SelfHostedIR" -ErrorAction SilentlyContinue

if (-not $existingIR) {
Set-AzDataFactoryV2IntegrationRuntime -ResourceGroupName
$ResourceGroupName -DataFactoryName $DataFactoryName -Name
"SelfHostedIR" -Type SelfHosted -Force
Write-Host "Integration Runtime created successfully" -
ForegroundColor Green

$irKeys = Get-AzDataFactoryV2IntegrationRuntimeKey -
ResourceGroupName $ResourceGroupName -DataFactoryName
$DataFactoryName -Name "SelfHostedIR"
Write-Host "Authentication Keys:" -ForegroundColor Cyan
Write-Host "Key1: $($irKeys.AuthKey1)"
Write-Host "Key2: $($irKeys.AuthKey2)"
}
} catch {
Write-Host "Error creating Integration Runtime:
$($_.Exception.Message)" -ForegroundColor Red
exit 1
}

# Key Vault Linked Service

$lsKeyVault = @{
name = "LS_KeyVault"
properties = @{
type = "AzureKeyVault"
typeProperties = @{
baseUrl = "https://$KeyVaultName.vault.azure.net/"
}
}
} | ConvertTo-Json -Depth 10

12 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

# Control Database Linked Service

$lsControlDB = @{
name = "LS_SQL_Control"
properties = @{
type = "AzureSqlDatabase"
typeProperties = @{
connectionString = "Integrated
Security=False;Encrypt=True;Data
Source=YOUR_SQL_SERVER.database.windows.net;Initial
Catalog=YOUR_CONTROL_DATABASE;User ID=YOUR_USERNAME"
password = @{
type = "AzureKeyVaultSecret"
store = @{
referenceName = "LS_KeyVault"
type = "LinkedServiceReference"
}
secretName = "sql-admin-password"
}
}
}
} | ConvertTo-Json -Depth 10

# Data Lake Linked Service

$lsDataLake = @{
name = "LS_ADLS_DeltaLake"
properties = @{
type = "AzureBlobFS"
typeProperties = @{
url = "https://$StorageAccountName.dfs.core.windows.net"
accountKey = @{
type = "AzureKeyVaultSecret"
store = @{
referenceName = "LS_KeyVault"
type = "LinkedServiceReference"
}
secretName = "storage-account-key"
}

13 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

}
}
} | ConvertTo-Json -Depth 10

# Databricks Linked Service

$lsDatabricks = @{
name = "LS_Databricks"
properties = @{
type = "AzureDatabricks"
typeProperties = @{
domain = $DatabricksWorkspaceUrl
accessToken = @{
type = "AzureKeyVaultSecret"
store = @{
referenceName = "LS_KeyVault"
type = "LinkedServiceReference"
}
secretName = "databricks-token"
}
existingClusterId = "YOUR_CLUSTER_ID"
}
}
} | ConvertTo-Json -Depth 10

# Deploy Linked Services

$linkedServices = @{
"LS_KeyVault" = $lsKeyVault
"LS_SQL_Control" = $lsControlDB
"LS_ADLS_DeltaLake" = $lsDataLake
"LS_Databricks" = $lsDatabricks
}

foreach ($service in $linkedServices.GetEnumerator()) {

Write-Host "Deploying $($service.Key)..." -ForegroundColor Yellow
$service.Value | Out-File "$($service.Key).json" -Encoding UTF8
Set-AzDataFactoryV2LinkedService -ResourceGroupName
$ResourceGroupName -DataFactoryName $DataFactoryName -Name
$service.Key -DefinitionFile "$($service.Key).json" -Force
Remove-Item "$($service.Key).json" -Force

14 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Write-Host "Integration Runtime and Linked Services deployment

completed" -ForegroundColor Green

What this script does:

• Creates a Self-Hosted Integration Runtime for on-premises connectivity

• Deploys Key Vault linked service for secure credential management

• Sets up Control Database linked service for metadata operations

• Configures Data Lake linked service for Bronze/Silver storage

• Creates Databricks linked service for data transformation

Script 2: Dataset Deployment

# deployment/02-deploy-datasets.ps1

param(
[Parameter(Mandatory=$true)]
[string]$SubscriptionId,

15 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

[Parameter(Mandatory=$true)]
[string]$ResourceGroupName,

[Parameter(Mandatory=$true)]
[string]$DataFactoryName
)

Set-AzContext -SubscriptionId $SubscriptionId

Write-Host "=== DATASETS DEPLOYMENT ===" -ForegroundColor Cyan

# Bronze Parquet Dataset - Parameterized for dynamic file paths

$dsBronzeParquet = @{
name = "DS_Bronze_Parquet"
properties = @{
linkedServiceName = @{
referenceName = "LS_ADLS_DeltaLake"
type = "LinkedServiceReference"
}
parameters = @{
FilePath = @{ type = "string" }
FileName = @{ type = "string" }
}
type = "Parquet"
typeProperties = @{
location = @{
type = "AzureBlobFSLocation"
fileName = @{
value = "@dataset().FileName"
type = "Expression"
}
folderPath = @{
value = "@dataset().FilePath"
type = "Expression"
}
fileSystem = "datalake"
}
compressionCodec = "snappy"

16 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

}
}
} | ConvertTo-Json -Depth 15

# Control Database Dataset

$dsControlDatabase = @{
name = "DS_Control_Database"
properties = @{
linkedServiceName = @{
referenceName = "LS_SQL_Control"
type = "LinkedServiceReference"
}
type = "AzureSqlTable"
typeProperties = @{
schema = "ctl"
}
}
} | ConvertTo-Json -Depth 15

# Generic SQL Dataset - Parameterized for any source table

$dsSqlGeneric = @{
name = "DS_SQL_Generic"
properties = @{
linkedServiceName = @{
referenceName = "LS_SQL_SourceSystem"
type = "LinkedServiceReference"
}
parameters = @{
TableName = @{ type = "string" }
SchemaName = @{ type = "string"; defaultValue = "dbo" }
}
type = "SqlServerTable"
typeProperties = @{
schema = @{
value = "@dataset().SchemaName"
type = "Expression"
}
table = @{

17 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

value = "@dataset().TableName"
type = "Expression"
}
}
}
} | ConvertTo-Json -Depth 15

# Deploy All Datasets

$datasets = @{
"DS_Bronze_Parquet" = $dsBronzeParquet
"DS_Control_Database" = $dsControlDatabase
"DS_SQL_Generic" = $dsSqlGeneric
}

foreach ($dataset in $datasets.GetEnumerator()) {

Write-Host "Deploying $($dataset.Key)..." -ForegroundColor Yellow
$dataset.Value | Out-File "$($dataset.Key).json" -Encoding UTF8
Set-AzDataFactoryV2Dataset -ResourceGroupName $ResourceGroupName
-DataFactoryName $DataFactoryName -Name $dataset.Key -DefinitionFile
"$($dataset.Key).json" -Force
Remove-Item "$($dataset.Key).json" -Force
Write-Host "✓ $($dataset.Key) deployed successfully" -
ForegroundColor Green
}

Write-Host "All datasets deployed successfully!" -ForegroundColor

Green

Key Dataset Features:

• Parameterized datasets for dynamic file paths

• Generic SQL dataset that works with any source table

18 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

• Control database dataset for metadata operations

• Compression optimization with Snappy codec

Script 3: Complete Pipeline Deployment

# deployment/03-deploy-pipelines.ps1

param(
[Parameter(Mandatory=$true)]
[string]$SubscriptionId,

[Parameter(Mandatory=$true)]
[string]$ResourceGroupName,

[Parameter(Mandatory=$true)]
[string]$DataFactoryName
)

Set-AzContext -SubscriptionId $SubscriptionId

Write-Host "Deploying Complete Pipeline Suite" -ForegroundColor Green

# Bronze Ingestion Pipeline

$bronzeIngestionPipeline = @{
name = "PL_Bronze_Ingestion"
properties = @{
activities = @(
@{
name = "Build Source Query"

19 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

type = "SetVariable"
typeProperties = @{
variableName = "SourceQuery"
value = @{
value =
"@{if(equals(pipeline().parameters.TableConfig.LoadType,
'Incremental'), concat('SELECT * FROM ',
pipeline().parameters.TableConfig.SchemaName, '.',
pipeline().parameters.TableConfig.TableName, ' WHERE ',
pipeline().parameters.TableConfig.IncrementalColumn, ' > ''',
if(empty(pipeline().parameters.TableConfig.WatermarkValue),
'1900-01-01', pipeline().parameters.TableConfig.WatermarkValue),
''''), concat('SELECT * FROM ',
pipeline().parameters.TableConfig.SchemaName, '.',
pipeline().parameters.TableConfig.TableName))}"
type = "Expression"
}
}
}
@{
name = "Copy to Bronze"
type = "Copy"
dependsOn = @(@{ activity = "Build Source Query";
dependencyConditions = @("Succeeded") })
typeProperties = @{
source = @{
type = "SqlServerSource"
sqlReaderQuery = @{ value =
"@variables('SourceQuery')"; type = "Expression" }
}
sink = @{
type = "ParquetSink"
storeSettings = @{ type =
"AzureBlobFSWriteSettings" }
}
}
inputs = @(@{
referenceName = "DS_SQL_Generic"

20 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

type = "DatasetReference"
parameters = @{
SchemaName =
"@pipeline().parameters.TableConfig.SchemaName"
TableName =
"@pipeline().parameters.TableConfig.TableName"
}
})
outputs = @(@{
referenceName = "DS_Bronze_Parquet"
type = "DatasetReference"
parameters = @{
FilePath = "@{concat('bronze/',
toLower(pipeline().parameters.TableConfig.SourceSystemName), '/',
toLower(pipeline().parameters.TableConfig.SchemaName), '/',
pipeline().parameters.TableConfig.TableName, '/',
pipeline().parameters.ProcessingDate)}"
FileName =
"@{concat(pipeline().parameters.TableConfig.TableName, '_',
pipeline().parameters.ProcessingDate, '.parquet')}"
}
})
}
)
parameters = @{
TableConfig = @{ type = "object" }
ProcessingDate = @{ type = "string" }
}
variables = @{
SourceQuery = @{ type = "String" }
}
}
} | ConvertTo-Json -Depth 20

# Silver Processing Pipeline

$silverProcessingPipeline = @{
name = "PL_Silver_Processing"
properties = @{

21 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

activities = @(
@{
name = "Get Tables for Silver"
type = "Lookup"
typeProperties = @{
source = @{
type = "AzureSqlSource"
sqlReaderQuery = @{
value = "SELECT DISTINCT t.TableId,
t.SchemaName, t.TableName, t.PrimaryKeyColumns, t.LoadType, CONCAT('/
mnt/bronze/', LOWER(ss.SourceSystemName), '/', LOWER(t.SchemaName),
'/', t.TableName) as BronzePath, CONCAT('/mnt/silver/',
LOWER(ss.SourceSystemName), '/', LOWER(t.SchemaName), '/',
t.TableName) as SilverPath FROM ctl.Tables t INNER JOIN
ctl.SourceSystems ss ON t.SourceSystemId = ss.SourceSystemId INNER
JOIN ctl.ProcessingStatus ps ON t.TableId = ps.TableId WHERE
ps.ProcessingDate = '@{pipeline().parameters.ProcessingDate}' AND
ps.Layer = 'Bronze' AND ps.Status = 'Success' AND t.IsActive = 1"
}
}
dataset = @{ referenceName =
"DS_Control_Database"; type = "DatasetReference" }
firstRowOnly = $false
}
}
@{
name = "ForEach Silver Table"
type = "ForEach"
dependsOn = @(@{ activity = "Get Tables for Silver";
dependencyConditions = @("Succeeded") })
typeProperties = @{
items = "@activity('Get Tables for
Silver').output.value"
isSequential = $false
batchCount = 4
activities = @(
@{
name = "Process Bronze to Silver"

22 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

type = "DatabricksNotebook"
typeProperties = @{
notebookPath = "/Shared/
bronze_to_silver"
baseParameters = @{
table_name =
"@{item().TableName}"
silver_path =
"@{item().SilverPath}"
load_type = "@{item().LoadType}"
schema_name =
"@{item().SchemaName}"
processing_date =
"@pipeline().parameters.ProcessingDate"
bronze_path =
"@{item().BronzePath}"
table_id = "@{item().TableId}"
primary_keys =
"@{item().PrimaryKeyColumns}"
}
}
linkedServiceName = @{ referenceName =
"LS_Databricks" }
}
)
}
}
)
parameters = @{
ProcessingDate = @{ type = "string" }
}
}
} | ConvertTo-Json -Depth 20

# Master Orchestrator Pipeline

$masterPipeline = @{
name = "PL_Master_Orchestrator"
properties = @{

23 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

activities = @(
@{
name = "Set Processing Date"
type = "SetVariable"
typeProperties = @{
variableName = "ProcessingDate"
value =
"@{if(equals(pipeline().parameters.ProcessingDate, ''),
formatDateTime(utcnow(), 'yyyy-MM-dd'),
pipeline().parameters.ProcessingDate)}"
}
}
@{
name = "Get Active Tables"
type = "Lookup"
dependsOn = @(@{ activity = "Set Processing Date";
dependencyConditions = @("Succeeded") })
typeProperties = @{
source = @{
type = "AzureSqlSource"
sqlReaderQuery = "SELECT t.*,
ss.ConnectionString, ss.SourceSystemName, w.WatermarkValue FROM
ctl.Tables t INNER JOIN ctl.SourceSystems ss ON t.SourceSystemId =
ss.SourceSystemId LEFT JOIN ctl.Watermarks w ON t.TableId = w.TableId
WHERE t.IsActive = 1 AND ss.IsActive = 1 ORDER BY t.Priority,
t.TableId"
}
dataset = @{ referenceName =
"DS_Control_Database"; type = "DatasetReference" }
firstRowOnly = $false
}
}
@{
name = "ForEach Table"
type = "ForEach"
dependsOn = @(@{ activity = "Get Active Tables";
dependencyConditions = @("Succeeded") })
typeProperties = @{

24 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

items = "@activity('Get Active

Tables').output.value"
isSequential = $false
batchCount = 5
activities = @(
@{
name = "Execute Bronze Ingestion"
type = "ExecutePipeline"
typeProperties = @{
pipeline = @{ referenceName =
"PL_Bronze_Ingestion" }
waitOnCompletion = $true
parameters = @{
TableConfig = "@item()"
ProcessingDate =
"@variables('ProcessingDate')"
}
}
}
)
}
}
@{
name = "Execute Silver Processing"
type = "ExecutePipeline"
dependsOn = @(@{ activity = "ForEach Table";
dependencyConditions = @("Succeeded") })
typeProperties = @{
pipeline = @{ referenceName =
"PL_Silver_Processing" }
waitOnCompletion = $true
parameters = @{
ProcessingDate =
"@variables('ProcessingDate')"
}
}
}
)

25 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

parameters = @{
ProcessingDate = @{ type = "string"; defaultValue = "" }
SourceSystemName = @{ type = "string"; defaultValue = ""
}
TableName = @{ type = "string"; defaultValue = "" }
}
variables = @{
ProcessingDate = @{ type = "String"; defaultValue = "" }
}
}
} | ConvertTo-Json -Depth 20

# Deploy Pipelines in Dependency Order

Write-Host "1. Deploying Bronze Ingestion Pipeline..." -
ForegroundColor Yellow
$bronzeIngestionPipeline | Out-File "PL_Bronze_Ingestion.json" -
Encoding UTF8
Set-AzDataFactoryV2Pipeline -ResourceGroupName $ResourceGroupName -
DataFactoryName $DataFactoryName -Name "PL_Bronze_Ingestion" -
DefinitionFile "PL_Bronze_Ingestion.json" -Force
Remove-Item "PL_Bronze_Ingestion.json" -Force

Write-Host "2. Deploying Silver Processing Pipeline..." -

ForegroundColor Yellow
$silverProcessingPipeline | Out-File "PL_Silver_Processing.json" -
Encoding UTF8
Set-AzDataFactoryV2Pipeline -ResourceGroupName $ResourceGroupName -
DataFactoryName $DataFactoryName -Name "PL_Silver_Processing" -
DefinitionFile "PL_Silver_Processing.json" -Force
Remove-Item "PL_Silver_Processing.json" -Force

Write-Host "3. Deploying Master Orchestrator Pipeline..." -

ForegroundColor Yellow
$masterPipeline | Out-File "PL_Master_Orchestrator.json" -Encoding
UTF8
Set-AzDataFactoryV2Pipeline -ResourceGroupName $ResourceGroupName -
DataFactoryName $DataFactoryName -Name "PL_Master_Orchestrator" -
DefinitionFile "PL_Master_Orchestrator.json" -Force
Remove-Item "PL_Master_Orchestrator.json" -Force

26 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Write-Host "All pipelines deployed successfully!" -ForegroundColor

Green

Pipeline Architecture:

1. Bronze Ingestion Pipeline: Extracts data from source systems to Bronze

layer

2. Silver Processing Pipeline: Transforms Bronze data to Silver with quality

controls

3. Master Orchestrator Pipeline: Coordinates the entire process with error

handling

Step 3: Databricks Transformation Logic

The heart of our Silver layer processing happens in Databricks:

# databricks/notebooks/bronze_to_silver.py

# Parameter Configuration
dbutils.widgets.text("table_id", "")

27 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

dbutils.widgets.text("schema_name", "")
dbutils.widgets.text("table_name", "")
dbutils.widgets.text("bronze_path", "")
dbutils.widgets.text("silver_path", "")
dbutils.widgets.text("processing_date", "")
dbutils.widgets.text("primary_keys", "")
dbutils.widgets.text("load_type", "Full")

# Extract Parameters
table_id = int(dbutils.widgets.get("table_id"))
schema_name = dbutils.widgets.get("schema_name")
table_name = dbutils.widgets.get("table_name")
bronze_path = dbutils.widgets.get("bronze_path")
silver_path = dbutils.widgets.get("silver_path")
processing_date = dbutils.widgets.get("processing_date")
primary_keys = dbutils.widgets.get("primary_keys").split(",") if
dbutils.widgets.get("primary_keys") else []
load_type = dbutils.widgets.get("load_type")

print(f"Processing: {schema_name}.{table_name}")
print(f"Load Type: {load_type}")

# Import Libraries
from pyspark.sql.functions import *
from delta.tables import *
import json

# Read Bronze Data with Fallback Strategy

bronze_file_path = f"{bronze_path}/{processing_date}/{table_name}
_{processing_date}.parquet"

try:
bronze_df = spark.read.parquet(bronze_file_path)
bronze_count = bronze_df.count()
print(f"Successfully read {bronze_count} records")
except Exception as e:
# Fallback to wildcard pattern
bronze_wildcard_path = f"{bronze_path}/{processing_date}/
*.parquet"

28 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

try:
bronze_df = spark.read.parquet(bronze_wildcard_path)
bronze_count = bronze_df.count()
print(f"Read {bronze_count} records from wildcard path")
except Exception as e2:
error_result = {"status": "failed", "error": str(e2)}
dbutils.notebook.exit(json.dumps(error_result))

# Add Metadata Columns for Data Lineage

silver_df = bronze_df \
.withColumn("_bronze_loaded_at", current_timestamp()) \
.withColumn("_processing_date", lit(processing_date)) \
.withColumn("_record_source", lit(f"{schema_name}.{table_name}"))
\
.withColumn("_is_deleted", lit(False)) \
.withColumn("_silver_loaded_at", current_timestamp())

# Data Quality: Remove Duplicates

duplicates_removed = 0
if primary_keys and primary_keys[0].strip():
initial_count = silver_df.count()
silver_df = silver_df.dropDuplicates(primary_keys)
final_count = silver_df.count()
duplicates_removed = initial_count - final_count
print(f"Removed {duplicates_removed} duplicates")

# Ensure Silver Directory Exists

try:
dbutils.fs.ls(silver_path)
except:
dbutils.fs.mkdirs(silver_path)

# Write to Silver Layer with Delta Lake

if load_type == "Full":
silver_df.write \
.format("delta") \
.mode("overwrite") \
.option("overwriteSchema", "true") \
.partitionBy("_processing_date") \

29 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

.save(silver_path)
print(f"Full load completed: {silver_df.count()} records")

elif load_type == "Incremental":

if DeltaTable.isDeltaTable(spark, silver_path):
silver_table = DeltaTable.forPath(spark, silver_path)
merge_condition = " AND ".join([f"source.{pk.strip()} =
target.{pk.strip()}" for pk in primary_keys])

silver_table.alias("target").merge(
silver_df.alias("source"),
merge_condition
).whenMatchedUpdateAll() \
.whenNotMatchedInsertAll() \
.execute()
print("Incremental merge completed")
else:
silver_df.write \
.format("delta") \
.mode("overwrite") \
.partitionBy("_processing_date") \
.save(silver_path)
print("Initial delta table created")

# Optimize Table for Performance

try:
spark.sql(f"OPTIMIZE delta.`{silver_path}`")
print("Table optimized")
except Exception as e:
print(f"Optimization skipped: {e}")

# Return Success Status

dbutils.notebook.exit("SUCCESS")

Key Transformation Features:

30 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

• Metadata Enrichment: Adds lineage tracking columns

• Data Quality: Duplicate removal based on primary keys

• Delta Lake Integration: ACID transactions and performance optimization

• Flexible Loading: Supports both full and incremental patterns

• Error Handling: Comprehensive error capture and reporting

Step 4: Configuration and Deployment

Configuration Template

{
"environment": "production",
"azure": {
"subscriptionId": "YOUR_SUBSCRIPTION_ID",
"resourceGroupName": "data-pipeline-prod-rg",
"dataFactoryName": "adf-data-pipeline-prod",
"keyVaultName": "kv-data-pipeline-prod",
"storageAccountName": "datapipelineprodstore",
"databricksWorkspaceUrl": "https://adb-WORKSPACE_ID.azuredatabricks.net"
},
"database": {
"server": "sql-data-pipeline-prod.database.windows.net",

31 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

"database": "DataPipelineControl"
},
"processing": {
"defaultBatchSize": 8,
"maxRetryAttempts": 3,
"processingTimeout": "04:00:00"
}
}

Complete Deployment Process

# 1. Clone repository and configure

git clone https://github.com/yourusername/azure-data-pipeline.git
cd azure-data-pipeline
copy config/config-template.json config/config.json
# Edit config.json with your Azure resource details

# 2. Deploy infrastructure
.\deployment\01-deploy-integration-runtime.ps1 -SubscriptionId
"YOUR_SUB_ID" -ResourceGroupName "YOUR_RG" -DataFactoryName
"YOUR_ADF" -KeyVaultName "YOUR_KV" -StorageAccountName "YOUR_STORAGE"
-DatabricksWorkspaceUrl "YOUR_DATABRICKS_URL"

.\deployment\02-deploy-datasets.ps1 -SubscriptionId "YOUR_SUB_ID" -

ResourceGroupName "YOUR_RG" -DataFactoryName "YOUR_ADF"

.\deployment\03-deploy-pipelines.ps1 -SubscriptionId "YOUR_SUB_ID" -

ResourceGroupName "YOUR_RG" -DataFactoryName "YOUR_ADF"

32 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

# 3. Setup database
# Execute SQL scripts in your Azure SQL Database:
# .\sql\01-create-control-tables.sql
# .\sql\02-create-stored-procedures.sql
# .\sql\03-sample-data-setup.sql

# 4. Configure Databricks
# Upload bronze_to_silver.py to /Shared/bronze_to_silver
# Configure storage mount points
# Update cluster configuration

Step 5: Testing and Validation

Adding Your First Source System

-- 1. Register source system

INSERT INTO ctl.SourceSystems (SourceSystemName, SourceSystemType, ConnectionString)
VALUES ('MY_ERP_SYSTEM', 'SqlServer', 'Server=myserver;Database=erp;...');

-- 2. Configure tables for processing

INSERT INTO ctl.Tables (
SourceSystemId, SchemaName, TableName, LoadType,
PrimaryKeyColumns, BronzePath, SilverPath
) VALUES (
1, 'dbo', 'Customers', 'Incremental', 'CustomerID',
'/mnt/bronze/erp/dbo/Customers',
'/mnt/silver/erp/dbo/Customers'

33 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

);

-- 3. Run the master pipeline

-- This will automatically process your configured tables

Monitoring and Validation

-- Check processing status

SELECT
t.TableName,
ps.Layer,
ps.ProcessingDate,
ps.Status,
ps.RecordsProcessed,
ps.StartTime,
ps.EndTime
FROM ctl.ProcessingStatus ps
JOIN ctl.Tables t ON ps.TableId = t.TableId
WHERE ps.ProcessingDate = '2025-01-15'
ORDER BY ps.StartTime DESC;

-- View Silver layer data

SELECT * FROM delta.`/mnt/silver/erp/dbo/Customers`
WHERE _processing_date = '2025-01-15'
LIMIT 10;

34 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Performance and Optimization

Databricks Cluster Configuration

{
"cluster_name": "data-pipeline-cluster",
"spark_version": "13.3.x-scala2.12",
"node_type_id": "Standard_DS3_v2",
"autoscale": {
"min_workers": 2,
"max_workers": 8
},
"auto_termination_minutes": 60,
"spark_conf": {
"spark.databricks.delta.preview.enabled": "true",
"spark.sql.adaptive.enabled": "true",
"spark.sql.adaptive.coalescePartitions.enabled": "true"
}
}

Performance Optimization Tips

1. Partition Strategy: Partition by processing date for time-based queries

2. Z-Ordering: Use Z-ORDER on frequently queried columns

35 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

3. File Sizes: Target 128MB-1GB file sizes for optimal performance

4. Cluster Sizing: Start with 2–4 workers, scale based on data volume

5. Delta Optimization: Enable auto-optimization for regular maintenance

Real-World Results
After implementing this architecture for several enterprise clients, here are
the typical results:

Performance Metrics
• Cost Reduction: 60–70% reduction compared to traditional ETL tools

• Reliability: 99.9% success rate with automatic retry mechanisms

• Scalability: Linear scaling from GB to TB without code changes

Business Impact
• Time to Market: New data sources onboarded in hours instead of weeks

• Data Quality: 95% reduction in data quality issues

• Operational Efficiency: 80% reduction in manual intervention

36 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

• Cost Savings: $50K+ annual savings on licensing and infrastructure

What’s Next? Extending to Gold Layer

The architecture we’ve built provides a solid foundation for Gold layer
development:

Gold Layer Characteristics

• Business-Specific Models: Customer 360, Product Analytics, Financial
KPIs

• Aggregated Data: Pre-calculated metrics for dashboard performance

• Domain Data Marts: Sales, Marketing, Finance-specific views

• API-Ready: Optimized for consumption by applications and reports

Implementation Approach

# Future Gold layer transformation example

def create_customer_360(silver_customers, silver_orders, silver_interactions):
customer_360 = silver_customers \
.join(silver_orders, "customer_id") \
.join(silver_interactions, "customer_id") \

37 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

.groupBy("customer_id") \
.agg(
sum("order_amount").alias("total_spent"),
count("order_id").alias("total_orders"),
max("last_interaction_date").alias("last_activity")
)
return customer_360

Conclusion
We’ve built a production-ready data pipeline that:

Scales automatically with your data growth

Requires minimal maintenance through metadata-driven configuration
Provides enterprise-grade reliability with comprehensive error handling
Optimizes costs through efficient resource utilization
Ensures data quality with built-in validation and monitoring

Key Takeaways:
1. Medallion Architecture provides a clear separation of concerns and data
quality layers

2. Delta Lake is essential for reliable, performant data processing at scale

3. Metadata-driven approaches dramatically reduce development and

38 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

maintenance overhead

4. Infrastructure as Code enables repeatable, reliable deployments

5. Proper monitoring and error handling are critical for production

systems

Screenshots From ADF

39 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Complete Repository
You can find the complete implementation with all scripts, notebooks, and
documentation at:

40 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

GitHub Repository: Azure Data Pipeline — Medallion Architecture

What You Get:

• Complete SQL scripts for control database setup

• Production-ready Databricks notebooks

• PowerShell deployment scripts (all 3 covered in this article)

• Sample configurations and examples

• Comprehensive documentation

• Integration testing scripts

Ready to transform your organization’s data processing capabilities? Clone

the repository and start building your own medallion architecture pipeline
today!

Next Steps
1. Share the Gold Layer:
I will provide details and implementation for the gold layer.

41 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

2. Create LDW on Synapse Serverless SQL Pool:

I’ll build a Logical Data Warehouse (LDW) on top of the gold layer using
Serverless SQL Pool.

3. Migrate Report Sources:

The next phase will be migrating report data sources from Serverless SQL
Pool to Fabric Shortcuts and SQL Endpoints.

4. Enable Databricks UC for Reporting:

We’ll leverage Databricks Unity Catalog to access the gold layer directly
for reporting purposes.

5. ADF to Fabric Data Factory Migration:

I am also planning to migrate the Azure Data Factory pipelines to Fabric
Data Factory for a fully modern, end-to-end architecture.

Stay tuned for updates!

The goal is to showcase the most effective orchestration strategies while
adopting the latest technologies available in the data engineering ecosystem.

Have questions about implementing this architecture? Found this helpful? Leave a
comment below or connect with me on LinkedIn. I’d love to hear about your data
pipeline challenges and successes!

42 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Azure Databricks Power Bi Etl Data Engineering

Written by Yasar Kocyigit

10 followers · 22 following

Senior Data Engineer | Cloud & Data Enthusiast | Designing the Future of
Scalable Analytics

No responses yet

Sava Matic

43 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Yasar Kocyigit Yasar Kocyigit

How to Connect Paginated Defining Governed Business

Reports/SSRS to Databricks SQL… Metrics with Unity Catalog Metric…
Introduction

5d ago 53 2 4d ago

44 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Yasar Kocyigit

Neon Joins Databricks: Ushering in

the Future of AI-Native Data…

4d ago

See all from Yasar Kocyigit

Recommended from Medium

45 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Yıldıray Kuru Mayurkumar Surani

From 7.6GB CSV to 0.42GB Mastering PySpark: Your

Parquet: Production-Ready Big… Complete Guide to 46 Essential…
The Problem: When “Out of Memory” Collection of PySpark Functions
Strikes…

Jun 2 2 1 Jun 3 33

Aakeef In Towards Dev by Islam Taha

Data Engineering Architectures— Beyond the Dashboard: Building a

Part 2 Robust Data Quality Framework i…
This is a continued part of Data Engineering In today’s data-driven world, organizations
Architectures. (previous part 1 blog) lean heavily on platforms like Microsoft Pow…

Jun 4 9 Jun 1 5

46 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Prateek Jain Thomas F McGeehan V

7 Open Source Diagram-as-Code The Lakehouse Is Dead. Long Live

Tools You Should Try the Lakehouse
A hands-on guide to 7 open-source tools that How complexity became the product
let you draw cloud or application architectur…

Jun 5 1.2K 25 4d ago 145 2

See more recommendations

Help Status About Careers Press Blog Privacy Rules Terms Text to speech

47 of 47 6/14/2025, 1:35 AM

How To Design AWS Data Architectures - by Narjes Karmeni - The Startup - Medium
No ratings yet
How To Design AWS Data Architectures - by Narjes Karmeni - The Startup - Medium
22 pages
The Complete Guide To Setting Up Azure Databricks: From Zero To Production-Ready in 2025
No ratings yet
The Complete Guide To Setting Up Azure Databricks: From Zero To Production-Ready in 2025
58 pages
End To End Project ADF
100% (1)
End To End Project ADF
73 pages
Unity Abhishek-1
No ratings yet
Unity Abhishek-1
6 pages
Azure de QSN and Ans
No ratings yet
Azure de QSN and Ans
16 pages
Systems Analysis and Design 3
No ratings yet
Systems Analysis and Design 3
5 pages
Azure Data Factory Guide & Tutorials
No ratings yet
Azure Data Factory Guide & Tutorials
1,158 pages
Pass Sqlsaturday Melbourne Azure Data Pipelines v0 1 PDF
No ratings yet
Pass Sqlsaturday Melbourne Azure Data Pipelines v0 1 PDF
41 pages
Azure Project Overview ADF+DBX+CICD Updated
No ratings yet
Azure Project Overview ADF+DBX+CICD Updated
9 pages
Azure Data Factory For Beginners
No ratings yet
Azure Data Factory For Beginners
250 pages
Aiesec X Aws Workshop
No ratings yet
Aiesec X Aws Workshop
45 pages
Data Platform Best Practices
No ratings yet
Data Platform Best Practices
27 pages
Data Pipelines From Zero To Solid
No ratings yet
Data Pipelines From Zero To Solid
58 pages
Streaming Data Pipelines Guide
No ratings yet
Streaming Data Pipelines Guide
9 pages
Data Pipelines From Zero To Solid
No ratings yet
Data Pipelines From Zero To Solid
16 pages
Week 4 - Azure-AWSStorage
No ratings yet
Week 4 - Azure-AWSStorage
97 pages
ST Open Source Data Pipelines Oreilly f22568 202003 en PDF
No ratings yet
ST Open Source Data Pipelines Oreilly f22568 202003 en PDF
79 pages
Azure Project Execution Plan ADF+DBX+CICD
No ratings yet
Azure Project Execution Plan ADF+DBX+CICD
5 pages
Azure Modern Data Warehouse Solutions
No ratings yet
Azure Modern Data Warehouse Solutions
92 pages
O Reilly Data Lake Bootcamp Day 11694182865124
No ratings yet
O Reilly Data Lake Bootcamp Day 11694182865124
46 pages
Building Medallion Architectures 1742969743
No ratings yet
Building Medallion Architectures 1742969743
18 pages
ADF Notes
No ratings yet
ADF Notes
1 page
Azure Data Factory - Pratap - Qbex Technologies - 8886230001
No ratings yet
Azure Data Factory - Pratap - Qbex Technologies - 8886230001
4 pages
DataOps AWS Architecture Blueprint
100% (1)
DataOps AWS Architecture Blueprint
11 pages
Intro To Cloud
No ratings yet
Intro To Cloud
20 pages
Azure Application Architecture Guide
100% (2)
Azure Application Architecture Guide
1,420 pages
7 Snowflake Reference Architectures For Application Builders
No ratings yet
7 Snowflake Reference Architectures For Application Builders
13 pages
Azure Data Factory Guide
No ratings yet
Azure Data Factory Guide
43 pages
Introduction To Data Engineering Daniel Beach PDF Version
100% (5)
Introduction To Data Engineering Daniel Beach PDF Version
70 pages
Architecture Design PDF
No ratings yet
Architecture Design PDF
10 pages
Data Factory
100% (2)
Data Factory
26 pages
Modernization of Databases in The Cloud Era: Building Databases That Run Like Legos
No ratings yet
Modernization of Databases in The Cloud Era: Building Databases That Run Like Legos
12 pages
Start To Finish With Azure Data Factory
100% (2)
Start To Finish With Azure Data Factory
30 pages
Azure PDF
No ratings yet
Azure PDF
2,825 pages
7 Snowflake Reference Architectures For Application Builders
No ratings yet
7 Snowflake Reference Architectures For Application Builders
13 pages
I&A Tech Solution Architecture Guidelines
No ratings yet
I&A Tech Solution Architecture Guidelines
321 pages
Azure Data Superstore Pipeline - End-to-End Data Engineering and Visualization Report
No ratings yet
Azure Data Superstore Pipeline - End-to-End Data Engineering and Visualization Report
23 pages
Big Data Analytics
100% (1)
Big Data Analytics
14 pages
Cloud Analytics for Cargo Firms
No ratings yet
Cloud Analytics for Cargo Firms
45 pages
Big Data Arch
No ratings yet
Big Data Arch
2 pages
Cloud Data CHP 1 Et CHP 2
No ratings yet
Cloud Data CHP 1 Et CHP 2
5 pages
OReilly Report Designing A Modern Application Data Stack
No ratings yet
OReilly Report Designing A Modern Application Data Stack
44 pages
Data Pipeline Architecture
No ratings yet
Data Pipeline Architecture
6 pages
Azure Data Factory
100% (4)
Azure Data Factory
16 pages
Real Time Event Processing With Microsoft Azure Stream Analytics
100% (1)
Real Time Event Processing With Microsoft Azure Stream Analytics
31 pages
Big Data
No ratings yet
Big Data
86 pages
Azure Data Factory Microsoft Fabric
No ratings yet
Azure Data Factory Microsoft Fabric
14 pages
Unit 1
No ratings yet
Unit 1
9 pages
Azure Data Platform End2End - 2day
100% (2)
Azure Data Platform End2End - 2day
108 pages
Azure Project
No ratings yet
Azure Project
13 pages
Cloud Computing: Post Graduate Program in
100% (1)
Cloud Computing: Post Graduate Program in
20 pages
Ade Ci - CD
No ratings yet
Ade Ci - CD
15 pages
Documentation Project
No ratings yet
Documentation Project
56 pages
Big Data Architectures
No ratings yet
Big Data Architectures
8 pages
Solution Design Document: ABC Ltd. Group Limited
100% (2)
Solution Design Document: ABC Ltd. Group Limited
21 pages
CS 525 Advanced Distributed Systems Spring 2010: Ravenshaw Management Centre, Cuttack
No ratings yet
CS 525 Advanced Distributed Systems Spring 2010: Ravenshaw Management Centre, Cuttack
27 pages
Az Questions
No ratings yet
Az Questions
11 pages
Exam 70-445 Prep
No ratings yet
Exam 70-445 Prep
56 pages
TraderFeed - A Different Kind of Trading Psychology Workshop
100% (1)
TraderFeed - A Different Kind of Trading Psychology Workshop
3 pages
Real Time Fraud Detection Using Apache Flink - Part 2 - by Yugen - Ai - Yugen - Ai Technology Blog - Medium
No ratings yet
Real Time Fraud Detection Using Apache Flink - Part 2 - by Yugen - Ai - Yugen - Ai Technology Blog - Medium
36 pages
TouristGuideChurches - Greece MATHRA - Version2 - Fin - EN
No ratings yet
TouristGuideChurches - Greece MATHRA - Version2 - Fin - EN
89 pages
How To Build An End-To-End Testing Pipeline With DBT On Databricks - by Databricks SQL SME - DBSQL SME Engineering - Medium
No ratings yet
How To Build An End-To-End Testing Pipeline With DBT On Databricks - by Databricks SQL SME - DBSQL SME Engineering - Medium
27 pages
500 Issues Mac Ios Financial
No ratings yet
500 Issues Mac Ios Financial
12 pages
Building Real-Time Streaming Pipelines With Apache Flink & PyFlink - by Yousef Yousefi - Medium
No ratings yet
Building Real-Time Streaming Pipelines With Apache Flink & PyFlink - by Yousef Yousefi - Medium
15 pages
Bab1460 Sup 0004 Sup - 4
No ratings yet
Bab1460 Sup 0004 Sup - 4
5 pages
Global Primer Series - Credit Default Swaps - AlphaPicks
No ratings yet
Global Primer Series - Credit Default Swaps - AlphaPicks
37 pages
Building A Real-Time E-Commerce Data Pipeline With Kafka, Flink, PostgreSQL, and Elasticsearch - by Shijun Ju - Apr, 2025 - Medium
No ratings yet
Building A Real-Time E-Commerce Data Pipeline With Kafka, Flink, PostgreSQL, and Elasticsearch - by Shijun Ju - Apr, 2025 - Medium
42 pages
Bab1460 Sup 0006 Sup - 6
No ratings yet
Bab1460 Sup 0006 Sup - 6
1 page
StockGPT A GenAI Model For Stock Prediction and Trading
No ratings yet
StockGPT A GenAI Model For Stock Prediction and Trading
26 pages
Why Probability Probably Doesn't Exist (But It Is Useful To Act Like It Does)
No ratings yet
Why Probability Probably Doesn't Exist (But It Is Useful To Act Like It Does)
13 pages
Building An AI Agent For Transportation Applicatio... - Databricks Community - 119459
No ratings yet
Building An AI Agent For Transportation Applicatio... - Databricks Community - 119459
14 pages
Goal-Setting Mastery
No ratings yet
Goal-Setting Mastery
1 page
"Same-Weekday Stock Momentum Study"
No ratings yet
"Same-Weekday Stock Momentum Study"
39 pages
MACD Cheat Sheet
100% (1)
MACD Cheat Sheet
1 page
Introduction To Kubernetes
100% (6)
Introduction To Kubernetes
182 pages
ST4300USB3 - QSG - Rev2 Usb Hub
No ratings yet
ST4300USB3 - QSG - Rev2 Usb Hub
2 pages
What's The Difference - RANK Vs - DENSE - RANK Vs - ROW - NUMBER - by Lori Lu - Medium
No ratings yet
What's The Difference - RANK Vs - DENSE - RANK Vs - ROW - NUMBER - by Lori Lu - Medium
18 pages
USB Hub for IT Professionals
No ratings yet
USB Hub for IT Professionals
4 pages
Apple Acsp11 6 5 1 Introduction To Macos Recovery
No ratings yet
Apple Acsp11 6 5 1 Introduction To Macos Recovery
2 pages
How To Get Into FAANG
67% (3)
How To Get Into FAANG
26 pages
Cleanblend Blender Classic Manual PDF
No ratings yet
Cleanblend Blender Classic Manual PDF
30 pages
Comparing Fuel Costs by Bartok
No ratings yet
Comparing Fuel Costs by Bartok
1 page
Certain To Win: Or, Any Position Other Than First Is A Tie For Last
No ratings yet
Certain To Win: Or, Any Position Other Than First Is A Tie For Last
48 pages
Quiz 1 Automation Anywhere
85% (13)
Quiz 1 Automation Anywhere
49 pages
Global Cache: Installation and Operations Guide
No ratings yet
Global Cache: Installation and Operations Guide
26 pages
JAXB
No ratings yet
JAXB
39 pages
Error 196 Emm Eeror
No ratings yet
Error 196 Emm Eeror
6 pages
ch4 Notes (Security Part II-Auditing Database Systems)
No ratings yet
ch4 Notes (Security Part II-Auditing Database Systems)
3 pages
Vali Drive
No ratings yet
Vali Drive
2 pages
FPGA UG 02039 1 1 LMMI LINTR User Guide
No ratings yet
FPGA UG 02039 1 1 LMMI LINTR User Guide
23 pages
Neogenesis V21-Readme
No ratings yet
Neogenesis V21-Readme
23 pages
C Faq2
No ratings yet
C Faq2
65 pages
Encapsulation in OOP
No ratings yet
Encapsulation in OOP
9 pages
Linux Disk Management & Formatting
No ratings yet
Linux Disk Management & Formatting
3 pages
Extra Notes DFC3033 - CHAPTER 1
No ratings yet
Extra Notes DFC3033 - CHAPTER 1
17 pages
Linux Kernel Module Management
No ratings yet
Linux Kernel Module Management
7 pages
Salv Tree To Excel
No ratings yet
Salv Tree To Excel
11 pages
Madhav Singal-Ism File Complete
No ratings yet
Madhav Singal-Ism File Complete
117 pages
SQL Data Definition: Database Systems Lecture 5 Natasha Alechina
No ratings yet
SQL Data Definition: Database Systems Lecture 5 Natasha Alechina
26 pages
Lacie 2big Nas Os 3.1 Lacie 5big Nas Pro User Manual
No ratings yet
Lacie 2big Nas Os 3.1 Lacie 5big Nas Pro User Manual
394 pages
NetApp AFF A-Series
No ratings yet
NetApp AFF A-Series
6 pages
Intercompany SLA Setup Guide
No ratings yet
Intercompany SLA Setup Guide
51 pages
Database Normalization
No ratings yet
Database Normalization
12 pages
Database Entities & System Scope
No ratings yet
Database Entities & System Scope
7 pages
CHAPTER 4 ER Diagram Lecture Note
No ratings yet
CHAPTER 4 ER Diagram Lecture Note
59 pages
Integrating Active Directory With PHP
No ratings yet
Integrating Active Directory With PHP
9 pages
Programming Pearls
No ratings yet
Programming Pearls
4 pages
1.A. Sonet Frame 1.B. Atm Cell 1.C. Ip Packet 1.D. Fiber Channel Frame
No ratings yet
1.A. Sonet Frame 1.B. Atm Cell 1.C. Ip Packet 1.D. Fiber Channel Frame
17 pages
The Basic Syntax of The SELECT Statement
No ratings yet
The Basic Syntax of The SELECT Statement
19 pages
How To Replicate Condition Records For Pricing To External System From SAP
No ratings yet
How To Replicate Condition Records For Pricing To External System From SAP
12 pages
Dexing NDS3975
100% (1)
Dexing NDS3975
5 pages
4Q08 Databook ComputingMemory
No ratings yet
4Q08 Databook ComputingMemory
25 pages
LC-3 Assembly TRAPs & Subroutines Lab
0% (1)
LC-3 Assembly TRAPs & Subroutines Lab
4 pages