0% found this document useful (0 votes)
9 views47 pages

Building A Production-Ready Data Pipeline With Azure - Complete Guide To Medallion Architecture - by Yasar Kocyigit - Jun, 2025 - Medium

This document provides a comprehensive guide on building a production-ready data pipeline using Azure's Medallion Architecture. It outlines the architecture's three layers—Bronze, Silver, and Gold—each serving different data quality and processing needs, and details the Azure services involved, including Azure Data Factory, Azure Databricks, and Azure Data Lake. The tutorial includes steps for setting up infrastructure, deploying linked services, and managing metadata for efficient data processing and transformation.

Uploaded by

sasa332138
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views47 pages

Building A Production-Ready Data Pipeline With Azure - Complete Guide To Medallion Architecture - by Yasar Kocyigit - Jun, 2025 - Medium

This document provides a comprehensive guide on building a production-ready data pipeline using Azure's Medallion Architecture. It outlines the architecture's three layers—Bronze, Silver, and Gold—each serving different data quality and processing needs, and details the Azure services involved, including Azure Data Factory, Azure Databricks, and Azure Data Lake. The tutorial includes steps for setting up infrastructure, deploying linked services, and managing metadata for efficient data processing and transformation.

Uploaded by

sasa332138
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Building a Production-Ready Data


Pipeline with Azure: Complete
Guide to Medallion Architecture
Yasar Kocyigit 15 min read · 11 hours ago

Introduction: The Modern Data Challenge


In today’s data-driven world, organizations are drowning in data but starving
for insights. Traditional ETL processes are too rigid, too slow, and too
expensive to scale. What if I told you there’s a better way to build data
pipelines that are:

• Metadata-driven (no code changes for new tables)

1 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

• Scalable (handles growing data volumes automatically)

• Reliable (comprehensive error handling and monitoring)

• Cost-effective (optimized resource usage)

Today, I’ll walk you through building a complete Medallion Architecture data
pipeline using Azure services. By the end of this article, you’ll have a
production-ready solution that can process terabytes of data with minimal
maintenance.

What is Medallion Architecture?


The Medallion Architecture is a data design pattern that organizes data in
layers of increasing quality and refinement:

Bronze Layer (Raw Data)


• Purpose: Exact copy of source system data

• Format: Parquet files for efficient storage

• Characteristics: Preserves original data types, includes duplicates

• Use Case: Data archival, compliance, re-processing scenarios

2 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Silver Layer (Cleaned & Enriched)


• Purpose: Business-ready data with quality controls

• Format: Delta Lake for ACID transactions

• Characteristics: Deduplicated, validated, enriched with metadata

• Use Case: Analytics, reporting, machine learning

Gold Layer (Analytics Ready)


• Purpose: Highly refined, aggregated data

• Format: Delta Lake with business models

• Characteristics: KPIs, dimensions, domain-specific views

• Use Case: Executive dashboards, regulatory reporting

In this tutorial, we’ll implement Bronze → Silver transformation, which covers


80% of most data pipeline requirements.

Architecture Overview

3 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Here’s what we’re building:

• Orchestration:
Azure Data Factory
Pipeline management and data movement

4 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

• Transformation:
Azure Databricks
Data processing with Apache Spark

• Storage:
Azure Data Lake Gen2
Scalable data storage

• Metadata:
Azure SQL Database
Control tables and processing logs

• Security:
Azure Key Vault
Secrets management

Why This Stack?

Azure Data Factory


• Metadata-driven: Configure once, process hundreds of tables

• Visual interface: Easy to understand and maintain

• Built-in connectors: Support for 90+ data sources

5 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

• Cost-effective: Pay only for pipeline runs

Azure Databricks
• Apache Spark: Distributed processing for large datasets

• Delta Lake: ACID transactions, schema evolution, time travel

• Auto-scaling: Clusters scale based on workload

• Optimization: Automatic performance tuning

Delta Lake Deep Dive


Delta Lake is the secret sauce that makes our Silver layer powerful:

ACID Transactions

# Concurrent writes don't corrupt data


df.write.format("delta").mode("append").save("/path/to/table")

Time Travel

6 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

-- View data as it existed yesterday


SELECT * FROM delta.`/mnt/silver/customers/` TIMESTAMP AS OF '2025-01-14'

Schema Evolution

# Add new columns without breaking existing queries


df.withColumn("new_column", lit("default_value")) \
.write.format("delta").option("mergeSchema", "true").save("/path")

Upsert Operations

# Merge new data with existing records


silver_table.alias("target").merge(
new_data.alias("source"),
"target.customer_id = source.customer_id"
).whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()

7 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Step 1: Setting Up the Foundation

Prerequisites
• Azure subscription with appropriate permissions

• PowerShell 7+ with Az modules

• SQL Server Management Studio

Repository Structure

azure-data-pipeline/
├── sql/ # Database setup scripts
│ ├── 01-create-control-tables.sql
│ ├── 02-create-stored-procedures.sql
│ └── 03-sample-data-setup.sql
├── databricks/
│ ├── notebooks/
│ │ └── bronze_to_silver.py # Main transformation logic
│ └── cluster-config.json
├── deployment/ # Infrastructure as Code
│ ├── 01-deploy-integration-runtime.ps1
│ ├── 02-deploy-datasets.ps1
│ └── 03-deploy-pipelines.ps1
└── config/

8 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

└── config-template.json

Control Database Schema


The heart of our metadata-driven approach is the control database:

-- Source Systems Configuration


CREATE TABLE ctl.SourceSystems (
SourceSystemId INT IDENTITY(1,1) PRIMARY KEY,
SourceSystemName NVARCHAR(100) NOT NULL UNIQUE,
SourceSystemType NVARCHAR(50) NOT NULL DEFAULT 'SqlServer',
ConnectionString NVARCHAR(500) NOT NULL,
IsActive BIT DEFAULT 1
);

-- Tables Configuration
CREATE TABLE ctl.Tables (
TableId INT IDENTITY(1,1) PRIMARY KEY,
SourceSystemId INT NOT NULL,
SchemaName NVARCHAR(50) NOT NULL,
TableName NVARCHAR(100) NOT NULL,
LoadType NVARCHAR(20) CHECK (LoadType IN ('Full',
'Incremental')),
PrimaryKeyColumns NVARCHAR(500),
BronzePath NVARCHAR(500) NOT NULL,
SilverPath NVARCHAR(500) NOT NULL,
IsActive BIT DEFAULT 1,
Priority INT DEFAULT 100

9 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

);

-- Processing Status Tracking


CREATE TABLE ctl.ProcessingStatus (
StatusId BIGINT IDENTITY(1,1) PRIMARY KEY,
TableId INT NOT NULL,
Layer NVARCHAR(20) CHECK (Layer IN ('Bronze', 'Silver', 'Gold')),
ProcessingDate DATE NOT NULL,
Status NVARCHAR(20) CHECK (Status IN ('Running', 'Success',
'Failed')),
StartTime DATETIME,
EndTime DATETIME,
RecordsProcessed BIGINT,
ErrorMessage NVARCHAR(MAX)
);

This schema enables us to:

• Configure new tables without code changes

• Track processing status in real-time

• Handle errors with detailed logging

• Support multiple source systems with different connection types

Step 2: Infrastructure Deployment


Now comes the exciting part — deploying our infrastructure using

10 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

PowerShell scripts.

Script 1: Integration Runtime and Linked Services

# deployment/01-deploy-integration-runtime.ps1

param(
[Parameter(Mandatory=$true)]
[string]$SubscriptionId,

[Parameter(Mandatory=$true)]
[string]$ResourceGroupName,

[Parameter(Mandatory=$true)]
[string]$DataFactoryName,

[Parameter(Mandatory=$true)]
[string]$KeyVaultName,

[Parameter(Mandatory=$true)]
[string]$StorageAccountName,

[Parameter(Mandatory=$true)]
[string]$DatabricksWorkspaceUrl
)

Set-AzContext -SubscriptionId $SubscriptionId

Write-Host "Deploying Integration Runtime and Linked Services" -


ForegroundColor Green

11 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

# Create Self-Hosted Integration Runtime


try {
$existingIR = Get-AzDataFactoryV2IntegrationRuntime -
ResourceGroupName $ResourceGroupName -DataFactoryName
$DataFactoryName -Name "SelfHostedIR" -ErrorAction SilentlyContinue

if (-not $existingIR) {
Set-AzDataFactoryV2IntegrationRuntime -ResourceGroupName
$ResourceGroupName -DataFactoryName $DataFactoryName -Name
"SelfHostedIR" -Type SelfHosted -Force
Write-Host "Integration Runtime created successfully" -
ForegroundColor Green

$irKeys = Get-AzDataFactoryV2IntegrationRuntimeKey -
ResourceGroupName $ResourceGroupName -DataFactoryName
$DataFactoryName -Name "SelfHostedIR"
Write-Host "Authentication Keys:" -ForegroundColor Cyan
Write-Host "Key1: $($irKeys.AuthKey1)"
Write-Host "Key2: $($irKeys.AuthKey2)"
}
} catch {
Write-Host "Error creating Integration Runtime:
$($_.Exception.Message)" -ForegroundColor Red
exit 1
}

# Key Vault Linked Service


$lsKeyVault = @{
name = "LS_KeyVault"
properties = @{
type = "AzureKeyVault"
typeProperties = @{
baseUrl = "https://$KeyVaultName.vault.azure.net/"
}
}
} | ConvertTo-Json -Depth 10

12 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

# Control Database Linked Service


$lsControlDB = @{
name = "LS_SQL_Control"
properties = @{
type = "AzureSqlDatabase"
typeProperties = @{
connectionString = "Integrated
Security=False;Encrypt=True;Data
Source=YOUR_SQL_SERVER.database.windows.net;Initial
Catalog=YOUR_CONTROL_DATABASE;User ID=YOUR_USERNAME"
password = @{
type = "AzureKeyVaultSecret"
store = @{
referenceName = "LS_KeyVault"
type = "LinkedServiceReference"
}
secretName = "sql-admin-password"
}
}
}
} | ConvertTo-Json -Depth 10

# Data Lake Linked Service


$lsDataLake = @{
name = "LS_ADLS_DeltaLake"
properties = @{
type = "AzureBlobFS"
typeProperties = @{
url = "https://$StorageAccountName.dfs.core.windows.net"
accountKey = @{
type = "AzureKeyVaultSecret"
store = @{
referenceName = "LS_KeyVault"
type = "LinkedServiceReference"
}
secretName = "storage-account-key"
}

13 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

}
}
} | ConvertTo-Json -Depth 10

# Databricks Linked Service


$lsDatabricks = @{
name = "LS_Databricks"
properties = @{
type = "AzureDatabricks"
typeProperties = @{
domain = $DatabricksWorkspaceUrl
accessToken = @{
type = "AzureKeyVaultSecret"
store = @{
referenceName = "LS_KeyVault"
type = "LinkedServiceReference"
}
secretName = "databricks-token"
}
existingClusterId = "YOUR_CLUSTER_ID"
}
}
} | ConvertTo-Json -Depth 10

# Deploy Linked Services


$linkedServices = @{
"LS_KeyVault" = $lsKeyVault
"LS_SQL_Control" = $lsControlDB
"LS_ADLS_DeltaLake" = $lsDataLake
"LS_Databricks" = $lsDatabricks
}

foreach ($service in $linkedServices.GetEnumerator()) {


Write-Host "Deploying $($service.Key)..." -ForegroundColor Yellow
$service.Value | Out-File "$($service.Key).json" -Encoding UTF8
Set-AzDataFactoryV2LinkedService -ResourceGroupName
$ResourceGroupName -DataFactoryName $DataFactoryName -Name
$service.Key -DefinitionFile "$($service.Key).json" -Force
Remove-Item "$($service.Key).json" -Force

14 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Write-Host "Integration Runtime and Linked Services deployment


completed" -ForegroundColor Green

What this script does:

• Creates a Self-Hosted Integration Runtime for on-premises connectivity

• Deploys Key Vault linked service for secure credential management

• Sets up Control Database linked service for metadata operations

• Configures Data Lake linked service for Bronze/Silver storage

• Creates Databricks linked service for data transformation

Script 2: Dataset Deployment

# deployment/02-deploy-datasets.ps1

param(
[Parameter(Mandatory=$true)]
[string]$SubscriptionId,

15 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

[Parameter(Mandatory=$true)]
[string]$ResourceGroupName,

[Parameter(Mandatory=$true)]
[string]$DataFactoryName
)

Set-AzContext -SubscriptionId $SubscriptionId

Write-Host "=== DATASETS DEPLOYMENT ===" -ForegroundColor Cyan

# Bronze Parquet Dataset - Parameterized for dynamic file paths


$dsBronzeParquet = @{
name = "DS_Bronze_Parquet"
properties = @{
linkedServiceName = @{
referenceName = "LS_ADLS_DeltaLake"
type = "LinkedServiceReference"
}
parameters = @{
FilePath = @{ type = "string" }
FileName = @{ type = "string" }
}
type = "Parquet"
typeProperties = @{
location = @{
type = "AzureBlobFSLocation"
fileName = @{
value = "@dataset().FileName"
type = "Expression"
}
folderPath = @{
value = "@dataset().FilePath"
type = "Expression"
}
fileSystem = "datalake"
}
compressionCodec = "snappy"

16 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

}
}
} | ConvertTo-Json -Depth 15

# Control Database Dataset


$dsControlDatabase = @{
name = "DS_Control_Database"
properties = @{
linkedServiceName = @{
referenceName = "LS_SQL_Control"
type = "LinkedServiceReference"
}
type = "AzureSqlTable"
typeProperties = @{
schema = "ctl"
}
}
} | ConvertTo-Json -Depth 15

# Generic SQL Dataset - Parameterized for any source table


$dsSqlGeneric = @{
name = "DS_SQL_Generic"
properties = @{
linkedServiceName = @{
referenceName = "LS_SQL_SourceSystem"
type = "LinkedServiceReference"
}
parameters = @{
TableName = @{ type = "string" }
SchemaName = @{ type = "string"; defaultValue = "dbo" }
}
type = "SqlServerTable"
typeProperties = @{
schema = @{
value = "@dataset().SchemaName"
type = "Expression"
}
table = @{

17 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

value = "@dataset().TableName"
type = "Expression"
}
}
}
} | ConvertTo-Json -Depth 15

# Deploy All Datasets


$datasets = @{
"DS_Bronze_Parquet" = $dsBronzeParquet
"DS_Control_Database" = $dsControlDatabase
"DS_SQL_Generic" = $dsSqlGeneric
}

foreach ($dataset in $datasets.GetEnumerator()) {


Write-Host "Deploying $($dataset.Key)..." -ForegroundColor Yellow
$dataset.Value | Out-File "$($dataset.Key).json" -Encoding UTF8
Set-AzDataFactoryV2Dataset -ResourceGroupName $ResourceGroupName
-DataFactoryName $DataFactoryName -Name $dataset.Key -DefinitionFile
"$($dataset.Key).json" -Force
Remove-Item "$($dataset.Key).json" -Force
Write-Host "✓ $($dataset.Key) deployed successfully" -
ForegroundColor Green
}

Write-Host "All datasets deployed successfully!" -ForegroundColor


Green

Key Dataset Features:

• Parameterized datasets for dynamic file paths

• Generic SQL dataset that works with any source table

18 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

• Control database dataset for metadata operations

• Compression optimization with Snappy codec

Script 3: Complete Pipeline Deployment

# deployment/03-deploy-pipelines.ps1

param(
[Parameter(Mandatory=$true)]
[string]$SubscriptionId,

[Parameter(Mandatory=$true)]
[string]$ResourceGroupName,

[Parameter(Mandatory=$true)]
[string]$DataFactoryName
)

Set-AzContext -SubscriptionId $SubscriptionId

Write-Host "Deploying Complete Pipeline Suite" -ForegroundColor Green

# Bronze Ingestion Pipeline


$bronzeIngestionPipeline = @{
name = "PL_Bronze_Ingestion"
properties = @{
activities = @(
@{
name = "Build Source Query"

19 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

type = "SetVariable"
typeProperties = @{
variableName = "SourceQuery"
value = @{
value =
"@{if(equals(pipeline().parameters.TableConfig.LoadType,
'Incremental'), concat('SELECT * FROM ',
pipeline().parameters.TableConfig.SchemaName, '.',
pipeline().parameters.TableConfig.TableName, ' WHERE ',
pipeline().parameters.TableConfig.IncrementalColumn, ' > ''',
if(empty(pipeline().parameters.TableConfig.WatermarkValue),
'1900-01-01', pipeline().parameters.TableConfig.WatermarkValue),
''''), concat('SELECT * FROM ',
pipeline().parameters.TableConfig.SchemaName, '.',
pipeline().parameters.TableConfig.TableName))}"
type = "Expression"
}
}
}
@{
name = "Copy to Bronze"
type = "Copy"
dependsOn = @(@{ activity = "Build Source Query";
dependencyConditions = @("Succeeded") })
typeProperties = @{
source = @{
type = "SqlServerSource"
sqlReaderQuery = @{ value =
"@variables('SourceQuery')"; type = "Expression" }
}
sink = @{
type = "ParquetSink"
storeSettings = @{ type =
"AzureBlobFSWriteSettings" }
}
}
inputs = @(@{
referenceName = "DS_SQL_Generic"

20 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

type = "DatasetReference"
parameters = @{
SchemaName =
"@pipeline().parameters.TableConfig.SchemaName"
TableName =
"@pipeline().parameters.TableConfig.TableName"
}
})
outputs = @(@{
referenceName = "DS_Bronze_Parquet"
type = "DatasetReference"
parameters = @{
FilePath = "@{concat('bronze/',
toLower(pipeline().parameters.TableConfig.SourceSystemName), '/',
toLower(pipeline().parameters.TableConfig.SchemaName), '/',
pipeline().parameters.TableConfig.TableName, '/',
pipeline().parameters.ProcessingDate)}"
FileName =
"@{concat(pipeline().parameters.TableConfig.TableName, '_',
pipeline().parameters.ProcessingDate, '.parquet')}"
}
})
}
)
parameters = @{
TableConfig = @{ type = "object" }
ProcessingDate = @{ type = "string" }
}
variables = @{
SourceQuery = @{ type = "String" }
}
}
} | ConvertTo-Json -Depth 20

# Silver Processing Pipeline


$silverProcessingPipeline = @{
name = "PL_Silver_Processing"
properties = @{

21 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

activities = @(
@{
name = "Get Tables for Silver"
type = "Lookup"
typeProperties = @{
source = @{
type = "AzureSqlSource"
sqlReaderQuery = @{
value = "SELECT DISTINCT t.TableId,
t.SchemaName, t.TableName, t.PrimaryKeyColumns, t.LoadType, CONCAT('/
mnt/bronze/', LOWER(ss.SourceSystemName), '/', LOWER(t.SchemaName),
'/', t.TableName) as BronzePath, CONCAT('/mnt/silver/',
LOWER(ss.SourceSystemName), '/', LOWER(t.SchemaName), '/',
t.TableName) as SilverPath FROM ctl.Tables t INNER JOIN
ctl.SourceSystems ss ON t.SourceSystemId = ss.SourceSystemId INNER
JOIN ctl.ProcessingStatus ps ON t.TableId = ps.TableId WHERE
ps.ProcessingDate = '@{pipeline().parameters.ProcessingDate}' AND
ps.Layer = 'Bronze' AND ps.Status = 'Success' AND t.IsActive = 1"
}
}
dataset = @{ referenceName =
"DS_Control_Database"; type = "DatasetReference" }
firstRowOnly = $false
}
}
@{
name = "ForEach Silver Table"
type = "ForEach"
dependsOn = @(@{ activity = "Get Tables for Silver";
dependencyConditions = @("Succeeded") })
typeProperties = @{
items = "@activity('Get Tables for
Silver').output.value"
isSequential = $false
batchCount = 4
activities = @(
@{
name = "Process Bronze to Silver"

22 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

type = "DatabricksNotebook"
typeProperties = @{
notebookPath = "/Shared/
bronze_to_silver"
baseParameters = @{
table_name =
"@{item().TableName}"
silver_path =
"@{item().SilverPath}"
load_type = "@{item().LoadType}"
schema_name =
"@{item().SchemaName}"
processing_date =
"@pipeline().parameters.ProcessingDate"
bronze_path =
"@{item().BronzePath}"
table_id = "@{item().TableId}"
primary_keys =
"@{item().PrimaryKeyColumns}"
}
}
linkedServiceName = @{ referenceName =
"LS_Databricks" }
}
)
}
}
)
parameters = @{
ProcessingDate = @{ type = "string" }
}
}
} | ConvertTo-Json -Depth 20

# Master Orchestrator Pipeline


$masterPipeline = @{
name = "PL_Master_Orchestrator"
properties = @{

23 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

activities = @(
@{
name = "Set Processing Date"
type = "SetVariable"
typeProperties = @{
variableName = "ProcessingDate"
value =
"@{if(equals(pipeline().parameters.ProcessingDate, ''),
formatDateTime(utcnow(), 'yyyy-MM-dd'),
pipeline().parameters.ProcessingDate)}"
}
}
@{
name = "Get Active Tables"
type = "Lookup"
dependsOn = @(@{ activity = "Set Processing Date";
dependencyConditions = @("Succeeded") })
typeProperties = @{
source = @{
type = "AzureSqlSource"
sqlReaderQuery = "SELECT t.*,
ss.ConnectionString, ss.SourceSystemName, w.WatermarkValue FROM
ctl.Tables t INNER JOIN ctl.SourceSystems ss ON t.SourceSystemId =
ss.SourceSystemId LEFT JOIN ctl.Watermarks w ON t.TableId = w.TableId
WHERE t.IsActive = 1 AND ss.IsActive = 1 ORDER BY t.Priority,
t.TableId"
}
dataset = @{ referenceName =
"DS_Control_Database"; type = "DatasetReference" }
firstRowOnly = $false
}
}
@{
name = "ForEach Table"
type = "ForEach"
dependsOn = @(@{ activity = "Get Active Tables";
dependencyConditions = @("Succeeded") })
typeProperties = @{

24 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

items = "@activity('Get Active


Tables').output.value"
isSequential = $false
batchCount = 5
activities = @(
@{
name = "Execute Bronze Ingestion"
type = "ExecutePipeline"
typeProperties = @{
pipeline = @{ referenceName =
"PL_Bronze_Ingestion" }
waitOnCompletion = $true
parameters = @{
TableConfig = "@item()"
ProcessingDate =
"@variables('ProcessingDate')"
}
}
}
)
}
}
@{
name = "Execute Silver Processing"
type = "ExecutePipeline"
dependsOn = @(@{ activity = "ForEach Table";
dependencyConditions = @("Succeeded") })
typeProperties = @{
pipeline = @{ referenceName =
"PL_Silver_Processing" }
waitOnCompletion = $true
parameters = @{
ProcessingDate =
"@variables('ProcessingDate')"
}
}
}
)

25 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

parameters = @{
ProcessingDate = @{ type = "string"; defaultValue = "" }
SourceSystemName = @{ type = "string"; defaultValue = ""
}
TableName = @{ type = "string"; defaultValue = "" }
}
variables = @{
ProcessingDate = @{ type = "String"; defaultValue = "" }
}
}
} | ConvertTo-Json -Depth 20

# Deploy Pipelines in Dependency Order


Write-Host "1. Deploying Bronze Ingestion Pipeline..." -
ForegroundColor Yellow
$bronzeIngestionPipeline | Out-File "PL_Bronze_Ingestion.json" -
Encoding UTF8
Set-AzDataFactoryV2Pipeline -ResourceGroupName $ResourceGroupName -
DataFactoryName $DataFactoryName -Name "PL_Bronze_Ingestion" -
DefinitionFile "PL_Bronze_Ingestion.json" -Force
Remove-Item "PL_Bronze_Ingestion.json" -Force

Write-Host "2. Deploying Silver Processing Pipeline..." -


ForegroundColor Yellow
$silverProcessingPipeline | Out-File "PL_Silver_Processing.json" -
Encoding UTF8
Set-AzDataFactoryV2Pipeline -ResourceGroupName $ResourceGroupName -
DataFactoryName $DataFactoryName -Name "PL_Silver_Processing" -
DefinitionFile "PL_Silver_Processing.json" -Force
Remove-Item "PL_Silver_Processing.json" -Force

Write-Host "3. Deploying Master Orchestrator Pipeline..." -


ForegroundColor Yellow
$masterPipeline | Out-File "PL_Master_Orchestrator.json" -Encoding
UTF8
Set-AzDataFactoryV2Pipeline -ResourceGroupName $ResourceGroupName -
DataFactoryName $DataFactoryName -Name "PL_Master_Orchestrator" -
DefinitionFile "PL_Master_Orchestrator.json" -Force
Remove-Item "PL_Master_Orchestrator.json" -Force

26 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Write-Host "All pipelines deployed successfully!" -ForegroundColor


Green

Pipeline Architecture:

1. Bronze Ingestion Pipeline: Extracts data from source systems to Bronze


layer

2. Silver Processing Pipeline: Transforms Bronze data to Silver with quality


controls

3. Master Orchestrator Pipeline: Coordinates the entire process with error


handling

Step 3: Databricks Transformation Logic


The heart of our Silver layer processing happens in Databricks:

# databricks/notebooks/bronze_to_silver.py

# Parameter Configuration
dbutils.widgets.text("table_id", "")

27 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

dbutils.widgets.text("schema_name", "")
dbutils.widgets.text("table_name", "")
dbutils.widgets.text("bronze_path", "")
dbutils.widgets.text("silver_path", "")
dbutils.widgets.text("processing_date", "")
dbutils.widgets.text("primary_keys", "")
dbutils.widgets.text("load_type", "Full")

# Extract Parameters
table_id = int(dbutils.widgets.get("table_id"))
schema_name = dbutils.widgets.get("schema_name")
table_name = dbutils.widgets.get("table_name")
bronze_path = dbutils.widgets.get("bronze_path")
silver_path = dbutils.widgets.get("silver_path")
processing_date = dbutils.widgets.get("processing_date")
primary_keys = dbutils.widgets.get("primary_keys").split(",") if
dbutils.widgets.get("primary_keys") else []
load_type = dbutils.widgets.get("load_type")

print(f"Processing: {schema_name}.{table_name}")
print(f"Load Type: {load_type}")

# Import Libraries
from pyspark.sql.functions import *
from delta.tables import *
import json

# Read Bronze Data with Fallback Strategy


bronze_file_path = f"{bronze_path}/{processing_date}/{table_name}
_{processing_date}.parquet"

try:
bronze_df = spark.read.parquet(bronze_file_path)
bronze_count = bronze_df.count()
print(f"Successfully read {bronze_count} records")
except Exception as e:
# Fallback to wildcard pattern
bronze_wildcard_path = f"{bronze_path}/{processing_date}/
*.parquet"

28 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

try:
bronze_df = spark.read.parquet(bronze_wildcard_path)
bronze_count = bronze_df.count()
print(f"Read {bronze_count} records from wildcard path")
except Exception as e2:
error_result = {"status": "failed", "error": str(e2)}
dbutils.notebook.exit(json.dumps(error_result))

# Add Metadata Columns for Data Lineage


silver_df = bronze_df \
.withColumn("_bronze_loaded_at", current_timestamp()) \
.withColumn("_processing_date", lit(processing_date)) \
.withColumn("_record_source", lit(f"{schema_name}.{table_name}"))
\
.withColumn("_is_deleted", lit(False)) \
.withColumn("_silver_loaded_at", current_timestamp())

# Data Quality: Remove Duplicates


duplicates_removed = 0
if primary_keys and primary_keys[0].strip():
initial_count = silver_df.count()
silver_df = silver_df.dropDuplicates(primary_keys)
final_count = silver_df.count()
duplicates_removed = initial_count - final_count
print(f"Removed {duplicates_removed} duplicates")

# Ensure Silver Directory Exists


try:
dbutils.fs.ls(silver_path)
except:
dbutils.fs.mkdirs(silver_path)

# Write to Silver Layer with Delta Lake


if load_type == "Full":
silver_df.write \
.format("delta") \
.mode("overwrite") \
.option("overwriteSchema", "true") \
.partitionBy("_processing_date") \

29 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

.save(silver_path)
print(f"Full load completed: {silver_df.count()} records")

elif load_type == "Incremental":


if DeltaTable.isDeltaTable(spark, silver_path):
silver_table = DeltaTable.forPath(spark, silver_path)
merge_condition = " AND ".join([f"source.{pk.strip()} =
target.{pk.strip()}" for pk in primary_keys])

silver_table.alias("target").merge(
silver_df.alias("source"),
merge_condition
).whenMatchedUpdateAll() \
.whenNotMatchedInsertAll() \
.execute()
print("Incremental merge completed")
else:
silver_df.write \
.format("delta") \
.mode("overwrite") \
.partitionBy("_processing_date") \
.save(silver_path)
print("Initial delta table created")

# Optimize Table for Performance


try:
spark.sql(f"OPTIMIZE delta.`{silver_path}`")
print("Table optimized")
except Exception as e:
print(f"Optimization skipped: {e}")

# Return Success Status


dbutils.notebook.exit("SUCCESS")

Key Transformation Features:

30 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

• Metadata Enrichment: Adds lineage tracking columns

• Data Quality: Duplicate removal based on primary keys

• Delta Lake Integration: ACID transactions and performance optimization

• Flexible Loading: Supports both full and incremental patterns

• Error Handling: Comprehensive error capture and reporting

Step 4: Configuration and Deployment

Configuration Template

{
"environment": "production",
"azure": {
"subscriptionId": "YOUR_SUBSCRIPTION_ID",
"resourceGroupName": "data-pipeline-prod-rg",
"dataFactoryName": "adf-data-pipeline-prod",
"keyVaultName": "kv-data-pipeline-prod",
"storageAccountName": "datapipelineprodstore",
"databricksWorkspaceUrl": "https://adb-WORKSPACE_ID.azuredatabricks.net"
},
"database": {
"server": "sql-data-pipeline-prod.database.windows.net",

31 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

"database": "DataPipelineControl"
},
"processing": {
"defaultBatchSize": 8,
"maxRetryAttempts": 3,
"processingTimeout": "04:00:00"
}
}

Complete Deployment Process

# 1. Clone repository and configure


git clone https://github.com/yourusername/azure-data-pipeline.git
cd azure-data-pipeline
copy config/config-template.json config/config.json
# Edit config.json with your Azure resource details

# 2. Deploy infrastructure
.\deployment\01-deploy-integration-runtime.ps1 -SubscriptionId
"YOUR_SUB_ID" -ResourceGroupName "YOUR_RG" -DataFactoryName
"YOUR_ADF" -KeyVaultName "YOUR_KV" -StorageAccountName "YOUR_STORAGE"
-DatabricksWorkspaceUrl "YOUR_DATABRICKS_URL"

.\deployment\02-deploy-datasets.ps1 -SubscriptionId "YOUR_SUB_ID" -


ResourceGroupName "YOUR_RG" -DataFactoryName "YOUR_ADF"

.\deployment\03-deploy-pipelines.ps1 -SubscriptionId "YOUR_SUB_ID" -


ResourceGroupName "YOUR_RG" -DataFactoryName "YOUR_ADF"

32 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

# 3. Setup database
# Execute SQL scripts in your Azure SQL Database:
# .\sql\01-create-control-tables.sql
# .\sql\02-create-stored-procedures.sql
# .\sql\03-sample-data-setup.sql

# 4. Configure Databricks
# Upload bronze_to_silver.py to /Shared/bronze_to_silver
# Configure storage mount points
# Update cluster configuration

Step 5: Testing and Validation

Adding Your First Source System

-- 1. Register source system


INSERT INTO ctl.SourceSystems (SourceSystemName, SourceSystemType, ConnectionString)
VALUES ('MY_ERP_SYSTEM', 'SqlServer', 'Server=myserver;Database=erp;...');

-- 2. Configure tables for processing


INSERT INTO ctl.Tables (
SourceSystemId, SchemaName, TableName, LoadType,
PrimaryKeyColumns, BronzePath, SilverPath
) VALUES (
1, 'dbo', 'Customers', 'Incremental', 'CustomerID',
'/mnt/bronze/erp/dbo/Customers',
'/mnt/silver/erp/dbo/Customers'

33 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

);

-- 3. Run the master pipeline


-- This will automatically process your configured tables

Monitoring and Validation

-- Check processing status


SELECT
t.TableName,
ps.Layer,
ps.ProcessingDate,
ps.Status,
ps.RecordsProcessed,
ps.StartTime,
ps.EndTime
FROM ctl.ProcessingStatus ps
JOIN ctl.Tables t ON ps.TableId = t.TableId
WHERE ps.ProcessingDate = '2025-01-15'
ORDER BY ps.StartTime DESC;

-- View Silver layer data


SELECT * FROM delta.`/mnt/silver/erp/dbo/Customers`
WHERE _processing_date = '2025-01-15'
LIMIT 10;

34 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Performance and Optimization

Databricks Cluster Configuration

{
"cluster_name": "data-pipeline-cluster",
"spark_version": "13.3.x-scala2.12",
"node_type_id": "Standard_DS3_v2",
"autoscale": {
"min_workers": 2,
"max_workers": 8
},
"auto_termination_minutes": 60,
"spark_conf": {
"spark.databricks.delta.preview.enabled": "true",
"spark.sql.adaptive.enabled": "true",
"spark.sql.adaptive.coalescePartitions.enabled": "true"
}
}

Performance Optimization Tips


1. Partition Strategy: Partition by processing date for time-based queries

2. Z-Ordering: Use Z-ORDER on frequently queried columns

35 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

3. File Sizes: Target 128MB-1GB file sizes for optimal performance

4. Cluster Sizing: Start with 2–4 workers, scale based on data volume

5. Delta Optimization: Enable auto-optimization for regular maintenance

Real-World Results
After implementing this architecture for several enterprise clients, here are
the typical results:

Performance Metrics
• Cost Reduction: 60–70% reduction compared to traditional ETL tools

• Reliability: 99.9% success rate with automatic retry mechanisms

• Scalability: Linear scaling from GB to TB without code changes

Business Impact
• Time to Market: New data sources onboarded in hours instead of weeks

• Data Quality: 95% reduction in data quality issues

• Operational Efficiency: 80% reduction in manual intervention

36 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

• Cost Savings: $50K+ annual savings on licensing and infrastructure

What’s Next? Extending to Gold Layer


The architecture we’ve built provides a solid foundation for Gold layer
development:

Gold Layer Characteristics


• Business-Specific Models: Customer 360, Product Analytics, Financial
KPIs

• Aggregated Data: Pre-calculated metrics for dashboard performance

• Domain Data Marts: Sales, Marketing, Finance-specific views

• API-Ready: Optimized for consumption by applications and reports

Implementation Approach

# Future Gold layer transformation example


def create_customer_360(silver_customers, silver_orders, silver_interactions):
customer_360 = silver_customers \
.join(silver_orders, "customer_id") \
.join(silver_interactions, "customer_id") \

37 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

.groupBy("customer_id") \
.agg(
sum("order_amount").alias("total_spent"),
count("order_id").alias("total_orders"),
max("last_interaction_date").alias("last_activity")
)
return customer_360

Conclusion
We’ve built a production-ready data pipeline that:

Scales automatically with your data growth


Requires minimal maintenance through metadata-driven configuration
Provides enterprise-grade reliability with comprehensive error handling
Optimizes costs through efficient resource utilization
Ensures data quality with built-in validation and monitoring

Key Takeaways:
1. Medallion Architecture provides a clear separation of concerns and data
quality layers

2. Delta Lake is essential for reliable, performant data processing at scale

3. Metadata-driven approaches dramatically reduce development and

38 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

maintenance overhead

4. Infrastructure as Code enables repeatable, reliable deployments

5. Proper monitoring and error handling are critical for production


systems

Screenshots From ADF

39 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Complete Repository
You can find the complete implementation with all scripts, notebooks, and
documentation at:

40 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

GitHub Repository: Azure Data Pipeline — Medallion Architecture

What You Get:


• Complete SQL scripts for control database setup

• Production-ready Databricks notebooks

• PowerShell deployment scripts (all 3 covered in this article)

• Sample configurations and examples

• Comprehensive documentation

• Integration testing scripts

Ready to transform your organization’s data processing capabilities? Clone


the repository and start building your own medallion architecture pipeline
today!

Next Steps
1. Share the Gold Layer:
I will provide details and implementation for the gold layer.

41 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

2. Create LDW on Synapse Serverless SQL Pool:


I’ll build a Logical Data Warehouse (LDW) on top of the gold layer using
Serverless SQL Pool.

3. Migrate Report Sources:


The next phase will be migrating report data sources from Serverless SQL
Pool to Fabric Shortcuts and SQL Endpoints.

4. Enable Databricks UC for Reporting:


We’ll leverage Databricks Unity Catalog to access the gold layer directly
for reporting purposes.

5. ADF to Fabric Data Factory Migration:


I am also planning to migrate the Azure Data Factory pipelines to Fabric
Data Factory for a fully modern, end-to-end architecture.

Stay tuned for updates!


The goal is to showcase the most effective orchestration strategies while
adopting the latest technologies available in the data engineering ecosystem.

Have questions about implementing this architecture? Found this helpful? Leave a
comment below or connect with me on LinkedIn. I’d love to hear about your data
pipeline challenges and successes!

42 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Azure Databricks Power Bi Etl Data Engineering

Written by Yasar Kocyigit


10 followers · 22 following

Senior Data Engineer | Cloud & Data Enthusiast | Designing the Future of
Scalable Analytics

No responses yet

Sava Matic

43 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

More from Yasar Kocyigit

Yasar Kocyigit Yasar Kocyigit

How to Connect Paginated Defining Governed Business


Reports/SSRS to Databricks SQL… Metrics with Unity Catalog Metric…
Introduction

5d ago 53 2 4d ago

44 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Yasar Kocyigit

Neon Joins Databricks: Ushering in


the Future of AI-Native Data…

4d ago

See all from Yasar Kocyigit

Recommended from Medium

45 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Yıldıray Kuru Mayurkumar Surani

From 7.6GB CSV to 0.42GB Mastering PySpark: Your


Parquet: Production-Ready Big… Complete Guide to 46 Essential…
The Problem: When “Out of Memory” Collection of PySpark Functions
Strikes…

Jun 2 2 1 Jun 3 33

Aakeef In Towards Dev by Islam Taha

Data Engineering Architectures— Beyond the Dashboard: Building a


Part 2 Robust Data Quality Framework i…
This is a continued part of Data Engineering In today’s data-driven world, organizations
Architectures. (previous part 1 blog) lean heavily on platforms like Microsoft Pow…

Jun 4 9 Jun 1 5

46 of 47 6/14/2025, 1:35 AM
Building a Production-Ready Data Pipeline with Azure: Complete Guide to Medallion Archite... https://medium.com/@kocyigityasar/building-a-production-ready-data-pipeline-with-azure-c...

Prateek Jain Thomas F McGeehan V

7 Open Source Diagram-as-Code The Lakehouse Is Dead. Long Live


Tools You Should Try the Lakehouse
A hands-on guide to 7 open-source tools that How complexity became the product
let you draw cloud or application architectur…

Jun 5 1.2K 25 4d ago 145 2

See more recommendations

Help Status About Careers Press Blog Privacy Rules Terms Text to speech

47 of 47 6/14/2025, 1:35 AM

You might also like