0% found this document useful (0 votes)
55 views4 pages

Azure Data Factory Full Notes

Azure Data Factory (ADF) is a cloud-based data integration service for creating ETL and ELT pipelines to move and transform data across various data stores. Key components include pipelines, activities, datasets, and linked services, with different types of integration runtimes available for cloud and on-premise data movement. ADF supports parameterization, control flow activities, debugging, monitoring, and integration with other Azure services, while also emphasizing best practices and security measures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views4 pages

Azure Data Factory Full Notes

Azure Data Factory (ADF) is a cloud-based data integration service for creating ETL and ELT pipelines to move and transform data across various data stores. Key components include pipelines, activities, datasets, and linked services, with different types of integration runtimes available for cloud and on-premise data movement. ADF supports parameterization, control flow activities, debugging, monitoring, and integration with other Azure services, while also emphasizing best practices and security measures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Azure Data Factory (ADF) - Complete Notes (Basic to Advanced)

1. Introduction to ADF

- Azure Data Factory is a cloud-based data integration service that enables creating ETL and ELT pipelines.

- Used to move data between different data stores and transform it as needed.

2. Core Components

- Pipeline: Group of activities

- Activity: Individual processing step (Copy, Stored Procedure, Data Flow, etc.)

- Dataset: Represents data structures (tables/files)

- Linked Service: Connection information

- Trigger: Schedules or events to run pipelines

- Integration Runtime: Compute used for data movement

3. Types of Integration Runtime

- Azure IR: Handles cloud-based data movement

- Self-hosted IR: Access on-premise data securely

- Azure SSIS IR: Lift and shift SSIS packages

4. Creating Your First Pipeline (Step-by-Step)

Step 1: Create a Linked Service (e.g., Azure Blob Storage)

Step 2: Create a Dataset (e.g., CSV file in Blob)

Step 3: Create a Pipeline

Step 4: Add Copy Data Activity

Step 5: Configure Source and Sink

Step 6: Debug and Trigger


5. Copy Data Activity Example

- Source: Blob Storage (CSV)

- Sink: Azure SQL Table

- Mapping columns manually or using Auto Mapping

6. Parameterization

- Use parameters in Linked Services, Datasets, and Pipelines

- Pass values dynamically using expressions

Example:

@pipeline().parameters.filename

7. Variables and Expressions

- Define pipeline variables (Set, Append)

- Use expressions for dynamic content

Examples:

- concat(), formatDateTime(), pipeline().parameters.name

8. Control Flow Activities

- If Condition

- Switch

- ForEach

- Until

- Execute Pipeline

9. Data Flow (Mapping Data Flows)


- Visually designed data transformation logic

- Source > Derived Column > Filter > Sink

Example:

- Read CSV > Clean Data > Load to SQL

10. Debugging and Monitoring

- Use Debug mode to test pipeline

- Monitor tab for pipeline run history, status, errors

11. Triggers

- Schedule Trigger (e.g., daily at 12AM)

- Tumbling Window Trigger (fixed-size intervals)

- Event-Based Trigger (Blob Created)

12. CI/CD with ADF

- Use Azure DevOps for source control

- Export ARM templates for deployment

13. Real-Time Use Case Example

ETL Pipeline:

- Extract: Copy data from Blob Storage

- Transform: Clean with Data Flow

- Load: Write to Azure SQL Database

14. Best Practices

- Parameterize wherever possible


- Reuse Linked Services and Datasets

- Use ForEach for parallel processing

- Monitor pipeline performance

15. Security in ADF

- Use Managed Identity for authentication

- Secure Linked Services using Azure Key Vault

16. Integration with Other Services

- Azure Synapse Analytics

- Azure Databricks

- Power BI

17. ADF Pricing

- Based on pipeline orchestration and Data Movement/Data Flow usage

18. Useful Resources

- Microsoft Docs: https://learn.microsoft.com/en-us/azure/data-factory/

- ADF Templates Gallery

You might also like