0% found this document useful (0 votes)
83 views

06.introduction To Data Factory

Azure Data Factory (ADF) is a hybrid data integration service that enables users to create automated data pipelines without coding. It has over 80 connectors and can move, transform, and save data using a drag-and-drop interface. ADF components include pipelines containing activities, datasets representing referenced data, linked services defining data connections, integration runtimes providing compute resources, and triggers executing pipelines on schedules or events. Together these components provide a managed service for orchestrating ETL/ELT workflows across on-premises and cloud data stores and platforms.

Uploaded by

Sharvaree Taware
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views

06.introduction To Data Factory

Azure Data Factory (ADF) is a hybrid data integration service that enables users to create automated data pipelines without coding. It has over 80 connectors and can move, transform, and save data using a drag-and-drop interface. ADF components include pipelines containing activities, datasets representing referenced data, linked services defining data connections, integration runtimes providing compute resources, and triggers executing pipelines on schedules or events. Together these components provide a managed service for orchestrating ETL/ELT workflows across on-premises and cloud data stores and platforms.

Uploaded by

Sharvaree Taware
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Introduction to Azure Data Factory

Learning Objectives
• Azure Data Factory
• Considerations
• Components
• Demos
What you can do in Azure Data Factory?
Definition
Azure Data Factory (ADF) is a hybrid data integration service
that enables you to quickly and efficiently create automated
data pipelines – without having to write any code!
Azure Data Factory
• Hybrid Data Integration Service
• Simplifies ETL at scale
• Enables modern data integration
• Drag and drop interface
• Over 80 connectors available
• Move, transform and save data
• Managed Service
• Create Data Driver workflows
• Orchestrate and automate data movement
• Transform and store data
• Operationalize the process
• ETL or ELT scenarios
Data Factory Considerations
Azure Data Factory Components
Data Factory Pipeline
• Data Factories can contain one ore more pipelines
• Logical group of Activities
• Manage Activities as a set
• One Pipeline can have one or more activitiesData
Azure Data Factory Activities
• Represents a processing step in the pipelines
• Actions to perform on data
• Ingest data
• Transform data
• Store data
• Can be linked
• Execute sequentially or
• Run in parallel
Activity Types
• Data movement activities
• Copy data amongst data stores located on-premises and in the cloud Data
stores – Blob storage, Cosmos DB, Amazon Redshift, Maria DB…etc
• Data transformation activities
• Transform and enrich data e.g. Hive, Pig, MapReduce, Spark or Databricks
• Control activities
• Control pipeline flow e.g. ForEach, Web
Data Flows
• Data Flow is a new feature of Azure Data Factory (ADF) that allows
you to develop graphical data transformation logic that can be
executed as activities within ADF pipelines.
• Two types:
• Mapping
• Wrangling
DataSet
• Simply point or reference the data
• Reference data used in an Activity
• Files
• Folders
• Documents
• Tables
Linked Service
• Similar to connection string
• Represent the connection information to connect to external
resources
• Datastores like Azure SQL Server
• Compute resource e.g. Spark Cluster
ADF Components
Integration Runtimes
• Provides fully managed, serverless compute infrastructure
• You don't have to worry about infrastructure provision, software
installation, patching, or capacity scaling.
• Pay only for duration of actual use
• Bridges between the activity and linked service
• Activity defines the action
• Linked service define the location
Integration Runtimes
• Data Integration Capabilities
• Data Flow
• Data Movement
• Format conversion, column mapping, serialization/ deserialization etc.
• Provides the native compute to move data between cloud data stores in a secure,
reliable, and high performance manner.
• Activity dispatch (e.g. Databricks Notebook, HDInsight Hive, pig,
spark activity, SP, ADL Analytics U-SQL activity)
• SSIS Package execution
Types of Integration Runtime
Specify the infrastructure to run activities
1. Azure Integration Runtime
• Work on public networks
• Responsible for data flows, data movements, and activity dispatches
2. Self-hosted Integration Runtime
• Work on public and private networks
• Provide data movement and activity dispatch capabilities
• Need to install on on-premises machine or a virtual machine inside private network
3. SSIS Integration Runtime
• Supports SSIS package execution
• Works on public and private networks
Integration Runtimes
IR Type Public Network Private Network
Azure Data Flow
Data movement
Activity Dispatch

Self-hosted Data movement Data movement


Activity Dispatch Activity Dispatch

Azure-SSIS SSIS package execution SSIS package execution


Integration Runtimes
• Default IR – AutoResolveIntegrationRuntime
• Create Azure IR
• When you want to explicitly define the location of IR
• Virtually group the activities executions on different IR for
management purpose
Triggers
• Execute pipeline
• Many to many relationship b/w pipeline and trigger
• Three types of Trigger
➢ Schedule Trigger – Invoke pipeline on a wall-clock schedule
➢ Tumbling Window Trigger – Operates on a periodic interval, also retain state
➢ one-to-one relationship
➢ Advance configuration options - Dependencies, delay, retry, concurrency
➢ Properties - trigger().outputs.WindowStartTime/WindowEndTime
➢ Event-based Trigger – trigger pipeline in response to an event
➢ e.g. Arrival/deletion of file in Blob storage
➢ Event trigger with Azure Event Grid Service
➢ Properties – triggerBody().folderPath/fileName
• https://docs.microsoft.com/en-us/azure/data-factory/tumbling-window-trigger-dependency
Demo
Summary
Thanks

You might also like