0% found this document useful (0 votes)
109 views5 pages

Incremental Load in Azure Data Factory

Interview questions

Uploaded by

Sumanth Kulkarni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views5 pages

Incremental Load in Azure Data Factory

Interview questions

Uploaded by

Sumanth Kulkarni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Scenario 2

Incremental Load

What is it?
It’s a data loading technique in ETL process where only updated data is loaded into destination
rather than the complete data.

Other ways of loading data.


Full load : When the complete data is erased and loaded again along with the updated data then its
called full load.

There are various ways to implement incremental data in Azure Data Factory. I have used the below
approach.

My source is a SFTP connection to an application and destination is a Blob storage.


Internet is filled with incremental load for SQL server so I wanted to have a different approach and
give it a twist.

Step 1: Make a successful a SFTP connection.

Step 2: Use a Get MetaData activity to retrieve all the files by using child items as argument.
Step 3: Use a Foreach activity for iterating through all files.

I have checked Sequential option, we also have another option of using Batch.
Difference : Sequential means all the files are iterated one by one but in Batch all the files are
processed at once, you need to give batch count if you use this option, the number of batch count is
the number of files that would get processed all at once.

Items: Here we need to send the files that our GetMeta Data have processed. This value must be
dynamic.

The above option must be chosen.

Step 4: We must choose activities within Foreach, here I choose another GetMeta data for this the
dataset must be parameterized and pass the expression of Foreach iterator. This GetMeta data will
give us the file name along with their last modified date which will be used to compare and give the
latest file.
Step 5: We further use If condition activity where we would compare the timestamps of files from
logial function I took greaterOrEquals function.

Step 6 : We will use set variables to store the last modified and item name. But before that variables
at pipeline level must be defined.
Set Variable1
Another set variable for last modified date.

Step 7 : Drag a copy activity. Set the source to the latest file variable and sink to the destination as
per your requirement.

You might also like