0% found this document useful (0 votes)
322 views20 pages

ETL Vs ELT

The document compares the ETL and ELT approaches for data warehousing, describing how ETL uses extract, transform, and load steps to move data from source systems to a data warehouse through transformations, while ELT extracts, loads, and then transforms data within the data warehouse database directly using SQL. Key differences are that ETL moves and transforms data separately while ELT keeps data static during loading and transforms in place, and considerations for which approach to use depend on factors like data volumes, integration needs, and tooling.

Uploaded by

SudhakarN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
322 views20 pages

ETL Vs ELT

The document compares the ETL and ELT approaches for data warehousing, describing how ETL uses extract, transform, and load steps to move data from source systems to a data warehouse through transformations, while ELT extracts, loads, and then transforms data within the data warehouse database directly using SQL. Key differences are that ETL moves and transforms data separately while ELT keeps data static during loading and transforms in place, and considerations for which approach to use depend on factors like data volumes, integration needs, and tooling.

Uploaded by

SudhakarN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

CPD # 1311

ETL Vs ELT

Description: The CPD gives a detailed study of the key conceptual differences between
ETL and ELT approach of Datawarehousing.
Author: Divya Pai
Reviewed by: Rahul Mehta, Nitin Ambare,Tej Rawat
Keywords : ETL , ELT
Version 1.0
October 2017

This document is confidential and contains proprietary information, including trade secrets of CitiusTech. Neither the document nor any of the information
contained in it may be reproduced or disclosed to any unauthorized person under any circumstances without the express written permission of CitiusTech.
Contents

▪ Background Summary

▪ Data Warehouse Architecture using ETL Approach

▪ ELT Approach for Data Warehouse Architecture

▪ Comparison between ETL and ELT

▪ When to use?

▪ References

2
Background Summary
In a traditional data warehouse the data is loaded into a central, schema-driven “repository of truth”
for analytics and reporting through an Extract, Transform, and Load process (ETL)
ETL has always been an important part of the data warehousing world
But there are several reasons and new trends why it has become more challenging to implement ETL
these days:
▪ Significant bandwidth consumption
▪ Long development cycles
▪ High costs for scalability
▪ Introduction of a variety of data (semi-structured, unstructured)

Considering the new technologies and toolsets that are available currently, a new approach can be
implemented. ELT (Extract, Load and Transform) can challenge the existing ETL method and prove
to be better. The choice needs to be made based upon the objectives set for the project and by
considering the various strengths and weakness of both the approaches.

3
Contents

▪ Background Summary

▪ Data Warehouse Architecture using ETL Approach

▪ ELT Approach for Data Warehouse Architecture

▪ Comparison between ETL and ELT

▪ When to use?

▪ References

4
Data Warehouse Architecture using ETL Approach (1/2)
2
Source
Systems Transformations
3
EMRs

1
Data Quality
HIS
Staging
DW
Reconciliation
LIS
E
Normalization
Source DB L
T

** Detailed description of the architecture is on the next slide

5
Data Warehouse Architecture using ETL Approach (2/2)
▪ The Extract step covers the data extraction from the source system
and makes it accessible for further processing in the staging area
EXTRACT
▪ The data in the staging area may be extracted on the basis of some
conditions
▪ The transform step applies a set of rules to transform the data from
the source to the target
▪ All the activities listed below are performed by using capabilities of
the ETL tool
TRANSFORM ▪ Types of transformation:
Cleansing Joining
Deduplication Data validation
Conversion Data Quality
Filtering Data reconciliation

▪ Load is the process of writing the data into the target database
▪ The loading phase is the last step of the ETL process
▪ The load starts only when the transformed data is available in the
LOAD
pipeline
▪ During the load step, it is necessary to ensure that the load is
performed correctly and with as little resources as possible

6
Contents

▪ Background Summary

▪ Data Warehouse Architecture using ETL Approach

▪ ELT Approach for Data Warehouse Architecture

▪ Comparison between ETL and ELT

▪ When to use?

▪ References

7
ELT Approach for Data Warehouse Architecture (1/2)

Source RDBMS 3
Systems 1
EMRs
2

HIS Staging/Data Stored procedures/TSQL


Lake tasks DW

LIS
L T
Transformations, Data Quality, Reconciliation,
Source DB Normalization
E

** Detailed description of the architecture is on the next slide


8
ELT Approach for Data Warehouse Architecture (2/2)
▪ The Extract step covers the extraction of raw data from the source
EXTRACT
system
▪ The raw data is extracted and the entire copy is dumped to the staging
area or data lake
LOAD
▪ Here, data is LOADED first to the staging area without any column
name or data field changes
▪ The transform step applies a set of rules to transform the data from the
source to the target
▪ All the activities given below are performed on the data where it sits in
the database before loading it to the target
▪ These activities are performed by use of TSQL or stored procedures by
utilizing MPP capabilities
TRANSFORM ▪ Types of transformation:
Cleansing Joining
Deduplication Data validation
Conversion Data Quality
Filtering Data reconciliation
▪ This transformed data is then loaded to the data warehouse

9
Real Life Example of Data Warehouse Architecture using ETL

1. Data is sourced from standard


messaging files
2. Data is sourced from standard CSV
files
3. Data from standard messages is
extracted by message parsers
4. Data for CSV files is extracted by
ETL process
5. Data from all files land into a
common staging area
6. The pre-built stored procedures
are used for loading data using
metadata tables
7. The stored procedures also
perform transformations, data
quality, reconciliation and
normalization
8. Data is loaded into the
dimensional warehouse model
9. Job scheduling and audit logging is
performed outside the ELT
framework

10
Contents

▪ Background Summary

▪ Data Warehouse Architecture using ETL Approach

▪ ELT Approach for Data Warehouse Architecture

▪ Comparison between ETL and ELT

▪ When to use?

▪ References

11
ETL - Pros and Cons

Pros Cons
ETL can balance the workload and share it with ETL jobs consume significant CPU, MEMORY, DISK
the RDBMS by utilizing PUSH-DOWN SPACE, and NETWORK BANDWIDTH, it is difficult to
PROCESSING accommodate running these jobs more than once
daily
It can handle PARTITIONING and PARALLELISM Extra COST of building ETL system or LICENSING of
independent of the data model and database ETL tools

It processes information ROW-BY-ROW hence Need to wait, especially for big data sizes - as data
debugging is easy grows, transformation TIME increases

Capable of capturing RUN-TIME STATISTICS and Possible REDUCED PERFORMANCE of row-based


DATA LINEAGE approach in scenarios with huge data volume

ETL tools have a very USER-FRIENDLY interface


and the process flow itself is sufficient to give
an insight into the functionality

12
ELT - Pros and Cons

Pros Cons
It can achieve 3x to 4x throughput rates due to Limited tools are available with full support for ELT
MPP (MASSIVE PARALLEL PROCESSING)
RDBMS platform

Transformation process is INDEPENDENT of NO IN-BUILT MECHANISM for capturing RUN-TIME


the extract and load, this can help in storing STATISTICS and data LINEAGE. Separate coding
data for future needs without transforming it effort is required to achieve it

No additional hardware or skill set is required ELT engines REQUIRE extra RDBMS space to
since it runs on the SAME HARDWARE AS THE transform data, particularly when dealing with VLDB
DATABASE ENGINE where the warehouse is (very large databases)
placed
RISK is REDUCED since there is less DEBUGGING is NOT EASY since the code has to be
dependency between stages checked in detail, especially in data reconciliation
scenarios

13
Comparative Study ETL vs ELT (1/2)

Feature ETL ELT

TAT (Turn around Time) Data load takes longer since the data Load time is reduced since the
moves between the database and ETL staging and target areas reside
server, required additional hop in the same database

Data lineage & Audit All ETL tools are equipped with in- These features are not available
logging house data lineage and audit logging out-of-the-box. Additional
mechanisms coding effort is needed to
incorporate data lineage and
audit logging

Cost Effectiveness ETL tool licensing, server costs and ELT may be more cost effective
data base engine costs add up to a since there is a one-time cost
significant amount for database engine and license

Data Quality Most ETL tool have their own in-suite There is no in-suite data quality
data quality components which can component readily available in
be integrated with the ETL process to ELT, hence is needs to be coded
perform data quality checks for separately

14
Comparative Study ETL vs ELT (2/2)

Feature ETL ELT

User Friendliness The graphical user interface of ETL Since no UI is available, the
tools is very user friendly. Developing entire code has to be written
jobs/process flows is easier to learn manually

Transformation time Transformation time increases Massively parallel processing


significantly with increase in data (MPP) capabilities handle huge
volumes data volumes with ease
Data Warehouse support Supports prevalent legacy data Supports modern data ware
warehouse model for relational and house architecture and
structured data equipped to support semi-
structured sources
MDM implementation MDM implementation in ETL is a MDM implementation is easier
complex process. It needs to be since it can be done in the
handled separately preferably in the RDBMS itself
database

15
Contents

▪ Background Summary

▪ Data Warehouse Architecture using ETL Approach

▪ ELT Approach for Data Warehouse Architecture

▪ Comparison between ETL and ELT

▪ When to use?

▪ References

16
When to use?
ELT ETL
▪ A strong database engine with MPP (Massive ▪ A very strong RDBMS system is not
Parallel Processing) capabilities and good available but a powerful ETL engine,
hardware which supports heavy processing is hardware and network resources for
available parallel processing are available
▪ Source data is on-premise, structured and
▪ Source data is semi-structured
relational

▪ Volume of data is huge (hundreds or thousands ▪ Volume of data is lesser (several hundred
of terabytes) gigabytes or a low number of terabytes)

▪ Data can remain in the repository, unprocessed ▪ Processing of data must happen in-
for future use stream, or in a pipeline process
• The target is a high-end data engine, such as a ▪ Expertise in ETL tools is available and
data appliance, Hadoop cluster, or cloud there is a need to stick to traditional and
installation mature, tested approach

17
Contents

▪ Background Summary

▪ Data Warehouse Architecture using ETL Approach

▪ ELT Approach for Data Warehouse Architecture

▪ Comparison between ETL and ELT

▪ When to use?

▪ References

18
References

▪ https://www.ironsidegroup.com/2015/03/01/etl-vs-elt-whats-the-big-difference/

▪ http://blogs.perficient.com/delivery/blog/2016/07/14/elt-vs-etl-data-warehousing/

▪ https://www.softwareadvice.com/resources/etl-vs-elt-for-your-data-warehouse/

19
THANK YOU

You might also like