CPD # 1311
ETL Vs ELT
Description: The CPD gives a detailed study of the key conceptual differences between
ETL and ELT approach of Datawarehousing.
Author: Divya Pai
Reviewed by: Rahul Mehta, Nitin Ambare,Tej Rawat
Keywords : ETL , ELT
Version 1.0
October 2017
This document is confidential and contains proprietary information, including trade secrets of CitiusTech. Neither the document nor any of the information
contained in it may be reproduced or disclosed to any unauthorized person under any circumstances without the express written permission of CitiusTech.
Contents
▪ Background Summary
▪ Data Warehouse Architecture using ETL Approach
▪ ELT Approach for Data Warehouse Architecture
▪ Comparison between ETL and ELT
▪ When to use?
▪ References
2
Background Summary
In a traditional data warehouse the data is loaded into a central, schema-driven “repository of truth”
for analytics and reporting through an Extract, Transform, and Load process (ETL)
ETL has always been an important part of the data warehousing world
But there are several reasons and new trends why it has become more challenging to implement ETL
these days:
▪ Significant bandwidth consumption
▪ Long development cycles
▪ High costs for scalability
▪ Introduction of a variety of data (semi-structured, unstructured)
Considering the new technologies and toolsets that are available currently, a new approach can be
implemented. ELT (Extract, Load and Transform) can challenge the existing ETL method and prove
to be better. The choice needs to be made based upon the objectives set for the project and by
considering the various strengths and weakness of both the approaches.
3
Contents
▪ Background Summary
▪ Data Warehouse Architecture using ETL Approach
▪ ELT Approach for Data Warehouse Architecture
▪ Comparison between ETL and ELT
▪ When to use?
▪ References
4
Data Warehouse Architecture using ETL Approach (1/2)
2
Source
Systems Transformations
3
EMRs
1
Data Quality
HIS
Staging
DW
Reconciliation
LIS
E
Normalization
Source DB L
T
** Detailed description of the architecture is on the next slide
5
Data Warehouse Architecture using ETL Approach (2/2)
▪ The Extract step covers the data extraction from the source system
and makes it accessible for further processing in the staging area
EXTRACT
▪ The data in the staging area may be extracted on the basis of some
conditions
▪ The transform step applies a set of rules to transform the data from
the source to the target
▪ All the activities listed below are performed by using capabilities of
the ETL tool
TRANSFORM ▪ Types of transformation:
Cleansing Joining
Deduplication Data validation
Conversion Data Quality
Filtering Data reconciliation
▪ Load is the process of writing the data into the target database
▪ The loading phase is the last step of the ETL process
▪ The load starts only when the transformed data is available in the
LOAD
pipeline
▪ During the load step, it is necessary to ensure that the load is
performed correctly and with as little resources as possible
6
Contents
▪ Background Summary
▪ Data Warehouse Architecture using ETL Approach
▪ ELT Approach for Data Warehouse Architecture
▪ Comparison between ETL and ELT
▪ When to use?
▪ References
7
ELT Approach for Data Warehouse Architecture (1/2)
Source RDBMS 3
Systems 1
EMRs
2
HIS Staging/Data Stored procedures/TSQL
Lake tasks DW
LIS
L T
Transformations, Data Quality, Reconciliation,
Source DB Normalization
E
** Detailed description of the architecture is on the next slide
8
ELT Approach for Data Warehouse Architecture (2/2)
▪ The Extract step covers the extraction of raw data from the source
EXTRACT
system
▪ The raw data is extracted and the entire copy is dumped to the staging
area or data lake
LOAD
▪ Here, data is LOADED first to the staging area without any column
name or data field changes
▪ The transform step applies a set of rules to transform the data from the
source to the target
▪ All the activities given below are performed on the data where it sits in
the database before loading it to the target
▪ These activities are performed by use of TSQL or stored procedures by
utilizing MPP capabilities
TRANSFORM ▪ Types of transformation:
Cleansing Joining
Deduplication Data validation
Conversion Data Quality
Filtering Data reconciliation
▪ This transformed data is then loaded to the data warehouse
9
Real Life Example of Data Warehouse Architecture using ETL
1. Data is sourced from standard
messaging files
2. Data is sourced from standard CSV
files
3. Data from standard messages is
extracted by message parsers
4. Data for CSV files is extracted by
ETL process
5. Data from all files land into a
common staging area
6. The pre-built stored procedures
are used for loading data using
metadata tables
7. The stored procedures also
perform transformations, data
quality, reconciliation and
normalization
8. Data is loaded into the
dimensional warehouse model
9. Job scheduling and audit logging is
performed outside the ELT
framework
10
Contents
▪ Background Summary
▪ Data Warehouse Architecture using ETL Approach
▪ ELT Approach for Data Warehouse Architecture
▪ Comparison between ETL and ELT
▪ When to use?
▪ References
11
ETL - Pros and Cons
Pros Cons
ETL can balance the workload and share it with ETL jobs consume significant CPU, MEMORY, DISK
the RDBMS by utilizing PUSH-DOWN SPACE, and NETWORK BANDWIDTH, it is difficult to
PROCESSING accommodate running these jobs more than once
daily
It can handle PARTITIONING and PARALLELISM Extra COST of building ETL system or LICENSING of
independent of the data model and database ETL tools
It processes information ROW-BY-ROW hence Need to wait, especially for big data sizes - as data
debugging is easy grows, transformation TIME increases
Capable of capturing RUN-TIME STATISTICS and Possible REDUCED PERFORMANCE of row-based
DATA LINEAGE approach in scenarios with huge data volume
ETL tools have a very USER-FRIENDLY interface
and the process flow itself is sufficient to give
an insight into the functionality
12
ELT - Pros and Cons
Pros Cons
It can achieve 3x to 4x throughput rates due to Limited tools are available with full support for ELT
MPP (MASSIVE PARALLEL PROCESSING)
RDBMS platform
Transformation process is INDEPENDENT of NO IN-BUILT MECHANISM for capturing RUN-TIME
the extract and load, this can help in storing STATISTICS and data LINEAGE. Separate coding
data for future needs without transforming it effort is required to achieve it
No additional hardware or skill set is required ELT engines REQUIRE extra RDBMS space to
since it runs on the SAME HARDWARE AS THE transform data, particularly when dealing with VLDB
DATABASE ENGINE where the warehouse is (very large databases)
placed
RISK is REDUCED since there is less DEBUGGING is NOT EASY since the code has to be
dependency between stages checked in detail, especially in data reconciliation
scenarios
13
Comparative Study ETL vs ELT (1/2)
Feature ETL ELT
TAT (Turn around Time) Data load takes longer since the data Load time is reduced since the
moves between the database and ETL staging and target areas reside
server, required additional hop in the same database
Data lineage & Audit All ETL tools are equipped with in- These features are not available
logging house data lineage and audit logging out-of-the-box. Additional
mechanisms coding effort is needed to
incorporate data lineage and
audit logging
Cost Effectiveness ETL tool licensing, server costs and ELT may be more cost effective
data base engine costs add up to a since there is a one-time cost
significant amount for database engine and license
Data Quality Most ETL tool have their own in-suite There is no in-suite data quality
data quality components which can component readily available in
be integrated with the ETL process to ELT, hence is needs to be coded
perform data quality checks for separately
14
Comparative Study ETL vs ELT (2/2)
Feature ETL ELT
User Friendliness The graphical user interface of ETL Since no UI is available, the
tools is very user friendly. Developing entire code has to be written
jobs/process flows is easier to learn manually
Transformation time Transformation time increases Massively parallel processing
significantly with increase in data (MPP) capabilities handle huge
volumes data volumes with ease
Data Warehouse support Supports prevalent legacy data Supports modern data ware
warehouse model for relational and house architecture and
structured data equipped to support semi-
structured sources
MDM implementation MDM implementation in ETL is a MDM implementation is easier
complex process. It needs to be since it can be done in the
handled separately preferably in the RDBMS itself
database
15
Contents
▪ Background Summary
▪ Data Warehouse Architecture using ETL Approach
▪ ELT Approach for Data Warehouse Architecture
▪ Comparison between ETL and ELT
▪ When to use?
▪ References
16
When to use?
ELT ETL
▪ A strong database engine with MPP (Massive ▪ A very strong RDBMS system is not
Parallel Processing) capabilities and good available but a powerful ETL engine,
hardware which supports heavy processing is hardware and network resources for
available parallel processing are available
▪ Source data is on-premise, structured and
▪ Source data is semi-structured
relational
▪ Volume of data is huge (hundreds or thousands ▪ Volume of data is lesser (several hundred
of terabytes) gigabytes or a low number of terabytes)
▪ Data can remain in the repository, unprocessed ▪ Processing of data must happen in-
for future use stream, or in a pipeline process
• The target is a high-end data engine, such as a ▪ Expertise in ETL tools is available and
data appliance, Hadoop cluster, or cloud there is a need to stick to traditional and
installation mature, tested approach
17
Contents
▪ Background Summary
▪ Data Warehouse Architecture using ETL Approach
▪ ELT Approach for Data Warehouse Architecture
▪ Comparison between ETL and ELT
▪ When to use?
▪ References
18
References
▪ https://www.ironsidegroup.com/2015/03/01/etl-vs-elt-whats-the-big-difference/
▪ http://blogs.perficient.com/delivery/blog/2016/07/14/elt-vs-etl-data-warehousing/
▪ https://www.softwareadvice.com/resources/etl-vs-elt-for-your-data-warehouse/
19
THANK YOU