InfoSphere DataStage Balanced Optimization
InfoSphere DataStage Balanced Optimization
Parallel Administratio
Connectivity Metadata Deployment
Processing n
Services Services Services
Services Services
2
Information Management Software
InfoSphere DataStage
Data Transformation
3
Information Management Software
4
Information Management Software
5
Information Management Software
6
Information Management Software
DataStage
Designer
design
job original
DataStage compile
job & run
verify
manually review/ job
optimize Balanced Optimization edit optimized job results
job
rewritten
optimized compile
job & run
7
Information Management Software
Supported Transformations
Push to Push to
source target
Transformation 1 1
Sorting 1
Aggregation 1 1,2
Join, Lookup
Funnel
Drop unnecessary processing (e.g., sorting)
Use bulk staging operations (load)
Push everything into the (target) database
as supported by database
1 2
involving data already in the target
8
Information Management Software
9
Information Management Software
10
Information Management Software
DataStage
Staging Area
one to one load of Source Data
Appl 1
LDWH
AT Financial Services Physical Data model
Appl 2
DataStage
Appl 1 Party
LDWH Party Finance
… Appl 2
Asset
Campain Agreement
Appl 1
LDWH
IT Channel Event/Claim
Appl 2
Internal
Product Location
Org.
Teradata RDBMS DataStage
V2R6.2
DataStage
Target Target Target Target
11
Information Management Software
• Teradata Server
• 2 x 3 Nodes 5500C
• 2x 6,7 TB Permspace
• 4 logical Environments
12
Information Management Software
Example Job
13
Information Management Software
Example Job
67 % improvement in job run-time
Before optimization
Overall runtime 6:45 mins
Sum TD CPU 956.76
Sum TD IO 1,123,456.00
Sum TD Spool 8,534,112,347.00
Sum ETL Temp ~ 3 GB
After optimization
Overall runtime 2:25 mins
Sum TD CPU 1,962.22 Complete PushAllToDB was not possible,
because of nested IF Then ELSE
Sum TD IO 1,735,983.00
In GA version this is already working as
Sum TD Spool 10,832,294,222.00
complete push to Database
Sum ETL Temp ~ 1.5 GB
14
Information Management Software
Runtime statistics
• Runtime reduction:
• Between 23% and 60% of previous runtime
15
Information Management Software
Summary
• Delivers optimal runtime
performance through a scalable
parallel architecture
• Harnesses available capacity and
computing power in Teradata and
DataStage
• Unparalleled productivity gains
through simplification and
consolidation of development
resources
• Comprehensive lineage from
source to target delivering trusted
information
16
Thank you