0% found this document useful (0 votes)
19 views17 pages

InfoSphere DataStage Balanced Optimization

InfoSphere DataStage Balanced Optimization

Uploaded by

haloj36372
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views17 pages

InfoSphere DataStage Balanced Optimization

InfoSphere DataStage Balanced Optimization

Uploaded by

haloj36372
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 17

InfoSphere DataStage Balanced Optimization

1 © 2008 IBM Corporation


Information Management Software

IBM InfoSphere Information Server


Delivering information you can trust

IBM InfoSphere Information Server


Understan
Cleanse Transform Deliver
d

Discover, model, Standardize, merge, Combine and Synchronize,


and govern and correct information restructure virtualize and move
information information for new information for in-
structure and uses line delivery
content
Platform Services

Parallel Administratio
Connectivity Metadata Deployment
Processing n
Services Services Services
Services Services

2
Information Management Software

InfoSphere DataStage
Data Transformation

• Provides codeless visual design of data


flows with hundreds of built-in
transformation functions Developers Architects

• Optimized reuse of integration objects Transform and aggregate any volume


of information in batch or real time
• Supports batch & real-time operations through visually designed logic
• Produces reusable components that can
be shared across projects
• Supports team-based development and
collaboration
• Provides heterogeneous integration
across the broadest range of sources
• Delivers massive scalability through
support for large SMP, MPP and GRID Hundreds of Built-in
Transformation Functions

3
Information Management Software

InfoSphere DataStage Balanced Optimization


Data Transformation

• Provides automatic optimization


of data flows mapping
transformation logic to SQL Developers Architects

Transform and aggregate any volume


of information in batch or real time
• Leverages investments in DBMS through visually designed logic
hardware by executing data
integration tasks with and within
the DBMS

• Optimizes job run-time by


allowing the developer to control
where the job or various parts of
the job will execute.
Optimizing run time through intelligent
use of DBMS hardware

4
Information Management Software

Leveraging best-of-breed systems


• Optimization is not constrained to
a single implementation style, such
as ELT

• InfoSphere DataStage Balanced


Optimization fully harnesses
available capacity and computing
power in Teradata and DataStage

• Delivering unlimited scalability and


performance through parallel
execution everywhere, all the time

5
Information Management Software

Elements of Balanced Optimization

• Minimize I/O and data copying/movement


• source data reductions
• move the processing to the data
• keep data in the database(s) - avoid target extractions
• Maximize optimization within sources or targets
• indices, native optimizations, database-specific features
• Maximize parallelism
• I/O from/to databases
• in the DataStage parallel engine
• inside the database(s)

6
Information Management Software

Using Balanced Optimization

DataStage
Designer
 design
job original
DataStage  compile
job & run
 verify
 manually review/ job
 optimize Balanced Optimization edit optimized job results
job
rewritten
optimized  compile
job & run

 choose different options


and reoptimize

7
Information Management Software

Supported Transformations

Push to Push to
source target
Transformation 1 1
Sorting 1
Aggregation 1 1,2
Join, Lookup  
Funnel 
Drop unnecessary processing (e.g., sorting)  
Use bulk staging operations (load) 
Push everything into the (target) database 
as supported by database
1 2
involving data already in the target

8
Information Management Software

Balanced Optimization options

Push processing to database Push Transformations, Sorts, and Aggregation


sources into database sources

Push processing to database Push Transformations, Joins, Lookups, Sorts,


targets and Aggregation into database targets where
possible
Use Bulk Loading Leverage high performance bulk loads into
staging with post processing

Staging database name Name for an alternative database where bulk


staging is to be used
Push all processing into the If all sources, targets reside in the same database
database and transformation logic support, push all
processing into target

9
Information Management Software

Teradata Optimization Specifics


• Balanced Optimization leverages new Teradata Connector
• Support for V2R5, V2R6, v12
• Teradata Parallel Transporter (TPT)
• Supports
• Target table insert, update, delete, update-then-insert, delete-then-insert
• Create/replace/truncate/append of target table for insert
• Immediate and bulk modes, with partitioned parallelism
• Push processing into SQL as nested derived table SELECT statements
• Operator/function/sub-expression mapping
• Bulk INSERT to temporary staging tables
• Parallel bulk LOAD operator
• Staging tables managed automatically (create/load/process/drop/truncate)
• V12 ERROR TABLE

10
Information Management Software

Major European Bank ETL Design


Source Source Source Source

DataStage
Staging Area
one to one load of Source Data

Appl 1
LDWH
AT Financial Services Physical Data model
Appl 2
DataStage
Appl 1 Party
LDWH Party Finance
… Appl 2
Asset
Campain Agreement
Appl 1
LDWH
IT Channel Event/Claim
Appl 2
Internal
Product Location
Org.
Teradata RDBMS DataStage
V2R6.2

Datamart / Application Area

DataStage
Target Target Target Target

11
Information Management Software

Major European Bank ETL Infrastructure


• ETL Server
• IBM Information Server 8.0.1 on AIX
• IBM P6 2x8 CPU’s configured as 4 LPAR’s
• planned migration to DataStage Grid in 2008

• Teradata Server
• 2 x 3 Nodes 5500C
• 2x 6,7 TB Permspace
• 4 logical Environments

• Environments hosted on these infrastructure


• Production, Preproduction, Test and Development

12
Information Management Software

Example Job

13
Information Management Software

Example Job
67 % improvement in job run-time
Before optimization
Overall runtime 6:45 mins
Sum TD CPU 956.76
Sum TD IO 1,123,456.00
Sum TD Spool 8,534,112,347.00
Sum ETL Temp ~ 3 GB

After optimization
Overall runtime 2:25 mins
Sum TD CPU 1,962.22 Complete PushAllToDB was not possible,
because of nested IF Then ELSE
Sum TD IO 1,735,983.00
In GA version this is already working as
Sum TD Spool 10,832,294,222.00
complete push to Database
Sum ETL Temp ~ 1.5 GB

14
Information Management Software

Runtime statistics

• Statistics for 15 optimized Jobs in Detail

• Runtime reduction:
• Between 23% and 60% of previous runtime

• Teradata resource consumption (based on DBQL)


• CPU between -5% and 400% increase
• IO between -25% and 50% increase
• Spoolspace between 10% and 50% decrease

15
Information Management Software

Summary
• Delivers optimal runtime
performance through a scalable
parallel architecture
• Harnesses available capacity and
computing power in Teradata and
DataStage
• Unparalleled productivity gains
through simplification and
consolidation of development
resources
• Comprehensive lineage from
source to target delivering trusted
information

16
Thank you

17 © 2008 IBM Corporation

You might also like