Informatica Powercenter Training: Srinivas Panchadar
Informatica Powercenter Training: Srinivas Panchadar
Srinivas Panchadar
1
Introduction to Data Warehousing
2
Data Mart
3
DWH definition by Bill Inmon
4
Non-volatile: Data in the data
warehouse is never over-written or
5
OLTP : Online Transaction Processing
OLAP : Online Analytical Processing
6
OLTP Vs OLAP
7
8
Introduction to dimensional
modelling
Dimensional modeling (DM) names a
set of techniques and concepts used in
data warehouse design. It is considered
to be different from entity-relationship
modeling (ER). Dimensional Modeling
does not necessarily involve a relational
database. The same modeling approach,
at the logical level, can be used for any
physical form, such as multidimensional
database or even flat files.
9
Dimensional modeling always uses the
concepts of facts (measures), and
dimensions (context).
11
Snowflake Schema
12
13
Fact constellation schema
14
15
Typical Architecture of a DWH
Data Sources ETL Software Data Stores Data Analysis Users
Tools and
Applications
Transaction Data S
T
IBM A
Prod G
I
N SQL
G ANALYSTS
Mkt IMS
A
R Cognos
Ascential E
HR VSAM Data Marts
A Teradata SAS
IBM MANAGERS
O Finance
Oracle P
Fin E Load
R Data Essbase Queries,Reporting,
Extract A Warehouse DSS/EIS,
T Informatica Data Mining
Acctg Sybase Marketing
I
O EXECUTIVES
Other Internal Data N Micro Strategy
A Meta
L Data Sales
ERP SAP Sagent
D
A Microsoft Siebel
Web Data T
A Business OPERATIONAL
Informix Objects PERSONNEL
Clickstream S
T
SAS O
External Data R Web
E Browser
Demographic Harte-
Hanks Clean/Scrub CUSTOMERS/
Transform 16
SUPPLIERS
Firstlogic
Introduction to ETL
17
ETL Process
18
Different ETL Tools
Informatica
DataStage from IBM
SAS System from SAS Institute
Data Integrator From BO
Hummingbird Genio Suite from Hummingbird
Communications
Oracle Express
Abinito
Decision Stream From Cognos
MS-DTS from Microsoft
19
Introduction to Informatica
20
Components Of Informatica
Powercenter
Repository Manager
Designer
Workflow Manager
Workflow Monitor
21
Informatica provides the following integrated
components:
22
Architecture
23
Process Flow
Informatica Server moves the data from source to target
based on the workflow and metadata stored in the
repository.
A workflow is a set of instructions how and when to run
the task related to ETL.
Informatica server runs workflow according to the
conditional links connecting tasks.
Session is type of workflow task which describes how to
move the data between source and target using a
mapping.
Mapping is a set of source and target definitions linked
by transformation objects that define the rules for data
transformation.
24
Sources
File. Fixed and delimited flat file, COBOL file, and XML.
25
Targets
Power Mart and Power Center can load data into the
following targets:
Relational. Oracle, Sybase, Sybase IQ, Informix, IBM DB2,
Microsoft SQL Server, and Teradata.
26
General Flow of Informatica
Step 1: Creating Repository ,creating folders ,Creating
users and assign permission in Repository Manager, so
as to work in the client tools.
27
Repository
28
Repository Contd..
When you use Power Center, you can develop global and
local repository to share metadata:
29
Repository Architecture
Repository Client
Repository Server
----------------------------
Repository Agent
Repository Database
30
Creating a Repository
To create Repository
31
Working with Repository..
32
Working with Repository contd..
Informatica tools include two basic types of security:
Privileges. Repository-wide security that
controls which task or set of tasks a single user
or group of users can access. Examples of these
are Use Designer, Browse repository , Session
operator etc.
33
Folders
Folders provide a way to organize and store all metadata in the
repository, including mappings, schemas, and sessions. Folders
are designed to be flexible, to help you organize your data
warehouse logically. Each folder has a set of properties you can
configure to define how users access the folder. For example,
you can create a folder that allows all repository users to see
objects within the folder, but not to edit them. Or you can
create a folder that allows users to share objects within the
folder.
Shared Folders
When you create a folder, you can configure it as a shared
folder. Shared folders allow users to create shortcuts to objects
in the folder. If you have reusable transformation that you
want to use in several mappings or across multiple folders, you
can place the object in a shared folder.
35
Creating Folders
36
Other Features of Repository Manager
Viewing , removing Locks
Adding Repository
37
Designer
38
Designer Toos
Source Analyser
Warehouse/Target Designer
Transformation Developer
Mapplet Designer
Mapping Designer
39
Working with Designer
Connecting to the repository using User id
and password.
Creation of mapping
40
Tools provided by Designer
Source Analyzer: Importing Source definitions
for Flat file, XML, COBOL and relational Sources.
42
Import from Database
Use ODBC connection for importing from database
43
Import from File
44
Creating Targets
You can create target definitions in the Warehouse Designer for file
and relational sources. Create definitions in the following ways:
Import the definition for an existing target. Import the
target definition from a relational target.
Create a target definition based on a source definition.
Drag one of the following existing source definitions into the
Warehouse Designer to make a target definition:
o Relational source definition
o Flat file source definition
o COBOL source definition
Manually create a target definition. Create and design a
target definition in the Warehouse Designer.
45
Creating targets
46
Creation of simple mapping
47
Creation of simple mapping
Switch to the Mapping Designer.
Choose Mappings-Create.
In the Mapping Name dialog box, enter <Mapping Name> as the name
of the new mapping and click OK.
48
Mapping creation Contd..
Click the icon representing the EMPLOYEES source and drag
it into the workbook.
49
Mapping creation Contd..
50
Mapping creation Contd..
To Connect the Source Qualifier to Target Definition:
Click once in the middle of the <Column Name> in the Source
Qualifier. Hold down the mouse button, and drag the cursor to the
<Column Name> in the target. Then release the mouse button.
An arrow (called a connector) now appears between the row
columns
51
Transformations
52
Transformations
A transformation is a repository object that generates,
modifies, or passes data
53
Transformations
Active transformations
Aggregator performs aggregate calculations
Filter serves as a conditional filter
Router serves as a conditional filter (more than one filters)
Joiner allows for heterogeneous joins
Source qualifier represents all data queried from the source
Update strategy allows for logic to insert, update, delete, or reject
data
Lookup looks up values and passes to other objects
Passive transformations
Expression performs simple calculations
Lookup looks up values and passes to other objects
Sequence generator generates unique ID values
Stored procedure calls a stored procedure and captures return values
54
Transformations Contd..
Create the transformation. Create it in the Mapping
Designer as part of a mapping, in the Mapplet Designer as
part of a Mapplet, or in the Transformation Developer as
a reusable transformation.
55
Expression Transformation
You can use the Expression transformations to calculate
values in a single row before you write to the target.
56
Expression Transformation
Calculating Values
To use the Expression transformation to calculate values for a single
row, you must include the following ports:
Input or input/output ports for each value used in the
calculation. For example, when calculating the total price for an
order, determined by multiplying the unit price by the quantity
ordered, the input or input/output ports. One port provides the
unit price and the other provides the quantity ordered.
Output port for the expression. You enter the expression as a
configuration option for the output port. The return value for
the output port needs to match the return value of the
expression.
Variable Port : Variable Port is used like local variable inside
Expression Transformation , which can be used in other calculations
57
Source Qualifier Transformation
58
Source Qualifier Transformation
When you add a relational or a flat file source definition to a mapping, you need to connect
it to a Source Qualifier transformation.
The Source Qualifier represents the records that the Informatica Server reads when it runs
a session. You can use the Source Qualifier to perform the following tasks:
Join data originating from the same source database. You can join two or more tables
with primary-foreign key relationships by linking the sources to one Source Qualifier.
Filter records when the Informatica Server reads source data. If you include a filter
condition, the Informatica Server adds a WHERE clause to the default query.
Specify an outer join rather than the default inner join. If you include a user-defined
join, the Informatica Server replaces the join information specified by the metadata in the
SQL query.
Specify sorted ports. If you specify a number for sorted ports, the Informatica Server
adds an ORDER BY clause to the default SQL query.
Select only distinct values from the source. If you choose Select Distinct, the
Informatica Server adds a SELECT DISTINCT statement to the default SQL query.
Create a custom query to issue a special SELECT statement for the Informatica Server to
read source data. For example, you might use a custom query to perform aggregate
calculations or execute a stored procedure
59
Configuring Source Qualifier Transformation
60
Configuring Source Qualifier
Option Description
Defines a custom query that replaces the default query the Informatica Server uses
SQL Query
to read data from sources represented in this Source Qualifier
User-Defined Specifies the condition used to join data from multiple sources represented in the
Join same Source Qualifier transformation
Source Filter Specifies the filter condition the Informatica Server applies when querying records.
Indicates the number of columns used when sorting records queried from relational
Number of sources. If you select this option, the Informatica Server adds an ORDER BY to
Sorted the default query when it reads source records. The ORDER BY includes the
Ports number of ports specified, starting from the top of the Source Qualifier.
When selected, the database sort order must match the session sort order.
Sets the amount of detail included in the session log when you run a session
Tracing Level
containing this transformation.
Select Distinct Specifies if you want to select only unique records. The Informatica Server includes
a SELECT DISTINCT statement if you choose this option. 61
Joiner Transformation
While a Source Qualifier transformation can join data originating from a common source database,
the Joiner transformation joins two related
heterogeneous sources residing in different locations or file systems. The combination of sources
can be varied. You can use the following sources:
If two relational sources contain keys, then a Source Qualifier transformation can easily join the
sources on those keys. Joiner transformations typically combine information from two
different sources that do not have matching keys, such as flat file sources.
The Joiner transformation allows you to join sources that contain binary data.
62
Creating a Joiner Transformation
To create a Joiner Transformation:
63
Creating a Joiner Transformation
Select the Condition tab and set the condition.
64
Configuring Joiner transformation
Case-Sensitive
If selected, the Informatica Server uses case-sensitive string
String
comparisons when performing joins on string columns.
Comparison
Specifies the directory used to cache master records and the index to
these records. By default, the caches are created in a directory
Cache Directory specified by the server variable $PMCacheDir. If you override the
directory, be sure there is enough disk space on the file system. The
directory can be a mapped or mounted drive.
Specifies the type of join: Normal, Master Outer, Detail Outer, or Full
Join Type
Outer.
65
Lookup Transformation
66
Lookup Transformation
67
Diff bet Connected & Unconnected Lookup
Connected lookup Unconnected lookup
1) Receives input values directly from of a Receives input values from the result of
the pipe line transformation. LKP expression within other
transformation.
2) U can use a dynamic or static cache U can use a static cache.
3) Cache includes all lookup columns used Cache includes all lookup out put ports.
in the mapping.
4) Support user defined default values Does not support user defined default
values
68
Diff between Static & Dynamic Cache
1) U can not insert or update the U can insert rows into the cache as u pass
cache to the target
69
Update Strategy Transformation
When you design your data warehouse, you need to decide what type of
information to store in targets. As part of your target table design, you
need to determine whether to maintain all the historic data or just the
most recent changes.
For example, you might have a target table, T_CUSTOMERS, that contains customer
data. When a customer address changes, you may want to save the original
address in the table, instead of updating that portion of the customer record. In
this case, you would create a new record containing the updated address, and
preserve the original record with the old customer address. This illustrates how
you might store historical information in a target table. However, if you want the
T_CUSTOMERS table to be a snapshot of current customer data, you would update
the existing customer record and lose the original address.
The model you choose constitutes your update strategy, how to handle changes to
existing records. In Power Mart and Power Center, you set your update strategy at
two different levels:
Within a session. When you configure a session, you can instruct the
Informatica Server to either treat all records in the same way (for
example, treat all records as inserts), or use instructions coded into the
session mapping to flag records for different database operations.
Within a mapping. Within a mapping, you use the Update Strategy
transformation to flag records for insert, delete, update, or reject.
70
Setting up Update Strategy at Session Level
During session configuration, you can select a single database operation
for all records. For the Treat Rows As setting, you have the following
options:
Setting Description
Treat all records as deletes. For each record, if the Informatica Server finds a
corresponding record in the target table (based on the primary key value),
Delete
the Informatica Server deletes it. Note that the primary key constraint must
exist in the target definition in the repository.
Treat all records as updates. For each record, the Informatica Server looks for
a matching primary key value in the target table. If it exists, the Informatica
Update
Server updates the record. Again, the primary key constraint must exist in the
target definition.
The Informatica Server follows instructions coded into Update Strategy
transformations within the session mapping to determine how to flag records
for insert, delete, update, or reject.
Data If the mapping for the session contains an Update Strategy transformation,
Driven this field is marked Data Driven by default.
If you do not choose Data Driven setting, the Informatica Server ignores all
Setting Use To
Populate the target tables for the first time, or maintaining a historical
Insert data warehouse. In the latter case, you must set this strategy for the
entire data warehouse, not just a select group of target tables.
Update target tables. You might choose this setting whether your data
warehouse contains historical data or a snapshot. Later, when you
Update configure how to update individual target tables, you can determine
whether to insert updated records as new records or use the updated
information to modify existing records in the target.
Exert finer control over how you flag records for insert, delete, update,
or reject. Choose this setting if records destined for the same table
Data
need to be flagged on occasion for one operation (for example,
Driven
update), or for a different operation (for example, reject). In addition,
this setting provides the only way you can flag records for reject.
72
Workflow Manager
Task Developer
Worklet Designer
Workflow Designer
73
Task Developer
A task developer facilitates to create
tasks such as session, email and
command tasks.
74
Worklet Designer
A set of sessions which can be reused
in the workflow, can be created in
worklet.
75
Workflow Manager
76
77
Workflow Monitor
78
79
References
The Data Warehouse toolkit
Ralph Kimball
Informatica Developer Network
tutorialspoint.com
learndatamodeling.com
80
Thank You
81