Data Modeling Data Mapping Training
Data Modeling Data Mapping Training
Oct 2010
EDW Upstream Data Model – To EDW, Upstream is a combination of layers having processes, databases, jobs, and
de-coupling views through which extracted source system files are loaded from Extraction Layer to staging (or Loading)
layer and then further transformed via Transform Layer to Core EDW Layer.
Prerequisite:
User requirement Or Functional Spec document, to determine the data elements to bring to EDW.
Early data Inventory (EDI), to understand the source system which contains the required data elements.
It has to explain the business attributes,
primary keys of each tables,
The relationship between the tables,
If the source system has the files which relates to other system.
Data Profiling documents to understand the data demographics and integrity of the new data elements
Understanding of the existing EDW model Or Teradata FSLDM.
Section 2a – Data Modeling & Data Mapping guidelines Upstream Contd..
Upstream Logical Data Modeling Process: Once the data elements required to bring to EDW is finalized and the
EDI for the corresponding source system is available, then it can follow the below process to prepare the data model.
1. Identify the data entity: Based on the EDI and data profiling, identify the data entity corresponding to the data
element which has to bring to EDW. E.g. Customer, account, Branch, employee, transaction, campaign, product,
collateral etc..
2. Group the attributes related to each entity, which EDW is interested.
Attributes describes data entities e.g.: Account (entity) : Status, account number, account open date ..etc
3. Refer the FSDLM guidelines to map the identified entities to OCBC FSLDM.
As per FSLDM, Customer, Branch, Employee are Party
Transaction is event
Collateral is Party asset
4. Extend the OCBC FSLDM model if the required entity doesn’t fit the existing model subject to the approval of
OCBC Data Modeler.
Section 2a – Data Modeling & Data Mapping guidelines Upstream Contd..
Upstream Physical Data Model creation: Physical data models are derived from logical model entity definitions and
relationships and are constructed to ensure that data is stored and managed in physical structures that operate effectively
on the Teradata database platform. Below are the PDM creation guidelines:
Primary Index: Target tables to follow the best candidate eligible for Primary Index (Unique or Non-Unique) knowing
the access path, distribution, join criteria etc.
Note: Primary Index and Primary key are different concept.
Partitioned Primary Index: To enhance the performance of the big size tables, PPI can be implemented. Refer the
Teradata technical documentation for more details on PPI guidelines.
Default Values: It can be used to default the values if no value passed to the table attribute
ETL Control Frame work Attributes: All EDW target tables (except b-key, b-map and reference tables) will have
seven control frame work attributes.
1. Start_Dt
2. End_Dt
3. Data_Source_Cd
4. Record_Deleted_Flag
5. BusinessDate
6. Ins_Txf_BatchID
7. Upd_Txf_BatchID
Data Types Assignation: Data Types should be assigned as part of physicalisation.
Null / Not Null Handling: Nulls and Not-Nulls should be assigned for each attribute.
Section 2a – Data Modeling & Data Mapping guidelines Upstream Contd..
T - EDW Platform Upstream Source Target Mapping: Data architecture document will be used as the
input for the source to target mapping document. Source to Target Mapping document captures the detailed
business rules, how the source business attributes transformed to EDW data elements. It will be used as the
reference document for the further ETL development.
Section 2b – Data Modeling & Data Mapping guidelines Downstream
EDW Downstream Data Model: To EDW, Downstream is a combination of layers having process, databases, tables,
views and through which user requirements are modeled and met after applying the business and technical
transformation rules on data read via customized OCBC FSLDM.
Downstream Model standards:
To be modelled to support user requirements and must be scalable.
To be modelled to ensure that it’s easy to maintain and it’s tuned to meet the SLA based on best performance.
To be modelled keeping in mind the enterprise level reporting (including geographies).
To be modelled with various security attributes at most granular level.
Tables must not be skewed.
Jobs to follow the Primary Key defined by OCBC Data Modeler.
Target tables to follow the best candidate eligible for Primary Index (Unique or Non-Unique) knowing the access path,
distribution, join criteria etc.
Prerequisite: User requirement or Functional Spec document, to determine the data elements to bring report to users which:
should explain the business rules to derive the reporting data elements
should specify the reporting level of each data element
should have details of the reporting hierarchies
Section 2b – Data Modeling & Data Mapping guidelines Downstream Contd..
T - EDW Platform Downstream Source Target Mapping: Data architecture document will be used as
the input for the source to target mapping document. Source to Target Mapping document captures the detailed
business rules, how the source business attributes transformed to EDW data elements. It will be used as the
reference document for the further ETL development.
Section 3a – Naming standards and conventions
Upstream Data Model: The naming conventions can be proposed by HCL Data Modeler to OCBC Data Modeler, in case of
FSLDM Customizations. The approval will be provided by OCBC Data Modeler.
Downstream Data Model: The naming conventions shall be followed by HCL Data Modeler based on the below rules:
Parent Database D<Env>_<Abbreviated Application This will hold all filtering and configuration for the associated
Name>DM Data Mart
Example:
Child Database: There will be some Childs of Parent database DP_CMDM based on different layers.
No Standard Definition
1 D<Env>_F<Abbreviated Application Name>DM This will hold all filtering and configuration for the associated Data Mart
*All in capital letter.
E.g. DP_FCMDM
2 D<Env>_B<Abbreviated Application Name>DM This database is the ‘base view’ layer from EDW into data mart. It
*All in capital letter. contains (1:1) views on top of VEDW, which make use the filter
3 D<Env>_W<Abbreviated Application Name>DM This will hold all interim / temporary tables, macros, Stored procedures
*All in capital letter. required to provide the functionality in DP_V<App Name>
E.g. DP_WCMDM
4 D<Env>_T<Abbreviated Application Name>DM This will hold all the ‘extra’ tables (e.g. materialized views) that are
*All in capital letter. required to be created in the Data Mart
E.g. DP_TCMDM
5 D<Env>_V<Abbreviated Application Name>DM This will hold all the ‘final’ views required by the down stream
*All in capital letter. application / user
E.g. DP_VCMDM
Section 3a – Naming standards and conventions Contd…
No Convention Description
1 <Object Type> Contains one/two character indicating the object type for the objects in working database only.
*All in capital letter. T: Table, V: View, SP: Procedures, XV: Transformation Views, M: Macros, etc.
E.g. T
2 <Abbreviated Application Name> Short name for application name e.g. CM (customer marketing)
*All in capital letter.
E.g. Retail Payment System (RPS)
3 <Abbreviated Product or Business or Rule1: Short name for product or business or table name eg. FITAS, OD, BAL_TYPE.
Table Name> Rule 2: Give preference to the kind of information it is going to keep and the terminology goes as
*All in capital letter. per Core EDW.
E.g. ACCOUNT_TYPE Rule 3: Take out the vowels i.e. a,e,i,o,u
ACCNT_TYP Rule 4: The length name should not be greater than 30 characters
Example:
Like measure Count of Transaction is not part of Core EDW then
Column Name
Count_Of_Transactions / Transaction_Count
Take off the vowels i.e. a, e, i, o and u.
Column Name post this rule
Cnt_Of_Trnsctn / Trnsctn_Cnt
Keep the column name in mixed case (first letter in capital).
Column Name post this rule
Cnt_Of_Trnsctn / Trnsctn_Cnt
The control columns should be brought as it is from Core EDW. And the above rules does not apply on them.
Column Name post this rule
The above column is not a control column.
Control column like Data_Source_Cd should be brought as it is.
Section 3b – Basic S/W Setup and Login
Currently OCBC is using Erwin Version 4.1. Every HCL Data Modeler need to get the Erwin CD from OCBC Data Modeler
to install the software on their respective machines.
For installation, one need to raise a Temp Admin ticket and get it approved by EIS Head. Then the Erwin software for data
modeling can be installed.
The instructions need to be followed on the screen and lastly ‘Key in’ the Erwin Key details mentioned in the CD.
For further clarifications, please contact HCL Data Modeler.
Section 4a – Tips for Developers
Request developers not to create any tables in the DD environment for development. Ask the HCL data modeler to create
through Erwin then he/she will generate the DDLs through Erwin and get it deployed in DD environment through DBA /
OCBC Data Modeler.
Section 4b – Best Practices
Upstream Mapping Naming Structure: The document name must reflect the followings:
Project Name or ITSR# or ITSC# such as ALMS or ITSR 7338 or ITSC 38755
Source Application Name such as SG_CIF
Version Number or CR Release such as ver 1.0 CR 14 or Jul10
Example for mapping doc would be:
ALMS SG_CIF Upstream Mapping Ver 1.0
Downstream Mapping Naming Structure: The document name must reflect the followings:
Project Name or ITSR# or ITSC # such as ALM or ITSR 7338 or ITSC 38755
Downstream Data Mart Name such as CSDM
Version Number or CR Release such as Ver 1.0 CR 14
Example for mapping doc would be:
ALMS CSDM Downstream Mapping Ver 1.0
Identify entities
Identify attributes of entities
Identify relationships
Apply naming conventions
Assign keys and indexes
Normalize to reduce data redundancy
Denormalize to improve performance
Section 5 – Check list / Self Review Process
Data Mapping:
Please start the self review after the completion of data mapping document. Each and every column in the doc should
provide the useful and meaningful information. Provide the information in the transformation rules as expected by the
OCBC reviewer.
Once the self review is done, suggest to have an internal review by HCL Data Mapper before proceed to the OCBC
BADM team review to avoid rework / escalations.
Data Modeling:
Check closely if all the names having correct naming conventions.
Create the domains to provide the flexibility of changing a data type
Create the indexes as per the OCBC standards.
Assign all the data types to all the columns.
Provide the table and column definitions for all the tables.
Assign Null & Not null to all the columns.
Create the default values, if any.
Attach the abbreviation file with all the meaningful abbreviated tables and columns.
Generate the DDLs from Erwin and add the database name for the datamart, send it to OCBC data modeler for review.
Once he/she approves will ask DBA to deploy it in the respective environment.
Make the data model presentable to the client / business users by coloring the entities and the relationships / columns.
Section 6 – DOs and DON’Ts
Please do a self review after the completion of Data Model & Data Mapping. All the standards should be meet as per the client
expectations.
For Data Modeling check all the data types, null, not null fields, indexes, generate the DDLs and pass it to the OCBC DM
get the approval and he/she will pass it to DBA to create the table structures.
For Data Mapping, all the columns should be filled with proper formatting, coloring, if any, font, size, uppercase, lowercase,
etc.
Every page of mapping document should be updated by the mapping name.
Extraction criteria should be clearly mentioned.
THANK YOU