Data Warehouse Implementation
Data Warehouse Implementation
Spring 2016
Zafar Heydari, Farid
Outlines About Lecturer
OLAP vs OLTP
Designing Dimensions
ROLAP
OLAP
MOLAP
OLTP SSIS
OLTP- Online Transactional Processing
Designed to solve everyday work transactions (i.e. sales, customer care, manufacturing)
Points where the data is captured and recorded in the company
Very efficient in the management of specific information : ERP, CRM, HRM, SCM, Email, Others
Disadvantages Advantages
They let you analyze all of the information available in the Data
Warehouse
Each cube stores a set of specific information, and contains different
“measures” and “dimensions”
Measures
numbers (amounts, quantities, percentages)
Dimensions
contain attributes to filter and order information
Samples of Dimension & MeasuresDimension
Highly Normalized
Lots of Joins
Archived Data from 1/1/1
Low Performance
Multiple Data Bases
Different Naming &
definitions
Why Data Warehouses?
Simplified Reporting
Archived Data
Dimension
(Attributes)
DIM Sub-Category
Role-Playing Dimensions
A single physical dimension can be referenced multiple times in a fact table
For instance: A fact table can have several dates, each of which is represented by a foreign
key to the date dimension. It is essential that each foreign key refers to a separate view of
the date dimension so that the references are independent.
These separate dimension views (with unique attribute column names) are called roles.
Designing Dimensions (6)
Hierarchies:
De-normalized Flattened Dimensions (Star)
Multiple Hierarchies in Dimensions (Snow flake)
Which Data levels (Star/ Snowflake?) : Depends on usability in all fact tables
Star Sample
Snowflake Sample
Hybrid Sample
Designing Fact Tables (1)
Measures Additivity:
Additive: All-dimensions
Non-Additive: No dimensions
(Unit Cost)
Semi-Additive: Some
dimensions (All except time)
(Units Balance)
Designing Fact Tables (3)
Fact-less Table:
represent Many-to-Many
relationships in Data
Warehouse.
Designing Fact Tables (4)
Conformed Facts
If the same measurement appears in separate fact
tables:
If the separate fact definitions are consistent, the
conformed facts should be identically named
Lineage Key
Partitioning on Large table
P-Schema
Simplified maintenance
Improve performance P-Function
Lineage
Directly in Fact tables
Centralized Lineage Dimension
Meta data about columns
Kimball Dimensional Modeling Techniques