0% found this document useful (0 votes)
85 views97 pages

4.0 Dimensional Modelling - CAAS 2

Uploaded by

akirayuuki013
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views97 pages

4.0 Dimensional Modelling - CAAS 2

Uploaded by

akirayuuki013
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 97

4.

Introduction to
Dimensional Modeling

CARLO ANGELO A. SONDAY


Assistant Professor
Department of Industrial Engineering & Operations Research
University of the Philippines Diliman
[email protected]
Outline for This Module
1. Entity Relationship Modelling
• Case Study on ER Diagrams
2. Relational Modelling
• Case Study on Relational Modelling
3. Relational Databases and SQL
• Case Study on SQL
4. Dimensional Modelling
• Case Study on Dimensional Modelling
5. ETL
• Case Study on Source to Target Mapping
6. Transformations
• Case Study on ETL

2 E.R. L. Jalao, UP NEC, [email protected]


Outline for This Session
• Introduction to Data Warehousing
• What is Dimensional Modeling?
• Why not Relational Modeling?
• Examples of Dimensional Modeling
• Fact and Dimension Tables
• Designing the Dimension Model
• Case Study

3 E.R. L. Jalao, UP NEC, [email protected]


Introduction to Data Warehousing
Definition: Data Warehousing

• A data warehouse (DW) is a


• subject-oriented
• integrated
• time-varying
• non-volatile
collection of data that is used primarily in
organizational decision making.
-- Bill Inmon, Building the Data Warehouse 1996

4 E.R. L. Jalao, UP NEC, [email protected]


Data Warehousing and Analytics

BA Framework
5 E.R. L. Jalao, UP NEC, [email protected]
Introduction to Data Warehousing

Definition: Subject-Oriented DW
• Data is categorized and stored by business subject
rather than by application
Data
Applications Warehouse
Subject
Equity
Plans Shares

Customer
Insurance Savings financial
information
Loans
Applications vs Subjects
E.R. L. Jalao, UP NEC,
6 6
[email protected]
Introduction to Data Warehousing

Definition: Integrated DW
• Data on a given subject is defined and stored once.

Savings

Current
accounts
Customer

Loans
Applications Integration Data
E.R. L. Jalao, UP NEC, Warehouse
7 7
[email protected]
Introduction to Data Warehousing

Definition: Time-Variant DW
• Data is stored as a series of snapshots, each
representing a period of time

Time Variant Example


E.R. L. Jalao, UP NEC,
8 8
[email protected]
Introduction to Data Warehousing

Definition: Non-Volatile DW
• Typically data in the data warehouse is not updated or
deleted.

Operational Databases DW

Load

Insert Read Read


Update
Delete Non Volatile Example
E.R. L. Jalao, UP NEC,
9 9
[email protected]
Introduction to Data Warehousing
• Business Analytics v. Data Warehouse
• Data Warehouse is the Information Technology (IT) term
• Users, especially senior management and users new to the
concept of a data warehouse, identify more readily with
Business Analytics
• We will use the term Data Warehouse most often in this
module because much of our focus will be on how to build a
DW system
• Always keep in mind that the end purpose is Analytics

10 E.R. L. Jalao, UP NEC, [email protected]


Outline for This Session
• Introduction to Data Warehousing
• What is Dimensional Modeling?
• Why not Relational Modeling?
• Examples of Dimensional Modeling
• Fact and Dimension Tables
• Designing the Dimension Model
• Case Study

11 E.R. L. Jalao, UP NEC, [email protected]


What is Dimensional Modeling?
• Dimensional modeling is a logical design technique for
structuring data so such that
• It is intuitive for business users
• And delivers fast query performance.
• Widely accepted as the preferred approach for DW
presentation.
• Simplicity is fundamental to usefulness.
• Allows any analytics software to easily navigate
databases.

12 E.R. L. Jalao, UP NEC, [email protected]


What is Dimensional Modeling?

The Kimball Lifecycle


13 E.R. L. Jalao, UP NEC, [email protected]
What is Dimensional Modeling?
Definition: Dimensional Modeling

• Divides world into measurements and context.


• Measurements are numeric values called facts.
• Context intuitively divided into clumps called
dimensions.
• Dimensions describe the “who, what, where, when,
why, and how” of the facts.

14 E.R. L. Jalao, UP NEC, [email protected]


What is Dimensional Modeling?
Definition: Dimensional Model

• A dimensional model consists of a fact table containing


measurements surrounded by a halo of dimension
tables containing textual context.
• Known as a star join.
• Known as a star schema when stored in a relational
database (RDBMS).

15 E.R. L. Jalao, UP NEC, [email protected]


What is Dimensional Modeling?

Typical Dimensional Model


16 E.R. L. Jalao, UP NEC, [email protected]
Standard SQL Query
Template
SELECT p.brand, sum(f.pesos_sold),
sum(f.units_sold)
FROM sales_fact f, product_dim p, date_dim d
WHERE f.productkey = p.productkey
and f.datekey = d.datekey
and d.quarter = ‘1 Q 2015’
GROUP BY p.brand
ORDER BY p.brand

17 E.R. L. Jalao, UP NEC, [email protected]


Typical Dimensional Answer Set

Brand Pesos Sales Unit Sales


Axon 780 263
Framis 1044 509
Widget 213 444
Zapper 95 39
Dimension Fact Table
Attribute Metrics

18 E.R. L. Jalao, UP NEC, [email protected]


Creating a Report by Drag and Drop

19 E.R. L. Jalao, UP NEC, [email protected]


Dimension Attributes
Yield Interesting Results
• Dimension attributes are the source of most interesting
constraints
• Examples
• Slice sales by product category, by region, by barangay
• Analyze sales effectiveness on radio promotions via the
AdType attribute in Promotions dimension

20 E.R. L. Jalao, UP NEC, [email protected]


Outline for This Session
• Introduction to Data Warehousing
• What is Dimensional Modeling?
• Why not Relational Modeling?
• Examples of Dimensional Modeling
• Fact and Dimension Tables
• Designing the Dimension Model
• Case Study

21 E.R. L. Jalao, UP NEC, [email protected]


Two Paradigms
• Relational Modelling
• Dimensional Modelling

22 E.R. L. Jalao, UP NEC, [email protected]


Review: Normalized Models
• Widely used method in most databases nowadays
• Designed to eliminate redundancies. Other than keys, each
attribute may appear in only one table.
• Design objective: a Third Normal Form (3NF) model.
• Modeling business processes results in numerous data
entities/tables and a spaghetti-like interweaving of relationships
among them.
• Some ERP systems have tens of thousands of tables.
• Even a small model can be challenging.
• Normalized models essential to good operational systems
• Excellent for capturing and understanding the business (rules)
• One PO, multiple Line Items
• Great for speed when processing individual transactions

23 E.R. L. Jalao, UP NEC, [email protected]


Northwind Relational Model

24 E.R. L. Jalao, UP NEC, [email protected]


Normalized Models NOT
Good for DW Systems
• Not usable by end-users – too complicated and
confusing
• Not usable for DW queries – performance too slow
(many joins)

25 E.R. L. Jalao, UP NEC, [email protected]


Observations on Relational Models

• Normalized models look very different from


dimensional models
• Normalized models confuse business users
• Business users see their business in dimensional models
• Dimensional models may contain more content than
normalized models
• History
• Enhanced with content from external sources

26 E.R. L. Jalao, UP NEC, [email protected]


Two Key Benefits of Dimensional
Modeling à la Kimball
• Understandability
• Model must be easily understood by business users
• Yet represent complexities of the business
• Performance
• Fast response to queries that summarize millions of rows is
essential
• Limiting models to single level joins rather than multi-level joins
• Denormalization has a significant impact on performance

27 E.R. L. Jalao, UP NEC, [email protected]


Benefits of Dimensional Models
• Predictable, Standard Framework
• Gracefully Extensible to Accommodate Change
• Star Schema is Symmetrical (Order is irrelevant)
• Standard Approaches for Common Modeling Situations
• Aggregate Management

28 E.R. L. Jalao, UP NEC, [email protected]


Benefits of Dimensional Models
• Predictable, Standard Framework
• Users recognize that this is “their business”
• Report writers, query tools, and user interfaces can be built
into BI tools
• Makes user interfaces more understandable
• Makes processing more efficient

29 E.R. L. Jalao, UP NEC, [email protected]


Benefits of Dimensional Models
• Gracefully Extensible to Accommodate Change
• Existing tables can be changed by adding new data rows
• Data should not have to be reloaded
• No query tool or reporting tool has to be reprogrammed
• Old BI applications continue to run without yielding different
results

30 E.R. L. Jalao, UP NEC, [email protected]


Benefits of Dimensional Models
• Star Join Schema is Symmetrical
• Every dimension is equivalent
• All dimensions symmetrically equal entry points to the fact
table
• No concern about order in selecting tables
• Logical design can be done nearly independent of expected
query patterns
• Future queries not thought of can be accommodated easily
• User interfaces, query strategies, and SQL generated are all
symmetrical

31 E.R. L. Jalao, UP NEC, [email protected]


Benefits of Dimensional Models
• Standard Approaches for Common Modeling Situations
• Role-playing dimensions
• Sales Date versus Received Date
• Slowly changing dimensions
• Heterogeneous products
• Need to track lines of business together
• But each LOB product set is highly idiosyncratic
• And more…

32 E.R. L. Jalao, UP NEC, [email protected]


Benefits of Dimensional Models
• Aggregate Management
• Aggregate tables are summary tables
• Example: monthly sales fact table with month dimension
• A sound aggregate strategy is essential to good performance
and economic processing

33 E.R. L. Jalao, UP NEC, [email protected]


Outline for This Session
• Introduction to Data Warehousing
• What is Dimensional Modeling?
• Why not Relational Modeling?
• Examples of Dimensional Modeling
• Fact and Dimension Tables
• Designing the Dimension Model
• Case Study

34 E.R. L. Jalao, UP NEC, [email protected]


Star Schema Example

35 E.R. L. Jalao, UP NEC, [email protected]


With Dimension Families

36 E.R. L. Jalao, UP NEC, [email protected]


Sample Data

37 E.R. L. Jalao, UP NEC, [email protected]


ERLJalao Copyright for UP Diliman
38
[email protected]
Outline for This Session
• Introduction to Data Warehousing
• What is Dimensional Modeling?
• Why not Relational Modeling?
• Examples of Dimensional Modeling
• Fact and Dimension Tables
• Designing the Dimension Model
• Case Study

39 E.R. L. Jalao, UP NEC, [email protected]


Fact Table Facts
• A fact is a performance measure
• Sales of Product X
• Fact value not known in advance; only when an event
measurement occurs
• Actual Sales
• Facts are numeric
• In PhP
• The most useful facts are numeric and additive
• At least interval type of attributes

40 ERLJalao, UP NEC, [email protected]


Fact Table Traits
• Are usually the largest tables
• Are usually appended to
• Can grow quickly
• A single fact table can contain either detail or
summarized data
• Their measures are typically though not necessarily
additive
• Are primarily joined to dimension tables through foreign
keys

41 ERLJalao, UP NEC, [email protected]


Table Granularity
• A table’s grain is the business definition of the
measurement event that produces the table
• Example: Each time a customer submits an order online a
customer order event ultimately becomes a row in the
customer order fact table.
• Declaring the grain means a fact table row represents the
blank in this statement: “A fact row is created when _______
occurs.”

42 ERLJalao, UP NEC, [email protected]


Determining the Grain of a Table
• In business terms
• What is the meaning of an individual row in the table
• In data modeling terms
• What is the unique logical identifier
• What are the identifying dimension keys
• In ETL terms
• What is the rule for populating the table

43 ERLJalao, UP NEC, [email protected]


Grain of a Fact Table Example
• Granularity
statement
– “One row for each
product sold by store
by day”

44 ERLJalao, UP NEC, [email protected]


Sample Fact Table Rows

45 E.R. L. Jalao, UP NEC, [email protected]


Dimension Tables
• A dimensional model divides the world into
measurements and context.
• Context intuitively divided into clumps called
dimensions.
• Dimensions describe the “who, what, where, when,
why, and how” of the facts.
• They can also identified as the “by” words in a
business question that asks for a report.
• Example: “I’d like a report that lists sales by store by product
by quarter.

46 E.R. L. Jalao, UP NEC, [email protected]


ERLJalao Copyright for UP Diliman
47
[email protected]
Dimension Tables
• Contain the parameters by which the fact table
measures are analyzed
• amount sold is analyzed by day, month, quarter, or year
• amount sold on sunny days vs. rainy days
• inventory quantity analyzed by warehouse by product
• profit analyzed by product, category, department, store,
district, or region

48 E.R. L. Jalao, UP NEC, [email protected]


Dimension Table Traits
• Provide the context to the fact table measures they
describe
• Contain descriptors of the business (nouns)
• Utilize business terminology
• Many large columns
• Contain textual and discrete data
• Are usually smaller than fact tables

49 E.R. L. Jalao, UP NEC, [email protected]


Dimension Table Traits
• Have a single column surrogate primary key (called the
warehouse dimension key)
• Are joined to a fact table through a foreign key
reference to their primary key
• Can contain one or more hierarchies
• The hierarchies are de-normalized into the dimension
tables

50 E.R. L. Jalao, UP NEC, [email protected]


Generic Dimension and Fact Tables

51 E.R. L. Jalao, UP NEC, [email protected]


Analytical and Detail Attributes
• Analytical Attributes are dimension attributes where
summarization/aggregation is feasible
• Sex at Birth: Male/Female
• Plan Type: Prepaid/Postpaid

• Detail Attributes are dimension attributes where


summarization/aggregation is typically not feasible and
not worthwhile
• Street Address: 45 Magsaysay Avenue
• Contact Number: +63 2 414 6510

52 E.R. L. Jalao, UP NEC, [email protected]


The Anatomy of a Dimension
• Composed of:
• Primary key
• Natural key
• Detail attributes
• Analytical attributes
• Hierarchies

53 E.R. L. Jalao, UP NEC, [email protected]


Rental Product Dimension Anatomy
• Primary key • Analytical attributes
• Rental Product Key – Rental Product Age
• Natural key Classification
• Rental Product ID – Rental Product Box Office
Rating
• Detail attributes
• Rental Product ID
• Rental Product Title
• Hierarchies
• Product < Category < Type
• Product < Genre

54 E.R. L. Jalao, UP NEC, [email protected]


Rental Product Dimension
Level Type

Genre
Category

Rental Hierarchy
Product

Age
Classificatio Title
n
Box Office Detail
Ratings Attribute
Analytical
Attribute
55 E.R. L. Jalao, UP NEC, [email protected]
Rental Product Dimension
Type

Genre
Category

Rental
Product
Age
Classificat Title
ion Box
Office
Ratings

56 E.R. L. Jalao, UP NEC, [email protected]


What is a Surrogate Key?
• A surrogate key is a system assigned primary key.
• When the first row is added to a dimension, the system automatically
assigns a key of 1 to the row.
• As each additional row is added, the system automatically increments
the key by 1.
• It’s meaningless, but essential as a foreign key in fact tables
• Important: Retain source system primary key as unique identifier to use
as lookup argument during ETL process and for report headers

57 E.R. L. Jalao, UP NEC, [email protected]


Warehouse Dimension Keys
• Single column surrogate keys
• Provide key control within the data warehouse
• Substantially improve performance
• Enable one method of tracking attribute history
• Facilitate exception references from a fact table
• Implemented in every dimension, even date and time
dimensions

58 E.R. L. Jalao, UP NEC, [email protected]


Example
• Legacy Data

• Within Data Warehouse, Data Marts, Analytical


Application Stores

• Invalid Rows

59 E.R. L. Jalao, UP NEC, [email protected]


Exception Condition Dimension
Table Rows
• Indicate that the row in the fact table referenced an
exception condition
• 0 – the fact table row had an invalid legacy id for this
dimension (Invalid)
• -1 – The fact table row should reference a value for this
dimension, but the value is unknown (Missing
Mandatory)
• -2 – The fact table row is not applicable for this
dimension (Missing Optional)

60 E.R. L. Jalao, UP NEC, [email protected]


Examples
• Invalid reference from fact table
• The sale of a product whose product ID is not in the
dimension table
• Unknown reference from fact table
• The sale of a product whose product ID is missing
• The fact table row is not applicable to this dimension
• The sale of a product that is not on promotion

61 E.R. L. Jalao, UP NEC, [email protected]


Default Dimension Rows

62 E.R. L. Jalao, UP NEC, [email protected]


Benefits of Surrogate Keys
• Provide Key Control
• Maintain dimension key control from within the Data
Warehouse environment
• Isolation from the operational system
• Strategic vs. Operational perspective
• Substantially Improve Performance using a single
column primary key
• These keys are the foreign key references which are
carried in the fact tables
• Substantially reducing fact table sizes

63 E.R. L. Jalao, UP NEC, [email protected]


Track Attribute History
• Enable one method of tracking dimension attribute
changes
• Type 2 – Slowly Changing Dimension
• Not to be used for all dimension t

64 E.R. L. Jalao, UP NEC, [email protected]


Sample Dimension Table

65 E.R. L. Jalao, UP NEC, [email protected]


Sample Dimension Table

66 E.R. L. Jalao, UP NEC, [email protected]


Sample Queries
• Which stores sold the most of product category ‘ABC’
last week?
SELECT store, sum(sales_dollars)
FROM sales_fact sf, sales_date sd, product p
where last_week_ind = ‘Y’ AND
product_category = ‘ABC’ and
<JOIN Statements>
GROUP BY store having rank(sum(sales_ dollars))
<6

67 E.R. L. Jalao, UP NEC, [email protected]


Sample Report
• Business Analysis
• How did profit last month equate to store size?
• Report

68 E.R. L. Jalao, UP NEC, [email protected]


Outline for This Session
• Introduction to Data Warehousing
• What is Dimensional Modeling?
• Why not Relational Modeling?
• Examples of Dimensional Modeling
• Fact and Dimension Tables
• Designing the Dimension Model
• Case Study

69 E.R. L. Jalao, UP NEC, [email protected]


Designing the Dimensional Model
Steps
• Establishing Naming Conventions
• Do the Four-Step Dimensional Modeling Process
• Document the High Level Data Model Diagram
• Define the Data Sources
• Document the Detailed Table Designs
• Develop Detailed Bus Matrix
• Identify, Track, and Resolve Issues

70 E.R. L. Jalao, UP NEC, [email protected]


Establishing Naming Conventions
• Use descriptive and consistent data names. Reasons:
• Names become column headers in reports. Column names must
be non-redundant. Example: not just City, but Customer City or
Supplier City
• Use standard naming convention
• PrimeWord_ZeroOrMoreQualifiers_ClassWord
• Dimension names – product_key, product_category_code,
product_category_name
• Fact names – item_amount, order_amount
• Know the naming rules of your RDBMS
• ProductKey, ProductCategoryCode, …

71 E.R. L. Jalao, UP NEC, [email protected]


Four Step Table Design Process
1. Choose the Business Process
2. Declare the Grain
3. Identify the Dimensions
4. Identify the Facts

72 E.R. L. Jalao, UP NEC, [email protected]


Document the High Level Data
Model Diagram
• High Level Data Model Diagram
• Used to communicate and validate with business users and
senior management
• Always follow the same convention in arranging dimensions
around the fact table, e.g., start with the date at the top
• Use the same arrangement with aggregates or omit or gray
out unused dimensions and substitute the names of shrunken
dimensions for others
• See exhibit 5

73 E.R. L. Jalao, UP NEC, [email protected]


Define the Data Sources
• This is sometimes known as the Application
Architecture
• Often much more extensive descriptions are very
helpful if you have many sources
• See exhibit 6

74 E.R. L. Jalao, UP NEC, [email protected]


Document the Detailed Table
Designs
• Document the detailed dimension worksheet
• Known as a Source-to-Target Map
• See Exhibit 7
• Note that spreadsheets are used extensively in
metadata documentation

75 E.R. L. Jalao, UP NEC, [email protected]


Develop Detailed Bus Matrix
• Bus matrix makes several things articulate and obvious
• Business processes have several fact tables
• Explicit granularity for fact tables
• Named facts for fact tables
• Reusable conformed dimensions
• See exhibit 8

76 E.R. L. Jalao, UP NEC, [email protected]


Identify, Track, and Resolve
Issues
• Issues continually arise as the team works among its
members and with business participants
• Important to identify, track, and resolve these issues
• See issues log
• Assign someone to capture and track issues that arise
at meetings or in discussions

77 E.R. L. Jalao, UP NEC, [email protected]


Dimensional Normal Form
Process
• A creative and practical approach originated by Mike
Schmitz to design Dimension Table Families
• Fact tables are highly normalized for maintainability and
flexibility
• Dimensions have their hierarchies de-normalized into them for
usability and performance
• The schema is limited to two levels
• A single first level or central highly normalized table called a fact
table
• Multiple second level tables called dimension tables linked to the
first level table in primarily one to many relationships

78 E.R. L. Jalao, UP NEC, [email protected]


Product Dimension Family
Normalized Model
• Dimensional Normal Form (DNF) Step 1

79 E.R. L. Jalao, UP NEC, [email protected]


Product Dimension Family
Denormalized Model
• Dimensional Normal Form (DNF) Step 2

80 E.R. L. Jalao, UP NEC, [email protected]


Product Dimension Family Usage
Product Dim Daily Summary Fact
• Declare Facts Step 3

81 E.R. L. Jalao, UP NEC, [email protected]


Product Dimension Family Usage
Product Dim Daily Summary Fact
• Dimensional Normal Form (DNF) Step 4

82 E.R. L. Jalao, UP NEC, [email protected]


What is Snowflaking?
• To use normalized tables in the dimensional
model.
• Break dimension hierarchies into normalized
tables connected by foreign key – primary key
relationships

83 E.R. L. Jalao, UP NEC, [email protected]


Why is this Bad ?
• Joins
• Joins
• Joins
• Joins
• Every join costs something and one extra join
may cause the database optimizer to choose a
bad algorithm

84 E.R. L. Jalao, UP NEC, [email protected]


Dimension Table Solution

85 E.R. L. Jalao, UP NEC, [email protected]


DNF Complete Solution

86 E.R. L. Jalao, UP NEC, [email protected]


Dimensional Model Exercise:
Hotel Business Overview
• Maitutulog Mo Kaya Hotel (MMK Hotel)
• Composed of 500 hotels
• Three property types (luxury, economy, budget)
• Different room types
• Want to maximize utilization
• Want to maximize profit

87 E.R. L. Jalao, UP NEC, [email protected]


Daily Room Type Profitability
Analysis
• What room types have the highest profitability and
which have the lowest profitability across the chain, by
property type?
• Which hotels have room type profitability different from
the norm?
• How does weekend profitability compare with
weekday?
• How does weekday profitability differ by day?

88 E.R. L. Jalao, UP NEC, [email protected]


Hotel Property Management
System
Source Data 3NF

89 E.R. L. Jalao, UP NEC, [email protected] 89


Challenge
• Build the LDMs for Daily Room Billing and
Daily Utilization and Profitability

90 E.R. L. Jalao, UP NEC, [email protected]


Solution Step 1

91 E.R. L. Jalao, UP NEC, [email protected]


Solution: Step 2

92 E.R. L. Jalao, UP NEC, [email protected]


Solution: Step 2

93 E.R. L. Jalao, UP NEC, [email protected]


Solution: Step 3 Declare Facts

94 E.R. L. Jalao, UP NEC, [email protected]


Solution: Step 4

95 E.R. L. Jalao, UP NEC, [email protected]


Outline for This Session
• What is Dimensional Modeling?
• Why not Relational Modeling?
• Examples of Dimensional Modeling
• Fact and Dimension Tables
• Designing the Dimension Model
• Case Study

96 E.R. L. Jalao, UP NEC, [email protected]


References
• Kimball, Ralph, Margy Ross, Warren Thornthwaite, Joy
Mundy, and Bob Becker, The Data Warehouse Life
Cycle Toolkit, Second Edition, Wiley, 2008, ISBN 978-
0-470-14977-5
• Schmitz, Michael D. UCI Irvine Data Warehousing
Notes (2014), High Performance Data Warehousing
• Simon, Alan. CIS 391 PPT Slides
• Jeltema ,Bernie, UCI Irvine Data Warehousing Notes
(2014), Strategic Frameworks, Inc.

97 E.R. L. Jalao, UP NEC, [email protected]

You might also like