Data Warehouses: FPT University Hanoi 2010
Data Warehouses: FPT University Hanoi 2010
FPT University
Hanoi 2010
Lecture 7: Dimensional Modeling
UPDATES TO THE DIMENSION
TABLES
The fact table Auto Sales contains the measurements or metrics such as
Actual Sale Price, Options Price, and so on.
Every day as more and more sales take place, more and more rows get
added to the fact table. The fact table continues to grow in the number of
rows over time. Very rarely are the rows in a fact table updated with
changes. Even when there are adjustments to the prior numbers, these are
also processed as additional adjustment rows and added to the fact table.
Compared to the fact table, the dimension tables are more stable and less
volatile. However, unlike the fact table, which changes through the increase
in the number of rows, a dimension table does not change just through the
increase in the number of rows, but also through changes to the attributes
themselves.
Look at the product dimension table. Every year, rows are added as new
models become available. But what about the attributes within the product
dimension table? If a particular product is moved to a different product
category, then the corresponding values must be changed in the product
dimension table.
Slowly Changing Dimensions
From the consideration of the changes to the dimension tables, we can
derive the following principles:
Most dimensions are generally constant over time
Many dimensions, though not constant over time, change slowly
The product key of the source record does not change
The description and other attributes change slowly over time
In the source OLTP systems, the new values overwrite the old ones
Overwriting of dimension table attributes is not always the appropriate
option in a data warehouse
The ways changes are made to the dimension tables depend on the types
of changes and what information must be preserved in the data warehouse
Type 1 Changes: Correction
of Errors
Nature of Type 1 Changes : These changes usually relate to the
corrections of errors in the source systems.
For example, suppose a spelling error in the customer name is
corrected to read as Michael Romano from the erroneous entry of
Michel Romano. Also, suppose the customer name for another
customer is changed from Kristin Daniels to Kristin Samuelson, and
the marital status changed from single to married.
There is no need to preserve the old values.
In the case of Michael Romano, the old name is erroneous and needs to
be discarded. When the users need to find all the orders from Michael
Romano, the users will use the correct name.
The same principles apply to the change in customer name for Kristin
Samuelson.
But the change in the marital status is slightly different.
This change can be handled in the same way as the change in
customer name only if that change is a correction of error.
Otherwise, you will cause problems when the users want to analyze
orders by marital status.
Type 1 Changes: Correction
of Errors
Nature of Type 1 Changes : Here are the
general principles for Type 1 changes:
Usually, the changes relate to correction of
errors in source systems
Sometimes the change in the source system
has no significance
The old value in the source system needs to
be discarded
The change in the source system need not
be preserved in the data warehouse
Type 1 Changes: Correction
of Errors
Applying Type 1 Changes to the Data
Warehouse. Please look at Figure 11-2 showing
the application of Type 1 changes to the customer
dimension table. The method for applying Type 1
changes is:
Overwrite the attribute value in the dimension table
row with the new value
The old value of the attribute is not preserved
No other changes are made in the dimension table
row
The key of this dimension table or any other key
values are not affected
This type is easiest to implement
Type 2 Changes: Preservation
of History
Nature of Type 2 Changes. Go back to the change in the
marital status for Kristin Samuelson. Assume that in your data
warehouse one of the essential requirements is to track
orders by marital status in addition to tracking by other
attributes. If the change to marital status happened on
October 1, 2000, all orders from Kristin Samuelson before that
date must be included under marital status: single, and all
orders on or after October 1, 2000 should be included under
marital status: married.
What exactly is needed in this case? In the data warehouse,
you must have a way of separating the orders for the
customer so that the orders before and after that date can be
added up separately.
Here are the general principles for this type of
change:
They usually relate to true changes in source
systems
There is a need to preserve history in the data
warehouse
This type of change partitions the history in the
data warehouse
Every change for the same attribute must be
preserved
Applying Type 2 Changes to the Data Warehouse. Please
look at Figure 11-3
showing the application of Type 2 changes to the customer
dimension table. The method for applying Type 2 changes is:
Add a new dimension table row with the new value of the
changed attribute
An effective date field may be included in the dimension table
There are no changes to the original row in the dimension table
The key of the original row is not affected
The new row is inserted with a new surrogate key
Type 3 Changes: Tentative
Soft Revisions
Nature of Type 3 Changes. Almost all the usual changes to
dimension values are either Type 1 or Type 2 changes.
Of these two, Type 1 changes are more common.
When you apply a Type 2 change on a certain date, that date is a
cut-off point. In the above case of change to marital status on
October 1, 2000, that date is the cut-off date.
Any orders from the customer prior to that date fall into the older
orders group; orders on or after that date fall into the newer orders
group.
An order for this customer has to fall in one or the other group; it
cannot be counted in both groups for any period of time.
What if you have the need to count the orders on or after the cut-off
date in both groups during a certain period after the cut-off date?
You cannot handle this change as a Type 2 change. Sometimes,
though rarely, there is a need to track both the old and new values of
changed attributes for a certain period, in both forward and backward
directions. These types of changes are Type 3 changes.
Type 3 Changes: Tentative
Soft Revisions
Here are the general principles for Type 3
changes:
They usually relate to “soft” or tentative changes
in the source systems
There is a need to keep track of history with old
and new values of the changed attribute
They are used to compare performances across
the transition
They provide the ability to track forward and
backward
Type 3 Changes: Tentative
Soft Revisions
Applying Type 3 Changes to the Data Warehouse. Please look at Figure 11-4 showing
the application of Type 3 changes to the customer dimension table. The methods for
applying Type 3 changes are:
Add an “old” field in the dimension table for the affected attribute
Push down the existing value of the attribute from the “current” field to the “old” field
Keep the new value of the attribute in the “current” field
Also, you may add a “current” effective date field for the attribute
The key of the row is not affected
No new dimension row is needed
The existing queries will seamlessly switch to the “current” value
Any queries that need to use the “old” value must be revised accordingly
The technique works best for one “soft” change at a time
If there is a succession of changes, more sophisticated techniques must be devised
Large Dimensions
Customer
Huge—in the range of 20 million rows
Easily up to 150 dimension attributes
Can have multiple hierarchies
Product
Sometimes as many as 100,000 product
variations
Can have more than 100 dimension attributes
Can have multiple hierarchies
Multiple Hierarchies
Rapidly Changing
Dimensions