0% found this document useful (0 votes)
36 views

Data Warehouses: FPT University Hanoi 2010

The document discusses different types of changes that can occur in dimension tables of a data warehouse: Type 1 changes are corrections of errors that simply overwrite the old value. Type 2 changes preserve history by adding a new row with the changed attribute value and effective date. Type 3 changes tentatively track both old and new values for a period to compare performance across transitions. Dimension tables can grow large over time through new rows and attribute changes, requiring techniques to handle their slow but ongoing changes.

Uploaded by

ngọc bình
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Data Warehouses: FPT University Hanoi 2010

The document discusses different types of changes that can occur in dimension tables of a data warehouse: Type 1 changes are corrections of errors that simply overwrite the old value. Type 2 changes preserve history by adding a new row with the changed attribute value and effective date. Type 3 changes tentatively track both old and new values for a period to compare performance across transitions. Dimension tables can grow large over time through new rows and attribute changes, requiring techniques to handle their slow but ongoing changes.

Uploaded by

ngọc bình
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 19

Data Warehouses

FPT University
Hanoi 2010
Lecture 7: Dimensional Modeling
UPDATES TO THE DIMENSION
TABLES
 The fact table Auto Sales contains the measurements or metrics such as
Actual Sale Price, Options Price, and so on.
 Every day as more and more sales take place, more and more rows get
added to the fact table. The fact table continues to grow in the number of
rows over time. Very rarely are the rows in a fact table updated with
changes. Even when there are adjustments to the prior numbers, these are
also processed as additional adjustment rows and added to the fact table.
 Compared to the fact table, the dimension tables are more stable and less
volatile. However, unlike the fact table, which changes through the increase
in the number of rows, a dimension table does not change just through the
increase in the number of rows, but also through changes to the attributes
themselves.
 Look at the product dimension table. Every year, rows are added as new
models become available. But what about the attributes within the product
dimension table? If a particular product is moved to a different product
category, then the corresponding values must be changed in the product
dimension table.
Slowly Changing Dimensions
 From the consideration of the changes to the dimension tables, we can
derive the following principles:
 Most dimensions are generally constant over time
 Many dimensions, though not constant over time, change slowly
 The product key of the source record does not change
 The description and other attributes change slowly over time
 In the source OLTP systems, the new values overwrite the old ones
 Overwriting of dimension table attributes is not always the appropriate
option in a data warehouse
 The ways changes are made to the dimension tables depend on the types
of changes and what information must be preserved in the data warehouse
Type 1 Changes: Correction
of Errors
 Nature of Type 1 Changes : These changes usually relate to the
corrections of errors in the source systems.
 For example, suppose a spelling error in the customer name is
corrected to read as Michael Romano from the erroneous entry of
Michel Romano. Also, suppose the customer name for another
customer is changed from Kristin Daniels to Kristin Samuelson, and
the marital status changed from single to married.
 There is no need to preserve the old values.
 In the case of Michael Romano, the old name is erroneous and needs to
be discarded. When the users need to find all the orders from Michael
Romano, the users will use the correct name.
 The same principles apply to the change in customer name for Kristin
Samuelson.
 But the change in the marital status is slightly different.
 This change can be handled in the same way as the change in
customer name only if that change is a correction of error.
 Otherwise, you will cause problems when the users want to analyze
orders by marital status.
Type 1 Changes: Correction
of Errors
 Nature of Type 1 Changes : Here are the
general principles for Type 1 changes:
 Usually, the changes relate to correction of
errors in source systems
 Sometimes the change in the source system
has no significance
 The old value in the source system needs to
be discarded
 The change in the source system need not
be preserved in the data warehouse
Type 1 Changes: Correction
of Errors
 Applying Type 1 Changes to the Data
Warehouse. Please look at Figure 11-2 showing
the application of Type 1 changes to the customer
dimension table. The method for applying Type 1
changes is:
 Overwrite the attribute value in the dimension table
row with the new value
 The old value of the attribute is not preserved
 No other changes are made in the dimension table
row
 The key of this dimension table or any other key
values are not affected
 This type is easiest to implement
Type 2 Changes: Preservation
of History
 Nature of Type 2 Changes. Go back to the change in the
marital status for Kristin Samuelson. Assume that in your data
warehouse one of the essential requirements is to track
orders by marital status in addition to tracking by other
attributes. If the change to marital status happened on
October 1, 2000, all orders from Kristin Samuelson before that
date must be included under marital status: single, and all
orders on or after October 1, 2000 should be included under
marital status: married.
 What exactly is needed in this case? In the data warehouse,
you must have a way of separating the orders for the
customer so that the orders before and after that date can be
added up separately.
 Here are the general principles for this type of
change:
 They usually relate to true changes in source
systems
 There is a need to preserve history in the data
warehouse
 This type of change partitions the history in the
data warehouse
 Every change for the same attribute must be
preserved
 Applying Type 2 Changes to the Data Warehouse. Please
look at Figure 11-3
 showing the application of Type 2 changes to the customer
dimension table. The method for applying Type 2 changes is:
 Add a new dimension table row with the new value of the
changed attribute
 An effective date field may be included in the dimension table
 There are no changes to the original row in the dimension table
 The key of the original row is not affected
 The new row is inserted with a new surrogate key
Type 3 Changes: Tentative
Soft Revisions
 Nature of Type 3 Changes. Almost all the usual changes to
dimension values are either Type 1 or Type 2 changes.
 Of these two, Type 1 changes are more common.
 When you apply a Type 2 change on a certain date, that date is a
cut-off point. In the above case of change to marital status on
October 1, 2000, that date is the cut-off date.
 Any orders from the customer prior to that date fall into the older
orders group; orders on or after that date fall into the newer orders
group.
 An order for this customer has to fall in one or the other group; it
cannot be counted in both groups for any period of time.
 What if you have the need to count the orders on or after the cut-off
date in both groups during a certain period after the cut-off date?
 You cannot handle this change as a Type 2 change. Sometimes,
though rarely, there is a need to track both the old and new values of
changed attributes for a certain period, in both forward and backward
directions. These types of changes are Type 3 changes.
Type 3 Changes: Tentative
Soft Revisions
 Here are the general principles for Type 3
changes:
 They usually relate to “soft” or tentative changes
in the source systems
 There is a need to keep track of history with old
and new values of the changed attribute
 They are used to compare performances across
the transition
 They provide the ability to track forward and
backward
Type 3 Changes: Tentative
Soft Revisions
 Applying Type 3 Changes to the Data Warehouse. Please look at Figure 11-4 showing
the application of Type 3 changes to the customer dimension table. The methods for
applying Type 3 changes are:
 Add an “old” field in the dimension table for the affected attribute
 Push down the existing value of the attribute from the “current” field to the “old” field
 Keep the new value of the attribute in the “current” field
 Also, you may add a “current” effective date field for the attribute
 The key of the row is not affected
 No new dimension row is needed
 The existing queries will seamlessly switch to the “current” value
 Any queries that need to use the “old” value must be revised accordingly
 The technique works best for one “soft” change at a time
 If there is a succession of changes, more sophisticated techniques must be devised
Large Dimensions
 Customer
 Huge—in the range of 20 million rows
 Easily up to 150 dimension attributes
 Can have multiple hierarchies
 Product
 Sometimes as many as 100,000 product
variations
 Can have more than 100 dimension attributes
 Can have multiple hierarchies
Multiple Hierarchies
Rapidly Changing
Dimensions

You might also like