Tutorial # 1
• London Metropolitan University is composed of various departments with each
department offering a variety of courses. Each course consists of a set of modules,
which are taken in either of the semesters. The university library provides services to
students that enable books, journals, videos, etc. to be borrowed upon validation of
their registration details (i.e. Student ID). This can either be done online through the
university's website or directly from the library counter. The library ensures a vast and
comprehensive array of study materials (books, journals, videos, etc.) is provided for
each course module. Students are then allowed to keep the book for a specified period
after which a fine is incurred if the deadline is exceeded. The borrowed materials can
however be renewed before the expiry date.
• However, library management needs a data warehouse application that can help
support various key decisions that will, in turn, improve services offered to students.
2
a) In line with a data warehouse being subject-oriented, using the
above scenario, identify what the "key focus" of this data warehouse
application should be.
b) Identify all the dimensions for the data warehouse application.
Considering your answer from question (a), identify at least 2 facts (or
measures) that will be contained in the fact table. (NOTE: Any
assumptions should be clearly stated.
c) Draw a simple star schema for the above data warehouse showing
only the primary key-foreign key relationships.
3
Dimensional Data Modelling
1. Date and Time dimensions
2. Degenerate Dimensions
3. Slowly Changing Dimensions (SCD)
4. Aggregate Fact tables
5. Three main types of fact tables.
6. Developing dimensional data models using iterative
process.
Date Dimensions
The date dimension is very important for every fact table as
facts are a sequence of observations.
The date dimension answers the first question asked to
identify dimensions of a fact table: When does it occur ?
It allows us to meet many user reporting and analysis
requirements such as :
• Calendar periods ( day, week, month, quarter, and year)
• Financial periods ( financial month, financial year)
• Relative periods for comparison( last month, last year)
• Periods of special status ( weekdays, weekends,
holidays )
A typical date dimension
Date Dim
Date Key (PK)
Date
WeekNumber
MonthName
MonthNumber
Quarter
Year
FinancialMonth
FinancialQuarter
Financial Year
IsWeekDay
IsWeekEnd
IsHoliday
Time Dimensions
• The date dimension is at a daily grain or detail level.
• Sometimes we are not only interested in the day that a
fact occurred but also the time at which it occurred.
• One option is to include DateTime attribute in the fact
table. But this option does not allow analysis by time
periods such as morning, afternoon, and evening.
The other option is to introduce a new dimension called
Time that can store individual hours or periods of time.
Calendar Date Product
DateKey ProductKey
Date SKU
WeekNumber Name
…. …..
Order Fact
Time Dim Customer
DateKey
TimeKey ProductKey CustomerKey
IsMorning CustomerKey Code
IsAfternoon TimeKey FirstName
IsEvening ….
…
Degenerate Dimensions
What are degenerate dimensions?
Are dimensions with no proper link to a fact table instead
their attributes are stored in the fact table.
Usually one attribute for each degenerate dimension is
stored in a fact table. This attribute is called degenerate key.
Degenerate dimensions have no attributes other than the
degenerate key, that is why it is not important to have a
separate dimension.
An example of degenerate dimension is Order Header
with order number attribute. As the Order Header
dimension has only one attribute, there is no need to
create a dimension.
Order Fact
Order Header DateKey
ProductKey
OrderNumber CustomerKey
OrderDate
OrderLineNumber
OrderNumber
Quantity Ordered
TotalCost
TotalRevenue
Slowly Changing Dimensions
One of the main benefits of Data warehousing is tracking
history or changes.
Examples:
When a product changes its price.
When a customer changes address or marital
status.
When a store changes its manager.
User reporting and analysis requirements that require tracking
history or changes.
Marketing people might want to compare the
impact of promotions for different product
prices.
Sales persons might want to generate reports of
total sales by marital status.
Reporting performance of managers by store.
There are three main techniques for handling slowly
changing dimensions:
I. Type 1: Overwrite the dimension attribute.
II. Type 2: Add a new dimension row.
III. Type 3: Add a new dimension attribute / column.
Type 1: Overwrite the dimension attribute:
The dimension attribute reflects the current state.
Any historical values are lost.
Applicable when the old value has no business
significance or when there is no need to track changes.
For example first name or phone number of a customer.
Type 1 technique is easy to implement.
15
Type 2: Add a new dimension row:
A new row with a new surrogate primary key is inserted
into the dimension table.
Both the previous and new rows include natural key to
identify that both records have the same origin.
New attributes are added to indicate when the change
happened and which one is the current row.
Type 2 technique is the most powerful technique for
accurately tracking changes in attribute values.
Customer
CustomerKey The tracked attribute is Marital status
Code
FirstName
LastName
BirthDate
MaritalStatus
IncomeGroup
EffectiveFrom
EffectiveTo
Custom Code FirstNam LastName BirthDate MaritalSta IncomeGr EffectiveFr EffectiveTo
erKey e tus oup om
1 CBTR Sara John 01/2/88 Single 20-30K 2/3/200 4/5/2010
5
2 CBTR Sara John 01/2/88 Married 20-30K 4/5/201 31/12/99
0 99
Type 3: Add a new dimension column:
A new column is added to the dimension table.
Used when an attribute changes, but still possible to
provide the old and new attribute values in the same row.
No new dimension rows are created and no new surrogate
keys are created
Store Dim
StoreKey The tracked attribute is Manager
Code
Name
Description
PreviousManager
CurrentManager
…..
StoreKe Code Name Description Previous Current
y Manager Manager
1 ASTT Main The main David David
store store …
StoreKe Code Name Description Previous Current
y Manager Manager
1 ASTT Main The main David Andrew
store store …
Aggregate fact tables
Contain a large numbers of rows and grow rapidly over time.
For example:
daily sale transactions for thousands of products will
increase by number of product sales per day.
Most of the time, managers are not interested for detailed
reports of daily sales per product.
Monthly sales per product type or yearly sales per product
category are more relevant for managers.
Product Type
ProductTypeKey
Calendar Month Name
Aggregate Order Description
MonthKey Fact
Name
Number MonthKey
Year ProductTypeKey Product
Total quantity
Total revenue ProductKey
SKU
Name
Calendar Date Brand
Category
DateKey Order Fact Price
Date DateKey Cost
WeekNumber ProductKey
MonthNumber Customer
CustomerKey
MonthName OrderNumber CustomerKey
Year OrderLineNumber Code
Total quantity FirstName
Total revenue LastName
BirthDate
Types of Fact tables
There are three main types of fact tables:
I. Transactional fact tables : new rows are inserted
when an activity or event occurs.
II. Periodic snap shots fact tables : Inserted at a
predetermined intervals such as daily, weekly, or
monthly.
III. Accumulating snap shots fact tables: to represent
processes
Order
Order Fact StockLevel OrderID
OrderDateKey
DateKey DateKey ProcessedDateKey
ProductKey ProductKey DispatchedDateKey
CustomerKey EmployeeKey IsProcessed
OrderNumber Quantity IsDispatched
OrderLineNumber ProcessingTimeLag
Total quantity
Total revenue
Periodic snapshots fact
tables
Transactional fact Accumulating
tables Snapshots fact table
Account transaction
Account Balance
AccountKey
DateKey AccountKey
BranchKey MonthKey
Type AccountBalance
Amount NumberOfTransactions
Developing Dimensional Data Models
The development of dimensional data model is an
iterative process and includes three main phases.
I. Create a high level dimensional data model
II. Identify attributes of dimensions and measures
of fact tables
III. Build a detailed dimensional data model.
Create a high level dimensional data model
Identify a central fact table.
Define the grain or detail level
Identify dimensions using the following questions: when
does it occur, what is involved, who is involved, where does
it occur, how does it occur, and why does it occur.
Create the high level dimensional model diagram
An example of a high level dimensional data
model diagram to represent selling product
business process
Date
Product Transaction Type
Payment Type
Fact Sale Transaction
Customer
Register Promotion
Outlet
Identify attributes of dimensions and measures of the
central fact table
List attributes of each dimension based on the business
analysis and reporting requirements.
Identify measures based on the business analysis and
reporting requirements.
Build a detailed dimensional data model
Enrich the high level model with missing information.
Resolve design issues.
Test if the dimensional data model is complete against the
business requirements.
Identify, understand, and profile data sources.
Cont..
An example of a detailed design for one dimension table.
Tutorial # 2
30
1. The following table form part of a database held in a relational DBMS.
Hotel (hotelNo, hotelName, City)
Room (roomNo, hotelNo, type, price)
Booking (hotelNo, guestNo, datefrom, dateTo, roomNo)
Guest (guestNo, guestName, guestAddress, guestcardNo,
expiryDate)
31
32
33
Solution of No. 2
34
35
1. Understanding of Data Sources using Profiling techniques.
2. Building Detailed Designs of Dimensions and Fact tables
3. Implementing a data warehouse.
36
Understanding of Data Sources
The goal of data understanding is to know the structure,
relationships, content and rules of the potential data sources that
will feed data to a data warehouse. Assessing accessibility and
data quality issues are also part of data understanding.
There are different techniques to understand data sources:
• Consulting Database administrators
• Examining documents or specifications such as ERD
models.
• Data profiling
Data profiling is the application of SQL commands or profiling
software tools to collect information and statistics about data
sources.
It should include :
• Data type, minimum and maximum filed lengths.
• Mean, mode, minimum and maximum values of numeric
data type.
• Number of all records and unique records only.
• Number of NULL records
• Any patterns identified.
Demo using SQL Server
Management Studio
Build Detailed Designs of Dimension and Fact Tables
An example of a detailed design for a dimension
table.
What information do we need to build detailed designs of
dimension or fact tables.
• Design of a data warehouse (Dimensional data models).
• Data understanding of data sources.
Steps:
For each dimension and fact table
- Define columns and identify primary keys.
- Identify data types of each column.
- Build a source to target mapping for each column.
Implementing a Data Warehouse
Once the design stage is performed, the next step is to
implement a data warehouse.
1. Create a database to store dimension and fact tables.
2. Create dimension tables.
3. Create fact tables.
4. Build primary and foreign keys constraints.
5. Create a separate database to hold the staging, security, and
auditing tables.
Wholesale furniture company
Design the data warehouse for a wholesale furniture company.
The data warehouse has to allow to analyse the company’s
situation at least with respect to the Furniture, Customers and
Time.
Moreover, the company needs to analyse:
1. The furniture with respect to its type (chair, table, wardrobe,
cabinet. . . ), category (kitchen, living room, bedroom, bathroom,
office. . . ) and material (wood, marble )
[Link] customers with respect to their spatial location, by
considering at least cities, regions and states
The company is interested in learning at least the quantity,
income and discount of its sales.
Identify Central Fact Table
Identify Dimension Tables
Build a high level dimensional
data model
Furniture
Customer
Sales
Dat
e
Identify attributes of Dimensions and Measures of the Central
Fact Table:
Date Dim Customer Dim
DateKey CustomerKey
Sales Fact Name Gender
Date
Name BirthDate City
DateKey Region
WeekNumber FurnitureKey
MonthNumber State
CustomerKey
MonthName Quantity
Quarter Income
Year Discount
Furniture Dim
FurnitureKey
Type Category
Material
Build Detailed Designs of Dimension
and Fact Tables:
Column Name Column Def. Data Type Key Null
DateKey Surrogate Key int PK not
Date SQL date date - not
Name Varchar(30) - not
WeekNumber int - not
MonthName Varchar(30) - not
MonthNumber int - not
Quarter int - not
Year int - not
A simplified design of Date dimension.
Column Name Column Def. Data Type Key Null
CustomerKey Surrogate Key int PK not
Name Varchar(50) - not
Gender Char(1) - not
BirthDate Date - not
City Varchar(50) - not
Region Varchar(50) - not
State Varchar(50) - not
A simplified design of Customer dimension without source to target mapping.
Column Name Column Def. Data Type Key Null
FurnitureKey Surrogate Key int PK not
Type Varchar(25) - not
Category varchar(25) - not
Material Varchar(25) - not
A simplified design of Furniture dimension without source to target mapping.
Column Name Column Def. Data Type Key Null
DateKey Surrogate Key int PK, FK not
CustomerKey Surrogate Key int PK, FK not
FurnitureKey Surrogate Key int PK, FK not
Quantity int - not
Income double - not
Discount double - not
A simplified design of Sale fact table without source to target mapping.
Create a relational database to store dimension and fact tables:
CREATE DATABASE MainDWDatabase2016
ON (NAME = 'MainDWDatabase2016_Data', FILENAME =
'C:\DataWarehouse\MainDWDatabase2016_Data.mdf', SIZE = 1000,
FILEGROWTH = 50)
LOG ON (NAME = 'MainDWDatabase2016_Log', FILENAME =
'C:\DataWarehouse\MainDWDatabase2016_Log.ldf' , SIZE =
20, FILEGROWTH = 9 6 ) ;
GO
The above SQL command creates an empty database named
‘MainDWDatabase2016’
And allocates two files: one for storing data and another for storing
transactional logs.
Create dimension tables:
CREATE TABLE [Link](
DateKey i n t IDENTITY(1,1) NOT NULL,
Date date NOT NULL,
Name varchar(30) NOT NULL,
WeekNumber i n t NOT NULL,
MonthName nvarchar(30) NOT
NULL, MonthNumber i n t Not NULL,
Quarter i n t NOT NULL,
Year i n t NOT NULL
) ON [PRIMARY];
GO
SQL command to
create a date
dimension table.
CREATE TABLE
[Link]( CustomerKey i n t
IDENTITY(1,1) NOT NULL, Name
varchar(50) NOT NULL,
Gender char(1) NOT NULL,
BirthDate date NOT NULL,
City varchar(50) Not
NULL,
Regional varchar(50) NOT NULL,
State varchar(50) NOT NULL
) ON [PRIMARY];
GO
SQL command to create a customer dimension table.
CREATE TABLE
[Link]( FurnitureKey i n t
IDENTITY(1,1) NOT NULL, Type varchar(25)
NOT NULL,
Category varchar(25) NOT NULL,
Ma te rial varchar(25) NOT NULL
) ON [PRIMARY];
GO
SQL command to create a furniture dimension table.
CREATE TABLE
[Link]( DateKey i n t
Not NULL, CustomerKey i n t
NOT NULL, FurnitureKey
i n t NOT NULL, Quantity
i n t NOT NULL, Income
f l o a t NOT NULL,
Discount f l o a t NOT NULL
) ON [PRIMARY];
GO
SQL command to create a sale fact table.
Build Primary Keys:
ALTER TABLE [Link] ADD
CONSTRAINT PK_DateDim PRIMARY KEY (DateKey) ON [PRIMARY];
ALTER TABLE [Link] ADD
CONSTRAINT PK_CustomerDim PRIMARY KEY (CustomerKey) ON
[PRIMARY];
ALTER TABLE [Link] ADD
CONSTRAINT PK_FurnitureDim PRIMARY KEY (FurnitureKey ) ON
[PRIMARY];
ALTER TABLE [Link] ADD CONSTRAINT
PK_SaleFact PRIMARY KEY (DateKey,FurnitureKey,CustomerKey ) ON
[PRIMARY];
Build Foreign Keys:
A l t e r Table [Link] ADD
CONSTRAINT FK_DateDim FOREIGN KEY (DateKey)
REFERENCES [Link] (DateKey),
CONSTRAINT Fk_CustomerKey FOREIGN KEY (CustomerKey)
REFERENCES [Link](CustomerKey),
CONSTRAINT Fk_FurnitureKey FOREIGN KEY (FurnitureKey )
REFERENCES [Link](FurnitureKey );
GO