An overview of Data Warehousing and OLAP Technology
Surajit Chaudhuri
Microsoft Reserch, Redmond
Umeshwar Dayal
HP Labs, Palo Alto
Presented by:- Krishma Dutta
Outline
Introduction Need of Data Warehousing and OLAP Architecture of Data Warehousing Front-Back End Tools Database Design Methodology Conclusion
Data Warehousing- An Introduction
Defined in many different ways:
In simplest terms Data Warehouse can be defined as collection of Data marts A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of managements decision-making process.W. H. Inmon A data warehousing is a collection of decision support technologies, aimed at enabling the knowledge worker to make better decisions
Need of Data Warehousing and OLAP
Data Warehousing
Decision support requires historical data which operational Databases do not typically maintain
Decision support requires consolidation (aggregation, summarization) of data
OLAP
Unacceptable Performance while execution of complex OLAP queries
Multidimensional data model is not supported by DBMS
Tiered Architecture
External Sources Operational Databases
Extract Transform Load Refresh
Tier1: Data Warehouse Server
Tier2: OLAP Server OLAP Server
Tier3: Clients
Data Warehouse
Serve
Analysis Query/Reports Data mining
Data Marts
Data Sources
Data Storage
OLAP Engine Front-End Tools
Front-Back End Tools
Front End Tools
Rollup (Drill-up) Drill-down (Roll-down) Slice and dice
Toronto Montreal
Household Automobile Kitchen Office
Back End Tools
Data Cleaning Load Refresh
Bob Jamie Brit Todd
Conceptual Model
Date
TV PC PVR sum 1
sum
Total annual sales of TV in U.S.A. U.S.A Canada Mexico sum
ALL
Country
Database Design Methodology
Database Designs
Star Schema
Snowflake Schema
A fact table in the middle connected to a set of dimension tables
A refinement of star
schema where
hierarchy is normalized into a set
of smaller dimension
tables, forming a shape similar to snowflake
Star Schema
Time
T_key T_day T_day_week T_month T_quarter T_year
Sales Fact Table
Time_key Item_key item
I_key I_name I_brand I_type I_supplier_type
Branch
B_key B_name B_type
Branch_key Location_key Measures Units_sold
location
location_key street city province country
Dollars_sold
Avg_sales
Star Schema
Snowflake Schema
Time
T_key T_day T_day_week T_month T_quarter T_year
Sales Fact Table
time_key item_key Item
I_key I_name I_brand I_type I_supplier_type Supplier S_key S_type
Branch
B_key B_name B_type
branch_key location_key Measures units_sold
Location
location_key street city City C_key C_city C_province C_country
dollars_sold
avg_sales
Snowflake Schema
Summary
Data warehouse
A subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of managements decision-making process
Architecture of Date warehouse
Consisting of Warehouse servers, front end and back end tools
OLAP operations: drilling, rolling, slicing, dicing and pivoting Multi dimensional model of Data warehouse
Data cube Star Schema Snowflake Schema
Thank You