CHapter 2 Data Data Warehousing and OLAP Technologies
CHapter 2 Data Data Warehousing and OLAP Technologies
1
9/25/2020
Data Mining
Introduction to Data Warehouse
What is data warehouse?
Data warehouse vs. Operational DBMS
OLTP VS. OLAP
Design of data warehouse
Conceptual modeling of data warehouses
Data warehouse design process
Data warehouse models
OLAP functionalities on data warehouses
2
9/25/2020
Data Mining
Data warehousing
Non volatile Once data enter the data warehouse, they are
never removed. Because the data in the warehouse represent
the company’s entire history.
Because data is added all the time, warehouse is growing.
4
9/25/2020
Data Mining
Data Warehouse
▪ Data warehouse
▫ A data warehouse is a relational database management
system responsible for the collection and storage of data to
support management decision making and problem solving.
▫ It enables managers and other business professionals for
data mining, online analytical processing, market research
and decision support.
▫ Current evolution of Decision Support Systems (DSSs)
▪ Data mart
▫ A subset of a data warehouse for small and medium-size
5
9/25/2020
Data businesses
Mining or departments within larger companies.
Data Warehouse Stores Heterogeneous
Data
6
9/25/2020
Data Mining
Data Warehouse as part of Data
Mining
7
9/25/2020
Data Mining works with Data
Warehouse
. ▪ Data Warehouse provides the Enterprise with
a memory
8
9/25/2020
Data Mining
Operational Database vs data warehouse
▪ The Operational Database is the source of information for the data warehouse.
▪ It includes detailed information used to run the day to day operations of the business.
▪ The data frequently changes as updates are made and reflect the current value of the last
transactions.
▪ Operational Database Management Systems also called as OLTP (Online Transactions
Processing Databases), are used to manage dynamic data in real-time.
▪ Data Warehouse Systems serve users or knowledge workers in the purpose of data analysis and
decision-making. Such systems can organize and present information in specific formats to
accommodate the diverse needs of various users.
▪ These systems are called as Online-Analytical Processing (OLAP) Systems.
9
9/25/2020
Operational Database Vs Data Warehouse
The data warehouse and operational environments are separated.
Data warehouse receives its data from operational databases
Data warehouse environment is characterized by read-only transactions to very large data
sets.
Operational environment is characterized by numerous update transactions to a few data
entities at a time.
Data warehouse contains historical data over a long time horizon.
Ultimately Information is created from data warehouses. Such Information becomes the
basis for rational decision making.
The data found in data warehouse is analyzed to discover previously unknown data
characteristics, relationships, dependencies, or trends.
10
9/25/2020
Data Mining
Data Processing Technologies
▪ OLAP – Online Analytical Processing
► Refers to an advanced data analysis environment that supports
decision making.
► Access to multidimensional databases providing managerially
useful display techniques
12
9/25/2020
Data Mining
Data Processing Technologies
▪ OLTP (on-line transaction processing)
► Major task of traditional relational DBMS
► Day-to-day operations: purchasing, inventory, banking,
manufacturing, payroll, registration, accounting, etc.
▪ OLAP (on-line analytical processing)
► Major task of data warehouse system
► Data analysis and decision making
13
9/25/2020
Data Mining
Data Processing Technologies
▪ Distinct features (OLTP vs. OLAP):
► User and system orientation: customer vs. market
► Data contents: current, detailed vs. historical, consolidated
► Database design: ER + application vs. star + subject
► View: current, local vs. evolutionary, integrated
► Access patterns: update vs. read-only but complex queries
14
9/25/2020
Data Mining
Data Processing Technologies
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date historical,
detailed, flat relational summarized, multidimensional
isolated integrated, consolidated
usage repetitive ad-hoc
access read/write lots of scans
index/hash on prim. key
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
15
9/25/2020
Data Mining
Common Challenges in Data
Warehousing
▪ Data Integration
▪ Ensuring data quality
▪ Handling large data volumes
▪ Scalability
▪ Maintaining performance
▪ Managing Cost
▪ User adoption
16
9/25/2020
Data Mining
Assigment-1(Individual)
1. Why data warehouse is selected for Data analysis and Decision making?
2. How is security managed in data warehouse?
17
9/25/2020
Data Mining
Question?
18