Lecture-4
1
Data Warehouse Vs. OLTP
OLTP: OnLine Transaction Processing (MIS or Database System)
Data Warehouse OLTP
Scope * Application –Neutral * Application specific
* Single source of “truth” * Multiple databases with repetition
* Evolves over time * Off the shelf application
* How to improve business * Runs the business
Data * Historical, detailed data * Operational data
Perspective * Some summary * No summary
* Lightly denormalized * Fully normalized
Queries * Hardly uses PK * Based on PK
* Number of results * Number of results returned in
returned in thousands hundreds
Time factor * Minutes to hours * Sub seconds to seconds
* Typical availability 6x12 * Typical availability 24x7
2
Comparison of Response Times
On-line analytical processing (OLAP) queries must be
executed in a small number of seconds.
Often requires denormalization and/or sampling.
Complex query scripts and large list selections can
generally be executed in a small number of minutes.
Sophisticated clustering algorithms (e.g., data mining)
can generally be executed in a small number of hours
(even for hundreds of thousands of customers).
3
Putting the pieces together
Data Data Warehouse Server OLAP Servers Clients
(Tier 0) (Tier 1) (Tier 2) (Tier 3)
Semistructured
MOLAP
Sources Query/Reporting
www data
Meta
Data
Extract
Data Analysis
IT
Archived
data
Transform
Load
(ETL)
Warehouse
ROLAP
Data Mining
Business
Users
Users
Operational
Data Bases
Data sources Data Marts Tools
Business Users
4
Types & Typical Applications
of DWH
5
Types of data warehouse
Financial
Telecommunication
Insurance
Human Resource
Global
Exploratory
6
Types of data warehouse
Financial
First data warehouse that an organization
builds. This is appealing because:
Nerve center, easy to get attention.
In most organizations, smallest data set.
Touches all aspects of an organization, with a
common denomination i.e. money.
Inherent structure of data directly influenced by the
day-to-day activities of financial processing.
Word of caution: Net balances.
7
Types of data warehouse
Telecommunication
Dominated by sheer volume of data.
Many ways to accommodate call level detail:
Only a few months of call level detail,
Storing lots of call level detail scattered over different
storage media,
Storing only selective call level detail, etc.
Unfortunately, for many kinds of processing, working at
an aggregate level is simply not possible.
8
Types of data warehouse
Insurance
Insurance data warehouses are similar to other
data warehouses BUT with a few exceptions.
Stored data that is very, very old, used for actuarial
processing.
Typical business may change dramatically over
last 40-50 years, but not insurance.
In retailing or telecomm there are a few important
dates, but in the insurance environment there are
many dates of many kinds.
9
Types of data warehouse
Insurance
Insurance data warehouses are similar to other
data warehouses BUT with a few exceptions.
Long operational business cycles, in years.
Processing time in months. Thus the operating
speed is different.
Transactions are not gathered and processed, but
are in kind of “frozen”.
Thus a very unique approach of design &
implementation.
10
Typical Applications
Impact on organization’s core business is to streamline
and maximize profitability.
Fraud detection.
Profitability analysis.
Direct mail/database marketing.
Credit risk prediction.
Customer retention modeling.
Yield management.
Inventory management.
ROI on any one of these applications can justify HW/SW
& consultancy costs in most organizations.
11
Typical Applications
Fraud detection
By observing data usage patterns.
People have typical purchase patterns.
Deviation from patterns.
Certain cities notorious for fraud.
Certain items bought by stolen cards.
Similar behavior for stolen phone cards.
12
Typical Applications
Profitability Analysis
Banks know if they are profitable or not.
Don’t know which customers are profitable.
Typically more than 50% are NOT profitable.
Don’t know which one?
Balance is not enough, transactional behavior is the
key.
Restructure products and pricing strategies.
Life-time profitability models (next 3-5 years).
13
Typical Applications
Direct mail marketing
Targeted marketing.
Offering high bandwidth package NOT to all users.
Know from call detail records of web surfing.
Saves marketing expense, saving pennies.
Knowing your customers better.
14
Typical Applications
Credit risk prediction
Who should get a loan?
Customer segregation i.e. stable vs. rolling.
Qualitative decision making NOT subjective.
Different interest rates for different customers.
Do not subsidize bad customer on the basis of
good.
15
Typical Applications
Yield Management
Works for fixed inventory businesses.
The price of item suddenly goes to zero.
Item prices vary for varying customers.
Example: Air Lines, Hotels etc.
Price of (say) Air Ticket depends on:
How much in advance ticket was bought?
How many vacant seats were present?
How profitable is the customer?
Ticket is one-way or return?
16
Recent Application
Agriculture Systems
Agri and related data collected for decades.
Metrological data consists of 50+ attributes.
Decision making based on expert judgment.
Lack of integration results in underutilization.
What is required, in which amount and when?
17