1
March 2024
Introduction to Data Warehousing
Introduction to Data Warehousing
S. Hassan Adelyar, Ph.D
Instructor of Computer Science Faculty
Data Warehousing
Kabul University
March 2024
06:24:47 A
M
2
Data warehouse March 2024
Introduction to Data Warehousing
A subject-oriented, integrated, time-variant,
& non-volatile collection of data in support of
management’s decision-making process.
An enterprise system used for the analysis &
reporting of structured & semi-structured
data.
Receives data periodically & on a regular
Data Warehousing
basis from multiple sources such as:
Point-of-sale transactions
Marketing automation
3
Data warehouse March 2024
Introduction to Data Warehousing
Relational databases
Customer relationship management
Operational sources
External data sources
Websites
Store both current & historical data in one
Data Warehousing
place & is designed to give a long-range view
of data over time, supports business
intelligence (BI) activities, specifically
analysis.
4
March 2024
Introduction to Data Warehousing
This data is then made available for decision-
makers to access & analyze.
A data warehouse is not a single software or
hardware product you purchase to provide
strategic information.
It is a computing environment where users can
find strategic information, & users are put
Data Warehousing
directly in touch with the data they need to
make better decisions.
It is a user-centric environment.
5
March 2024
Introduction to Data Warehousing
Answer questions users have about the business,
the performance of the various operations, the
business trends, & about what can be done to
improve the business.
Data Warehousing
6
A Blend of Many Technologies March 2024
Introduction to Data Warehousing
The environment for data warehouses &
marts includes the following:
Data integration technology & processes
that are needed to prepare the data for use;
Different tools & applications for a variety
of users;
The basic concept of data warehousing is:
Data Warehousing
Take all the data from the operational
systems.
7
March 2024
Introduction to Data Warehousing
Integrate all the data from the various
sources.
Remove inconsistencies & transform the
data.
Store the data in formats suitable for easy
access for decision making.
Figure 1-9 shows how a data warehouse is a
Data Warehousing
blend of the many technologies.
8 Figure 1-9 The data warehouse: a blend of technologies
March 2024
Introduction to Data Warehousing
Data Warehousing
9
Data warehouse architecture March 2024
Introduction to Data Warehousing
Every data warehouse has three fundamental
components:
Load Manager
Warehouse Manager
Data Access Manager
Data Warehousing
10
March 2024
Introduction to Data Warehousing
Load manager
Responsible for Data collection from
operational systems.
Performs data conversion into some usable
form to be further utilized by the user.
Includes all the programs & application
interfaces which are required for extracting
Data Warehousing
data from the operational systems.
11
March 2024
Introduction to Data Warehousing
It should perform the following tasks:
Data Identification
Data Validation for its accuracy
Data Extraction from the original source
Data Cleansing
Data formatting
Data Warehousing
Consolidates data from multiple sources to
one place
12
March 2024
Introduction to Data Warehousing
Warehouse manager
The main part of Data Warehousing
system.
Holds the massive amount of information
from many sources.
Organizes data in a way so it becomes easy
for anyone to analyze or find the required
Data Warehousing
information.
13
Architecture of a data warehouse March 2024
Introduction to Data Warehousing
Data Warehousing
14
Database vs. Data Warehouse March 2024
Introduction to Data Warehousing
Database:
The main difference is that in a database,
data is collected for multiple transactional
purposes.
Databases provide real-time data.
Data Warehouse:
In a data warehouse, data is collected on an
Data Warehousing
extensive scale to perform analytics.
Data warehouses store data to be accessed
for big analytical queries.
15
Data warehouse usages March 2024
Introduction to Data Warehousing
Most common data warehouse usages are:
Making real-time decisions:
Analyze data in real time to proactively
address challenges, identify
opportunities, gain efficiency, reduce
costs, & proactively respond to business
events.
Data Warehousing
16
March 2024
Introduction to Data Warehousing
Consolidating siloed data:
Quickly pull data from multiple
structured sources across your
organization, such as point-of-sale
systems, websites, & email lists, & bring
it together into one location so that you
can perform analysis & get insights.
Data Warehousing
17
March 2024
Introduction to Data Warehousing
Enabling business reporting & ad hoc
analysis:
Keep historical data on a separate server
from operational data so that end users
can access it & run their own queries &
reports without impacting the
performance of operational systems or
Data Warehousing
waiting to get help from IT.
18
March 2024
Introduction to Data Warehousing
Implementing machine learning & AI:
Collect historical & real-time data to
develop algorithms that can provide
predictive insights, such as anticipating
traffic points or suggesting relevant
products to a customer browsing a
website.
Data Warehousing
19
March 2024
Introduction to Data Warehousing
If your organization has or does any of the
following, you’re probably a good candidate
for a data warehouse:
Multiple sources of disparate data
Big-data analysis & visualization
Machine learning models & other AI-
driven processes
Data Warehousing
Custom report generation & ad hoc
analysis
20
Types of Data Warehouse March 2024
Introduction to Data Warehousing
Enterprise Data Warehouse (EDW)
This type of warehouse serves as a key or
central database that facilitates decision-
support services throughout the enterprise.
The advantage to this type of warehouse is
that it provides access to cross-
organizational information, offers a unified
Data Warehousing
approach to data representation, & allows
running complex queries.
21
March 2024
Introduction to Data Warehousing
Operational Data Store (ODS)
This type of data warehouse refreshes in
real-time. It is often preferred for routine
activities like storing employee records. It is
required when data warehouse systems do
not support reporting needs of the business.
Data Mart
Data Warehousing
A data mart is a subset of a data warehouse
built to maintain a particular department,
region, or business unit.
22
March 2024
Introduction to Data Warehousing
Every department of a business has a central
repository or data mart to store data.
The data from the data mart is stored in the
ODS periodically.
The ODS then sends the data to the EDW,
where it is stored & used.
Data Warehousing
23
Evolution of Business Intelligence (BI) March 2024
Introduction to Data Warehousing
Business intelligence for an organization
requires two environments :
Transformation of data to information;
Derivation of knowledge from information.
Business intelligence (BI), therefore, is a broad
group of applications & technologies.
First, the term refers to the systems &
Data Warehousing
technologies for gathering, cleaning,
consolidating, & storing corporate data.
24
March 2024
Introduction to Data Warehousing
Next, business intelligence (BI) relates to the
tools, techniques, & applications for analyzing
the stored data.
BI is an umbrella term to include concepts &
methods to improve business decision making
by fact-based support systems.
Data Warehousing
25
BI: Two Environments March 2024
Introduction to Data Warehousing
When you consider all that BI encompasses,
you may view BI for an enterprise as composed
of two environments:
Data to Information
In this environment data from multiple
operational systems are extracted,
integrated, cleansed, transformed &
Data Warehousing
stored as information in specially
designed repositories.
26
March 2024
Introduction to Data Warehousing
Information to Knowledge
In this environment analytical tools are
made available to users to access &
analyze the information content in the
specially designed repositories & turn
information into knowledge.
Data Warehousing
27
March 2024
Introduction to Data Warehousing
Figure 1-10 shows the two complementary
environments, the data warehousing
environment, which transforms data into
information, & the analytical environment,
which produces knowledge from information.
Data Warehousing
28 Figure 1-10 BI: data warehousing & analytical environments
March 2024
Introduction to Data Warehousing
Data Warehousing
29
March 2024
Introduction to Data Warehousing
Common functions of business intelligence
technologies include:
Reporting
Online analytical processing
Data mining
Process mining
Data Warehousing
Complex event processing
Business performance management
30
March 2024
Introduction to Data Warehousing
Text mining
Predictive analytics
Prescriptive analytics
Data Warehousing
31 Traditional vs. cloud-based data warehouse
March 2024
Introduction to Data Warehousing
Traditional data warehouses:
Hosted on-premises, with data flowing in
from relational databases, transactional
systems, business applications, & other
source systems.
Typically designed to capture a subset of
data in batches & store it, making them
Data Warehousing
unsuitable for unstructured queries or real-
time analysis.
32
March 2024
Introduction to Data Warehousing
Companies also must purchase their own
hardware & software with an on-premises
data warehouse, making it expensive to
scale & maintain.
Storage is typically limited compared to
compute, so data is transformed quickly &
then discarded to keep storage space free.
Data Warehousing
33
March 2024
Introduction to Data Warehousing
Cloud-based data warehouse:
Today’s data analytics activities have
transformed to the center of all core
business activities, including revenue
generation, cost containment, improving
operations, & enhancing customer
experiences.
Data Warehousing
34
March 2024
Introduction to Data Warehousing
As data evolves & diversifies, organizations
need more robust data warehouse solutions
& advanced analytic tools for storing,
managing, & analyzing large quantities of
data across their organizations.
These systems must be scalable, reliable,
secure enough for regulated industries, &
Data Warehousing
flexible enough to support a wide variety of
data types & big data use cases.
35
Architecture of a Data Warehouse March 2024
Introduction to Data Warehousing
The data stored in the warehouse is uploaded
from the operational systems.
There are two main approaches used to build a
data warehouse system:
Extract, transform, load (ETL)
Extract, load, transform (ELT)
Data Warehousing
36
Key Characteristics of Data WarehouseMarch 2024
Introduction to Data Warehousing
Subject-Oriented
A data warehouse is subject-oriented since
it provides topic-wise information rather
than the overall processes of a business.
Such subjects may be sales, promotion,
inventory, etc.
For example, if you want to analyze your
Data Warehousing
company’s sales data, you need to build a
data warehouse that concentrates on sales.
37
March 2024
Introduction to Data Warehousing
Such a warehouse would provide valuable
information like ‘who was your best
customer last year?’ or ‘who is likely to be
your best customer in the coming year?’
Data Warehousing
38
March 2024
Introduction to Data Warehousing
Integrated
A data warehouse is developed by
integrating data from varied sources into a
consistent format.
The data must be stored in the warehouse in
a consistent & universally acceptable
manner in terms of naming, format, &
Data Warehousing
coding.
This facilitates effective data analysis.
39
March 2024
Introduction to Data Warehousing
Non-Volatile
Data once entered into a data warehouse
must remain unchanged.
All data is read-only.
Previous data is not erased when current
data is entered.
This helps you to analyze what has
Data Warehousing
happened & when.
40
March 2024
Introduction to Data Warehousing
Time-Variant
The data stored in a data warehouse is
documented with an element of time, either
explicitly or implicitly.
An example of time variance in Data
Warehouse is exhibited in the Primary Key,
which must have an element of time like the
Data Warehousing
day, week, or month.
41
Data Warehousing Tools March 2024
Introduction to Data Warehousing
Data warehouse tools are software
components used to perform several
operations on an extensive data set.
These tools help to collect, read, write &
transfer data from various sources.
Data warehouses support are designed to
support operations like data sorting, filtering,
Data Warehousing
merging, etc.
Data warehouse applications can be
categorized as:
42
March 2024
Introduction to Data Warehousing
Query & reporting tools
Application Development tools
Data mining tools
OLAP tools
Some popular data warehouse tools are
Xplenty, Amazon Redshift, Teradata,
Oracle 12c, Informatica, IBM Infosphere,
Data Warehousing
Cloudera, & Panoply.
End of Chapter 3
Question / Discussion?