Unit5 DM&DW
Unit5 DM&DW
Data Warehouse
The major distinguishing features of OLTP and OLAP are summarized as follows:
• Data modeling refers to the process of handling and designing the data model
within a data warehouse platform.
• It consists of making an appropriate database schema so as to transfer the data
that can be stored and of useful to user.
• Data warehouse modeling includes models, different schemas , measures and
concept hierarchies to design the structure of data warehouse .
The most popular data model for a data warehouse is a multidimensional model.
Such a model can exist in the form of a star schema, a snowflake schema, or a fact
constellation schema.
• Star schema: The most common modeling paradigm is the star schema, in
which the data warehouse contains :
(1) a large central table (fact table) containing the bulk of the data, with no
redundancy, and
(2) a set of smaller attendant tables (dimension tables), one for each
dimension.
(3)The schema graph resembles a starburst, with the dimension tables
displayed in a radial pattern around the central fact table.
OLAP :
OLAP stands for Online Analytical Processing, which is a technology that enables
multi-dimensional analysis of business data.
Fast :
It defines which the system targeted to deliver the most feedback to the client
within about five seconds, with the elementary analysis taking no more than one
second and very few taking more than 20 seconds.
Analysis :
It defines that the system can manage with any business logic and statistical
analysis that is appropriate for the application and the user, the keep it easy enough
for the target user. Although some pre programming can be required, we don’t think
it acceptable in the event that all application definitions need to be permit the client
to characterize modern Adhoc calculations as portion of the examination and to record
on the information in any wanted strategy.
Shared :
It defines that the system implements all the security requirements for
confidentiality (probably down to cell level) and, multiple write access is required,
concurrent update areas at a suitable level, It is not all applications required users
to write data back, but for the increasing number that does, the system must be able
to handle several updates in an appropriate, secure manner.
Multidimensional :
Information :
The system should be able to hold all the data needed by the applications. Data
sparsity should be handled in an efficient manner.
• These operations allow users to manipulate and analyze the data cube to derive
meaningful insights.
The roll-up operation (also called the drill-up operation by some vendors)
performs aggregation on a data cube, either by climbing up a concept hierarchy for a
dimension or by dimension reduction
❖ Drill-down:
Drill-down is the reverse of roll-up. It navigates from less detailed data to
• Drill-down occurs by descending the time hierarchy from the level of quarter to
the more detailed level of month. The resulting data cube details the total sales
per month rather than summarizing them by quarter.
❖ Pivot (rotate):
Pivot (also called rotate) is a visualization operation that rotates the data axes
in view in order to provide an alternative presentation of the data.
• It may contain swapping the rows and columns or moving one of the row-
dimensions into the column dimensions.
The following stages should be followed by every project for building a Multi
Dimensional Data Model :
Stage 1 : Assembling data from the client : In first stage, a Multi Dimensional
Data Model collects correct data from the client.
Stage 2 : Grouping different segments of the system : In the second stage, the
Multi Dimensional Data Model recognizes and classifies all the data to the respective
section they belong to and also builds it problem-free to apply step by step.
Stage 3 : Noticing the different proportions : In this stage, the main factors are
recognized according to the user’s point of view. These factors are also known as
“Dimensions”.
Stage 5 : Finding the actuality of factors which are listed previously and
their qualities: In the fifth stage, A Multi Dimensional Data Model separates and
differentiates the actuality from the factors which are collected by it.
Stage 6 : Building the Schema to place the data, with respect to the
information collected from the steps above : In the sixth stage, on the basis of
the data which was collected previously, a Schema is built.
Data cubes in data mining can be classified into two main categories -
1. Multidimensional data cube – This type of data cube in data mining is based
on the concept of dimensions and measures
2. Relational data cube – This type of data cube in data mining is based on the
relational database model and represents data in tables with rows and
columns.
1. Requirement Analysis
Understand the business requirements to determine what dimensions and
measures are needed. This includes identifying the key metrics and
dimensions of analysis, such as time, geography, and product categories
2. Schema Design :
Design the database schema to support the data cube. This typically
involves creating a star schema or a snowflake schema.
• Star Schema: Central fact table connected to multiple dimension tables.
• Snowflake Schema: A normalized form of the star schema where dimension
tables are further broken down into related tables.
3. ETL Process
Extract, Transform, and Load (ETL) data into the data warehouse:
Once the cube is created and processed, you can query it using MDX
(Multidimensional Expressions) or other supported query languages.
OLAP implementation
When implementing an OLAP system, there are a few key considerations to keep
in mind:
1.Data Model Design: Carefully design the data model to align with the analytical
requirements of the organization. This includes defining dimensions, hierarchies, and
measures.
2.Data Integration: Ensure seamless integration of data from various sources into
the OLAP database. This may involve data extraction, transformation, and loading
(ETL) processes.
To choose and implement the most suitable solution your team need to define
business objectives first. It is one of the most important steps as only the clear
understanding of what you need and how to get it can lead to success. The next stage
is also to identify the strategy.
5.Data preparation
First of all, it is important to learn as much as possible about the system. That
is why before preparing your data to be transferred into the new system, you should
check OLAP data characteristics first:
Make sure that conditions suit you and start your data preparation then.
▪ ROLAP
▪ MOLAP
▪ HOLAP
7.Review : Summarize all the requirements, data and steps before implementation.
Step 2 : select the data required for removing into OLAP system
OLAP Software
OLAP (Online Analytical Processing) software is a critical component in the
field of data warehousing, enabling complex analytical and ad-hoc queries with a
rapid execution time.
Key Concepts
Cubes: OLAP organizes data into cubes instead of traditional tables. A cube is
a multi-dimensional array of data, and each dimension represents a different
attribute (e.g., time, geography, product).
▪ Fast Query Performance: OLAP systems are optimized for quick query
execution to support real-time analysis.
▪ Complex Calculations: Supports complex calculations and aggregations, such
as SUM, AVG, COUNT, etc.
▪ Data Drilling: Allows users to drill down into details or roll up to higher-level
summaries.
▪ Slicing and Dicing: Enables data to be viewed from different perspectives by
slicing along dimensions or dicing subsets of the cube.
Benefits