0% found this document useful (0 votes)
135 views1 page

OLAP Operations in The Multidimensional Data Model

The document discusses OLAP operations within the multidimensional data model, emphasizing how data is organized into multiple dimensions for analysis and querying. It explains various OLAP operations such as roll-up, drill-down, slice, dice, and pivot, which facilitate interactive data analysis. Additionally, it contrasts OLAP systems with OLTP systems, highlighting their different purposes, users, and data management approaches.

Uploaded by

delir51132
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
135 views1 page

OLAP Operations in The Multidimensional Data Model

The document discusses OLAP operations within the multidimensional data model, emphasizing how data is organized into multiple dimensions for analysis and querying. It explains various OLAP operations such as roll-up, drill-down, slice, dice, and pivot, which facilitate interactive data analysis. Additionally, it contrasts OLAP systems with OLTP systems, highlighting their different purposes, users, and data management approaches.

Uploaded by

delir51132
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

OLAP Operations in the Multidimensional Data Model : system focuses mainly on the current data within an enterprise or department,

in an enterprise or department, are the perspectives or entities with respect to which an organization wants to
In the multidimensional model, data are organized into without referring to historical data or data in different organizations. In keep records. For example, AllElectronics may create a sales data warehouse
multiple dimensions, and each dimension contains multiple levels of contrast, an OLAP system often spans multiple versions of a database in order to keep records of the store‘s sales with respect to the dimensions
abstraction defined by concept hierarchies. This organization provides users schema, due to the evolutionary process of an organization. OLAP systems time, item, branch, and location. These dimensions allow the store to keep
with the flexibility to view data from different perspectives. A number of also deal with information that originates from different organizations, track of things like monthly sales of items and the branches and locations. A
OLAP data cube operations exist to materialize these different views, integrating information from many data stores. Because 2-D view of sales data for AllElectronics according to the dimensions time
allowing interactive querying and analysis of the data at hand. Hence, OLAP  Access patterns: The access patterns of an OLTP system consist mainly of and item, where the sales are from branches located in the city of Vancouver.
provides a user-friendly environment for interactive data analysis. short, atomic transactions. Such a system requires concurrency control and The measure displayed is dollars sold (in thousands). At which the items
Roll-up: The roll-up operation (also called the drill-up operation by some recovery mechanisms. However, accesses to OLAP systems are mostly read- were sold. Each dimension may have a table associated with it, called a
vendors) performs aggregation on a data cube, either by climbing up a only operations dimension table, which further describes the dimension. For example, a
concept hierarchy for a dimension or by dimension reduction. Figure 2.10 dimension table for item may contain the attributes item name, brand, and
shows the result of a roll-up operation performed on the central cube by type. Dimension tables can be specified by users or experts, or automatically
climbing up the concept hierarchy for location. This hierarchy was defined as generated and adjusted based on data distributions.A multidimensional data
the total order ―street < city < province or state < country.‖ The roll-up model is typically organized around a central theme, like sales, for instance.
operation shown aggregates the data by ascending the location hierarchy This theme is represented by a fact table. Facts are numerical measures.
from the level of city to the level of country. In other words, rather than Think of themes the quantities by which we want to analyze relationships
grouping the data by city, the resulting cube groups the data by between dimensions. Examples of facts for a sales data warehouse include
country.//When roll-up is performed by dimension reduction, one or more dollars sold (sales amount in dollars), units sold (number of units sold), and
dimensions are removed from the given cube. For example, consider a sales amount budgeted. The fact table contains the names of the facts, or measures,
data cube containing only the two dimensions location and time. Roll-up may
be performed by removing, say, the time dimension, resulting in an as well as keys to each of the related dimension tables. Although we usually
aggregation of the total sales by location, rather than by location and by time. think of cubes as 3-D geometric structures, in data warehousing the data cube
Drill-down: Drill-down is the reverse of roll-up. It navigates from less is n-dimensional. To gain a better understanding of data cubes and the
detailed data to more detailed data. Drill-down can be realized by either multidimensional data model. As a cuboid. Given
stepping down a concept hierarchy for a dimension or introducing additional a set of dimensions, we can generate a cuboid for each of the possible subsets
dimensions. Figure 2.10 shows the result of a drill-down operation performed of the given dimensions. The result would form a lattice of cuboids, each
on the central cube by stepping down a concept hierarchy for time defined as showing the data at a different level of summarization, or group by. The
―day < month < quarter < year.‖ Drill-down occurs by descending the time
hierarchy from the level of quarter to the more detailed level of month. The
lattice of cuboids is then referred to as a data cube.figure shows a lattice of
resulting data cube details the total sales per month rather than summarizing
them by quarter. Because a drill-down adds more detail to the given data, it cuboids forming a data cube for the dimensions time, item, location, and
Table 2.1 Comparison between OLTP and OLAP systems. supplier. The cuboid that holds the lowest level of summarization is called
can also be performed by adding new Figure 2.10 can occur by introducing
an additional dimension, such as customer group. the base cuboid. For example, the 4-D cuboid in Figure 2.2 is the base cuboid
Slice and dice: The slice operation performs a selection on one dimension of Measures:Their Categorization and Computation ―How are measures for the given time, item, location, and supplier dimensions. Figure 2.1 is a 3-
the given cube, resulting in a sub cube. Figure 2.10 shows a slice operation computed?‖ To answer this question, we first study how measures can be D (non base) cuboid for time, item, and location, summarized for all
where the sales data are selected from the central cube for the dimension time categorized. Note that a multidimensional point in the data cube space can be suppliers. The 0-D cuboid, which holds the highest level of summarization, is
using the criterion time = ―Q1‖. The dice operation defines a sub cube by defined by a set of dimension-value pairs, for example, (time = ―Q1‖, called the apex cuboid. In our example, this is the total sales, or dollars sold,
performing a selection on two or more dimensions. Figure 2.10 shows a dice location = ―Vancouver‖, item =―computer‖). A data cube measure is a summarized over all four dimensions. The apex cuboid is typically denoted
operation on the central cube based on the following selection criteria that numerical function that can be evaluated at each point in the data cube space. by all.
involve three dimensions: (location = ―Toronto‖ or ―Vancouver‖) and (time A measure value is computed for a given point by aggregating the data
= ―Q1‖ or ―Q2‖) and (item = ―home entertainment‖ or ―computer‖). corresponding to the respective dimension-value pairs defining the given
Pivot (rotate): Pivot (also called rotate) is a visualization operation that point. We will look at concrete examples of this shortly. Measures can be
rotates the data axes in view in order to provide an alternative presentation of organized into three categories (i.e., distributive, algebraic, holistic), based on
the data. Figure 2.10 shows a pivot operation where the item and location the kind of aggregate functions used.
axes in a 2-D slice are rotated. Other examples include rotating the axes in a Distributive: An aggregate function is distributive if it can be computed in a
3-D cube, or transforming a 3-D cube into a series of 2-D planes. distributed manner as follows. Suppose the data are partitioned into n sets.
Other OLAP operations: Some OLAP systems offer additional drilling We apply the function to each partition, resulting in n aggregate values. If the
operations. For example, drill-across executes queries involving (i.e., across) result derived by applying the function to the n aggregate values is the same
more than one fact table. The drill-through operation uses relational SQL as that derived by applying the function to the entire data set (without
facilities to drill through the bottom level of a data cube down to its back-end partitioning), the function can be computed in a distributed manner. For
relational tables. Other OLAP operations may include ranking the top N or example, count() can be computed for a data cube by first partitioning the
bottom or/and items in lists, as well as computing moving averages, growth
rates, interests, internal rates of return, depreciation, currency conversions, cube into a set of sub cubes, computing count() for each sub cube, and then
and statistical functions. OLAP offers analytical modeling capabilities, summing up the counts obtained for each sub cube. Hence, count() is a
including a calculation engine for deriving ratios, variance, and so on, and for distributive aggregate function. Stars, Snowflakes, and Fact Constellations: Schemas for Multidimensional
computing measures across multiple dimensions. It can generate Algebraic: An aggregate function is algebraic if it can be computed by an Databases :The entity-relationship data model is commonly used in the
summarizations, aggregations, and hierarchies at each granularity level and at algebraic function with M arguments (where M is a bounded positive design of relational databases, where a database schema consists of a set of
every dimension intersection. OLAP also supports functional models for integer), each of which is obtained by applying a distributive aggregate entities and the relationships between them. Such a data model is appropriate
forecasting, trend analysis, and statistical analysis. In this context, an OLAP function. For example, avg() (average) can be computed by sum()/count(), for on-line transaction processing. A data warehouse, however, requires a
engine is a powerful data analysis tool. where both sum() and count() are distributive aggregate functions. concise, subject-oriented schema that facilitates on-line data analysis.
Holistic: An aggregate function is holistic if there is no constant bound on the The most popular data model for a data warehouse is a multidimensional
2.3 Data Warehouse and OLAP Technology storage size needed to describe a sub aggregate. That is, there does not exist model. Such a model can exist in the form of a star schema, a snowflake
an algebraic function with M arguments (where M is a constant) that schema, or a fact constellation schema. Let‘s look at each of these schema
characterizes the computation .Common examples of holistic functions types. Star schema: The most common modeling paradigm is the star schema,
The construction of a data warehouse requires data cleaning, data include median(), mode(), and rank(). A measure is holistic if it is obtained in which the data warehouse contains (1) a large central table (fact table)
integration, and data consolidation. The utilization of a data warehouse often by applying a holistic aggregate function. containing the bulk of the data, with no redundancy, and (2) a set of smaller
necessitates a collection of decision support technologies. This allows
attendant tables (dimension tables), one for each dimension. The schema
―knowledge workers‖ (e.g., managers, analysts, and executives) to use the
Concept Hierarchies A concept hierarchy defines a sequence of mappings graph resembles a starburst, with the dimension tables displayed in a radial
warehouse to quickly and conveniently obtain an overview of the data, and to
from a set of low-level concepts to higher-level, more general concepts. pattern around the central fact table.
make sound decisions based on information in the warehouse Some authors
Consider a concept hierarchy for the dimension location. City values for The most popular data model for a data warehouse is a multidimensional
use the term ―data warehousing‖ to refer only to the process of data
location include Vancouver, Toronto, Newyork, and Chicago. Each city, model. Such a model can exist in the form of a star schema, a snowflake
warehouse construction, while the term ―warehouse DBMS‖ is used to refer
however, can be mapped to the province or state to which it belongs. For schema, or a fact constellation schema. Let‘s look at each of these schema
to the management and utilization of data warehouses .Data warehousing is
example, Vancouver can be mapped to British Columbia, and Chicago to types.
also very useful from the point of view of heterogeneous database
Illinois. The provinces and states can in turn be mapped to the country to Star schema: The most common modeling paradigm is the star schema, in
integration. Many organizations typically collect diverse kinds of data and
which they belong, such as Canada or the USA. These mappings forma which the data warehouse contains (1) a large central table (fact table)
maintain large databases from multiple, heterogeneous, autonomous, and
concept hierarchy for the dimension location, mapping a set of low- level containing the bulk of the data, with no redundancy, and (2) a set of smaller
distributed information sources. The traditional database approach to
concepts (i.e., cities) to higher-level, more general concepts (i.e., countries). attendant tables (dimension tables), one for each dimension. The schema
heterogeneous database integration is to build wrappers and integrators (or
The concept hierarchy described above is illustrated in Figure 2.7. Many graph resembles a starburst, with the dimension tables displayed in a radial
mediators), on top of multiple, heterogeneous databases. When a query is
concept hierarchies are implicit within the database schema. For example, pattern around the central fact table .
posed to a client site, a metadata dictionary is used to translate the query into
suppose that the dimension location is described by the attributes number, Fact constellation: Sophisticated applications may require multiple fact tables
queries appropriate for the individual heterogeneous sites involved. These
street, city, province or state, zipcode, and country. These attributes are to share dimension tables. This kind of schema can be viewed as a collection
queries are then mapped and sent to local query processors. The results
related by a total order, forming a concept hierarchy such as ―street < city < of stars, and hence is called a galaxy schema or a fact constellation.
returned from the different sites are integrated into a global answer set. This
query-driven approach requires complex information filtering and integration province or state < country‖. This hierarchy is shown in Figure 2.8(a).
processes, and competes for resources with processing at local sources. It is Alternatively, the attributes of a dimension may be organized in a partial
inefficient and potentially expensive for frequent queries, especially for order, forming a lattice. An example of a partial order for the time dimension
queries requiring aggregations. Data warehousing employs an update-driven based on the attributes day, week, month, quarter, and year is ―day <
approach in which information from multiple, heterogeneous sources is {month <quarter; week} < year‖. This lattice structure is shown in Figure
integrated in advance and stored in a warehouse for direct querying and 2.8(b). A concept hierarchy
analysis. Unlike on-line transaction processing databases, data warehouses do
not contain the most current information. However, a data warehouse brings
high performance to the integrated heterogeneous database system because
data are copied, preprocessed, integrated, annotated, summarized, and
restructured into one semantic data store. Furthermore, query processing in
data warehouses does not interfere historical information and support
complex multidimensional queries. As a result, data warehousing has become
popular in industry with the processing at local sources. Differences between
Operational Database Systems and Data Warehouses

Because most people are familiar with commercial relational database


systems, it is easy to understand what a data warehouse is by comparing these
two kinds of systems. The major task of on-line operational database systems
is to perform on-line transaction and query processing. These systems are Figure 2.7 A concept hierarchy for the dimension location. Due to space
called on-line transaction processing (OLTP) systems. They cover most of limitations, not all of the nodes of the hierarchy are shown (as indicated by
the day-to-day operations of an organization, such as purchasing, inventory, the use of ―ellipsis‖ between nodes).
manufacturing, banking, payroll, registration, and accounting. Data
warehouse systems, on the other hand, serve users or knowledge workers in
the role of data analysis and decision making. Such systems can organize and
present data in various formats in order to accommodate the diverse needs of
the different users. These systems are known as on-line analytical processing
(OLAP) systems. The major distinguishing features between OLTP and
OLAP are summarized as follows:  Users and system orientation: An
OLTP system is customer-oriented and is used for transaction and query
processing by clerks, clients, and information technology professionals. An
OLAP system is market-oriented and is used for data analysis by knowledge
workers, including managers, executives, and analysts.
 Data contents: An OLTP system manages current data that, typically, are
too detailed to be easily used for decision making. An OLAP system
manages large amounts of historical data, provides facilities for
summarization and aggregation, and stores and manages information at
different levels of granularity. These features make the data easier to use in
informed decision making. Figure 2.8 Hierarchical and lattice structures of attributes in warehouse
 Database design: An OLTP system usually adopts an entity-relationship dimensions:
(ER) data model and an application-oriented database design. An OLAP
system typically adopts either a star or snowflake model (to be discussed in
Cubes & Tables:: is a data cube?‖ A data cube allows data to be modeled and
Section 2.2.2) and a subject oriented database design.  View: An OLTP
viewed in. It is defined by dimensions and facts. In general terms, dimensions

You might also like