0% found this document useful (0 votes)

7 views

OLAP693

Uploaded by

prodigy.uuuu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

OLAP693

Uploaded by

prodigy.uuuu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/224699283

OLAP query processing for XML data in RDBMS

Conference Paper · May 2007

DOI: 10.1109/SWOD.2007.353190 · Source: IEEE Xplore

CITATIONS READS

9 93

3 authors:

Chantola Kit Toshiyuki Amagasa

Limkokwing University of Creative Technology University of Tsukuba
9 PUBLICATIONS 25 CITATIONS 101 PUBLICATIONS 1,502 CITATIONS

SEE PROFILE SEE PROFILE

Hiroyuki Kitagawa
University of Tsukuba
411 PUBLICATIONS 3,274 CITATIONS

SEE PROFILE

All content following this page was uploaded by Chantola Kit on 03 October 2014.

The user has requested enhancement of the downloaded file.

OLAP Query Processing for XML Data in RDBMS

Chantola KIT1 , Toshiyuki AMAGASA1,2 , and Hiroyuki KITAGAWA1,2

1
Department of Computer Science, Graduate School of Systems and Information Engineering
2
Center for Computational Sciences
University of Tsukuba
1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573, Japan
[email protected], {amagasa, kitagawa}@cs.tsukuba.ac.jp

Abstract the data-cube by issuing multidimensional queries. One no-

table feature of the work is that we take account of structure-
Extensible Markup Language (XML) has become an im- based concept hierarchy, as well as value-based concept hi-
portant format for data exchange and representation on the erarchy, which is an important characteristic of XML data.
web. In addition to conventional query processing, more In this paper, we will discuss an approach to XML-
complex analysis on XML data is considered to become im- OLAP system using relational database systems based on
portant in order to discover valuable information. In this our previous work. Our contributions in this paper are as
research, we attempt to investigate an XML-OLAP, by which follows:
we can perform multidimensional analysis on XML data
taking XML’s features into account. Users are allowed to • We discuss roll-up operation for XML data-cube. It
specify XML data-cube by XPath, and perform analytical is an extension in SQL2003 for supporting OLAP op-
processing by XQuery with OLAP extension. The system is erations for relational data-cube. We employ the syn-
implemented on top of relational databases, and the given tax, and adapt it for XML data-cube. We then discuss
requests for data-cube specification and analysis are trans- its implementation using the functionality of relational
lated into SQL so that they can be processed using the un- database systems.
derlying system. We show the feasibility of the proposed
• We evaluate the performance of the proposed scheme
scheme by experimental evaluations.
by a series of experiments. The experimental results
show that the proposed scheme can deal with 100MB
XML data with reasonable processing time.
1. Introduction
The rest of this paper is organized as follows: in Sec-
tion 2, we introduce preliminaries which we describe about
Since its emergence in 1998, Extensible Markup Lan-
OLAP and XML. Then, in Section 3, we discuss related
guage (XML) [7] has become a de facto standard for data
works. In Section 4, we show an overview of our proposed
exchange and representation on the web. Now, XML has
system and the definitions of fact and dimension of XML
been used in a wide spectrum of application domains, such
data, XML hierarchies, and data-cube on XML data. We
as web documents, business documents, and log data. For
also discussed OLAP extensions to XQuery in this section.
this reason, in addition to conventional simple query re-
In Section 5 we describe our implementation issue which
trieval, more complex ways to make analysis of XML data
we will discuss relational XML storage, data cube construc-
are considered to be more and more important in order to
tion, and query processing with both structure- and value-
extract useful information from massive XML data.
based grouping, and “ROLLUP” operation. In Section 6,
In our previous research [2], we proposed a model for
we give experimental evaluation. Finally, in Section 7, we
OLAP analysis on XML data using relational databases.
conclude this paper.
Specifically, for allowing users to specify facts and dimen-
sions about XML data, we employ slightly extended XPath
expressions. The system extracts corresponding XML frag- 2. Preliminaries
ments from the underlying XML database based on the fact
and dimension specifications, and constructs multidimen- In this section, we briefly overview OLAP, XML and its
sional XML data-cube. The users then make analysis on query languages, XPath and XQuery.

1-4244-0904-7/07/$20.00 ©2007 IEEE

7
2.1. Online Analytical Processing (OLAP) System Implementation
did pid nid nnum tname value
p Pexp poccu
0 1 0 1 bookinfo null

sales bookinfo

0 2 1 1.1 c null
1 /bookinfo 1

2 /bookinfo/c 2

area 0 3 2 1.1.1 @name null

Path
c
c 3 /bookinfo/c/@name 4

Node table
Online Analytical Processing (OLAP) is a category of Path table
na n
me a
m 0 3 3 1.1.1.1 CDATA math
“m e
ath “
” c
s
”

3 /bookinfo/c/c 3
k k 0 4 4 1.1.2 c null

a a
c

Approach
c
c 4 /bookinfo/c/c/@name 6
n n
name na
m
“linalgebra” e
“d n
b” a
m 0 …

t s
e
“
w

ts k
e

o
b
”

o a 5 /bookinfo/c/c/b 6

i
1 9 46 1 sales Null

software technology that enables analysts, managers, and

b b b b b b

6 /bookinfo/c/c/b/t 12

b b b b
1 10 47 1.1 area null

b 7 /bookinfo/c/c/b/p 12
1 11 48 1.1.1 kanto null
t p t p t p t p t p t p

8 /sales 1
t q t q t q t q t q t q 1 12 49 1.1.1.1 tsukuba null

executives to obtain insight into data through fast, consis-

1 /sales/area/kanto/tsukuba 1
1 13 50 1.1.1.1.1 b null 1

Paths XPath SQL

imension Query Translation
tent, interactive access to a wide variety of possible views Fact /D
xmlcube jke

of information. The information has been transformed from

di jk did pexp d jk
pexp y i pexp e did pexp jkey
d ey
d y

0 /bookinfo/ c/c/b/ A
0 /bookinfo/ c/c A

tuple
p 0 /bookinfo/ c/ A 0 /bookinfo /c/c/b /p A

tuple tuple
/b/p
c/b/p

sales bookinfo sales bookinfo sales bookinfo

Fact
/bookinfo/ c/c
/b/p
B

Dimension
0
/bookinfo/ c/c/b/
p
B
0
XML
/bookinfo/ c/
c/b/p
B 0 /bookinfo/ c/c/b/ p B

raw data to reflect the real dimensionality of the enterprise

name name name
area c /bookinfo/ c/c/b/

“math” area
/bookinfo/ c/c 0 C

c area
0 C p

c “cs” c “cs” /b/p

0
/bookinfo/ c/
C 0 /bookinfo/ c/c/b/ p C

Data-cube
c/b/p

kanto c
b
name
“linear algebra” kanto c
name
“db” kansai
cc
name
“web”
0
table
/bookinfo/ c/c
D
0
table
/bookinfo/ c/c/b/
p
D
/bookinfo/ c/

tsukuba
/b/p 0 D 0 /bookinfo/ c/c/b/ p D
c/b/p

tsukuba kyoto b
b1
b
p1 b2 p4 b b6 p5
b b

as understood by users.
/bookinfo/ c/c/b/

pp
/bookinfo/ c/c 0 E /bookinfo/ c/
0 E p 0 E 0 /bookinfo/ c/c/b/ p E

p
/b/p c/b/p

t q t q t q

XQu
/bookinfo/ c/c/b/ /bookinfo/ c/
/bookinfo/ c/c 0 F 0 F 0 /bookinfo/ c/c/b/ p F
0 F p c/b/p
/b/p

1000
2000
D

e
Exte ry w/OL XQuery
When considering OLAP, star schema, cube, and aggre- nsio
n
AP
w/OLAP SQL
Extension
Query Translation
gation operations are the most important concepts. To rep- group1 group2 group3 ---
--- --- --- ---
resent the multidimensional data model, star schema, that --- --- --- ---
total total
--- --- --- ---
consists of single fact table and some dimension tables, is --- --- --- ---
g1 g2 g3 qty g1 g2 g3 qty
--- --- --- ---
used. Each dimension table contains columns correspond- --- --- --- ---

ing to attributes of the dimension.

An OLAP system models the input data as a logical mul- Figure 1. System overview.
tidimensional cube with multiple dimensions which pro-
vides the context for analyzing measures of interest. To
analyze the data with the cube structure, various aggrega- existing OLAP systems, 3) XML is used for both data rep-
tion operations, namely, drilling, pivoting (or rotating), and resentation and analysis. In order to support complex an-
slicing-and-dicing, are used to change the number of dimen- alytical operations, they also proposed new syntactical ex-
sions and the resolutions of dimensions of interest. tensions to XQuery, such as “GROUP BY”, “ ROLLUP”,
“TOPOLOGICAL ROLLUP”, “CUBE”, and “TOPOLOG-
2.2. XML, XPath, and XQuery ICAL CUBE”. In our research, we employ the syntax of
“GROUP BY ROLLUP” and “GROUP BY TOPOLOGI-
CAL ROLLUP” to allow users to specify OLAP operation
XML has become the language of choice for data rep-
in XQuery.
resentation across a wide range of application. It has been
Jensen et al. [4] proposed a scheme for specifying OLAP
designed to represent both structured and semi-structured
cubes on XML data. They integrated XML and relational
data. An XML data is basically modeled as a labeled tree:
data at the conceptual level based on UML, which is easy to
elements and attributes are mapped into nodes; directed
understand by system designers and users. In their scheme,
nesting relationships are mapped into edges in the tree.
a UML model is built from XML data and relational data,
XML data can be queried by XML query languages such
and the corresponding UML snowflake diagram is then cre-
as XPath and XQuery. XPath [5] is a language for address-
ated from the UML model. In particular, they considered
ing portion of an XML data. We can specify an XML sub-
how to handle dimensions with hierarchies and ensuring
tree in term of a navigational path over XML tree by condi-
correct aggregation.
tions on the element’s label, value, and relationship among
nodes along the path.
XQuery [6] is a query language designed to query col- 4. An Overview of the Proposed XML-OLAP
lection of XML data. XQuery uses XPath as a sub-language System
to address specific parts of an XML document. It employs
SQL-like FLWOR (FOR, LET, WHERE, ORDER BY, RE-
TURN) expression for performing joins.
4.1. System Overview

The left side of Figure 1 shows an overview of our pro-

3. Related Works posed scheme. According to the content of XML data, a
user at first gives a fact path and some dimension paths in
Bordawakar et al. [1] investigated various issues related XPath expression to denote his/her interest. Referring to
to XML data analysis, and proposed a logical model for the given fact and dimension paths, the system produces
XML analysis based on the abstract tree-structured XML an XML cube. After getting the cube, the user can make
representation. In particular, they proposed a categoriza- analysis of the XML data-cube using XQuery with OLAP
tion of XML data analysis system: 1) XML is used simply extensions.
for external representation for OLAP results, 2) Relational The following discusses how XML cube can be con-
data is extracted from XML data, and then processed with structed in our system.

8
4.2. Formal Definitions Concept Hierarchy The concept hierarchy is a notable
feature of traditional OLAP systems by which we can carry
To construct an XML data-cube, we first need to specify out flexible grouping operations over the data items stored
fact and dimensions. Let us look at the definitions of fact in the fact table. As with the traditional OLAP systems, we
and dimensions. assume that value-based concept hierarchies are given be-
forehand. We do not go into the detail of how to represent
such a hierarchy, due to the page limitation. When deal-
Facts about an XML Data A fact-table in a traditional ing with XML data in the same context, we need a special
OLAP system stores data items being analyzed. We at- consideration on the semistructured nature. Specifically, we
tempt to define the facts in an XML data after the traditional have to take into account structure-based concept hierarchy
OLAP way. In order to identify the facts, we use XPath as which is naturally represented as the hierarchical structure
the query language. For example, when a user wants to get of XML data.
information of book sales from sales XML data as in the Taking Figure 2 for example, all books (b) are catego-
upper left side of Figure 2, the related data items can be ob- rized by the XML hierarchies according to the area or book
tained by the fact path pf = doc("sales.xml")//b. category. The structure-based concept hierarchy allows us
to aggregate facts using such XML data structure. We will
discuss the detail later.
Definition 1 (Fact path) A fact path (pf ) is an absolute
XPath expression that identifies data items of interest.
Data Cube on XML Data We are now ready to define
data cube on XML data using the concepts of the fact and
Dimensions Having fixed the fact data, we might addi- dimension paths. Before going into the definition, we intro-
tionally need some dimensions whose values are used to duce some notations as helpers. For a given XPath expres-
group the facts together for the subsequent aggregation op- sion p, [[p]] denotes an evaluation of p, and the result would
erations. In traditional OLAP systems, dimensions are be XML nodes, string-values, or a boolean. Let [[p]] denotes
given as independent tables associated with the fact table. an evaluation of p where p represents an XPath expression.
In this work we try to define a dimension as an XPath query,
but we need to care about the relationship between the fact Definition 3 (XML data-cube) An XML-cube is defined
data and dimensions. In order to ensure this, a dimension as (pf , D) where pf is a fact path and D =
path is in either of the two cases: relative path from the fact {pd1 , pd2 , . . . , pdn } is a set of dimension paths. A fact f in
path and absolute path with referential constraints. the cube is an n + 1-tuple (f, d1 , . . . , dn ) where f ∈ [[pf ]]
and each di is obtained by evaluating pdi : [[pdi ]]f if pdi is
in a relative form or [[pdi ]] where pdi can be obtained by
Definition 2 (Dimension path) A dimension path is an
replacing each occurrence of pf /pr in pdi with [[pr ]]f . n is
XPath expression (pd ) in either of the two forms:
the rank of the XML-cube.
1. pd is a relative path expression originated from the fact Let us consider an XML data-cube as an ex-
path pf , or ample (Figure 2). It is defined as (pf , {pd }),
where pf =doc("sales.xml")//b and
2. pd is an absolute path expression contains at least one pd =doc("bookinfo.xml")//b[t =pf /t]/p.
condition with the fact path pf . A tuple can be extracted as follows. Firstly, fact
data can be extracted by evaluating fact path like
Figure 2 shows an example of fact and dimension paths. [[pf ]] = {b1 , b2 , . . . , b6 }. For each fact data bi , we
The circles on the top left document represent the facts can identify corresponding dimension data in another
corresponding to pf . When we want to use the book ti- XML data as specified by pd . When evaluating pd , we
tle as a dimension for the subsequent analysis, a dimen- need to rewrite the path according to the fact data. For
sion path can be given as pd1 =t, which is a relative path example, for the fact b1 , pf /t, which is a part of pd ,
from pf . If we are interested in grouping the books ac- is rewritten as [[pf /t]]b1 = {"A"}, that turns out to
cording to price ranges represented in another XML data be doc("bookinfo.xml")//b[t = "A"]/p.
(the upper right document of Figure 2), we need to spec- In this way, we can extract all tuples from
ify absolute path expression with referential constraints like the data cube, that are set of 2-tuple:
pd2 =doc("bookinfo.xml")//b[t = pf /t]/p. {(b1, p1), (b2, p4), (b3, p3), (b4, p3), (b5, p2), (b6, p5)}.
As can be seen from the example, for a given book, we can In contrast to the existing OLAP, and XML-cube may
obtain corresponding price in another XML data by using contain much information more than the dimensionality
title as the clue. (what we call “rank”). That is, each XML fragment may

9
sales.xml bookinfo.xml

Fact Dimension Table 1. Path table and node table.

sales bookinfo pid pexp poccur did pid nid nnum tname value
area name 1 /bookinfo 1 0 1 0 1 bookinfo null
“math” c c name 2 /bookinfo/c 2 0 2 1 1.1 c null
kanto kansa name “cs”
i 3 /bookinfo/c/@name 4 0 3 2 1.1.1 @name null
“linear algebra”c name c
c name
tsukuba osaka kyoto “db” 9 /sales 1 0 3 3 1.1.1.1 CDATA math
“web”
b b b b b b 10 /sales/area 1 0 7 8 1.1.2.2.1 t null
b b b b b b
b1 b2 b3 b4 b5 b6 11 /sales/area/kanto 1 0 7 9 1.1.2.2.1.1 #TEXT A
t q t q t q t q t q t q t p1
p t p2p t p3
p t p4
p t p t p5
p 12 /sales/area/kanto/tsukuba 1 … … … … … …
A 10 D 20 13 /sales/area/kanto/tsukuba/b 3 1 9 46 1 sales null
C 10 C 60 B 40 F 30 A 1000 B 3000 C 8000 D 2000 E 6000 F 3400
14 /sales/area/kanto/tsukuba/b/t 6 1 10 47 1.1 area null
15 /sales/area/kanto/tsukuba/b/q 6 1 11 48 1.1.1 kanto null
… … … … … … … … …

xmlcube
tuple tuple tuple
sales bookinf sales bookinfo sales bookinfo
area
kanto
co
name
“math”
name
area c
name
“cs”
name
area c
name
“cs”
name
5. Implementation Using Relational Database
c “linear algebra” kanto c “db”
kansa
c “web”
tsukuba
b tsukuba
b
i
kyoto
b
Systems
b1 b p1 p b2 b p4 p b6 b p5 p
t q t q t q
A 10 1000 D 20 2000 F 30 3400
This section discusses an implementation of the pro-
posed model and grouping operations (Figure 1, right). We
Figure 2. Facts, dimensions, and Sales XML try to make the best use of relational databases as the under-
data-cube. lying data storage. The reasons are: 1) there are many com-
mercial and open source products, 2) enormous amount of
information resources are stored in relational systems, and
3) we can leverage established relational XML storage tech-
contain more information than a numerical value, such as niques. In addition, we can utilize grouping functionalities
elements, texts, attributes, and hierarchical information. In which are supported in most relational database systems,
order to form a cube-like structure, we need to specify some to implement value- and structure-based grouping of XML
of them as dimensions of the cube structure. data.
For instance, each tuple of the rank 1 XML data-cube in
Figure 2 (lower side) contains two XML fragments of books
coming from “sales.xml” and prices from “bookinfo.xml”. 5.1. Relational XML Storage
According to the fragments, this XML data-cube potentially
has five attribute values: title, quantity, area, price, and cat- We employ the path-approach [8] for mapping XML
egory. Assume that we are interested in getting the informa- data to relational tables, because we can manage any well-
tion related to the book sales area and price, we can create formed XML documents with fixed relational schema and
a cube by specifying the area and price as the dimension. realize practical subset of XPath solely by the use of SQL
functionalities. Due to the limitation of pages, we just show
a brief overview. In the path-approach, an XML node is ba-
4.3. OLAP Extensions to XQuery
sically mapped to a relational tuple of two tables, path table
containing all absolute path expression of all XML nodes,
Once the data cube is constructed, we perform multidi- and node table containing all XML node information. Ta-
mensional analysis using the dimensions and related infor- ble 1 (left) shows the path table extracted from “sales.xml”
mation such as XML hierarchies. In our system, we at- and “bookinfo.xml”. In the node table (Table 1, right), there
tempt to use XQuery as the user query language. However, are document id (did), pid (path id), nid (node id), nnum
the current version of XQuery does not support aggrega- (node number), tname (tage name), and value.
tion function. So we employ the syntax of OLAP extension
for XQuery [1], “GROUP BY ROLLUP” and “GROUP BY
5.2. Extracting Fact and Dimensions
TOPOLOGICAL ROLLUP”. The same as the roll-up op-
eration in ordinary OLAP systems, “ROLLUP” enables a
“SELECT” statement to calculate multiple levels of subto- The first step is to extract fact and dimensions. As dis-
tals across a specified group of dimensions. It also cal- cussed in Section 4.2, a fact and its dimensions are XML
culates a grand total. “ROLLUP” is an extension to the sub-trees specified by XPath queries. Hence, we can repre-
“GROUP BY” clause so its syntax is extremely easy to sent the fact (or a dimension) as a part of node table. This
use. The latter, “TOPOLOGICAL ROLLUP”, is similar can be achieved by evaluating the fact (dimension) path, and
to “ROLLUP” but for computing structure-based grouping storing the result as a new table. Those tables can be defined
over XML data. as either views or materialized views.

10
5.3. Data Cube Construction the same 3-depth prefixes, “/sales/area/kanto”
and “/sales/area/kansai”.
In the next we create an XML data-cube. For this pur- In fact, the proposed grouping operation can be imple-
pose, we need to establish the relationships between the fact mented in many ways, but an important remark is that
and the dimension as described in Section 4.2. We join it can be realized solely by the functionality of SQL.
the base relations by giving the referential constraints as One possible way is to leverage the string match func-
the join key. XML data-cube table containing all attributes tionality provided by the database system. More pre-
from the fact and dimension, and each record consists of cisely, we can make use of regular expressions to ex-
data from the fact and dimension which have the same book tract substrings, and use them with the “GROUP BY”
title. clause. Assume that we would like to use the first two
tags to group the facts, e.g., use “/sales/area” out of
5.4. Query Processing “/sales/area/kanto/tsukuba/b”, we can achieve
this by:
As discussed in Section 4.3, we use XQuery with OLAP SELECT ...
extensions as the user query language. In order to process FROM ...
a query, we need to translate the query into SQL, because WHERE ...GROUP BY regexp_replace(dim.pexp,
’ˆ(/[ˆ/]+/[ˆ/]+)/.+’, ’\\1’)
we make use of relational database systems as the query
processing engine. In fact, there have been several works Another possibility is to introduce dedicated indexes
on XQuery to SQL query translation [3], and we can borrow based on Dewey encodings or prime numbers. They might
those ideas. So, in this paper, we focus on how to implement be good for speeding up the grouping operations compared
OLAP operations using SQL. Specifically, we discuss how to the above approach. The comparison might be an inter-
to realize structure-based grouping and roll-up operations. esting topic to research.

5.5. ROLLUP Operations

Structure-based Grouping Our basic strategy is to uti-
lize path expressions as the clue to perform grouping. One possible way to implement roll-up operations
Specifically, for a given data item, we need to compute the (“GROUP BY ROLLUP” and “TOPOLOGICAL
prefix of each data, and then perform grouping on the pre- ROLLUP”) in the extended XQuery is to directly
fixes. The level of grouping can be controlled by the length translate them into the counterpart in SQL2003, in which
of the path prefixes. OLAP operations are supported. However, SQL2003 is not
Now, we discuss how to perform grouping operations supported in many database systems. For this reason, we
using path expressions. Let us introduce some notations. try to realize the roll-up operations using the functionality
Given an XML node n, let pexp(n) denote n’s absolute of SQL-92, which is supported in most systems. Here
path expression, and let pref ix(exp, i) denote path expres- we show how roll-up operations are applied to XML
sion exp’s i-th prefix, e.g., pref ix(“/a/b/c”, 1) = “/a”, data-cube. As mentioned in Section 4.3, “ROLLUP” and
and pref ix(“/a/b/c”, 2) = “/a/b.” Then, the grouping “TOPOLOGICAL ROLLUP” create subtotals that roll
can be performed in the following way: up from the most detailed level to a grand total, we use
“UNION ALL”, which enable us to compute set union over
1. Let the depth, which is the distance from the root, different grouping levels, to implement the operations.
of the dimension be d, e.g., the depth of a path
“/sales/area/kanto/tsukuba/b” is d = 5.
6. Experimental Evaluation
2. Find the common prefix of all path expressions
and let the depth be i, e.g., the three path expres- 6.1. Experimental Setup
sions, “/sales/area/kanto/tsukuba/b”,
“/sales/area/kansai/osaka/b”, and All experiments were performed in Sun Microsystems
“/sales/area/kansai/kyoto/b” have Sun Fire X4200 server whose CPU is a 2-way Dual Core
the same prefix path “/sales/area”. As a AMD Opteron(tm) processor (2.4GHz). This machine has
consequence, we get i = 2. 16GB memory and runs Sun OS 5.10. We used Java ver-
sion 1.5.0 09 to parse XML data to relational tables, and
3. The level-j (i ≤ j ≤ d) grouping can be computed PostgreSQL 8.1.4 to perform query processing.
by calculating pref ix(pexp(n), j) for each dimension For the experimental data, we used XMark data which is
value n, e.g., Referring to the previous three path ex- a comprehensive distributed system benchmarking and op-
pressions, let j be 3. The pref ix(pexp(n), 3) com- timization suite. We tested the following sizes of XML data:
putes two level-3 groupings of the paths which have 10MB, 100MB, 200MB, 300MB, 400MB, and 500MB.

11
100,000,000.000
concepts of fact path, dimension path, value- and structure-
10,000,000.000
based concept hierarchy, and XML data-cube. We then dis-
1,000,000.000
cussed OLAP extension to XQuery. For the implementa-
Time (ms)

100,000.000
tion issues, we use the path approach for mapping XML
10,000.000
data to relations, and we utilize “UNION ALL” to perform
1,000.000
“GROUP BY ROLLUP” operation for both structure- and
100.000
value-based groupings. Our experiments with large collec-
10.000
tions of XML data show that the “GROUP BY ROLLUP”
10MB 100MB 200MB 300MB 400MB 500MB queries perform less than 10 sec. for 500MB XML data.
File size The results show the effectiveness of our proposed tech-
nique.
item quantity payment For the future research, we try to improve the perfor-
payqty rollup(payment) rollup(region)
rollup(regionpay) mance of data-cube construction. We also plan to investi-
gate how to incorporate textual features such as word vec-
tors of XML data into the analytical processing.
Figure 3. Query processing time.
8. Acknowledgments
6.2. Benchmark Queries
This research is partly supported by the Grant-in-Aid
For the benchmark query, we give a fact path, pf = for Scientific Research (17700110) from Japan Society for
doc("xmark.xml")//item, and two dimension paths, the Promotion of Science (JSPS), Japan, and the Grant-in-
pd1 = quantity and pd2 = payment. Aid for Scientific Research on Priority Areas (18049005)
We ran three queries to show the performance of roll- from the Ministry of Education, Culture, Sports, Science
up functions which we can calculate the total quantity of and Technology (MEXT), Japan.
item grouped by value-based (payment), structure-based
(region), and the combination (regionpay). References

6.3. Experimental Results [1] R. Bordawakar and C. A. Lang. Analytical Processing of

XML Documents: Opportunities and Challenges. In SIG-
MOD Record, volume 34(2), pages 27–32, 2005.
Figure 3 shows the elapse times for data-cube construc- [2] K. Chantola, T. Amagasa, and H. Kitagawa. Towards Ana-
tion and query processing. At first, “item”, “quantity”, and lytical Processing of XML Data. In IPSJ Technical Report,
“payment” are created. After extracting those tables, the volume 78 (2006-DBS-140 (I)), pages 201–208, 2006.
data-cube “payqty” is constructed. For the query process- [3] D. DeHaan, D. Toman, and M. P.Consens. A Comprehensive
ing, we had done three roll-up operations, value-based (pay- XQuery to SQL Translation Using Dynamic Interval Encod-
ment), structure-based (region), and their combination (re- ing. In Proc. ACM SIGMOD 2002, pages 623–634, 2003.
gionpay). The results show that the processing time for [4] M. R. Jensen, T. H. Moller, and T. B. Pedersen. Specifying
data-cube construction is quite time consuming even for OLAP Cubes on XML Data. In Proc. SSDBM, pages 101–
112, 2001.
100MB data. However, the important remark here is that
[5] W3C. XML Path Language (XPath) Version 1.0. http:
once the data-cube is constructed, analytical query process- //www.w3.org/TR/1999/REC-xpath-19991116,
ing can be processed in reasonable time. In real systems, November 1999. Recommendation.
in many cases, data-cube construction is performed once in [6] W3C. XQuery: A query language for XML. http://www.
the midnight, and analytical processing are applied repeat- w3.org/TR/xquery, 2001. Working draft.
edly in business hours. From the observation, we think that [7] W3C. Extensible Markup Language (XML) 1.0 (Third Edi-
the performance of the proposed scheme is acceptable. tion). http://www.w3/org/TR/REC-xml, February
2004. Recommendation.
[8] M. Yoshikawa, T. Amagasa, and T. Shimura. XRel: A Path-
7. Conclusions based Approach to Storage and Retrieval of XML Documents
Using Relational Databases. In ACM Transactions on Internet
Technology (TOIT), volume 1(1), pages 110–141, 2001.
In this paper, we proposed a system for XML-OLAP
which is constructed on top of relational databases. Our
system supports both value- and structure-based hierarchy
which enable users to make analysis of XML data taking
account of features of XML data. We first introduced the

View publication stats

Presentation CrowdStrike Falcon Identity Protection
No ratings yet
Presentation CrowdStrike Falcon Identity Protection
50 pages
Exam Papers With Answers
100% (1)
Exam Papers With Answers
13 pages
Mapping of XML Document and Relational Database (Using Structural Queries)
No ratings yet
Mapping of XML Document and Relational Database (Using Structural Queries)
6 pages
Structural XML Query Processing
No ratings yet
Structural XML Query Processing
41 pages
A Graph-Theoretic Approach To Map Conceptual Designs To XML Schemas
No ratings yet
A Graph-Theoretic Approach To Map Conceptual Designs To XML Schemas
45 pages
Cms-Mod Shop-55-10 1 1 70 1713
No ratings yet
Cms-Mod Shop-55-10 1 1 70 1713
17 pages
Graph OLAP Towards Online Analytical Processing On
No ratings yet
Graph OLAP Towards Online Analytical Processing On
11 pages
Improving Query Performance Using Materialized XML Views: A Learning-Based Approach
No ratings yet
Improving Query Performance Using Materialized XML Views: A Learning-Based Approach
12 pages
Integrating Data Warehouses With Web Data: A Survey
No ratings yet
Integrating Data Warehouses With Web Data: A Survey
16 pages
August 2016 1474359621 05
No ratings yet
August 2016 1474359621 05
5 pages
Mastering the Art of PL/SQL Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of PL/SQL Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Mastering XML: Essential Techniques
From Everand
Mastering XML: Essential Techniques
Brett Neutreon
No ratings yet
First Frequent Pattern-Tree Based XML Pattern Fragment Growth Method For Web Contents
No ratings yet
First Frequent Pattern-Tree Based XML Pattern Fragment Growth Method For Web Contents
6 pages
Cube_Lattices_A_Framework_for_Multidimensional_Dat
No ratings yet
Cube_Lattices_A_Framework_for_Multidimensional_Dat
6 pages
ImplementatioDbNOSQL
No ratings yet
ImplementatioDbNOSQL
12 pages
P24CDMCA4_unit4[1]
No ratings yet
P24CDMCA4_unit4[1]
10 pages
XML Unlocked: A Complete Guide to Mastery and Advanced Techniques
From Everand
XML Unlocked: A Complete Guide to Mastery and Advanced Techniques
Adam Jones
No ratings yet
SKP Engineering College: A Course Material On
No ratings yet
SKP Engineering College: A Course Material On
212 pages
Data Structures and Algorithms with Python
From Everand
Data Structures and Algorithms with Python
Aadinath Pothuvaal
No ratings yet
Advanced Log Management and System Monitoring: Mastering the ELK Stack
From Everand
Advanced Log Management and System Monitoring: Mastering the ELK Stack
Adam Jones
No ratings yet
(Ebook) Advanced Applications and Structures in Xml Processing: Label Streams, Semantics Utilization and Data Query Technologies (Premier Reference Source) by Changqing Li ISBN 9781615207275, 9781615207282, 1615207279, 1615207287 pdf download
100% (1)
(Ebook) Advanced Applications and Structures in Xml Processing: Label Streams, Semantics Utilization and Data Query Technologies (Premier Reference Source) by Changqing Li ISBN 9781615207275, 9781615207282, 1615207279, 1615207287 pdf download
53 pages
MSC CS Mqp0708
No ratings yet
MSC CS Mqp0708
12 pages
Data Warehouse and Data Mining Question Bank R13 PDF
No ratings yet
Data Warehouse and Data Mining Question Bank R13 PDF
12 pages
Hierarchies and different operators in OLAP
No ratings yet
Hierarchies and different operators in OLAP
6 pages
XML Programming: The Ultimate Guide to Fast, Easy, and Efficient Learning of XML Programming
From Everand
XML Programming: The Ultimate Guide to Fast, Easy, and Efficient Learning of XML Programming
Christopher Right
2.5/5 (2)
Aeitc 2021 1
No ratings yet
Aeitc 2021 1
9 pages
03 - A Survey On OLAP
No ratings yet
03 - A Survey On OLAP
9 pages
Data Mining New Notes Unit 2 PDF
No ratings yet
Data Mining New Notes Unit 2 PDF
15 pages
Accelerating XPath Evaluation in Any RDBMS
No ratings yet
Accelerating XPath Evaluation in Any RDBMS
43 pages
Advanced Applications and Structures in Xml Processing Label Streams Semantics Utilization and Data Query Technologies Premier Reference Source 1st Edition Changqing Li - The ebook in PDF/DOCX format is ready for download now
100% (2)
Advanced Applications and Structures in Xml Processing Label Streams Semantics Utilization and Data Query Technologies Premier Reference Source 1st Edition Changqing Li - The ebook in PDF/DOCX format is ready for download now
72 pages
XML-to-SQL Query Translation Literature: The State of The Art and Open Problems
100% (3)
XML-to-SQL Query Translation Literature: The State of The Art and Open Problems
17 pages
Relational Databases For Querying XML Documents Limitations and Opportunities
No ratings yet
Relational Databases For Querying XML Documents Limitations and Opportunities
13 pages
DWM UNIT 1 (2)
No ratings yet
DWM UNIT 1 (2)
67 pages
iiwas02dbb
No ratings yet
iiwas02dbb
5 pages
Abstractt
No ratings yet
Abstractt
5 pages
CS 8031 Data Mining and Data Warehousing Tutorial
No ratings yet
CS 8031 Data Mining and Data Warehousing Tutorial
9 pages
Mastering Algorithms and Data Structures
From Everand
Mastering Algorithms and Data Structures
Manish Soni
No ratings yet
Mining Tree-Based Association Rules For XML Query Answering
No ratings yet
Mining Tree-Based Association Rules For XML Query Answering
4 pages
Python Data Structures Explained: A Practical Guide with Examples
From Everand
Python Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Beginning XML
From Everand
Beginning XML
Joe Fawcett
3/5 (1)
Advances in Data Warehousing and OLAP in The Big Data Era
No ratings yet
Advances in Data Warehousing and OLAP in The Big Data Era
2 pages
Haacloud Report
No ratings yet
Haacloud Report
14 pages
Full download Advanced Applications and Structures in Xml Processing Label Streams Semantics Utilization and Data Query Technologies Premier Reference Source 1st Edition Changqing Li pdf docx
No ratings yet
Full download Advanced Applications and Structures in Xml Processing Label Streams Semantics Utilization and Data Query Technologies Premier Reference Source 1st Edition Changqing Li pdf docx
91 pages
Foundations of Databases PDF
No ratings yet
Foundations of Databases PDF
16 pages
Foundations of Databases: January 1995
No ratings yet
Foundations of Databases: January 1995
16 pages
XML and Web Database
No ratings yet
XML and Web Database
10 pages
XML Data Format
From Everand
XML Data Format
Lucas Lee
No ratings yet
DWDM-QB
No ratings yet
DWDM-QB
12 pages
Concise Oracle Database For People Who Has No Time
From Everand
Concise Oracle Database For People Who Has No Time
Billy Aung Myint
No ratings yet
Mapping XML To Key-Value Database: Abstract-XML Is A Popular Data Format Used in Many
No ratings yet
Mapping XML To Key-Value Database: Abstract-XML Is A Popular Data Format Used in Many
7 pages
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Warehousing Web Data: Keywords
No ratings yet
Warehousing Web Data: Keywords
5 pages
Data Mining and Knowledge Discovery For Big Data - Methodologies, Challenge and Opportunities (Chu 2013-10-09)
No ratings yet
Data Mining and Knowledge Discovery For Big Data - Methodologies, Challenge and Opportunities (Chu 2013-10-09)
310 pages
Mastering Python: A Comprehensive Guide to Programming
From Everand
Mastering Python: A Comprehensive Guide to Programming
Christine Lambertson
No ratings yet
Online Analytical Processing
No ratings yet
Online Analytical Processing
24 pages
An Approach To Analysis and Classification of Data From Big Data by Using Apriori Algorithm
No ratings yet
An Approach To Analysis and Classification of Data From Big Data by Using Apriori Algorithm
4 pages
Applied SOA Patterns on the Oracle Platform
From Everand
Applied SOA Patterns on the Oracle Platform
Sergey Popov
No ratings yet
(Tournier-07) ER OLAP Conceptual Model
No ratings yet
(Tournier-07) ER OLAP Conceptual Model
16 pages
Data Warehouse Design From XML Sources: Matteo Golfarelli Stefano Rizzi Boris Vrdoljak
No ratings yet
Data Warehouse Design From XML Sources: Matteo Golfarelli Stefano Rizzi Boris Vrdoljak
8 pages
Storing and Querying XML Data Using Rdbms Yesi Novaria Kunang
No ratings yet
Storing and Querying XML Data Using Rdbms Yesi Novaria Kunang
8 pages
Efficiency of Flat File Database Approach in Data PDF
No ratings yet
Efficiency of Flat File Database Approach in Data PDF
14 pages
Chap 2
No ratings yet
Chap 2
21 pages
Funnel Sagar
No ratings yet
Funnel Sagar
2,644 pages
Learn Excel Power Pivot
100% (7)
Learn Excel Power Pivot
204 pages
Configure Syslog
No ratings yet
Configure Syslog
16 pages
Revised OOSAD Module2020
No ratings yet
Revised OOSAD Module2020
72 pages
Basic Troubleshooting Tips For Veritas 4
No ratings yet
Basic Troubleshooting Tips For Veritas 4
5 pages
Case Fact Log
No ratings yet
Case Fact Log
57 pages
SQL Answer
No ratings yet
SQL Answer
6 pages
Integreted Case Study
No ratings yet
Integreted Case Study
7 pages
ICAM Reqirment
No ratings yet
ICAM Reqirment
3 pages
Lab 5
No ratings yet
Lab 5
11 pages
Rally Readthedocs Io en Latest
No ratings yet
Rally Readthedocs Io en Latest
453 pages
SAS Concepts
100% (1)
SAS Concepts
651 pages
Overview of New Features For Opentext Content Server 16 and Modules
No ratings yet
Overview of New Features For Opentext Content Server 16 and Modules
68 pages
DigiCert CPS V.5.3 1 2
No ratings yet
DigiCert CPS V.5.3 1 2
85 pages
CH-1 (Oosd) - An Overview of Object Oriented Systems Development
No ratings yet
CH-1 (Oosd) - An Overview of Object Oriented Systems Development
17 pages
Compiler Construction: University of Central Punjab
No ratings yet
Compiler Construction: University of Central Punjab
3 pages
Class VP Ver61
No ratings yet
Class VP Ver61
12 pages
Total 3.2 Years of Experience in Implementation of Java, J2EE Applications. Hibernate ORM Framework
No ratings yet
Total 3.2 Years of Experience in Implementation of Java, J2EE Applications. Hibernate ORM Framework
4 pages
Polyspace Code Verification: Call Hierarchy Report For Project: Polyspace
No ratings yet
Polyspace Code Verification: Call Hierarchy Report For Project: Polyspace
6 pages
2023-2024 DCIG TOP 5 High-End Storage Arrays
No ratings yet
2023-2024 DCIG TOP 5 High-End Storage Arrays
14 pages
Asset-V1 VIT+MSC1004+2020+type@asset+block@W5 Notes
No ratings yet
Asset-V1 VIT+MSC1004+2020+type@asset+block@W5 Notes
55 pages
Simotion New Features V51 en
No ratings yet
Simotion New Features V51 en
20 pages
Tech Memory
No ratings yet
Tech Memory
9 pages
University of Botswana Department of Computer Science
No ratings yet
University of Botswana Department of Computer Science
53 pages
LSMW For Uploading BOM: Purpose
No ratings yet
LSMW For Uploading BOM: Purpose
22 pages
Hud Sight
No ratings yet
Hud Sight
7 pages
TM09 Monitoring and Supporting Data Conversion
No ratings yet
TM09 Monitoring and Supporting Data Conversion
32 pages
Customer Management System For Bahir Dar City Water Supply Service (Shumaboo Branch)
No ratings yet
Customer Management System For Bahir Dar City Water Supply Service (Shumaboo Branch)
76 pages

OLAP693

Uploaded by

OLAP693

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

OLAP query processing for XML data in RDBMS

Conference Paper · May 2007

Chantola Kit Toshiyuki Amagasa

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Chantola KIT1 , Toshiyuki AMAGASA1,2 , and Hiroyuki KITAGAWA1,2

Abstract the data-cube by issuing multidimensional queries. One no-

1-4244-0904-7/07/$20.00 ©2007 IEEE

area 0 3 2 1.1.1 @name null

software technology that enables analysts, managers, and

executives to obtain insight into data through fast, consis-

Paths XPath SQL

of information. The information has been transformed from

sales bookinfo sales bookinfo sales bookinfo

raw data to reflect the real dimensionality of the enterprise

c “cs” c “cs” /b/p

ing to attributes of the dimension.

The left side of Figure 1 shows an overview of our pro-

Fact Dimension Table 1. Path table and node table.

5.5. ROLLUP Operations

6.3. Experimental Results [1] R. Bordawakar and C. A. Lang. Analytical Processing of

View publication stats

You might also like