0% found this document useful (0 votes)
7 views

OLAP693

OLAP693

Uploaded by

prodigy.uuuu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

OLAP693

OLAP693

Uploaded by

prodigy.uuuu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/224699283

OLAP query processing for XML data in RDBMS

Conference Paper · May 2007


DOI: 10.1109/SWOD.2007.353190 · Source: IEEE Xplore

CITATIONS READS

9 93

3 authors:

Chantola Kit Toshiyuki Amagasa


Limkokwing University of Creative Technology University of Tsukuba
9 PUBLICATIONS 25 CITATIONS 101 PUBLICATIONS 1,502 CITATIONS

SEE PROFILE SEE PROFILE

Hiroyuki Kitagawa
University of Tsukuba
411 PUBLICATIONS 3,274 CITATIONS

SEE PROFILE

All content following this page was uploaded by Chantola Kit on 03 October 2014.

The user has requested enhancement of the downloaded file.


OLAP Query Processing for XML Data in RDBMS

Chantola KIT1 , Toshiyuki AMAGASA1,2 , and Hiroyuki KITAGAWA1,2


1
Department of Computer Science, Graduate School of Systems and Information Engineering
2
Center for Computational Sciences
University of Tsukuba
1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573, Japan
[email protected], {amagasa, kitagawa}@cs.tsukuba.ac.jp

Abstract the data-cube by issuing multidimensional queries. One no-


table feature of the work is that we take account of structure-
Extensible Markup Language (XML) has become an im- based concept hierarchy, as well as value-based concept hi-
portant format for data exchange and representation on the erarchy, which is an important characteristic of XML data.
web. In addition to conventional query processing, more In this paper, we will discuss an approach to XML-
complex analysis on XML data is considered to become im- OLAP system using relational database systems based on
portant in order to discover valuable information. In this our previous work. Our contributions in this paper are as
research, we attempt to investigate an XML-OLAP, by which follows:
we can perform multidimensional analysis on XML data
taking XML’s features into account. Users are allowed to • We discuss roll-up operation for XML data-cube. It
specify XML data-cube by XPath, and perform analytical is an extension in SQL2003 for supporting OLAP op-
processing by XQuery with OLAP extension. The system is erations for relational data-cube. We employ the syn-
implemented on top of relational databases, and the given tax, and adapt it for XML data-cube. We then discuss
requests for data-cube specification and analysis are trans- its implementation using the functionality of relational
lated into SQL so that they can be processed using the un- database systems.
derlying system. We show the feasibility of the proposed
• We evaluate the performance of the proposed scheme
scheme by experimental evaluations.
by a series of experiments. The experimental results
show that the proposed scheme can deal with 100MB
XML data with reasonable processing time.
1. Introduction
The rest of this paper is organized as follows: in Sec-
tion 2, we introduce preliminaries which we describe about
Since its emergence in 1998, Extensible Markup Lan-
OLAP and XML. Then, in Section 3, we discuss related
guage (XML) [7] has become a de facto standard for data
works. In Section 4, we show an overview of our proposed
exchange and representation on the web. Now, XML has
system and the definitions of fact and dimension of XML
been used in a wide spectrum of application domains, such
data, XML hierarchies, and data-cube on XML data. We
as web documents, business documents, and log data. For
also discussed OLAP extensions to XQuery in this section.
this reason, in addition to conventional simple query re-
In Section 5 we describe our implementation issue which
trieval, more complex ways to make analysis of XML data
we will discuss relational XML storage, data cube construc-
are considered to be more and more important in order to
tion, and query processing with both structure- and value-
extract useful information from massive XML data.
based grouping, and “ROLLUP” operation. In Section 6,
In our previous research [2], we proposed a model for
we give experimental evaluation. Finally, in Section 7, we
OLAP analysis on XML data using relational databases.
conclude this paper.
Specifically, for allowing users to specify facts and dimen-
sions about XML data, we employ slightly extended XPath
expressions. The system extracts corresponding XML frag- 2. Preliminaries
ments from the underlying XML database based on the fact
and dimension specifications, and constructs multidimen- In this section, we briefly overview OLAP, XML and its
sional XML data-cube. The users then make analysis on query languages, XPath and XQuery.

1-4244-0904-7/07/$20.00 ©2007 IEEE

7
2.1. Online Analytical Processing (OLAP) System Implementation
did pid nid nnum tname value
p Pexp poccu
0 1 0 1 bookinfo null

sales bookinfo

0 2 1 1.1 c null
1 /bookinfo 1

2 /bookinfo/c 2

area 0 3 2 1.1.1 @name null

Path
c
c 3 /bookinfo/c/@name 4

Node table
Online Analytical Processing (OLAP) is a category of Path table
na n
me a
m 0 3 3 1.1.1.1 CDATA math
“m e
ath “
” c
s

3 /bookinfo/c/c 3
k k 0 4 4 1.1.2 c null

a a
c

Approach
c
c 4 /bookinfo/c/c/@name 6
n n
name na
m
“linalgebra” e
“d n
b” a
m 0 …

t s
e

w

ts k
e

o
b

o a 5 /bookinfo/c/c/b 6

i
1 9 46 1 sales Null

software technology that enables analysts, managers, and


b b b b b b

6 /bookinfo/c/c/b/t 12

b b b b
1 10 47 1.1 area null

b 7 /bookinfo/c/c/b/p 12
1 11 48 1.1.1 kanto null
t p t p t p t p t p t p

8 /sales 1
t q t q t q t q t q t q 1 12 49 1.1.1.1 tsukuba null

executives to obtain insight into data through fast, consis-


1 /sales/area/kanto/tsukuba 1
1 13 50 1.1.1.1.1 b null 1

Paths XPath SQL


imension Query Translation
tent, interactive access to a wide variety of possible views Fact /D
xmlcube jke

of information. The information has been transformed from


di jk did pexp d jk
pexp y i pexp e did pexp jkey
d ey
d y

0 /bookinfo/ c/c/b/ A
0 /bookinfo/ c/c A

tuple
p 0 /bookinfo/ c/ A 0 /bookinfo /c/c/b /p A

tuple tuple
/b/p
c/b/p

sales bookinfo sales bookinfo sales bookinfo


0

Fact
/bookinfo/ c/c
/b/p
B

Dimension
0
/bookinfo/ c/c/b/
p
B
0
XML
/bookinfo/ c/
c/b/p
B 0 /bookinfo/ c/c/b/ p B

raw data to reflect the real dimensionality of the enterprise


name name name
area c /bookinfo/ c/c/b/

“math” area
/bookinfo/ c/c 0 C

c area
0 C p

c “cs” c “cs” /b/p


0
/bookinfo/ c/
C 0 /bookinfo/ c/c/b/ p C

Data-cube
c/b/p

kanto c
b
name
“linear algebra” kanto c
name
“db” kansai
cc
name
“web”
0
table
/bookinfo/ c/c
D
0
table
/bookinfo/ c/c/b/
p
D
/bookinfo/ c/

tsukuba
/b/p 0 D 0 /bookinfo/ c/c/b/ p D
c/b/p

tsukuba kyoto b
b1
b
p1 b2 p4 b b6 p5
b b

as understood by users.
/bookinfo/ c/c/b/

pp
/bookinfo/ c/c 0 E /bookinfo/ c/
0 E p 0 E 0 /bookinfo/ c/c/b/ p E

p
/b/p c/b/p

t q t q t q

XQu
/bookinfo/ c/c/b/ /bookinfo/ c/
/bookinfo/ c/c 0 F 0 F 0 /bookinfo/ c/c/b/ p F
0 F p c/b/p
/b/p

1000
2000
D

e
Exte ry w/OL XQuery
When considering OLAP, star schema, cube, and aggre- nsio
n
AP
w/OLAP SQL
Extension
Query Translation
gation operations are the most important concepts. To rep- group1 group2 group3 ---
--- --- --- ---
resent the multidimensional data model, star schema, that --- --- --- ---
total total
--- --- --- ---
consists of single fact table and some dimension tables, is --- --- --- ---
g1 g2 g3 qty g1 g2 g3 qty
--- --- --- ---
used. Each dimension table contains columns correspond- --- --- --- ---

ing to attributes of the dimension.


An OLAP system models the input data as a logical mul- Figure 1. System overview.
tidimensional cube with multiple dimensions which pro-
vides the context for analyzing measures of interest. To
analyze the data with the cube structure, various aggrega- existing OLAP systems, 3) XML is used for both data rep-
tion operations, namely, drilling, pivoting (or rotating), and resentation and analysis. In order to support complex an-
slicing-and-dicing, are used to change the number of dimen- alytical operations, they also proposed new syntactical ex-
sions and the resolutions of dimensions of interest. tensions to XQuery, such as “GROUP BY”, “ ROLLUP”,
“TOPOLOGICAL ROLLUP”, “CUBE”, and “TOPOLOG-
2.2. XML, XPath, and XQuery ICAL CUBE”. In our research, we employ the syntax of
“GROUP BY ROLLUP” and “GROUP BY TOPOLOGI-
CAL ROLLUP” to allow users to specify OLAP operation
XML has become the language of choice for data rep-
in XQuery.
resentation across a wide range of application. It has been
Jensen et al. [4] proposed a scheme for specifying OLAP
designed to represent both structured and semi-structured
cubes on XML data. They integrated XML and relational
data. An XML data is basically modeled as a labeled tree:
data at the conceptual level based on UML, which is easy to
elements and attributes are mapped into nodes; directed
understand by system designers and users. In their scheme,
nesting relationships are mapped into edges in the tree.
a UML model is built from XML data and relational data,
XML data can be queried by XML query languages such
and the corresponding UML snowflake diagram is then cre-
as XPath and XQuery. XPath [5] is a language for address-
ated from the UML model. In particular, they considered
ing portion of an XML data. We can specify an XML sub-
how to handle dimensions with hierarchies and ensuring
tree in term of a navigational path over XML tree by condi-
correct aggregation.
tions on the element’s label, value, and relationship among
nodes along the path.
XQuery [6] is a query language designed to query col- 4. An Overview of the Proposed XML-OLAP
lection of XML data. XQuery uses XPath as a sub-language System
to address specific parts of an XML document. It employs
SQL-like FLWOR (FOR, LET, WHERE, ORDER BY, RE-
TURN) expression for performing joins.
4.1. System Overview

The left side of Figure 1 shows an overview of our pro-


3. Related Works posed scheme. According to the content of XML data, a
user at first gives a fact path and some dimension paths in
Bordawakar et al. [1] investigated various issues related XPath expression to denote his/her interest. Referring to
to XML data analysis, and proposed a logical model for the given fact and dimension paths, the system produces
XML analysis based on the abstract tree-structured XML an XML cube. After getting the cube, the user can make
representation. In particular, they proposed a categoriza- analysis of the XML data-cube using XQuery with OLAP
tion of XML data analysis system: 1) XML is used simply extensions.
for external representation for OLAP results, 2) Relational The following discusses how XML cube can be con-
data is extracted from XML data, and then processed with structed in our system.

8
4.2. Formal Definitions Concept Hierarchy The concept hierarchy is a notable
feature of traditional OLAP systems by which we can carry
To construct an XML data-cube, we first need to specify out flexible grouping operations over the data items stored
fact and dimensions. Let us look at the definitions of fact in the fact table. As with the traditional OLAP systems, we
and dimensions. assume that value-based concept hierarchies are given be-
forehand. We do not go into the detail of how to represent
such a hierarchy, due to the page limitation. When deal-
Facts about an XML Data A fact-table in a traditional ing with XML data in the same context, we need a special
OLAP system stores data items being analyzed. We at- consideration on the semistructured nature. Specifically, we
tempt to define the facts in an XML data after the traditional have to take into account structure-based concept hierarchy
OLAP way. In order to identify the facts, we use XPath as which is naturally represented as the hierarchical structure
the query language. For example, when a user wants to get of XML data.
information of book sales from sales XML data as in the Taking Figure 2 for example, all books (b) are catego-
upper left side of Figure 2, the related data items can be ob- rized by the XML hierarchies according to the area or book
tained by the fact path pf = doc("sales.xml")//b. category. The structure-based concept hierarchy allows us
to aggregate facts using such XML data structure. We will
discuss the detail later.
Definition 1 (Fact path) A fact path (pf ) is an absolute
XPath expression that identifies data items of interest.
Data Cube on XML Data We are now ready to define
data cube on XML data using the concepts of the fact and
Dimensions Having fixed the fact data, we might addi- dimension paths. Before going into the definition, we intro-
tionally need some dimensions whose values are used to duce some notations as helpers. For a given XPath expres-
group the facts together for the subsequent aggregation op- sion p, [[p]] denotes an evaluation of p, and the result would
erations. In traditional OLAP systems, dimensions are be XML nodes, string-values, or a boolean. Let [[p]] denotes
given as independent tables associated with the fact table. an evaluation of p where p represents an XPath expression.
In this work we try to define a dimension as an XPath query,
but we need to care about the relationship between the fact Definition 3 (XML data-cube) An XML-cube is defined
data and dimensions. In order to ensure this, a dimension as (pf , D) where pf is a fact path and D =
path is in either of the two cases: relative path from the fact {pd1 , pd2 , . . . , pdn } is a set of dimension paths. A fact f in
path and absolute path with referential constraints. the cube is an n + 1-tuple (f, d1 , . . . , dn ) where f ∈ [[pf ]]
and each di is obtained by evaluating pdi : [[pdi ]]f if pdi is
in a relative form or [[pdi ]] where pdi can be obtained by
Definition 2 (Dimension path) A dimension path is an
replacing each occurrence of pf /pr in pdi with [[pr ]]f . n is
XPath expression (pd ) in either of the two forms:
the rank of the XML-cube.
1. pd is a relative path expression originated from the fact Let us consider an XML data-cube as an ex-
path pf , or ample (Figure 2). It is defined as (pf , {pd }),
where pf =doc("sales.xml")//b and
2. pd is an absolute path expression contains at least one pd =doc("bookinfo.xml")//b[t =pf /t]/p.
condition with the fact path pf . A tuple can be extracted as follows. Firstly, fact
data can be extracted by evaluating fact path like
Figure 2 shows an example of fact and dimension paths. [[pf ]] = {b1 , b2 , . . . , b6 }. For each fact data bi , we
The circles on the top left document represent the facts can identify corresponding dimension data in another
corresponding to pf . When we want to use the book ti- XML data as specified by pd . When evaluating pd , we
tle as a dimension for the subsequent analysis, a dimen- need to rewrite the path according to the fact data. For
sion path can be given as pd1 =t, which is a relative path example, for the fact b1 , pf /t, which is a part of pd ,
from pf . If we are interested in grouping the books ac- is rewritten as [[pf /t]]b1 = {"A"}, that turns out to
cording to price ranges represented in another XML data be doc("bookinfo.xml")//b[t = "A"]/p.
(the upper right document of Figure 2), we need to spec- In this way, we can extract all tuples from
ify absolute path expression with referential constraints like the data cube, that are set of 2-tuple:
pd2 =doc("bookinfo.xml")//b[t = pf /t]/p. {(b1, p1), (b2, p4), (b3, p3), (b4, p3), (b5, p2), (b6, p5)}.
As can be seen from the example, for a given book, we can In contrast to the existing OLAP, and XML-cube may
obtain corresponding price in another XML data by using contain much information more than the dimensionality
title as the clue. (what we call “rank”). That is, each XML fragment may

9
sales.xml bookinfo.xml

Fact Dimension Table 1. Path table and node table.


sales bookinfo pid pexp poccur did pid nid nnum tname value
area name 1 /bookinfo 1 0 1 0 1 bookinfo null
“math” c c name 2 /bookinfo/c 2 0 2 1 1.1 c null
kanto kansa name “cs”
i 3 /bookinfo/c/@name 4 0 3 2 1.1.1 @name null
“linear algebra”c name c
c name
tsukuba osaka kyoto “db” 9 /sales 1 0 3 3 1.1.1.1 CDATA math
“web”
b b b b b b 10 /sales/area 1 0 7 8 1.1.2.2.1 t null
b b b b b b
b1 b2 b3 b4 b5 b6 11 /sales/area/kanto 1 0 7 9 1.1.2.2.1.1 #TEXT A
t q t q t q t q t q t q t p1
p t p2p t p3
p t p4
p t p t p5
p 12 /sales/area/kanto/tsukuba 1 … … … … … …
A 10 D 20 13 /sales/area/kanto/tsukuba/b 3 1 9 46 1 sales null
C 10 C 60 B 40 F 30 A 1000 B 3000 C 8000 D 2000 E 6000 F 3400
14 /sales/area/kanto/tsukuba/b/t 6 1 10 47 1.1 area null
15 /sales/area/kanto/tsukuba/b/q 6 1 11 48 1.1.1 kanto null
… … … … … … … … …

xmlcube
tuple tuple tuple
sales bookinf sales bookinfo sales bookinfo
area
kanto
co
name
“math”
name
area c
name
“cs”
name
area c
name
“cs”
name
5. Implementation Using Relational Database
c “linear algebra” kanto c “db”
kansa
c “web”
tsukuba
b tsukuba
b
i
kyoto
b
Systems
b1 b p1 p b2 b p4 p b6 b p5 p
t q t q t q
A 10 1000 D 20 2000 F 30 3400
This section discusses an implementation of the pro-
posed model and grouping operations (Figure 1, right). We
Figure 2. Facts, dimensions, and Sales XML try to make the best use of relational databases as the under-
data-cube. lying data storage. The reasons are: 1) there are many com-
mercial and open source products, 2) enormous amount of
information resources are stored in relational systems, and
3) we can leverage established relational XML storage tech-
contain more information than a numerical value, such as niques. In addition, we can utilize grouping functionalities
elements, texts, attributes, and hierarchical information. In which are supported in most relational database systems,
order to form a cube-like structure, we need to specify some to implement value- and structure-based grouping of XML
of them as dimensions of the cube structure. data.
For instance, each tuple of the rank 1 XML data-cube in
Figure 2 (lower side) contains two XML fragments of books
coming from “sales.xml” and prices from “bookinfo.xml”. 5.1. Relational XML Storage
According to the fragments, this XML data-cube potentially
has five attribute values: title, quantity, area, price, and cat- We employ the path-approach [8] for mapping XML
egory. Assume that we are interested in getting the informa- data to relational tables, because we can manage any well-
tion related to the book sales area and price, we can create formed XML documents with fixed relational schema and
a cube by specifying the area and price as the dimension. realize practical subset of XPath solely by the use of SQL
functionalities. Due to the limitation of pages, we just show
a brief overview. In the path-approach, an XML node is ba-
4.3. OLAP Extensions to XQuery
sically mapped to a relational tuple of two tables, path table
containing all absolute path expression of all XML nodes,
Once the data cube is constructed, we perform multidi- and node table containing all XML node information. Ta-
mensional analysis using the dimensions and related infor- ble 1 (left) shows the path table extracted from “sales.xml”
mation such as XML hierarchies. In our system, we at- and “bookinfo.xml”. In the node table (Table 1, right), there
tempt to use XQuery as the user query language. However, are document id (did), pid (path id), nid (node id), nnum
the current version of XQuery does not support aggrega- (node number), tname (tage name), and value.
tion function. So we employ the syntax of OLAP extension
for XQuery [1], “GROUP BY ROLLUP” and “GROUP BY
5.2. Extracting Fact and Dimensions
TOPOLOGICAL ROLLUP”. The same as the roll-up op-
eration in ordinary OLAP systems, “ROLLUP” enables a
“SELECT” statement to calculate multiple levels of subto- The first step is to extract fact and dimensions. As dis-
tals across a specified group of dimensions. It also cal- cussed in Section 4.2, a fact and its dimensions are XML
culates a grand total. “ROLLUP” is an extension to the sub-trees specified by XPath queries. Hence, we can repre-
“GROUP BY” clause so its syntax is extremely easy to sent the fact (or a dimension) as a part of node table. This
use. The latter, “TOPOLOGICAL ROLLUP”, is similar can be achieved by evaluating the fact (dimension) path, and
to “ROLLUP” but for computing structure-based grouping storing the result as a new table. Those tables can be defined
over XML data. as either views or materialized views.

10
5.3. Data Cube Construction the same 3-depth prefixes, “/sales/area/kanto”
and “/sales/area/kansai”.
In the next we create an XML data-cube. For this pur- In fact, the proposed grouping operation can be imple-
pose, we need to establish the relationships between the fact mented in many ways, but an important remark is that
and the dimension as described in Section 4.2. We join it can be realized solely by the functionality of SQL.
the base relations by giving the referential constraints as One possible way is to leverage the string match func-
the join key. XML data-cube table containing all attributes tionality provided by the database system. More pre-
from the fact and dimension, and each record consists of cisely, we can make use of regular expressions to ex-
data from the fact and dimension which have the same book tract substrings, and use them with the “GROUP BY”
title. clause. Assume that we would like to use the first two
tags to group the facts, e.g., use “/sales/area” out of
5.4. Query Processing “/sales/area/kanto/tsukuba/b”, we can achieve
this by:
As discussed in Section 4.3, we use XQuery with OLAP SELECT ...
extensions as the user query language. In order to process FROM ...
a query, we need to translate the query into SQL, because WHERE ...GROUP BY regexp_replace(dim.pexp,
’ˆ(/[ˆ/]+/[ˆ/]+)/.+’, ’\\1’)
we make use of relational database systems as the query
processing engine. In fact, there have been several works Another possibility is to introduce dedicated indexes
on XQuery to SQL query translation [3], and we can borrow based on Dewey encodings or prime numbers. They might
those ideas. So, in this paper, we focus on how to implement be good for speeding up the grouping operations compared
OLAP operations using SQL. Specifically, we discuss how to the above approach. The comparison might be an inter-
to realize structure-based grouping and roll-up operations. esting topic to research.

5.5. ROLLUP Operations


Structure-based Grouping Our basic strategy is to uti-
lize path expressions as the clue to perform grouping. One possible way to implement roll-up operations
Specifically, for a given data item, we need to compute the (“GROUP BY ROLLUP” and “TOPOLOGICAL
prefix of each data, and then perform grouping on the pre- ROLLUP”) in the extended XQuery is to directly
fixes. The level of grouping can be controlled by the length translate them into the counterpart in SQL2003, in which
of the path prefixes. OLAP operations are supported. However, SQL2003 is not
Now, we discuss how to perform grouping operations supported in many database systems. For this reason, we
using path expressions. Let us introduce some notations. try to realize the roll-up operations using the functionality
Given an XML node n, let pexp(n) denote n’s absolute of SQL-92, which is supported in most systems. Here
path expression, and let pref ix(exp, i) denote path expres- we show how roll-up operations are applied to XML
sion exp’s i-th prefix, e.g., pref ix(“/a/b/c”, 1) = “/a”, data-cube. As mentioned in Section 4.3, “ROLLUP” and
and pref ix(“/a/b/c”, 2) = “/a/b.” Then, the grouping “TOPOLOGICAL ROLLUP” create subtotals that roll
can be performed in the following way: up from the most detailed level to a grand total, we use
“UNION ALL”, which enable us to compute set union over
1. Let the depth, which is the distance from the root, different grouping levels, to implement the operations.
of the dimension be d, e.g., the depth of a path
“/sales/area/kanto/tsukuba/b” is d = 5.
6. Experimental Evaluation
2. Find the common prefix of all path expressions
and let the depth be i, e.g., the three path expres- 6.1. Experimental Setup
sions, “/sales/area/kanto/tsukuba/b”,
“/sales/area/kansai/osaka/b”, and All experiments were performed in Sun Microsystems
“/sales/area/kansai/kyoto/b” have Sun Fire X4200 server whose CPU is a 2-way Dual Core
the same prefix path “/sales/area”. As a AMD Opteron(tm) processor (2.4GHz). This machine has
consequence, we get i = 2. 16GB memory and runs Sun OS 5.10. We used Java ver-
sion 1.5.0 09 to parse XML data to relational tables, and
3. The level-j (i ≤ j ≤ d) grouping can be computed PostgreSQL 8.1.4 to perform query processing.
by calculating pref ix(pexp(n), j) for each dimension For the experimental data, we used XMark data which is
value n, e.g., Referring to the previous three path ex- a comprehensive distributed system benchmarking and op-
pressions, let j be 3. The pref ix(pexp(n), 3) com- timization suite. We tested the following sizes of XML data:
putes two level-3 groupings of the paths which have 10MB, 100MB, 200MB, 300MB, 400MB, and 500MB.

11
100,000,000.000
concepts of fact path, dimension path, value- and structure-
10,000,000.000
based concept hierarchy, and XML data-cube. We then dis-
1,000,000.000
cussed OLAP extension to XQuery. For the implementa-
Time (ms)

100,000.000
tion issues, we use the path approach for mapping XML
10,000.000
data to relations, and we utilize “UNION ALL” to perform
1,000.000
“GROUP BY ROLLUP” operation for both structure- and
100.000
value-based groupings. Our experiments with large collec-
10.000
tions of XML data show that the “GROUP BY ROLLUP”
10MB 100MB 200MB 300MB 400MB 500MB queries perform less than 10 sec. for 500MB XML data.
File size The results show the effectiveness of our proposed tech-
nique.
item quantity payment For the future research, we try to improve the perfor-
payqty rollup(payment) rollup(region)
rollup(regionpay) mance of data-cube construction. We also plan to investi-
gate how to incorporate textual features such as word vec-
tors of XML data into the analytical processing.
Figure 3. Query processing time.
8. Acknowledgments
6.2. Benchmark Queries
This research is partly supported by the Grant-in-Aid
For the benchmark query, we give a fact path, pf = for Scientific Research (17700110) from Japan Society for
doc("xmark.xml")//item, and two dimension paths, the Promotion of Science (JSPS), Japan, and the Grant-in-
pd1 = quantity and pd2 = payment. Aid for Scientific Research on Priority Areas (18049005)
We ran three queries to show the performance of roll- from the Ministry of Education, Culture, Sports, Science
up functions which we can calculate the total quantity of and Technology (MEXT), Japan.
item grouped by value-based (payment), structure-based
(region), and the combination (regionpay). References

6.3. Experimental Results [1] R. Bordawakar and C. A. Lang. Analytical Processing of


XML Documents: Opportunities and Challenges. In SIG-
MOD Record, volume 34(2), pages 27–32, 2005.
Figure 3 shows the elapse times for data-cube construc- [2] K. Chantola, T. Amagasa, and H. Kitagawa. Towards Ana-
tion and query processing. At first, “item”, “quantity”, and lytical Processing of XML Data. In IPSJ Technical Report,
“payment” are created. After extracting those tables, the volume 78 (2006-DBS-140 (I)), pages 201–208, 2006.
data-cube “payqty” is constructed. For the query process- [3] D. DeHaan, D. Toman, and M. P.Consens. A Comprehensive
ing, we had done three roll-up operations, value-based (pay- XQuery to SQL Translation Using Dynamic Interval Encod-
ment), structure-based (region), and their combination (re- ing. In Proc. ACM SIGMOD 2002, pages 623–634, 2003.
gionpay). The results show that the processing time for [4] M. R. Jensen, T. H. Moller, and T. B. Pedersen. Specifying
data-cube construction is quite time consuming even for OLAP Cubes on XML Data. In Proc. SSDBM, pages 101–
112, 2001.
100MB data. However, the important remark here is that
[5] W3C. XML Path Language (XPath) Version 1.0. http:
once the data-cube is constructed, analytical query process- //www.w3.org/TR/1999/REC-xpath-19991116,
ing can be processed in reasonable time. In real systems, November 1999. Recommendation.
in many cases, data-cube construction is performed once in [6] W3C. XQuery: A query language for XML. http://www.
the midnight, and analytical processing are applied repeat- w3.org/TR/xquery, 2001. Working draft.
edly in business hours. From the observation, we think that [7] W3C. Extensible Markup Language (XML) 1.0 (Third Edi-
the performance of the proposed scheme is acceptable. tion). http://www.w3/org/TR/REC-xml, February
2004. Recommendation.
[8] M. Yoshikawa, T. Amagasa, and T. Shimura. XRel: A Path-
7. Conclusions based Approach to Storage and Retrieval of XML Documents
Using Relational Databases. In ACM Transactions on Internet
Technology (TOIT), volume 1(1), pages 110–141, 2001.
In this paper, we proposed a system for XML-OLAP
which is constructed on top of relational databases. Our
system supports both value- and structure-based hierarchy
which enable users to make analysis of XML data taking
account of features of XML data. We first introduced the

12

View publication stats

You might also like