Soft Computing in XML Data Management Intelligent Systems From Decision Making To Data Mining Web Intelligence and Computer Vision 1st Edition Barbara Oliboni PDF Download
Soft Computing in XML Data Management Intelligent Systems From Decision Making To Data Mining Web Intelligence and Computer Vision 1st Edition Barbara Oliboni PDF Download
https://ebookbell.com/product/soft-computing-in-materials-development-
and-its-sustainability-in-the-manufacturing-sector-amar-
patnaik-47301994
https://ebookbell.com/product/soft-computing-in-data-science-7th-
international-conference-scds-2023-virtual-event-
january-2425-2023-proceedings-marina-yusoff-49420188
https://ebookbell.com/product/soft-computing-in-smart-manufacturing-
solutions-toward-industry-50-tatjana-sibalija-editor-j-paulo-davim-
editor-50983288
https://ebookbell.com/product/soft-computing-in-management-and-
business-economics-volume-2-anna-m-gillafuente-52957428
Soft Computing In Textile Engineering Woodhead Publishing Series In
Textiles Abhijit Majumdar
https://ebookbell.com/product/soft-computing-in-textile-engineering-
woodhead-publishing-series-in-textiles-abhijit-majumdar-2341558
https://ebookbell.com/product/soft-computing-in-green-and-renewable-
energy-systems-1st-edition-arturo-pachecovega-auth-2451436
https://ebookbell.com/product/soft-computing-in-humanities-and-social-
sciences-1st-edition-rudolf-seising-2456110
https://ebookbell.com/product/soft-computing-in-inventory-management-
inventory-optimization-1st-ed-2021-nita-h-shah-editor-34116490
https://ebookbell.com/product/soft-computing-in-industrial-
electronics-1st-edition-professor-seppo-j-ovaska-4187988
Zongmin Ma and Li Yan (Eds.)
Soft Computing in XML Data Management
Studies in Fuzziness and Soft Computing, Volume 255
Editor-in-Chief
Prof. Janusz Kacprzyk
Systems Research Institute
Polish Academy of Sciences
ul. Newelska 6
01-447 Warsaw
Poland
E-mail: [email protected]
Vol. 238. Atanu Sengupta, Tapan Kumar Pal Vol. 247. Michael Glykas
Fuzzy Preference Ordering of Interval Fuzzy Cognitive Maps, 2010
Numbers in Decision Problems, 2009 ISBN 978-3-642-03219-6
ISBN 978-3-540-89914-3 Vol. 248. Bing-Yuan Cao
Optimal Models and Methods
Vol. 239. Baoding Liu with Fuzzy Quantities, 2010
Theory and Practice of Uncertain
Programming, 2009 ISBN 978-3-642-10710-8
ISBN 978-3-540-89483-4 Vol. 249. Bernadette Bouchon-Meunier,
Luis Magdalena, Manuel Ojeda-Aciego,
Vol. 240. Asli Celikyilmaz, I. Burhan Türksen
José-Luis Verdegay,
Modeling Uncertainty with Fuzzy Logic, 2009
Ronald R. Yager (Eds.)
ISBN 978-3-540-89923-5
Foundations of Reasoning under
Vol. 241. Jacek Kluska Uncertainty, 2010
Analytical Methods in Fuzzy ISBN 978-3-642-10726-9
Modeling and Control, 2009
ISBN 978-3-540-89926-6 Vol. 250. Xiaoxia Huang
Portfolio Analysis, 2010
Vol. 242. Yaochu Jin, Lipo Wang ISBN 978-3-642-11213-3
Fuzzy Systems in Bioinformatics
and Computational Biology, 2009 Vol. 251. George A. Anastassiou
ISBN 978-3-540-89967-9 Fuzzy Mathematics:
Approximation Theory, 2010
Vol. 243. Rudolf Seising (Ed.) ISBN 978-3-642-11219-5
Views on Fuzzy Sets and Systems from Vol. 252. Cengiz Kahraman,
Different Perspectives, 2009 Mesut Yavuz (Eds.)
ISBN 978-3-540-93801-9 Production Engineering and Management
Vol. 244. Xiaodong Liu and Witold Pedrycz under Fuzziness, 2010
Axiomatic Fuzzy Set Theory and Its ISBN 978-3-642-12051-0
Applications, 2009 Vol. 253. Badredine Arfi
ISBN 978-3-642-00401-8 Linguistic Fuzzy Logic Methods in Social
Sciences, 2010
Vol. 245. Xuzhu Wang, Da Ruan, ISBN 978-3-642-13342-8
Etienne E. Kerre
Mathematics of Fuzziness – Vol. 254. Weldon A. Lodwick,
Basic Issues, 2009 Janusz Kacprzyk (Eds.)
Fuzzy Optimization, 2010
ISBN 978-3-540-78310-7 ISBN 978-3-642-13934-5
Vol. 246. Piedad Brox, Iluminada Castillo,
Santiago Sánchez Solano Vol. 255. Zongmin Ma, Li Yan (Eds.)
Soft Computing in XML Data
Fuzzy Logic-Based Algorithms for
Video De-Interlacing, 2010 Management, 2010
ISBN 978-3-642-10694-1 ISBN 978-3-642-14009-9
Zongmin Ma and Li Yan (Eds.)
ABC
Editors
Zongmin Ma
College of Information Science and Engineering
Northeastern University
3-11 Wenhua Road
Shenyang, Liaoning 110819
China
E-mail: [email protected]
Li Yan
School of Software
Northeastern University
3-11 Wenhua Road
Shenyang, Liaoning 110819
China
DOI 10.1007/978-3-642-14010-5
c 2010 Springer-Verlag Berlin Heidelberg
This work is subject to copyright. All rights are reserved, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilm or in any other
way, and storage in data banks. Duplication of this publication or parts thereof is
permitted only under the provisions of the German Copyright Law of September 9,
1965, in its current version, and permission for use must always be obtained from
Springer. Violations are liable to prosecution under the German Copyright Law.
The use of general descriptive names, registered names, trademarks, etc. in this pub-
lication does not imply, even in the absence of a specific statement, that such names
are exempt from the relevant protective laws and regulations and therefore free for
general use.
Typeset & Cover Design: Scientific Publishing Services Pvt. Ltd., Chennai, India.
Printed on acid-free paper
987654321
springer.com
Preface
Being the de-facto standard for data representation and exchange over the Web,
XML (Extensible Markup Language) allows the easy development of applications
that exchange data over the Web. This creates a set of data management
requirements involving XML. XML and related standards have been extensively
applied in many business, service, and multimedia applications. As a result, a
large volume of data is managed today directly in XML format.
With the wide and in-depth utilization of XML in diverse application domains,
some particularities of data management in concrete applications emerge, which
challenge current XML technology. This is very similar with the situation that
some database models and special database systems have been developed so that
databases can satisfy the need of managing diverse data well. In data- and
knowledge- intensive application systems, one of the challenges can be
generalized as the need to handle imprecise and uncertain information in XML
data management by applying fuzzy logic, probability, and more generally soft
computing. Currently, two kinds of situations are roughly identified in soft
computing for XML data management: applying soft computing for the intelligent
processing of classical XML data; applying soft computing for the representation
and processing of imprecise and uncertain XML data. For the former, soft
computing can be used for flexible query of XML document as well as XML data
mining, XML duplicate detection, and so on. Additionally, it is crucial for Web-
based intelligent information systems to explicitly represent and process imprecise
and uncertain XML data with soft computing. This is because XML has been
extensively applied in many application domains which may have a big deal of
imprecision and vagueness. Imprecise and uncertain data can be found, for
example, in the integration of data sources and data generation with nontraditional
means (e.g., automatic information extraction and data acquirement by sensor and
RFID). Also XML has been an important component of the Semantic Web
framework, and the Semantic Web provides Web data with well-defined meaning,
enabling computers and people to better work in cooperation.
Soft computing has been a crucial means of implementing machine
intelligence. Therefore, soft computing cannot be ignored in order to bridge the
gap between human-understandable soft logic and machine-readable hard logic. It
can be believed that soft computing can play an important and positive role in
XML data management. Currently the research and development of soft
computing in XML data management are attracting an increased attention.
VI Preface
This book covers in a great depth the fast growing topic of techniques, tools
and applications of soft computing in XML data management. It is shown how
XML data management (like model, query, integration) can be covered with a soft
computing focus. This book aims to provide a single account of current studies in
soft computing approaches to XML data management. The objective of the book
is to provide the state of the art information to researchers, practitioners, and
graduate students of the Web intelligence, and at the same time serving the
information technology professional faced with non-traditional applications that
make the application of conventional approaches difficult or impossible.
This book, which consists of twelve chapters, is organized into three major
sections. The first section containing the first four chapters discusses the issues of
uncertainty in XML. The next four chapters, covering the flexibility in XML data
management supported by soft computing, comprise the second section. The third
section focuses on the developments and applications of soft computing in XML
data management in the final four chapters.
Chapter 1 proposes a general XML Schema definition for representing and
managing fuzzy information in XML documents. Different aspects of fuzzy
information are represented by starting from proposals coming from the classical
database context. Their datatype classifications are extended and integrated in
order to propose a complete and general approach for representing fuzzy
information in XML documents by using XML Schema. In particular, a fuzzy
XML Schema Definition is described taking into account fuzzy datatypes and
elements needed to fully represent fuzzy information.
Chapter 2 aims to satisfy the need of modeling complex objects with
imprecision and uncertainty in the fuzzy XML model and the fuzzy nested
relational database model. After presenting the fuzzy DTD model and the fuzzy
nested relational database model based on possibility distributions, the formal
approach is developed in order to map a fuzzy DTD model to a fuzzy nested
relational database schema.
Chapter 3 describes a fuzzy XML schema to represent an implementation of a
fuzzy relational database that allows for similarity relations and fuzzy sets. A flat
translation algorithm is provided to translate from the fuzzy database
implementation to a fuzzy XML document that conforms to the suggested fuzzy
XML schema. The proposed algorithm is implemented within VIREX. A
demonstrating example is presented to illustrate the power of VIREX in
converting fuzzy relational data into fuzzy XML.
Chapter 4 aims at automatically integrating data sources, using very simple
knowledge rules to rule out most of the nonsense possibilities, combined with
storing the remaining possibilities as uncertainty in the database and resolving
these during querying by means of user feedback. For this purpose, the chapter
introduces this “good is good-enough” integration approach and explains the
uncertainty model that is used to capture the remaining integration possibilities. It
is shown that using this strategy, the time necessary to integrate documents
drastically decreases, while the accuracy of the integrated document increases
over time.
Preface VII
Acknowledgements
We wish to thank all of the authors for their insights and excellent contributions to
this book and would like to acknowledge the help of all involved in the collation
and review process of the book. Thanks go to all those who provided constructive
and comprehensive reviews. Thanks go to Janusz Kacprzyk, the series editor of
Studies in Fuzziness and Soft Computing, and Thomas Ditzinger, the senior editor
of Applied Sciences and Engineering of Springer-Verlag, for their support in the
preparation of this volume. The idea of editing this volume stems from our initial
research work which is supported by the National Natural Science Foundation of
China (60873010), the Fundamental Research Funds for the Central Universities
(N090504005 & N090604012) and Program for New Century Excellent Talents in
University (NCET-05-0288).
Abstract. Topics related to fuzzy data have been investigated in the classical
database research field, and in the last years they are becoming interesting also in
the XML data context. In this work, we consider issues related to the representation
and management of fuzzy data by using XML documents. We propose to represent
different aspects of fuzzy information by starting from proposals coming from the
classical database context. We extend and integrate their datatype classifications in
order to propose a complete and general approach for representing fuzzy informa-
tion in XML documents by using XML Schema. In particular, we describe a fuzzy
XML Schema Definition taking into account fuzzy datatypes and elements needed
to fully represent fuzzy information.
1 Introduction
Issues related to the representation, processing, and management of information in
a flexible way appear in several research areas (e.g., artificial intelligence, databases
and information systems, data mining, and knowledge representation). Require-
ments related to fuzziness come from the observation that human reasoning is not
exact and precise as happen usually in personal computers. Humans do not fol-
low precise and always equal rules. Moreover, in some applications data come with
errors or are inherently imprecise since their values are subjective (e.g., values for
representing customer satisfaction degrees). Thus, it has been natural for researchers
try to incorporate flexible features in software. Hence, several proposals deal with
Barbara Oliboni
Department of Computer Science, University of Verona, Italy
e-mail: [email protected]
Gabriele Pozzani
Department of Computer Science, University of Verona, Italy
e-mail: [email protected]
Z. Ma & L. Yan (Eds.): Soft Computing in XML Data Management, STUDFUZZ 255, pp. 3–34.
springerlink.com c Springer-Verlag Berlin Heidelberg 2010
4 B. Oliboni and G. Pozzani
Our proposal for an XML Schema able to represent fuzzy data can be used by
any organization or system managing uncertain data. These users may have the ne-
cessity to exchange fuzzy information through different subsystems, locally or over
the net, and the use of fuzzy XML documents may represent a good solution. More-
over, fuzzy XML documents can be used by these systems as a storage method
for collected fuzzy data. Since, actually, there are no DBMSs implementing fuzzy
capabilities and the development of a fuzzy extention for an existing DBMS may
require too effort, fuzzy XML documents can represent a simple way to store and
manage fuzzy information, as already happen for classical data. Our proposal can
help in organizing these data providing a common and complete reference Schema
for representing fuzzy data.
This work is structured as follows: in Section 2 we present some background no-
tions useful to better understand the context of this proposal. In Section 3 we present
our proposal of an XML Schema definition introducing new fuzzy datatypes and el-
ements needed to represent fuzzy information in an XML document. In Section 4 we
give an example of an XML document satisfying the proposed Schema, by consid-
ering information managed by a weather station. In Section 5 we further extend the
proposed Schema allowing the representation of some information useful during the
fuzzy processing of an XML document. Some examples about these fuzzy process-
ing information are illustrated in Section 6. In Section 7 we discuss how a classical
XML document can be changed in order to comply with our fuzzy XML Schema
proposal and be able to represent fuzzy data. In Section 8 we give a brief description
of other approaches presented in the literature about representation and querying of
fuzzy XML documents. Finally, in Section 9 we sketch some conclusions and future
research directions.
2 Background
In this section we briefly report some background notions on fuzziness, on relational
databases dealing with fuzzy data, and on XML.
Several proposals deal with the representation of uncertain data in databases. The
relational approach [6, 7, 8] has introduced the NULL value in order to represent
unknown attribute values (i.e., none value is applicable or all values in the domain
are possible). NULL value introduces a tri-valued logic. Later on, for example in
Umano-Fukami model [27, 28], NULL value was further differentiated introducing
the fuzzy values UNKNOWN, UNDEFINED and NULL. UNKNOWN means that any
value in the domain is possible, UNDEFINED means that none of the values in the
domain is possible and NULL (it is different by the null pointer) means that we do
not know anything, in other words it may be both undefined or unknown.
However, more systematic approaches to fuzzy databases started from the notion
of fuzzy set and other related notions.
The definition of fuzzy set, introduced by Zadeh in [36], is based on the classical
notion of set and extends it to introduce flexibility. In the classical definition, a set S
6 B. Oliboni and G. Pozzani
DTD is included in the XML 1.0 standard, and thus it is widely used and sup-
ported in applications. However DTD has some limitations: it does not support new
XML features (e.g., namespaces), it has some lack of expressivity and it uses a
non-XML syntax to describe the grammar.
All these limitations are overcame by the XML Schema [32] (also called XML
Schema Definition, XSD). XML Schema can be used to express a set of rules to
which an XML document must conform in order to be considered “valid” (with
respect to that schema). XML Schema provides an object oriented approach to the
definition of XML elements and datatypes. Moreover it is compatible with other
XML technologies like Web services, XQuery (for XML documents querying) [31]
and XSLT (for XML documents presentation) [33].
Our proposal deals with the representation of fuzzy data in XML documents,
is based on the extended version of the GEFRED model proposed by Galindo
et al. [13], and uses XML Schema.
FleXchema.xsd
FuzzyOrdType.xsd
base.xsd
FuzzyNonOrdSimType.xsd
base.xsd
FuzzyNonOrdType.xsd
base.xsd
degrees.xsd
base.xsd
FMB.xsd
base.xsd
processing.xsd
base.xsd
Fig. 1 Reference relations between proposed XML schemata
<xs:complexType name="ClassicType">
<xs:sequence>
<xs:any namespace="http://www.w3.org/2001/XMLSchema"
minOccurs="1" maxOccurs="1" />
</xs:sequence>
<xs:attribute name="info" type="xs:IDREF" use="required" />
<xs:attribute name="type" type="xs:string" fixed="T1"
use="required" />
</xs:complexType>
An XML Schema for Managing Fuzzy Documents 9
<xs:complexType name="FuzzyOrdType">
<xs:sequence>
<xs:any namespace="http://stars.sci.univr.it/FuzzyOrdType"
minOccurs="1" maxOccurs="1" />
</xs:sequence>
<xs:attribute name="info" type="xs:IDREF" use="required" />
<xs:attribute name="type" type="xs:string" fixed="T2"
use="required"/>
</xs:complexType>
<xs:complexType name="FuzzyNonOrdSimType">
<xs:sequence>
<xs:any minOccurs="1" maxOccurs="1"
namespace="http://stars.sci.univr.it/FuzzyNonOrdSimType"/>
</xs:sequence>
<xs:attribute name="info" type="xs:IDREF" use="required" />
<xs:attribute name="type" type="xs:string" fixed="T3"
use="required"/>
</xs:complexType>
4. imprecise data over a discrete nonordered domain and not related by a similarity
relation, represented by datatype FuzzyNonOrdType (see Section 3.5).
<xs:complexType name="FuzzyNonOrdType">
<xs:sequence>
<xs:any
namespace="http://stars.sci.univr.it/FuzzyNonOrdType"
minOccurs="1" maxOccurs="1" />
</xs:sequence>
<xs:attribute name="info" type="xs:IDREF" use="required" />
<xs:attribute name="type" type="xs:string" fixed="T4"
use="required"/>
</xs:complexType>
This fixed attribute allows us to distinguish between the different fuzzy classes of
datatypes. Some fuzzy datatypes (e.g., possdistr, null, unknown) are defined
in several classes and we may need a way to distinguish them in order to process
them in different ways.
Finally, each datatype contains a subelement representing the actual fuzzy data.
These subelements are defined by using the any XML element and each one allows
one to insert an element selected from a referred different namespace. Each names-
pace is defined in another external XML Schema. In particular, the any subelement
in classicType refers to the basic XML Schema provided by the W3C [29]. In this
way, it is possible to specify any value of the classical crisp datatypes (e.g. strings,
integers, timestamps). Subelements in the other three datatypes refer to namespaces
defined in different XML schemata proposed by us and explained in the following
sections.
To better understand how these definitions may be used, let us consider the fol-
lowing example. It represents a classical crisp data containing the name of a cus-
tomer, where type=T1 means that the name is a crisp data, and info="ABC"
means that the related meta-information are contained in the FMB element with ID
ABC.
<name type="T1" info="ABC">
John
</name>
Up to now, we have defined datatypes able to represent the structure of the fuzzy
information. Finally, the main Schema introduces elements defining the structure
of new particular parts of a fuzzy XML document. These elements delineate the
structure of the FMB and processing information. FMB is a sequence of (in some
cases, optional) elements, each one describing a different meta-information (see
Section 3.7). Meta-information include label definitions, default margin for approx-
imate values, and similarity relations.
<xs:element name="FMB">
<xs:complexType>
<xs:sequence minOccurs="0" maxOccurs="1">
<xs:element ref="xsfmb:fcl" minOccurs="1" maxOccurs="1"/>
<xs:element ref="xsfmb:labelDefs" minOccurs="0"
maxOccurs="1"/>
<xs:element ref="xsfmb:fam" minOccurs="0" maxOccurs="1"/>
<xs:element ref="xsfmb:simRelDefs" minOccurs="0"
maxOccurs="1"/>
</xs:sequence>
</xs:complexType>
</xs:element>
The datatype ftype is the set of integer values in the range [1, 7]. It is used in the
FMB definition in order to keep information about the fuzzy type of a fuzzy object
(see Section 3.7).
<xs:simpleType name="ftype">
<xs:restriction base="xs:positiveInteger">
<xs:minInclusive value="1"/>
<xs:maxInclusive value="7"/>
</xs:restriction>
</xs:simpleType>
Finally, datatype any defines a shorthand for the any element defined by the
W3C and refering to any element and type already defined in the W3C namespace.
12 B. Oliboni and G. Pozzani
<xs:complexType name="any">
<xs:sequence>
<xs:any namespace="http://www.w3.org/2001/XMLSchema"
minOccurs="1" maxOccurs="1" />
</xs:sequence>
</xs:complexType>
For the same reason, in FuzzyOrdType we allow one to introduce also any
crisp data (on an ordered domain).
<xs:element name="crisp" type="xsb:any" />
The namespace with prefix xsb refers to the XML Schema base.xsb reported
in the previous section.
We define that fuzzy data over an ordered domain can include:
• Linguistic labels. The use of a label lies in an IDREF to its definition. This defi-
nition, given in a name and eventually a trapezoidal form, is reported in the FMB
part of the XML document (see Section 3.7). The choice to use IDREFs, storing
label definitions in the FMB, reduces the data redundancy in XML documents
but, on the other hand, requires a more complex data processing for querying
XML data.
<xs:element name="label" type="xsb:labelRefType"/>
1 1
1
0 α β γ δ 0 0 margin
lb ub d
(a) Trapezoidal distribution (b) Interval (c) Triangular distribution
Fig. 2 Continuous possibility distributions on an ordered domain
the lines connecting the two values. We will see that also labels have a trapezoidal
definition; however, trapezoidal values allow us to define a trapezoidal distribu-
tion without having a label for it. Note that, trapezoidal distributions is a general
case of interval values and triangular distributions.
<xs:element name="trapezoidal">
<xs:complexType>
<xs:sequence>
<xs:element name="alpha" type="xs:decimal"/>
<xs:element name="beta" type="xs:decimal"/>
<xs:element name="gamma" type="xs:decimal"/>
<xs:element name="delta" type="xs:decimal"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="approxvalue">
<xs:complexType>
<xs:sequence>
<xs:element name="d" type="xs:decimal" />
<xs:element name="margin" type="xs:decimal"
minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="possdistr">
<xs:complexType>
<xs:sequence maxOccurs="unbounded" minOccurs="1">
<xs:element name="p" type="xsb:probType"/>
<xs:element name="d" type="xsb:labelRefType"/>
</xs:sequence>
<xs:attribute name="simRel" type="xs:IDREF" use="required" />
</xs:complexType>
</xs:element>
meaning is not fixed in advance, but can be specified by the user in the string
attribute meaning. As the other kinds of degrees, also non-associated degrees
include a possibility value F and an IDREF attribute needed to retrieve the meta-
information about degrees in FMB. The choice to include the meaning inside de-
grees, instead of inside their meta-information, allows the user to easier retrieve
the meaning of degrees, reducing the data processing complexity.
<xs:complexType name="fuzzyNonAssDegree">
<xs:sequence>
<xs:element name="F" type="xsb:probType"/>
<xs:element name="meaning" type="xs:string"/>
</xs:sequence>
<xs:attribute name="info" type="xs:IDREF" use="required" />
</xs:complexType>
<xs:element name="fcl">
<xs:complexType>
<xs:sequence minOccurs="1" maxOccurs="unbounded">
<xs:element ref="fc"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="fc">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="ftype" type="xsb:ftype"/>
<xs:element name="len" type="xs:positiveInteger"
minOccurs="0"/>
<xs:element name="com" type="xs:string" minOccurs="0" />
<xs:element name="um" type="xs:string" minOccurs="0" />
<xs:element name="sym" type="xs:boolean" minOccurs="0" />
</xs:sequence>
<xs:attribute name="id" type="xs:ID"/>
</xs:complexType>
</xs:element>
Since these are the main elements, they have an ID that identifies the fuzzy object.
As we explained in the previous sections, any fuzzy element has an IDREF to the
ID associated to its auxiliary information. These IDs are also used in other auxiliary
elements to give further type-specific information. For example the user may specify
the default margin for approximate values. The margins are stored in elements of
type fam (fuzzy approximate much) together with the value much that defines the
minimum distance needed to consider two values to be very different.
<xs:element name="fam">
<xs:complexType>
<xs:sequence>
<xs:element name="margin" type="xs:nonNegativeInteger"/>
<xs:element name="much" type="xs:positiveInteger"/>
</xs:sequence>
<xs:attribute name="id" type="xs:IDREF"/>
</xs:complexType>
</xs:element>
The FMB contains also the definition of similarity relations used in the XML
document. Definitions of all similarity relations are wrapped in the simRelDefs
element. Inside it, each similarity relation is contained in one simRel element
having an id attribute that identifies univocally the relation inside the document
and a name. A similarity relation is defined by a set of triples (sim), each one
composed by two IDREFs (fid1 and fid2) refering to the two related labels and
a value (degree), in range [0, 1], that specifies the similarity degree between them.
Obviously, labels may appear in several similarity relations, and two labels may be
related with different degrees in different similarity relations.
An XML Schema for Managing Fuzzy Documents 19
<xs:element name="simRelDefs">
<xs:complexType>
<xs:sequence minOccurs="1" maxOccurs="unbounded">
<xs:element ref="simRel" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="simRel">
<xs:complexType>
<xs:sequence minOccurs="1" maxOccurs="unbounded">
<xs:element ref="sim" />
</xs:sequence>
<xs:attribute name="id" type="xs:ID" />
<xs:attribute name="name" type="xs:string" />
</xs:complexType>
</xs:element>
<xs:element name="sim">
<xs:complexType>
<xs:sequence>
<xs:element name="fid1" type="xs:IDREF" />
<xs:element name="fid2" type="xs:IDREF" />
<xs:element name="degree" type="xsb:probType" />
</xs:sequence>
</xs:complexType>
</xs:element>
Note that the trapezoidal distribution is required only for labels defined over ordered
domains. However, this constraint (as any other one) must be checked by the system
since it cannot be expressed directly in the XML Schema.
4 Example
In this section we give a simple example of an XML document satisfying the pro-
posed XML Schema, by considering information managed by a weather station. The
document represents the tomorrow forecast and in particular the temperature and the
weather at different times in the day.
Each forecast is contained in a record element. The referred time in a record is
a classical information but it is represented by using a fuzzy element, marking it to
be processed by fuzzy querying. The temperature is a numerical datum represented
with a FuzzyOrdType element (because it is based on an ordered domain), while
possible weathers are represented by a FuzzyNonOrdSimType element because they
are based on a nonordered domain. We associate a degree (accuracy) to the tem-
perature for representing the accuracy of the forecasted temperature. Moreover, at
each time several forecasts are calculated by using different meteorological models
(e.g., LAM and GCM [22]). Thus, in each record a degree (precision) represents
the precision of the forecast calculated by the model at the considered time.
In this work, we focused only on the description of new elements enabling for
representation of fuzzy information in XML documents. However, each document
has also other classical elements and it must have its own schema. The XML Schema
for the considered example has to define elements tomorrowForecast (con-
taining all records), record, and so on, eventually by refering to proposed fuzzy
elements. The following listing reports the definition of the record element in
the Schema associated to the document for the weather station. We see that fuzzy
objects have types refering to the proposed ones.
<xs:element name="record">
<xs:complexType>
<xs:sequence>
<xs:element name="model" type="xs:string" />
<xs:element name="time" type="fuzzy:ClassicType" />
<xs:element name="temp" type="fuzzy:FuzzyOrdType" />
<xs:element name="accuracy" type="dgr:fuzzyAttrDegree" />
<xs:element name="weather" type="fuzzy:FuzzyNonOrdSimType"/>
<xs:element name="precision" type="dgr:fuzzyNonAssDegree" />
</xs:sequence>
</xs:complexType>
</xs:element>
The following document portion reports a record about the 5 o’clock forecast
calculated by the LAM model. Temperature is unknown, i.e., every value is possi-
ble, (hence, its accuracy is one), while the weather is undefined. The precision
element has value zero, due to the lack of information in temperature and weather.
Note that, since this degree is not associated to any attribute or instance (i.e., it has
type FuzzyNonAssDegree), it contains also its own meaning.
An XML Schema for Managing Fuzzy Documents 21
<record>
<model>LAM</model>
<time type="T1" info="T0">
<hm>05:00:00</hm>
</time>
<temp type="T2" info="Te1">
<t2:unknown />
</temp>
<accuracy refTo="Te1" info="D1"> 1 </accuracy>
<weather type="T3" info="W1">
<t3:undefined />
</weather>
<precision info="P1">
<dgr:F> 0 </dgr:F>
<dgr:meaning>model forecast precision</dgr:meaning>
</precision>
</record>
At the same time, the GCM model may report temperature by a trapezoidal distri-
bution [24, 25, 26, 27] with an accuracy of 0, 9, while possible weather is represented
by a possibility distribution based on a similarity relation SR1. In the example, with
a percentage of 80%, tomorrow the weather will be sunny (referred by the label
“S”), while with a percentage of 30% it will be cloudy (referred by the label “C”).
We remember that label and similarity relation definitions are contained in the FMB.
<record>
<model>GCM</model>
<time type="T1" info="T0">
<hm>05:00:00</hm>
</time>
<temp type="T2" info="Te1">
<t2:trapezoidal>
<t2:alpha>24</t2:alpha>
<t2:beta>25</t2:beta>
<t2:gamma>26</t2:gamma>
<t2:delta>27</t2:delta>
</t2:trapezoidal>
</temp>
<accuracy refTo="Te1" info="D1"> 0.9 </accuracy>
<weather type="T3" info="W1">
<t3:possdistr simRel="SR1">
<t3:p>1</t3:p>
<t3:d label_id="S"></t3:d>
</t3:possdistr>
</weather>
<precision info="P1">
<dgr:F> 0.86 </dgr:F>
<dgr:meaning>model forecast precision</dgr:meaning>
</precision>
</record>
The FMB portion of the XML document reports auxiliary information about
fuzzy elements. As said in Section 3.7, the fc element contains main basic in-
formation about them. For example the fc element for the temperature may be the
following one:
<fmb:fc id="Te1">
<fmb:name>temp</fmb:name>
<fmb:ftype>2</fmb:ftype>
<fmb:com>the expected temperature</fmb:com>
<fmb:um>Celsius degrees</fmb:um>
</fmb:fc>
where Te1 is the unique ID identifying the temp fuzzy object. Hence, it is used
inside the document to link data with auxiliary information and viceversa.
In the FMB, we may retrieve also definitions of the labels with ID S (representing
sunny weather), C (representing cloudy weather), and k4 (representing a possible
value for the temperature).
<fmb:labelDefs>
<fmb:labelinfo label_id="S">
<fmb:name>sunny</fmb:name>
</fmb:labelinfo>
<fmb:labelinfo label_id="C">
<fmb:name>cloudy</fmb:name>
</fmb:labelinfo>
<fmb:labelinfo label_id="k4">
<fmb:name>temperature4</fmb:name>
<fmb:alpha>27.5</fmb:alpha>
<fmb:beta>29</fmb:beta>
<fmb:gamma>30</fmb:gamma>
<fmb:delta>30.5</fmb:delta>
</fmb:labelinfo>
</fmb:labelDefs>
An XML Schema for Managing Fuzzy Documents 23
Labels representing sunny and cloudy weathers are defined over a nonordered
domain, thus they are pure linguistic labels and they have not a trapezoidal defi-
nition. The label used to represent a temperature is defined also by a trapezoidal
distribution. Labels S and C are related also by a similarity relation defined inside
a simRel element. This similarity relation is identified by the ID SR1 and it has
also a name. Inside each sim element we may retrieve a pair of objects and their
similarity degree. In the reported example sunny and cloudy is similar with a degree
of 0, 3.
<fmb:simRelDefs>
<fmb:simRel id="SR1" name="SimilarityRelation1">
<fmb:sim>
<fmb:fid1>S</fmb:fid1>
<fmb:fid2>C</fmb:fid2>
<fmb:degree>0.3</fmb:degree>
</fmb:sim>
</fmb:simRel>
</fmb:simRelDefs>
Finally, the FMB contains information about default margin for approximate val-
ues representing temperatures. Moreover, the threshold necessary to consider two
temperatures very different is defined. In the example these two parameters have
value 1 and 5, respectively.
<fmb:fam id="Te1">
<fmb:margin>1</fmb:margin>
<fmb:much>5</fmb:much>
</fmb:fam>
has: an id attribute that identifies it, a name that represents the qualifier in queries,
and a value in the range [0, 1].
<xs:element name="qualifiers">
<xs:complexType>
<xs:sequence minOccurs="1" maxOccurs="unbounded">
<xs:element ref="qualDef" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="qualDef">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="qualifier" type="xsb:probType"/>
</xs:sequence>
<xs:attribute name="id" type="xs:ID"/>
</xs:complexType>
</xs:element>
Fuzzy quantifiers [17, 18, 34, 39] are linguistic labels that allow us to represent
uncertain quantities. They may be used in queries in order to provide the approxi-
mate number of elements fulfilling a given condition. Quantifiers may be absolute or
relative. The first ones express quantities with respect to the total number of objects
in a set (e.g., “approximately between 25 and 35”, “close to 0”). Hence, absolute
quantifiers range in R. The second ones represent the proportion between the total
number of objects in a set and the number of objects in this set that complies with the
stated condition. In other words, relative quantifiers measure the fulfillment quantity
of a certain condition (e.g., “the majority”, “about half of”). For this reason relative
quantifiers are valued in the range [0, 1].
Absolute and relative quantifiers may be represented in the same form by using a
trapezoidal representation [α , β , γ , δ ] and keeping information about their type.
Another classification of quantifiers divides them in those based on product and
those based on sum. Moreover, they may have zero, one, or two arguments. A gen-
eral definition of fuzzy quantifiers with respect to their arguments and operations is
the following one:
• quantifiers without arguments are defined simply by their trapezoidal distribution
[α , β , γ , δ ];
• quantifiers with one argument x:
– based on product: [x · α , x · β , x · γ , x · δ ];
– based on sum: [x + α , x + β , x + γ , x + δ ];
• quantifiers with two arguments x and y:
– based on product: [x · α , x · β , y · γ , y · δ ];
– based on sum: [x + α , x + β , y + γ , y + δ ].
An XML Schema for Managing Fuzzy Documents 25
Note that, in some cases, a relative quantifier may not be inside the range [0, 1].
This problem can be addressed by considering only the intersection of trapezoidal
distribution associated to the quantifier with the interval [0, 1].
In our Schema proposal, all these information about a quantifier definition are
contained in a quantDef element. Each quantifier is internally identified by an
unique id, while it is used by refering its name. Moreover, a quantifier definition
has the following subelements:
• args ∈ {0, 1, 2} specifies the number of arguments;
• AR specifies whether the quantifier is absolute (A) or relative (R);
• SP specifies whether the quantifier is based on sum (S) or product (P). When the
quantifier has not arguments a ‘-’ is provided.
Finally, all kinds of quantifiers have a trapezoidal definition provided by four ele-
ments alpha, beta, gamma, delta.
<xs:element name="quantDef">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="args">
<xs:simpleType>
<xs:restriction base="xs:nonNegativeInteger">
<xs:minInclusive value="0"/>
<xs:maxInclusive value="2"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="AR">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="A"/>
<xs:enumeration value="R"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="SP">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="S"/>
<xs:enumeration value="P"/>
<xs:enumeration value="-"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="alpha" type="xs:decimal"/>
<xs:element name="beta" type="xs:decimal"/>
<xs:element name="gamma" type="xs:decimal"/>
<xs:element name="delta" type="xs:decimal"/>
</xs:sequence>
<xs:attribute name="id" type="xs:ID"/>
</xs:complexType>
</xs:element>
26 B. Oliboni and G. Pozzani
Although quantifiers and qualifiers are information used during the processing
phase of XML documents and they are not really data, it may be useful to repre-
sent them inside documents. In fact, the processing phase is a very important issue
about fuzzy databases and information. Consider cases in which XML documents
are exchanged between several users. In these cases, it may be interesting also to
exchange processing information in order to share not only data but also semantics
and processing operators. In such a way, different users can query a document ob-
taining the same results. However, an user may be free to use his own qualifier and
quantifier definitions instead of the document ones.
On the other hand, we may define a qualifier High, with value 0, 8, that may be
used as threshold in queries about temperature. It may be used to constraint query
results to comply with the query condition with a fulfillment degree greater than
80%.
<proc:qualifiers>
<proc:qualDef id="H12">
<proc:name>High</proc:name>
<proc:qualifier>0.8</proc:qualifier>
</proc:qualDef>
</proc:qualifiers>
Note that, in fuzzy queries, quantifiers and qualifiers may be used together in or-
der to constraint results. Considering, for example, queries about temperature cited
above, we may retrieve records which temperature is Hot with a High fulfillment
An XML Schema for Managing Fuzzy Documents 27
degree (i.e., temperature overlaps for at least 80% the trapezoidal distribution defin-
ing the quantifier Hot).
Then, the designer must decide which data must be represented with a fuzzy data
type and over which kind of domain, ordered or nonordered, the interested data
are. Once the domains have been decided, each original element must be redefined
changing its type to one of the fuzzy proposed types. Data over an ordered domain
must be declared with type FuzzyOrdType, data over an nonordered domain and with
an associated similarity relation must be declared with type FuzzyNonOrdSimType,
and, finally, data over an nonordered domain and without an associated similarity
relation must be declared with type FuzzyNonOrdType.
For instance, let us consider an XML element age representing the age of a
person.
The original definition of this element may be something like:
<xs:element name="age" type="xs:integer" />
On the other hand, one possible its fuzzy definition may be:
<xs:element name="age" type="fuzzy:FuzzyOrdType" />
After this change the age can be represented by using any kind of element de-
fined for datatype FuzzyOrdType, e.g., interval, trapezoidal distribution, approxi-
mate value, and so on (see Section 3.3).
Similar considerations and changes must be done also for all other elements that
the designer want to be able to represent fuzzy information. Changes to different
elements differ only on the fuzzy datatype the designer needs to use to represent
them: classicType, FuzzyOrdType, FuzzyNonOrdSimType, or FuzzyNonOrdType.
The second step of the translation of a classical XML document to one its fuzzy
version consists of the modification of the document itself. Of course, the usage of
elements which definition has been changed must be replaced accordingly to their
new definition.
28 B. Oliboni and G. Pozzani
Continuing the example here introduced, the usage of the age element changes
from:
<age>32</age>
to something like:
<age type="T2" info="a1">
<t2:interval>
<t2:lb>31</t2:lb>
<t2:ub>34</t2:ub>
</t2:interval>
</age>
Note that the transition, from a classical XML document to a fuzzy one based
on our Schema, allows one not only to change the definition of the elements to a
fuzzy compliant version but also to enrich the XML document by using degrees,
quantifiers, and qualifiers.
the definition of values and elements introducing special elements representing pos-
sibility distributions and similarity relations. Possibility distributions may be in-
troduced through the two elements <fuzzyValue> and <fuzzyDegree>. The
first one allows the specification of the possibility degree associated to a classical
value, while the second one allows the specification of the possibility with which a
sub-element belongs to its parent element.
The Schema proposed by Gaurav et al. permits to introduce similarity relations by
using the new element <SimilarityRelation> that defines pairs composing
the similarity relation. The <SimilarityRelationRef> attribute may be used
to refer to an already defined similarity relation.
Differently from our proposal, they do not allow the use of linguistic labels and
generic degrees, thus the example described in Section 4 cannot be fully imple-
mented by using the approach proposed in [14]. The impossibility to define linguis-
tic labels does not allow to Gaurav et al. to define trapezoidal distributions (note
that trapezoidal distributions can represent also triangular distributions and inter-
vals) with a unique name that can be referred in several point of a document. Thus,
when a trapezoidal distribution is used more times inside a document, Gaurav at al.
proposal must specify more times the distribution itself. Conversely, our solution
permits to associate a name (i.e., a linguistic label) to a trapezoidal distribution in
order to refer it by using that name instead of by specifying distribution values. This
approach allows us to reuse distribution definitions, reducing documents size.
Gaurav et al. do not allow to represent fuzzy degrees too. Thus, they cannot asso-
ciate fuzzy information to classical data. For instance, they cannot represent fuzzy
information similar to the accuracy of a forecasted temperature or the precision of a
whole forecast, as we reported in the example in Section 4.
We note that all fuzzy constructs proposed by Gaurav et al. have a corresponding
rappresentation also in our Schema. A similarity relation, defined by Gaurav et al.
through the <SimilarityRelation> element, is defined in our proposal in the
FMB simRel element and it is referred by specifying its IDREF inside the element
possdistr of datatype FuzzyNonOrdSimType.
Elements <fuzzyValue> and <fuzzyDegree> defined by Gaurav et al.
represent possibility distributions and tuple degrees, respectively. Possibility dis-
tributions can be represented, by using our proposal, defining a possibility distri-
bution possdistr as specified in the FuzzyOrdType, FuzzyNonOrdSimType, and
FuzzyNonOrdType datatypes. Tuple degrees are represented in our proposal through
degrees associated to a whole tuple, by using FuzzyInstDegree.
In [20, 19], Ma et al. defined a model for representing fuzzy information modify-
ing the DTD associated to an XML document. In particular they modified the DTD
wrapping the original element definitions inside the new element <Val poss="">
which associates to the current element its possibility degree. The new element
<Dist>, composed by one or more <Val> elements, allows one to define a pos-
sibility distribution in an XML document. Moreover, Ma et al. defined two types
of distribution: disjunctive and conjunctive. The former represents a set of possible
values where actually only one of them is true at any moment, the latter represents
a set of fuzzy values everyone true with different degrees at any moment.
30 B. Oliboni and G. Pozzani
Campi et al. take into account two kinds of fuzziness: fuzziness on structure and
fuzziness on values. With respect to the first one, users can submit queries without to
specify in a precise way the structure of the XML document and of the required ele-
ments, while, with respect to values, queries do not look only for exact value match-
ing but also for similar values. These features are introduced by defining new fuzzy
path predicates (e.g., NEAR, ABOUT, and BESIDES). Fuzzy predicates allow one
to search elements, attributes, and values similar to those really required. For ex-
ample, the expression /proceedings/article[@year NEAR 2009] re-
trieves article elements, child of an element proceedings, which attribute
year has a value close to 2009. On the other hand, the user may retrieve article
elements that are close descendant of proceedings by using the expression
/proceedings{/NEAR}/article.
Fuzzy predicates can be partially satisfied by XML elements with several de-
grees. Hence, conversely to classical XPath queries, fuzzy queries return a ranked
set of nodes. Ranks associated to elements represent the similarity of returned
elements with the ones required by the query.
Moreover, Campi et al. define a method allowing one to choose how the ranks for
a query may be calculated. Users may associate to each part of a query a variable
which value represents the degree of satisfaction of the conditions. Users may define
how the ranks must be calculated combining values bound to variables.
Finally, Campi et al. proposal allows users to use fuzzy quantifiers (e.g., tall)
and qualifiers (e.g., very) inside predicates (e.g., height = very tall).
A very similar approach to fuzzy querying is proposed by Goncalves and
Tineo [15].
Using a different approach, Amer-Yahia et al. [1] do not extend XPath expres-
sions with new predicates and operators, but they introduce fuzziness by query
relaxations. They define four operations (e.g., axis generalization and leaf deletion)
on the structure of queries that, given a query, produce an its relaxed version (i.e.,
a query containing the original one). Relaxations broaden the scope of the path ex-
pressions provided in the original query. A ranking strategy associates a penalty to
each modification applied to a query through a relaxation operation. Penalties are
then used to calculate how much retrieved elements satisfy the original query.
Note that, in all proposals about fuzzy querying in the literature, query results are
sets of ranked elements where ranks represent the fulfillment degrees of retrieved
elements with respect to the query conditions.
9 Conclusion
In this work, we proposed a general XML Schema definition for representing fuzzy
information in XML documents. In our proposal, we represent different aspects of
fuzzy information by adapting a data type classification already proposed for the
relational database context, and by integrating different kinds of fuzzy information
to compose a complete definition.
For future work we plan to start from documents valid with respect to the
XML Schema proposed in this paper and to study topics related to querying and
32 B. Oliboni and G. Pozzani
References
1. Amer-Yahia, S., Lakshmanan, L.V.S., Pandit, S.: FleXPath: flexible structure and full-
text querying for XML. In: ACM (ed.) Proceedings of the 2004 ACM SIGMOD Interna-
tional Conference on Management of Data 2004, Paris, France, June 13–18, pp. 83–94.
ACM Press, New York (2004) pub-ACM:adr
2. Bosc, D., Pivert, P.: Flexible queries in relational databases – the example of the division
operator. TCS: Theoretical Computer Science 171 (1997)
3. Braga, D., Campi, A., Damiani, E., Pasi, G., Lanzi, P.: FXPath: Flexible querying of
XML documents. In: Proceedings of EuroFuse 2002 (2002)
4. Buckles, B.P., Petry, F.E.: A fuzzy representation of data for relational databases. Fuzzy
Sets and Systems 7(3), 213–226 (1982)
5. Campi, A., Guinea, S., Spoletini, P.: A fuzzy extension for the XPath query language. In:
Larsen, H.L., Pasi, G., Ortiz-Arroyo, D., Andreasen, T., Christiansen, H. (eds.) FQAS
2006. LNCS (LNAI), vol. 4027, pp. 210–221. Springer, Heidelberg (2006)
6. Codd, E.F.: A relational model of data for large shared data banks. CACM: Communi-
cations of the ACM 13 (1970)
7. Codd, E.F.: Extending the database relational model to capture more meaning. ACM
Transactions on Database Systems 4(4), 397–434 (1979)
8. Codd, E.F.: The relational model for database management. Addison-Wesley Longman
Publishing Co. Inc., Boston (1990)
An XML Schema for Managing Fuzzy Documents 33
9. Damiani, E., Marrara, S., Pasi, G.: FuzzyXPath: Using fuzzy logic an IR features to ap-
proximately query XML documents. In: Melin, P., Castillo, O., Aguilar, L.T., Kacprzyk,
J., Pedrycz, W. (eds.) IFSA 2007. LNCS (LNAI), vol. 4529, pp. 199–208. Springer, Hei-
delberg (2007)
10. Damiani, E., Marrara, S., Pasi, G.: A flexible extension of xpath to improve XML query-
ing. In: Myaeng, S.H., Oard, D.W., Sebastiani, F., Chua, T.S., Leong, M.K. (eds.) Pro-
ceedings of the 31st Annual International ACM SIGIR Conference on Research and De-
velopment in Information Retrieval, SIGIR 2008, Singapore, July 20-24, pp. 849–850.
ACM, New York (2008)
11. Dubois, D., Prade, H.: Possibility Theory: An Approach to Computerized Processing of
Uncertainty. Plenum Press, New York (1988)
12. Elmasri, R.A., Navathe, S.B.: Fundamentals of Database Systems. Addison-Wesley
Longman Publishing Co. Inc., Boston (1999)
13. Galindo, J., Urrutia, A., Piattini, M.: Fuzzy Databases: Modeling, Design, and Imple-
mentation. IGI Publishing (2006)
14. Gaurav, A., Alhajj, R.: Incorporating fuzziness in XML and mapping fuzzy relational
data into fuzzy XML. In: Haddad, H. (ed.) Proceedings of the 2006 ACM Symposium
on Applied Computing, pp. 456–460. ACM, New York (2006)
15. Goncalves, M., Tineo, L.: A new step towards flexible XQuery. Avances en sistemas e
Informática 4, 27–34 (2007)
16. ISO: ISO 8879:1986: Information processing — Text and office systems — Standard
Generalized Markup Language, SGML (1986),
http://www.iso.ch/cate/d16387.html
17. Liu, Y., Kerre, E.E.: An overview of fuzzy quantifiers. (I). interpretations. Fuzzy Sets
Syst. 95(1), 1–21 (1998)
18. Liu, Y., Kerre, E.E.: An overview of fuzzy quantifiers (II). reasoning and applications.
Fuzzy Sets Syst. 95(2), 135–146 (1998)
19. Ma, Z.: Fuzzy Database Modeling with XML (The Kluwer International Series on Ad-
vances in Database Systems). Springer-Verlag New York, Inc. (2005)
20. Ma, Z.M., Yan, L.: Fuzzy XML data modeling with the UML and relational data models.
DKE 63(3), 972–996 (2007)
21. Medina, J.M., Pons, O., Vila, M.A.: GEFRED: A generalized model of fuzzy relational
databases. Information Sciences 76(1-2), 87–109 (1994)
22. Nebeker, F.: Calculating the Weather: Meteorology in the 20th Century. International
Geophysics Series, vol. 60. Academic Press, London (1995)
23. Paoli, J., Bray, T., Sperberg-McQueen, C.M., Yergeau, F., Maler, E.: Extensible markup
language (XML) 1.0 (fourth edition). W3C recommendation, W3C (2006),
http://www.w3.org/TR/2006/REC-xml-20060816
24. Prade, H.: Lipski’s approach to incomplete information databases restated and general-
ized in the setting of Zadeh’s possibility theory. Information Systems 9(1), 27–42 (1984)
25. Prade, H., Testemale, C.: Generalizing database relational algebra for the treatment of
incomplete or uncertain information and vague queries. Information Sciences 34, 115–
143 (1984)
26. Turowski, K., Weng, U.: Representing and processing fuzzy information - an XML-based
approach. Knowl.-Based Syst. 15(1-2), 67–75 (2002)
27. Umano, M.: FREEDOM-O: A fuzzy database system. In: Gupta, M.M., Sanchez, E.
(eds.) Fuzzy Information and Decision Processes, pp. 339–349. North-Holland, Amster-
dam (1982)
28. Umano, M., Fukami, S.: Fuzzy relational algebra for possibility-distribution-fuzzy-
relational model of fuzzy data. J. Intell. Inf. Syst. 3(1), 7–27 (1994)
34 B. Oliboni and G. Pozzani
Abstract. XML has been the de-facto standard of information representation and
exchange over the web. In addition, imprecise and uncertain data are inherent in
the real world. Although fuzzy information has been extensively investigated in
the context of relational model, the classical relational database model and its
fuzzy extension to date do not satisfy the need of modeling complex objects with
imprecision and uncertainty, especially when the fuzzy relational databases are
created by mapping the fuzzy conceptual data models and the fuzzy XML data
model. Based on possibility distributions, this chapter concentrates on fuzzy
information modeling in the fuzzy XML model and the fuzzy nested relational
database model. In particular, the formal approach to mapping a fuzzy DTD model
to a fuzzy nested relational database (FNRDB) schema is developed.
1 Introduction
Li Yan
School of Software, Northeastern University, Shenyang, 110819, China
Jian Liu
School of Information Science & Engineering, Northeastern University, Shenyang, 110819,
China
Z.M. Ma
School of Information Science & Engineering, Northeastern University, Shenyang, 110819,
China
e-mail: [email protected]
Z. Ma & L. Yan (Eds.): Soft Computing in XML Data Management, STUDFUZZ 255, pp. 35–54.
springerlink.com © Springer-Verlag Berlin Heidelberg 2010
36 L. Yan, J. Liu, and Z.M. Ma
manage XML data, it is necessary to integrate XML and databases [3]. Various
databases, including relational, object-oriented, and object-relational databases, have
been used for mapping to and from the XML document. At the same time, some data
are inherently imprecise and uncertain since their values are subjective in the real
world applications. For example, consider values representing the satisfaction degree
for a film, different person may have different satisfaction degree. Information
fuzziness has also been investigated in the context of EC and SCM [25, 30, 31]. It is
shown that fuzzy set theory is very useful in Web-based business intelligence.
Fuzzy information has been extensively investigated in the context of relational
model [6, 24, 26, 28]. However, the classical relational database model and its fuzzy
extension do not satisfy the need of modeling complex objects with imprecision and
uncertainty. The requirements of modeling complex objects and information
imprecision and uncertainty can be found in many application domains (e.g.,
multimedia applications) and have challenged the current database technology [2, 7].
In order to model uncertain data and complex-valued attributes as well as complex
relationships among objects, current efforts have concentrated on the conceptual data
models [15, 16, 21, 33], the fuzzy nested relational data model (also known as an NF2
data model) [34], and the fuzzy object-oriented databases [4, 10, 12, 13, 20]. Also
there are efforts to conceptually design the fuzzy databases using the fuzzy conceptual
data models [15, 16, 21, 33]. More recently, the fuzzy object-relational databases are
proposed [9] which combine both characters of fuzzy relational databases and fuzzy
object-oriented databases. Ones can refer to [17, 18] for recent surveys of these fuzzy
data models.
Despite fuzzy values have been employed to model and handle imprecise
information in databases since Zadeh introduced the theory of fuzzy sets [35], relative
little work has been carried out in extending XML towards the representation of
imprecise and uncertain concepts. Abiteboul et al. [1] provide a model for XML
documents and DTDs and a representation system for XML with incomplete
information. The representations of probabilistic data in XML are proposed in other
previous research papers, such as [14, 22, 27, 29]. Without presenting XML
representation model, the data fuzziness in XML document is discussed directly
according to the fuzzy relational databases in [11], and the simple mappings from the
fuzzy relational databases to fuzzy XML document are provided also. Oliboni and
Pozzani [23] propose a XML Schema definition for representing fuzzy information.
They adopt the data type classification for the XML data context. A fuzzy XML data
model which is based XML DTD is proposed in [19], in which the mapping of the
fuzzy XML DTD (Document Type Definition) from the fuzzy UML data model and
to the fuzzy relational database schema are discussed, respectively. In [32], a fuzzy
XML data model based on XML Schema is developed.
The classical relational database model and its fuzzy extension do not satisfy the
need of modeling complex objects with imprecision and uncertainty. It is also true
when the fuzzy relational databases are created by mapping the fuzzy conceptual data
models and the fuzzy XML data model. Being the extension of relational data model,
the NF2 database model is able to handle complex-valued attributes and may be better
Formal Translation from Fuzzy XML to Fuzzy Nested Relational Database Schema 37
Different models have been proposed to handle different categories of data quality
(or lack thereof). Five basic kinds of imperfection have been identified in [5],
which are inconsistency, imprecision, vagueness, uncertainty, and ambiguity.
Instead of giving the definitions of the imperfect information, we herewith explain
their meanings.
Inconsistency is a kind of semantic conflict, meaning the same aspect of the
real world is irreconcilably represented more than once in a database or in several
different databases. For example, the age of George is stored as 34 and 37
simultaneously. Information inconsistency usually comes from information
integration.
Intuitively, the imprecision and vagueness are relevant to the content of an attribute
value, which means that a choice must be made from a given range (interval or set) of
values without knowing which one to choose. In general, vague information is
represented by linguistic values. Assume that, for example, we do not know exactly
the age of two persons named Michael and John, and only know that the age of
Michael may be 18, 19, 20, or 21, and the age of John is old. Then the information of
Michael’s age is an imprecise one, denoted by a set of values {18, 19, 20, 21}. The
information of John’s age is a vague one, denoted by a linguistic value, "old".
The uncertainty is related to the degree of truth of its attribute value. With
uncertainty, we can apportion some, but not all, of our belief to a given value or a
group of values. For example, the possibility that the age of Chris is 35 right now
should be 98%. The random uncertainty, described using probability theory, is not
considered in this chapter. The ambiguity means that some elements of the model lack
complete semantics, leading to several possible interpretations.
Generally, several different kinds of imperfection can co-exist with respect to
the same piece of information. For example, the age of Michael is a set of values
{18, 19, 20, 21} and their possibilities are 70%, 95%, 98%, and 85%, respectively.
Imprecision, uncertainty, and vagueness are three major types of imperfect
information and can be modeled with fuzzy sets [35] and possibility theory [36].
38 L. Yan, J. Liu, and Z.M. Ma
Many of the existing approaches dealing with imprecision and uncertainty are
based on the theory of fuzzy sets.
The concept of fuzzy sets was originally introduced by Zadeh [35]. Let U be a
universe of discourse and F be a fuzzy set in U. A membership function
μF: U → [0, 1]
is defined for F, where μF (u), for each u ∈ U, denotes the membership degree of u
in the fuzzy set F. Thus, the fuzzy set F is described as follows:
F = {μF (u1)/u1, μF (u2)/u2, ..., μF (un)/un}
The fuzzy set F is consisted of some elements just like the conventional set. But,
not being the same as the conventional set, each element in F may or may not
belong to F, having a membership degree to F which needs to be explicitly
indicated. So in F, an element (say ui) is associated with its membership degree
(say μF (ui)), and they occur together in form of μF (ui)/ui. When the membership
degrees that all elements in F belong to F are exactly 1, the fuzzy set F reduces to
a conventional one.
When the membership degree μF (u) above is explained to be a measure of the
possibility that a variable X has the value u, where X takes values in U, a fuzzy
value is described by a possibility distribution πX (Zadeh, 1978).
πX = {πX (u1)/u1, πX (u2)/u2, ..., πX (un)/un}
Here, πX (ui), ui ∈ U denotes the possibility that ui is true. Let πX be the possibility
distribution representation for the fuzzy value of a variable X. It means that the
value of X is fuzzy, and X may take one from some possible values u1, u2, ..., and
un and each one (say ui) taken possibly is associated with its possibility degree
(say πX (ui)).
Definition: A fuzzy set F of the universe of discourse U is convex if and only if
for all u1, u2 in U,
μF (λu1 + (1 − λ) u2) ≥ min (μF (u1), μF (u2))
where λ ∈ [0, 1].
Definition: A fuzzy set F of the universe of discourse U is called a normal
fuzzy set if ∃ u ∈ U, μF (u) = 1.
Definition: A fuzzy set is a fuzzy subset in the universe of discourse U that is
both convex and normal.
Paint-brush, 118.
Porcupine, 270.
Silverton, 171.
Woodpeckers, 273.
Yarding, 262-265.
Updated editions will replace the previous one—the old editions will
be renamed.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the
terms of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or
expense to the user, provide a copy, a means of exporting a copy, or
a means of obtaining a copy upon request, of the work in its original
“Plain Vanilla ASCII” or other form. Any alternate format must
include the full Project Gutenberg™ License as specified in
paragraph 1.E.1.
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
1.F.4. Except for the limited right of replacement or refund set forth
in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
ebookbell.com