Zongmin Ma-Advances in Fuzzy Object-Oriented Databases Modeling and Applications-Idea Group Publishing (2004)

Advances in Fuzzy
Object-Oriented
Databases:
Modeling and Applications
Zongmin Ma
Universit de Sherbrooke, Canada
Hershey London Melbourne Singapore
IDEA GROUP PUBLISHING
Acquisitions Editor: Mehdi Khosrow-Pour
Senior Managing Editor: Jan Travers
Managing Editor: Amanda Appicello
Development Editor: Michele Rossi
Copy Editor: Lori Eby
Typesetter: Jennifer Wetzel
Cover Design: Lisa Tosheff
Printed at: Yurchak Printing Inc.
Published in the United States of America by
Idea Group Publishing (an imprint of Idea Group Inc.)
701 E. Chocolate Avenue, Suite 200
Hershey PA 17033
Tel: 717-533-8845
Fax: 717-533-8661
E-mail: cust@idea-group.com
Web site: http://www.idea-group.com
and in the United Kingdom by
Idea Group Publishing (an imprint of Idea Group Inc.)
3 Henrietta Street
Covent Garden
London WC2E 8LU
Tel: 44 20 7240 0856
Fax: 44 20 7379 3313
Web site: http://www.eurospan.co.uk
Copyright 2005 by Idea Group Inc. All rights reserved. No part of this book may be repro-
duced in any form or by any means, electronic or mechanical, including photocopying, without
written permission from the publisher.
Library of Congress Cataloging-in-Publication Data
Advances in fuzzy object-oriented databases : modeling and applications / Zongmin Ma,
editor.
p. cm.
Includes bibliographical references and index.
ISBN 1-59140-384-7 (h/c) ISBN 1-59140-385-5 (s/c) ISBN 1-59140-386-3 (eISBN)
1. Object-oriented databases. 2. Fuzzy systems. I. Ma, Zongmin, 1965-
QA76.9.D3A34833 2004
005.757dc22
2004017843
British Cataloguing in Publication Data
A Cataloguing in Publication record for this book is available from the British Library.
All work contributed to this book is new, previously-unpublished material. The views expressed in
this book are those of the authors, but not necessarily of the publisher.
Advanc es i n Fuzzy
Obj ec t -Or i ent ed Dat abases:
Model i ng and Appl i c at i ons
Tabl e of Cont ent s
Preface .............................................................................................................. v
SECTION I
Chapter I. A Constraint Based Fuzzy Object Oriented Database
Model ............................................................................................................... 1
G. de Tr, Ghent University, Belgium
R. de Caluwe, Ghent University, Belgium
Chapter II. Fuzzy and Probabilistic Object Bases .................................. 46
T. H. Cao, Ho Chi Minh City University of Technology, Vietnam
H. Nguyen, Ho Chi Minh City Open University, Vietnam
Chapter III. Generalization Data Mining in Fuzzy Object-Oriented
Databases ....................................................................................................... 85
Rafal Angryk, Tulane University, USA
Roy Ladner, Naval Research Laboratory, USA
Frederick E. Petry, Tulane University & Naval Research Laboratory,
USA
Chapter IV. FRIL++ and Its Applications ............................................ 113
J. M. Rossiter, University of Bristol, UK & Bio-Mimetic Control
Research Center, The Institute of Physical and Chemical
Research (RIKEN), Japan
T. H. Cao, Ho Chi Minh City University of Technology, Vietnam
SECTION II
Chapter V. Fuzzy Information Modeling with the UML .................... 153
Zongmin Ma, Universit de Sherbrooke, Canada
SECTION III
Chapter VI. A Framework to Build Fuzzy Object-Oriented Capabilities
Over an Existing Database System ........................................................ 177
Fernando Berzal, University of Granada, Spain
Nicols Marn, University of Granada, Spain
Olga Pons, University of Granada, Spain
M. Amparo Vila, University of Granada, Spain
Chapter VII. Index Structures for Fuzzy Object-Oriented Database
Systems ....................................................................................................... 206
Sven Helmer, Universitt Mannheim, Germany
Chapter VIII. Introducing Fuzziness in Existing Orthogonal
Persistence Interfaces and Systems ....................................................... 241
Miguel ngel Sicilia, University of Alcal, Spain
Elena Garca-Barriocanal, University of Alcal, Spain
Jos A. Gutirrez, University of Alcal, Spain
SECTION IV
Chapter IX. An Object-Oriented Approach to Managing Fuzziness
in Spatially Explicit Ecological Models Coupled to a Geographic
Database ...................................................................................................... 269
Vincent B. Robinson, University of Toronto at Mississauga, Canada
Phil A. Graniero, University of Windsor, Canada
Chapter X. Object-Oriented Publish/Subscribe for Modeling and
Processing Imperfect Information .......................................................... 301
Haifeng Liu, University of Toronto, Canada
Hans Arno Jacobsen, University of Toronto, Canada
About the Authors ..................................................................................... 332
Index ............................................................................................................ 338
Preface
v
A major goal for database research has been the incorporation of additional
semantics into the data model. Classical data models often suffer from their
incapability to represent and manipulate imprecise and uncertain information
that may occur in many real-world applications. Since the early 1980s, Zadehs
fuzzy logic has been used to extend various data models. The purpose of intro-
ducing fuzzy logic in data modeling is to enhance the classical models so that
uncertain and imprecise information can be represented and manipulated. This
resulted in numerous contributions, mainly with respect to the popular relational
model or to some related form of it.
However, rapid advances in computing power brought opportunities for data-
bases in emerging applications in CAD/CAM, multimedia, geographic informa-
tion systems, knowledge management, etc. These applications characteristi-
cally require the modeling and manipulation of complex objects and semantic
relationships. The advances of object-oriented databases are acknowledged
outside the research and academic worlds. It proves that the object-oriented
paradigm lends itself extremely well to the requirements. Because the classical
relational database model and its extension of fuzziness do not satisfy the need
of modeling complex objects with imprecision and uncertainty, currently, much
research has concentrated on fuzzy object-oriented database models in order
to deal with complex objects and uncertain data together.
This book focuses on an important extension of the object-oriented paradigm
that allows for the inclusion of fuzzy information in this paradigm and presents
the latest research and application results in fuzzy object-oriented databases.
Some major issues on concepts, semantics, models, design, implementation, and
applications of fuzzy object-oriented databases will be investigated in the book.
The different chapters in the book were contributed by different authors and
provide possible solutions for the different types of technological problems con-
cerning fuzzy object-oriented databases. Each of the contributors to the book is
a leading researcher in the field of fuzzy object-oriented databases who has
made numerous contributions to fuzzy information engineering.
vi
Introduction
This book is organized into four major sections. The first section discusses the
issues of the representation, semantics, and models of fuzzy object-oriented
databases in the first four chapters. Chapter V describes fuzzy object-oriented
conceptual data modeling and comprises the second part. The next three chap-
ters covering the implementation issues in fuzzy object-oriented databases com-
prise the third part. Finally, the last two chapters, which comprise the fourth
part, contain applications of fuzzy object-oriented information modeling and fuzzy
databases in publish/subscribe and geographic information systems, respectively.
First, we will look at the problem of the representation, semantics, and models
of fuzzy object-oriented databases.
The authors of the Chapter I, de Tr and de Caluwe, define a fuzzy object-
oriented formal database model that allows us to model and manipulate infor-
mation in a (true to nature) natural way. The presented model was built upon an
object-oriented-type system and an elaborated constraint system, which, re-
spectively, support the definitions of types and constraints. Types and constraints
are the basic building blocks of object schemes, which, in turn, are used for
defining database schemes. Finally, the definition of the database model was
obtained by providing adequate data definition operators and data manipulation
operators. Novelties in the approach are the incorporation of generalized con-
straints and of extended possibilistic truth values, which allow for a better rep-
resentation of data(base) semantics.
Cao and Nguyen introduce an extension of the probabilistic object base model.
Their model is not the same as the probabilistic object base model that was
investigated in the literature. Their model uses fuzzy sets for representing and
handling vague and imprecise values of object attributes. A probabilistic inter-
pretation of relations on fuzzy set values is proposed to integrate them into that
probability-based framework. Then, the definitions of fuzzy-probabilistic object
base schemas, instances, and algebraic operations are presented.
Angryk, Ladner, and Petry extend the attribute generalization algorithms that
were most commonly applied to relational databases and consider the applica-
tion of generalization-based data mining to fuzzy similarity based object-ori-
ented databases. A key aspect of generalization data mining is the use of a
concept hierarchy. The objects of the database are generalized by replacing
specific attribute values with the next higher-level term in the hierarchy. This
will eventually result in generalizations that represent a summarization of the
information in the database. The authors focus on the generalization of similar-
ity-based simple fuzzy attributes for an object-oriented database (OODB) us-
ing approaches to the fuzzy concept hierarchy developed from the given simi-
larity relation of the database. They then consider application of this approach
to complex structure-valued data in the fuzzy OODB.
vii
Rossiter and Cao introduce a deductive probabilistic and fuzzy object-oriented
database language, called FRIL++, which can deal with both probability and
fuzziness. Its foundation is a logic-based probabilistic and fuzzy object-oriented
model in which a class property (i.e., an attribute or a method) can contain
fuzzy set values, and uncertain class membership and property applicability are
measured by lower and upper bounds on probability. Each uncertainly appli-
cable property is interpreted as a default probabilistic logic rule, which is defea-
sible. Probabilistic default reasoning on fuzzy events is proposed for uncertain
property inheritance and class recognition. The authors present the design, imple-
mentation, and basic features of FRIL++. FRIL++, as described in Chapter IV
can be used as a modeling and a programming language, as demonstrated by its
applications to machine learning, user modeling, and modeling with words herein.
The next section takes another look at the semantics and representation of
fuzzy object-oriented data modeling, but from the perspective of a conceptual
data model.
Conceptual data models were proposed for the conceptual design of databases
and conceptual data modeling in some nontraditional areas. Ma concentrates
on the Unified Modeling Language (UML), a set of object-oriented modeling
notations, and a standard of the Object Data Management Group (ODMG),
which can be applied in many areas of software engineering and knowledge
engineering. In order to model complex objects and uncertain data, the author
extends the class of the UML by using fuzzy set and possibility distribution
theory. The different levels of fuzziness are introduced, and the corresponding
graphical representations are given. The class diagrams of the UML can hereby
model fuzzy information.
In the third section, we see some implementation issues of fuzzy object-ori-
ented databases: building fuzzy object-oriented capabilities over an existing da-
tabase system, indexing fuzzy object-oriented database systems, and introduc-
ing fuzziness in existing orthogonal persistence interfaces and systems.
Berzal, Marn, Pons, and Vila describe both a framework and an architecture
that can be used to develop fuzzy object-oriented capabilities using the conven-
tional features of the object-oriented data paradigm. The authors present a
framework composed of a set of classical classes that gives support to fuzzily-
described complex objects. They also explain how to deal with fuzzy extensions
of object-oriented features using, as a basis, conventional object-oriented fea-
tures. The proposal given in the chapter can be used to build a fuzzy object-
oriented database system, taking as its basis an existing database system, mini-
mizing the development effort.
Helmer gives an overview of indexing techniques suitable for fuzzy object-
oriented databases (FOODBS). First, the author identifies typical query pat-
terns used in FOODBS, namely single-valued, set-valued, navigational, and type
hierarchy access. Here, the description of the patterns does not follow a par-
viii
ticular fuzzy object-oriented data model but is kept general enough to be used in
different FOODBS contexts. Second, the author presents the index structures
for each query pattern, which support the efficient evaluation of these queries.
An explanation of the basic techniques from standard index structures (like
B-trees) to sophisticated access methods (like Join Index Hierarchies) is given
in the chapter rather than an exhaustive description.
Sicilia, Garca-Barriocanal, and Gutirrez focus on how to integrate the models
and techniques that can deal with imprecise and uncertain information in the
facets of object data stores with current database design and programming
practices, so that the benefits of fuzzy extensions can be easily adopted and
seamlessly integrated in current applications. The authors try to provide some
criteria to use to select the fuzzy extensions that more seamlessly integrate into
the current object storage paradigm known as orthogonal persistence, in which
programming language object models are directly stored, so that database de-
sign becomes mainly a matter of object design. They provide concrete ex-
amples and case studies as practical illustrations of the introduction of fuzziness,
both at the conceptual and the physical levels of this kind of persistent system.
In the fourth section, we see the applications of fuzzy object-oriented informa-
tion modeling and fuzzy databases.
Robinson and Graniero use a spatially explicit, individual-based ecological mod-
eling problem to illustrate an approach to managing fuzziness in spatial data-
bases that accommodates the use of nonfuzzy as well as fuzzy representations
of geographic databases. The approach taken in the chapter uses the Exten-
sible Component Objects for Constructing Observable Simulation Models (ECO-
COSM) system loosely coupled with geographic information systems. The eco-
logical modeling problem described in the chapter is used to illustrate how com-
bining Probes and ProbeWrappers with Agent objects affords a flexible means
of handling semantic variation and serves as an effective approach to utilize
heterogeneous sources of spatial data.
The publish/subscribe systems describe such a paradigm that information pro-
viders disseminate publications to all consumers who expressed interest by reg-
istering subscriptions with the publish/subscribe system. Liu and Jacobsen no-
tice that in all existing publish/subscribe systems, neither subscriptions nor pub-
lications can capture uncertainty inherent to the information underlying the ap-
plication domain. However, in many situations, exact knowledge of either spe-
cific subscriptions or publications is not available. To address this problem, the
authors propose a new object-oriented publish/subscribe model based on possi-
bility theory and fuzzy set theory to process imperfect information for either
expressing subscriptions or publications or both combined. Furthermore, the
authors define the approximate publish/subscribe matching problem and de-
velop and evaluate the algorithms for solving it.
ix
Acknowledgments
The editor would like to acknowledge the help of all involved in the collation
and review process of the book, without whose support the project could not
have been satisfactorily completed.
Most of the authors of chapters included in this book also served as referees
for papers written by other authors. Thanks go to all those who provided con-
structive and comprehensive reviews.
A special note of thanks goes to all the staff at Idea Group Publishing, whose
contributions throughout the whole process, from inception of the initial idea to
final publication, have been invaluable.
Special thanks go to the publishing team at Idea Group Publishing. In particular
to Mehdi Khosrow-Pour, whose enthusiasm motivated me to initially accept his
invitation for taking on this project, and to Michele Rossi, who continuously
prodded via e-mail to keep the project on schedule.
In closing, I wish to thank all of the authors for their insights and excellent
contributions to this book. I also want to thank all of the people who assisted in
the reviewing process. In addition, this book would not have been possible with-
out the ongoing professional support from Mehdi Khosrow-Pour and Jan Travers
at Idea Group Publishing.
Zongmin Ma, Ph.D.
Sherbrooke, Canada
April 2004
SECTION I
A Constraint Based Fuzzy Object Oriented Database Model 1
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Chapter I
A Constraint Based
Fuzzy Object Oriented
Database Model
G. de Tr
Department of Telecommunications and Information Processing,
Ghent University, Belgium
R. de Caluwe
Department of Telecommunications and Information Processing,
Ghent University, Belgium
Abstract
The objective of this chapter is to define a fuzzy object-oriented formal
database model that allows us to model and manipulate information in a
(true to nature) natural way. Not all the elements (data) that occur in the
real world are fully known or defined in a perfect way. Classical database
models only allow the manipulation of accurately defined data in an
adequate way. The presented model was built upon an object-oriented type
system and an elaborated constraint system, which, respectively, support
the definitions of types and constraints. Types and constraints are the basic
building blocks of object schemes, which, in turn, are used for defining
database schemes. Finally, the definition of the database model was
obtained by providing adequate data definition operators and data
manipulation operators. Novelties in the approach are the incorporation of
generalized constraints and of extended possibilistic truth values, which
allow for a better representation of data(base) semantics.
2 de Tr & de Caluwe
Introduction
In this chapter, a formal object-oriented database model that is suited to model
both perfect and imperfect information is built. This model distinguishes itself
from existing fuzzy object-oriented models by integrating (generalized) con-
straints (Zadeh, 1997). These constraints are used to define the semantics and
integrity of the data and to define query criteria. Another novelty is its underlying
logical framework of extended possibilistic truth values (de Tr, 2002). More-
over, the model is built upon the Object Data Management Group (ODMG) data
model (Cattell & Barry, 2000), as far as its crisp components are considered.
The starting point for the formalism is an algebraic foundation, in which sets of
objects, operators on these sets, and constraints that are defined for these sets
are central (de Tr, de Caluwe, & Van der Cruyssen, 2000). Special domain-
specific elements that are represented by the symbol, are used to formalize
undefined (or inapplicable) data. This foundation is formally defined on the
basis of a type system and a constraint system. Starting from this basis, object
schemes and database schemes are defined, which allow for databases to be
defined rather easily. Furthermore, querying is generalized to a manageable
closed set of operators.
Contrary to existing proposals that extend a crisp model, an approach based on
generalization allows databases to be defined that handle perfect data as special
cases of imperfect data. For the generalization, fuzzy set theory and possibility
theory are used. Moreover, with the presented work, it is shown how Zadehs
theory on fuzzy information granulation and generalized constraints (Zadeh,
1996, 1997) can be applied within the context of a database model.
The underlying logic of the database model is many valued and uses so-called
extended possibilistic truth values (de Tr, 2002), which are obtained by
considering the three truth values true, false, and undefined and
adding possibilistic uncertainty. This logic allows for a more epistemological
modeling of truth and, moreover, can explicitly handle those cases where some
of the data are not applicable.
The remainder of the chapter is organized as follows. In the next section, an
overview of different approaches in fuzzy object-oriented database modeling is
given. Furthermore, some preliminary concepts and definitions are introduced.
In the section entitled, Types and Type System, a type system, which supports
the formal definition of all data types defined in the database model, is presented.
These data types are compliant with the ODMG data model, as far as their crisp
counterparts are considered. In Constraints and Constraint System, a con-
straint system supporting the formalization of constraints is defined. Constraints
are important for defining database semantics and query criteria. In Object
Schemes and Database Schemes, object (scheme) and database (scheme)
definitions are given. The data definition and data manipulation operators are
presented in Database Model. Finally, the achieved results are summarized,
and some ideas for future research are discussed in the concluding section.
Some Preliminaries
Simultaneously with the maturation of object-oriented database models, re-
search on fuzzy object-oriented databases is getting more attention. Nowa-
days, several fuzzy object-oriented database models exist. Based on some of
them, prototypes were already implemented.
Related Work
Among the existing fuzzy object-oriented database models are the following:
the object-centered model of Rossazza et al. (1990, 1997); the object-oriented
model of Tanaka et al. (1991); the similarity-based model of George et al. (1992,
1997); the fuzzy object-oriented data (FOOD) model of Bordogna et al. (1994,
1999, 2000); the fuzzy algebra of Rocacher et al. (1996); the UFO model of Van
Gyseghem (1998); the fuzzy association algebra of Na and Park (1997); the
FIRMS model of Mouaddib et al. (1997); the FOODM model of Marn et al.
(2000, 2001, 2003); and the rough object-oriented database of Beaubouef and
Petry (2002).
The Object-Centered Model of Rossazza et al.
In this model (Rossazza, 1990; Rossazza et al., 1997), all information is contained
in objects that are completely described by a set of attributes. For these objects,
no behavior is defined. Objects with the same attributes are collected in classes
that are organized in class hierarchies. A range of allowed values and a range
of typical values are specified for the attributes. These ranges may be fuzzy.
Various kinds of (graded) inclusion relations can be defined between classes.
The Object-Oriented Model of Tanaka et al.
In this model, fuzziness is considered on both structural and behavioral aspects
of objects (Tanaka, Kobayashi, & Sakanoue, 1991). Attribute values can be
4 de Tr & de Caluwe
fuzzy predicates. Furthermore, fuzziness is considered at the levels of instantiation,
of inheritance, and of the relationships between objects by introducing extra
special classes.
The Similarity-Based Model of George et al.
The capability of this model, to facilitate an enhanced representation of different
types of imprecision, is derived by utilizing a similarity relation to generalize
equality to similarity (George, 1992; George et al., 1997). Similarity permits the
representation of impreciseness in data and impreciseness in inheritance. An
object algebra based on extensions of the five classical operators (union,
difference, product, projection, and selection) is provided.
The Food Model of Bordogna et al.
This model (Bordogna, Lucarella, & Pasi, 1994; Bordogna, Pasi, & Lucarella,
1999) is based on a visualization paradigm that supports the representation of the
data semantics and the direct browsing of the information. It was defined as an
extension of a graph-based object model, in which the database scheme and
instances are represented as directed labeled graphs. A prototype of the model
was implemented (Bordogna, Leporati, Lucarella, & Pasi, 2000).
The Fuzzy Algebra of Rocacher et al.
This algebra (Rocacher & Connan, 1996) is an extension of the so-called
EQUAL-algebra, which is part of the object-oriented database model, Exten-
sible and Natural Common Object Resource (ENCORE) (Shaw & Zdonik,
1990). The extension is based on the ODMG data model (Cattell & Barry, 2000)
and is aimed at the modeling and manipulation of fuzzy data.
The UFO Model of Van Gyseghem
This model (Van Gyseghem, 1998) was an attempt to extend an object-oriented
database model as generally as possible in order to be able to deal with fuzziness
as well as with uncertainty. Different model levels were extended (attributes,
methods, objects, classes, inheritance, instantiation, etc.).
The Fuzzy Association Algebra of Na and Park
In this approach (Na & Park, 1997), a fuzzy object-oriented data model was built
by means of fuzzy classes and fuzzy associations. Fuzzy databases are repre-
sented by a fuzzy schema graph at the schema level and a fuzzy object graph at
the object instance level. Data manipulation is handled by means of a fuzzy
association algebra, which consists of operators that can operate on the fuzzy
association patterns of homogeneous and heterogeneous structures. As the
result of these operators, truth values are returned with the patterns.
The FIRMS Model of Mouaddib et al.
This model (Mouaddib & Subtil, 1997) can deal with fuzzy, uncertain, and
incomplete information. At the basis of the model are the concepts nuanced
value and nuanced domain. Furthermore, a fuzzy thesaurus is used to restrict
the allowed domain values of discrete attributes. A Chomsky grammar is used
to generate the characteristic membership functions of the thesaurus terms. In
the FIRMS model, no class hierarchies are supported.
The FOODM Model of Marn et al.
This model (Marn, Pons, & Vila, 2000; Blanco, Marn, Pons, & Vila, 2001)
shows how different sources of vagueness can be managed over a regular
object-oriented database model. It is founded on the concept of fuzzy type,
where properties are ranked in different levels of precision according to their
relationships with the type. Objects are created using -cuts of their fuzzy types.
An architecture of a prototype implementation of the model was presented in the
literature (Berzal, Marn, Pons, & Vila, 2003).
The Rough Object-Oriented Database of Beaubouef and Petry
In this approach (Beaubouef & Petry, 2002), the indiscernibility relation and
approximation regions of rough set theory are used to incorporate uncertainty
and vagueness into the database model.
The majority of these models do not conform to a single underlying object data
model, as a logical consequence of the present lack of (formal) object standards.
The ODMG proposal (Cattell & Barry, 2000) offers some perspectives.
However, it still suffers from some shortcomings, such as the absence of formal
6 de Tr & de Caluwe
semantics (Kim, 1994; Alagi , 1997) and its limited ability to deal with con-
straints, despite the fact that a thorough support of constraints is the most obvious
way to define the semantics of a database (Kuper, Libkin, & Paredaens, 2000;
de Tr & de Caluwe, 2000).
The presented fuzzy object-oriented database model is consistent with the
ODMG data model (as far as its crisp components are considered) and,
moreover, deals with constraints. Zadehs generalized constraints (Zadeh, 1997)
were integrated in the framework and allow for a general, extensible definition
of the semantics and integrity of the data and of the query criteria. Furthermore,
a logic based on extended possibilistic truth values is used to be able to explicitly
cope with missing information.
Generalized Constraints
The concept of generalized constraint was introduced by L. A. Zadeh (Zadeh,
1986, 1997) as the basis for a computational approach to meaning and knowledge
representation. The introduction of this concept was motivated by the fact that
conventional crisp constraints of the form X C, where X is a variable and C is
a set, are insufficient to represent the meaning of perceptions.
A generalized constraint is, in effect, a family of constraints and can be seen as
a generalization of an assignment statement (Zadeh, 1997).
Definition 1 (Generalized constraint): An unconditional generalized con-
straint on a variable X is defined by:
X isr R
where R is the constraining relation, and isr is a variable copula in
which the discrete-valued variable r defines the way in which R con-
strains X.
As specified in (Zadeh, 2002), the principal constraints are the following:
Equality constraint: r = e, i.e., X ise R. X equals R.
Possibilistic constraint: r = blank, i.e., X is R. R is the possibility
distribution of X (Zadeh, 1978; Dubois & Prade, 1988). For example, the
possibilistic constraint car A is expensive, on the price variable of car A,
in which expensive is a disjunctive fuzzy set with membership function
expensive
, denotes that
X
(x) =
expensive
(x), where x is a numerical value of
price and
X
(x) is the possibility that the price of car A is x.
Veristic constraint: r = v, i.e., X isv R. R is the verity distribution of X
(Zadeh, 1999). For example, the veristic constraint car A isv {(blue,0.1),
(white,1)}, in which {(blue,0.1), (white,1)} is a conjunctive fuzzy set,
denotes that the verity of the proposition car A is blue is 0.1, and the verity
of the proposition car A is white is 1. This expresses that car A is almost
white, but at same time also has some blue parts.
Probabilistic constraint: r = p, i.e., X isp R. R is the probability
distribution of X. For example, the probabilistic constraint consumption of
car A isp N(8,1.5) means that the consumption of car A is a normally
distributed random variable with mean 8 and variance 1.5.
Probability-value constraint: r = pv, i.e., X ispv R. X is the probability
of a fuzzy event (Zadeh, 1968), and R is its value. For example, the
proposition it is likely that car A is expensive can be modeled by the
probability-value constraint Prob(car A is expensive) ispv likely in which
likely is a fuzzy probability.
Random set constraint: r = rs, i.e., X isrs R. R is the fuzzy-set-valued
probability distribution of X. For example, if the price of car A is uncertain,
and the potential price values are modeled by the fuzzy sets around 4.000
USD, almost 5.000 USD, and more than 6.000 USD, with respec-
tive probabilities 0.5, 0.2, and 0.3, this can be expressed by the random-set
constraint car A isrs (0.5\around 4.000 USD + 0.2\almost 5.000 USD
+ 0.3\more than 6.000 USD).
Fuzzy graph constraint: r = fg, i.e., X isfg R. X is a function, and R is its
fuzzy graph (Zadeh, 1997). For example, if X is a function expressing the
relationship between speed and stopping distances of cars, and X is
approximated by the fuzzy graph f* = low short + average rather long
+ high very long, this can be expressed by the fuzzy graph constraint X
isfg f*.
Usuality constraint: r = u, i.e., X isu R. This means that usually X is R.
A usuality constraint is a special case of a probability-value constraint. For
example, the usuality constraint Mercedes isu expensive should be
interpreted as an abbreviation of Prob(Mercedes is expensive) ispv
usually.
8 de Tr & de Caluwe
Extended Possibilistic Truth Values
The concept of extended possibilistic truth value (EPTV) (de Tr, 2002) is an
extension of the concept of possibilistic truth value that was originally introduced
in the literature by Prade (1982) and was further developed by De Cooman
(1995, 1999). EPTVs provide an epistemological representation of the truth of
a proposition, which allows us to reflect on our knowledge about the actual truth.
They were specifically designed to deal with those cases in which the truth value
of a proposition is either unknown or undefined.
The truth value of a proposition is unknown if, e.g., some data in the proposition
exist but are not available. For example, the truth value of the proposition the
price of car A is 20.000 USD is unknown if car A is for sale but no information
about its price is given. The truth value of a proposition is undefined if, e.g., the
proposition cannot be evaluated due to the nonapplicability of (some of) its
elements. For example, the truth value of the same proposition the price of car
A is 20.000 USD is considered to be undefined if it is known for sure that car
A is not for sale, in which case it does not make sense to ask for its price (in
the supposition that price information is not applicable to cars that are not
for sale).
Definition 2 (EPTV): With the understanding that P represents the universe
of all propositions, and
~
(I*) denotes the set of all regular, ordinary fuzzy
sets (hereby excluding the empty fuzzy set) that can be defined over the
universal set I* = {T,F,} of truth values (where T represents true,
F represents false, and represents an undefined truth value), the
EPTV t
~*
(p) of a proposition p P is formally defined by means of a
mapping:
t
~*
:P
~
(I*):p t
~
*(p)
that associates with each p P a fuzzy set t
~
*(p) = {(T,
t*(p)
(T)), (F,
t*(p)
(F)),
(,
t*(p)
())}.
The semantics of this associated fuzzy set is defined in terms of a possibility
distribution. With the understanding that t
*
:P I* is the mapping function
that associates the value T with p if p is true, that associates the value F with
p if p is false, and that associates the value with p if (some of) the elements
of p are not applicable, undefined, or not supplied, this means that:
x I*:
t*(p)
(x) =
t*(p)
(x)
where
t*(p)
(x) denotes the possibility that the value of t
*
(p) conforms to x,
and
t*(p)
(x) is the membership grade of x within the fuzzy set t
~
*(p).
Special cases of EPTVs are as follows:
As an example, consider the modeling of an unknown truth value by the
possibility distribution {(T,1), (F,1)}, which denotes that it is completely possible
that the proposition is true (T), but it is also completely possible that the
proposition is false (F).
New propositions can be constructed from existing propositions, using so-called
logical operators that have definitions based on the operators of a strong three-
valued Kleene logic (Resher, 1969). An unary operator is provided for the
negation of a proposition. Binary operators , , , and are provided,
respectively, for the conjunction, disjunction, implication, and equivalence of
propositions. The arithmetic rules to calculate the EPTV of a composite
proposition and the algebraic properties of extended possibilistic truth values are
presented in de Tr (2002).
As illustrated in the literature (de Tr & de Caluwe, 2003), EPTVs can be used
to express query satisfaction in flexible database querying. Every object o in the
result set of a (flexible) query Q was assigned a calculated EPTV t
~
*(o satisfies
Q), where the membership grades of T, F, and , denote, respectively, the
possibility that o satisfies Q, the possibility that o does not satisfy Q, and the
possibility that Q is not (fully) applicable to o.
t
~
*(p) Interpretation
{(T,1)} p is true
{(F,1)} p is false
{(T,1), (F,1)} p is unknown
{(,1)} p is undefined
{(T,1), (F,1), (,1)} p is unknown or undefined
10 de Tr & de Caluwe
Types and Type System
The common characteristics of a data collection can be described by means of
a type. For this reason, most database models, including the model presented in
this chapter, support some type notion.
Definition of Types
In order to give a complete definition of the concept of type, it is necessary to
provide the rules that define its syntax, as well as the rules that define its
semantics.
Definition 3 (Type): Each type supported by the type system is defined by
its syntax and its semantics.
The syntax of a type. The syntax rules for a type can be formally
described by means of some mathematical expressions.
The semantics of a type. The semantic definition of a type t can be fully
determined by:
A set of domains D
t
A designated domain dom
t
D
t
A set of operators O
t
A set of axioms A
t
The designated domain dom
t
defines the set of valid values for the type and is
called the domain of the type. In order to deal with cases where a regular domain
value does not apply, the assumption was made that every domain dom
t
contains
a special, domain-specific value
t
, which is used to represent undefined
domain values. The set of operators O
t
contains the operators, which are defined
on the domain dom
t
. The set of domains D
t
consists of the domains that are
involved in the definition of the operators of O
t
, whereas the set of axioms A
t
consists of the axioms that are involved in the definition of the semantics of the
operators of O
t
.
Type System
In order to define the types supported by the presented database model, a type
system (Lausen & Vossen, 1998) was built. The presented type system is
consistent with the specifications of the ODMG object model (Cattell & Barry,
2000). To guarantee this consistency, a distinction was made between a so-
called void type (which is the most primitive type of the system), literal types,
object types, and reference types (which are new with respect to the ODMG
model). Reference types enable us to refer to the instances of object types and
are used to formalize the binary relationships between the object types in a
database scheme.
Each type supported by the type system is formally defined as prescribed by
Definition 3. The syntax rules for the types of the presented type system are
defined as in Definition 4.
Definition 4 (Types: syntax rules): Let ID denote the set of valid identifi-
ers, and let the sets of type expressions that satisfy the syntax of a reference
type, a literal type, and an object type be denoted, respectively, as T
reference
,
T
literal
, and T
object
, where:
The set T
reference
is defined by:
T
reference
T
single_ref
T
multi_ref
where
T
single_ref
{Ref (t)|t T
object
}
and
T
multi_ref
{Set
Ref
(t), Bag
Ref
(t), List
Ref
(t)|t T
object
}
Type t is called the (most) significant type of the reference type.
The set T
literal
is defined by induction as follows:
Basic types:
T
basic
{Integer, Real, Boolean, Octet, String} T
literal
Collection types:
T
collect
{Set(t), Bag(t), List(t), Array(t), Dict(t,t) | t,t T
literal
} T
literal
Type t is called the significant type of the collection type. In the case of
nested collection types, the significant type of the innermost collection type
is called the most significant type of the collection type.
Enumeration types:
T
enum
{Enum id (id
1
,id
2
,,id
n
|{id,id
1
,id
2
,,id
n
} ID} T
literal
The identifier id identifies the enumeration type, whereas (id
1
,id
2
,,id
n
)
represents the ordered sequence of identifiers that is described by the type.
Structured types:
T
struct
{Struct id (id
1
isr
1
t
1
; id
2
isr
2
t
2
;; id
n
isr
n
t
n
) | ({id,id
1
,id
2
,,id
n
} ID)
[ 1 i n: (isr
i
{ise,is,isv}) (t
i
T
literal
T
reference
)]} T
literal
Hereby, id identifies the structured type, whereas (id
1
isr
1
t
1
; id
2
isr
2
t
2
;;
id
n
isr
n
t
n
) represents the components of the structured type. Each compo-
nent id
i
isr
i
t
i
, 1 i n is a (generic) generalized constraint on a variable
id
i
with associated type t
i
T
literal
T
reference
.
If isr
i
= ise, the valid values of id
i
are restricted to the values of the
domain dom
ti
of the associated type t
i
.
If isr
i
= is, id
i
is interpreted as a disjunctive (possibilistic) variable,
with valid values that are restricted as follows:
If t
i
T
collect
T
multi_ref
, the valid values are restricted to fuzzy sets
that are defined over domain dom
ti
i
.
If t
i
T
collect
T
multi_ref
, the valid values are restricted to collections
of fuzzy sets that are defined over the domain dom
t i
of the most
significant type t
i
of type t
i
.
The membership grades of all fuzzy sets are interpreted as degrees of
possibility.
If isr
i
= isv, id
i
is interpreted as a conjunctive (veristic) variable, with
valid values that are restricted as follows:
If t
i
T
collect
T
multi_ref
, the valid values are restricted to fuzzy sets
that are defined over domain dom
ti
i
.
If t
i
T
collect
T
multi_ref
, the valid values are restricted to collections
of fuzzy sets that are defined over the domain dom
t i
of the most
significant type t
i
of type t
i
.
The membership grades of these fuzzy sets are interpreted as degrees
of verity.
The set T
object
is defined by the following:
Let V
signat
denote the set of all valid operator signatures, which is defined
as follows:
t' T
literal
T
reference
{Void}: Signat (( ) t' ) V
signat
t' T
literal
T
reference
{Void}, {id'
1
,id'
2
,...,id'
p
} ID,
isr
i
{ise,is,isv}, t'
i
T
literal
T
reference
, 1 i p:
Signat ((id'
1
isr
1
t'
1
;id'
2
isr
2
t'
2
;;id'
p
isr
p
t'
p
) t' ) V
signat
Hereby, Void denotes the void type, which is used in situations where a
further type specification could not be given (Cattell & Barry, 2000).
Furthermore, t' is the type of the returned value(s) of the operator, and id'
i
isr
i
t'
i
, 1 i p are the input parameters of the operator. Each input
parameter is a (generic) generalized constraint on a variable id'
i
with
associated type t'
i
T
literal
T
reference
. These generalized constraints are
interpreted as specified previously.
If id ID, {id
1
, id
2
,, id
m
} ID \ {id}, {id
1
,id
2
,,id
n
} ID, isr
i
{ise,is,isv} and s
i
T
literal
T
reference
V
signat
, 1 i n, then:
Class id (id
1
isr
1
s
1
;id
2
isr
2
s
2
;;id
n
isr
n
s
n
) T
object
Class id : id
1
, id
2
,, id
m
( ) T
object
Class id : id
1
, id
2
,, id
m
(id
1
isr
1
s
1
;id
2
isr
2
s
2
;;id
n
isr
n
s
n
) T
object
The identifier id identifies the object type. Like many object models, the
ODMG Object Model includes an inheritance-based type-subtype hierar-
chy. The identifiers id
i
, 1 i m denote the supertypes of the object type
(if existent). The characteristics
1
of the object type are represented by (id
1
isr
1
s
1
;id
2
isr
2
s
2
;;id
n
isr
n
s
n
).
Each characteristic id
i
isr
i
s
i
, 1 i n is a (generic) generalized constraint
on a variable id
i
with associated specification s
i
T
literal
T
reference
V
signat
.
The semantics of the generalized constraints are the same as specified
previously. If s
i
T
literal
, the characteristic is called an attribute; if s
i
T
reference
,
the characteristic is called a binary relationship; whereas if s
i
V
signat
, the
characteristic is a method. The generalized constraint puts a restriction on
the return values of the operator. In addition to the characteristics stated
in its type specification, an object type inherits the characteristics of its
supertypes (if existent).
Then, the set T of all type expressions is defined by the following:
T {Void} T
reference
T
literal
T
object
Furthermore, the full semantics of the types t T (cf. Definition 4) are defined
by providing an appropriate definition for the set of domains D
t
, the domain dom
t
of the type, the set of operators O
t
and the set of axioms A
t
. Below, some informal
descriptions are given:
Void type. The domain of the Void type is, by definition, {
Void
}. Its
corresponding set of operators is the singleton {: dom
Void
} consisting
of the bottom operator , which always results in an undefined domain
value (represented by the symbol
Void
).
Reference types. The reference types are all generic types, designated
by a type generator and an object type parameter. Reference types were
introduced in order to formalize binary association relationships between
object types. An association relationship between two object types has a
one-to-one, a one-to-many, or a many-to-many cardinality, which
denotes the maximum number of participating domain values of both types.
To support the notion of cardinality, a distinction was made between single-
valued and multivalued reference types. Multivalued reference types are
subdivided into set-of-references, bag-of-references, and list-of-
references, in order to formalize the different ODMG definitions of one-
to-many and many-to-many relationships (Cattell & Barry, 2000).
Single-valued reference types are denoted by the type generator
Ref and an object-type parameter t T
object
. The domain of the single-
valued reference type Ref (t) consists of the undefined domain value
Ref(t)
and of references to regular elements (objects) of dom
t
. The
associated set of operators consists of the operators =, , dereference,
and . For example, with TPerson being the identifier of an object type
that is used to represent information about persons, Ref(TPerson) is a
single-valued reference type that allows reference to be made to a
single-person object.
Multivalued reference types include set-of-references, bag-of-
references, and list-of-references, and are denoted, respectively,
by the type generators Set
Ref
, Bag
Ref
, and List
Ref
and by an object-type
parameter t T
object
. The domain of type Set
Ref
(t) [resp. Bag
Ref
(t) and
List
Ref
(t)] consists of the undefined domain value
Set_Ref(t)
[resp.
Bag_Ref(t)
and
List_Ref(t)
] and of sets (resp. bags and lists) of references
to regular elements of dom
t
. Furthermore, the types Set
Ref
(t) [resp.
Bag
Ref
(t) and List
Ref
(t)] have the same semantics as a corresponding
collection type that should be defined over the single-valued reference
type t (cf. description of collection types). For example, with TPerson
being the identifier of an object type, Set
Ref
(TPerson) is a multivalued
reference type with domain values that are all sets of references to
single-person objects.
Basic types. The definition of the basic types is straightforward. Each
basic type has a domain that consists of simple, noncomposite, values. Its
corresponding set of operators consists of the usual operators defined over
its domain. For example, the domain of the Integer type consists of the
integer numbers and of the undefined value
Integer
. The set of operators
O
Integer
consists of the operators =, , <, >, , , +, -, *, div, mod, and the
bottom operator , which always results in an undefined domain value.
Collection types. The collection types are all generic types, designated by
a type generator and one or two type parameters, e.g., the bag types are
denoted by the type generator Bag and a type parameter t. The domain of
the bag type Bag(t) consists of the undefined domain value
Bag(t)
and of
unordered collections of elements of the domain of type t, in which
duplicates are allowed. The associated set of operators consists of =, ,
cardinality, is_empty, count, +, , , \, is_element, and . For example,
the collection type Set(Integer) is used to model sets of integer numbers,
whereas the collection type Bag(Real) is used to model bags of real
numbers.
Enumeration types. The domain of an enumeration type Enum id
(id
1
,id
2
,,id
n
) consists of the undefined domain value
id
and of the
identifiers id
1
,id
2
,,id
n
. Its corresponding set of operators consists of =, ,
<, >, , , and . For example, the enumeration type Enum TLang (French,
Dutch, German) defines the set of enumeration constants {French,
Dutch, German} and represents the official languages spoken by people
in Belgium.
Structured types. The domain of a structured type Struct id (id
1
isr
1
t
1
;id
2
isr
2
t
2
;;id
n
isr
n
t
n
) contains the undefined domain value
id
. All other domain
values are composite and consist of n values id
i
isr'
i
v
i
, with isr'
i
{ise,is},
i = 1,2,,n. Each value in the composition is, in turn, described by a
generalized constraint, for which the semantics are as follows:
If isr'
i
= ise, the value for component id
i
equals v
i
.
If isr'
i
= is, the value for component id
i
is uncertain and is described by
possibility distribution v
i
.
For example, the structured type:
Struct TCompany (
Name ise String; #_Employees is Integer; Company_language isv TLang)
describes a simple representation for companies where the Name ise
String component denotes the companys name, the #_Employees is
Integer component is used to model the number of people employed by the
company, and the Company_language isv TLang component models
the main language(s) used in the company.
By combining the generalized constraint of the type specification, denoted
by the copula isr
i
, with the generalized constraint of the domain value,
denoted by the copula isr'
i
, we obtain the following interpretations for the
values v
i
, i = 1,2,,n:
isr
i
= ise
If isr'
i
= ise, then v
i
dom
ti
. The value of id
i
is crisply described.
For example, the value Name ise My_company is a valid
value for the Name ise String component of TCompany and
denotes that the name of the represented company is certain and
equals My_company.
If isr'
i
= is, then v
i

~
(dom
ti
), in which
~
(dom
ti
) denotes the
fuzzy power set of the domain dom
ti
i
. The
value of id
i
is uncertain. All candidate values are crisply
described. For example, the value Name is {(My_companyA,1),
(My_companyB,0.4)} is a valid value for the Name ise
String component of TCompany. It denotes that the name of
the represented company is uncertain and is represented by the
possibility distribution equal to {(My_companyA,1),
(My_companyB, 0.4)}, which denotes that it is completely
possible that the name of the company is My_companyA, and
it is less possible that the name is My_companyB.
isr
i
= is
isr'
i
= ise
If t
i
T
collect
T
multi_ref
, then v
i

~
(dom
ti
). The value of id
i
is vague or imprecise. For example, the value #_Employees
ise About_2000, where About_2000 is a possibility distri-
bution defined over the set of integer values, is a valid value
for the #_Employees is Integer component of TCompany
and denotes that there are about 2000 employees in the
considered company.
If t
i
T
collect
T
multi_ref
, then v
i
is a collection of vague or
imprecise values, all specified by fuzzy sets over the domain
dom
t'i
of the most significant type t'
i
of t
i
. For example,
consider a component Ages_of_children is Set(Integer)
that is used to represent the ages of the children of a person.
Then, Ages_of_children ise Set(Around_6, Teenager)
might be the value for a person with two children, the
youngest being around six years old, the other being a
teenager.
isr'
i
= is
If t
i
T
collect
T
multi_ref
, then v
i

~
(
~
(dom
ti
)) in which
~
(
~
(dom
ti
)) denotes the set of all Level 2 fuzzy sets that
can be defined over dom
ti
(Gottwald, 1979). The value of id
i
is uncertain, what is described by the membership grades in
the outer-level fuzzy set. Candidate values can be fuzzy or
imprecise, what is described by the inner-level fuzzy sets
(de Tr & de Caluwe, 2003a). For example, the value
#_Employees is {(About_2000,1), (About_4000,1)} de-
notes that there are possibly about 2000 or possibly about
4000 employees in the considered company.
If t
i
T
collect
T
multi_ref
, then v
i
is uncertain and is a fuzzy set
of collections of vague or imprecise values, which, in turn,
are all specified by fuzzy sets over the domain dom
t'i
of the
most significant type t'
i
of t
i
. For example, the value
Ages_of_children ise {(Set(Around_6,Teenager),1),
(Set(Around_6,Around_22),0.4)} denotes that the young-
est child is around 6 years old, but the other child is either a
teenager, or less possibly around 22 years old.
isr
i
= isv
isr'
i
= ise
If t
i
T
collect
T
multi_ref
, then v
i

~
(dom
ti
). The value of id
i
is veristic. For example, a value Company_language ise
{(Dutch,1),(French,0.6)} for the Company_language
isv TLang component of TCompany denotes that the main
languages used in the company are Dutch and French, of
which Dutch is mostly used.
If t
i
T
collect
T
multi_ref
, then v
i
is a collection of veristic
values, all specified by fuzzy sets over the domain dom
t'i
of
the most significant type t'
i
of t
i
.
isr'
i
= is
If t
i
T
collect
T
multi_ref
, then v
i

~
(
~
(dom
ti
)), in which
~
(
~
(dom
ti
)) denotes the set of all Level 2 fuzzy sets that
can be defined over dom
ti
. The value of id
i
is uncertain, what
is described by the membership grades in the outer-level
fuzzy set. Candidate values are veristic, what is described by
the inner-level fuzzy sets. For example, a value
Company_language ise {({(Dutch,1),(French,0.6)},1),
({(German,1)},0.2)} denotes that it is uncertain whether
the main languages of the company are Dutch and French (in
which case, Dutch is mostly used) or German.
If t
i
T
collect
T
multi_ref
, then v
i
is uncertain and is a fuzzy set
of collections of veristic values, which, in turn, are specified
by fuzzy sets over the domain dom
t'i
of the most significant
type t'
i
of t
i
.
The associated set of operators consists of =, , . (period member operator),
set_component, get_component, and . In order to deal with values that are
represented by fuzzy sets or Level 2 fuzzy sets, the operators of the sets O
ti
,
i = 1,2,,n, are extended with the following:
Operators that are extensions of the original operators in O
ti
and are
obtained by applying Zadehs extension principle (Zadeh, 1975) one
time (for fuzzy sets) or two consecutive times (for Level 2 fuzzy sets)
(de Tr & de Caluwe, 2003a). Due to this principle, almost every
classical mathematical concept and structure based on (binary) logic
and set theory can be fuzzified. Consider the ordinary sets U
1
,U
2
,,U
n
and Y and a mapping R from U
1
U
2
U
n
to Y. The extension
principle of Zadeh defines the extended mapping R
~
of R as:
R
~
:
~
(U
1
)
~
(U
2
)
~
(U
n
)
~
(Y)
V
1
~
, V
2
~
,, V
n
~
R
~
(V
1
~
, V
2
~
,, V
n
~
)
with R
~
(V
1
~
, V
2
~
,, V
n
~
) being defined as
R
~
(V
1
~
, V
2
~
,, V
n
~
): Y [0,1]
y sup
R(x1,x2,...,xn) = y
min (
V
~
1
(x
1
),
V
~
2
(x
2
),...,
V
~
n
(x
n
))
In the type system, fuzzified operators are defined using polymor-
phism and operator overloading, which allows a different meaning to
be assigned to operators in different contexts. Operators then vary
depending on whether their parameters are ordinary values, fuzzy sets,
or Level 2 fuzzy sets.
Operators intended for the handling of fuzzy sets and of Level 2 fuzzy
sets. Examples include the operators =, , , co, normalize, support,
core, -cut,
-cut, and (where (F,x) returns the membership

grade of element x within fuzzy set F). Each other operator preserves
its usual semantics.
Object types. The object types are the most elaborated types of the type
system. Each object type is characterized by a number of properties (which
describe its structure) and a number of explicitly defined operators, also
called methods (which describe its behavior).
As specified in Definition 4, a property is either an attribute or a binary
relationship. In order to define the binary relationships between object
types, a partial association relation is defined over the set T
object
. (id
1

id
2
denotes that object type id
1
is binary related to object type id
2
.)
An object type can inherit properties and methods from its parent types
(Taivalsari, 1996). In order to define the inheritance-based type-subtype
relationships between object types, a partial ordering relation < is defined
over the set T
object
. (id < id denotes that object type id inherits all
characteristics of object type id.)
The domain of an object type id contains the undefined domain value
id
and the undefined domain values
id
of the parent types id of type id. Each
other domain value is composite and contains a value id
i
isr'
i
v
i
, with
isr'
i
{ise,is}, for each of the (inherited) properties id
i
isr
i
s
i
, s
i
T
literal
T
reference
of the type. Each value in the composition is, in turn, described
by a generalized constraint, for which the semantics are the same as that
explained with the structured types. The set of operators associated with
a given object type is the union of a set of implicitly defined operators and
a set of explicitly defined operators. The implicitly defined operators are =,
, . (period member operator), set_property, get_property, and . The
explicitly defined operators are the (inherited) methods id
i
isr
i
s
i
, s
i
V
signat
of the object type.
The type system TS, which defines all the valid types supported by the presented
database model, is defined by the following definition.
Definition 5 (Type system): The type system TS is defined by the qua-
druple:
TS [ID,T,,<]
where
ID is the set of the valid identifiers
T is the set of valid types (cf. Definition 4)
: T
object
T
object
{True,False} is the partial relation, which is used
to define the binary association relationships between object types
<: T
object
T
object
{True,False} is the partial ordering relation, which
is used to define the inheritance-based type-subtype relationships
between object types
Example 1: The type system allows for definitions like the following, which are
intended to describe a (simplified) type representing employees. With the
structured types
Struct TAddress (Street ise String;City ise String)
Struct TCompany (Name ise String;Location ise String)
Struct TWorks (Company ise TCompany; Percentage is Real)
and the enumeration type
Enum TLang (French, Dutch, German)
the object types TPerson and TEmployee can be defined by:
Class TPerson ( Name ise String;
Age is Integer;
Address ise TAddress;
Languages isv TLang;
Children ise Set
Ref
(TPerson);
Add_child ise Signat ((New_child ise TPerson) Void) )
and
Class TEmployee:TPerson ( EmployeeID ise String;
Works_for ise Bag(TWorks) )
Instances of Types
The instances of a reference type, a literal type, and an object type are,
respectively, called reference instances, literals, and objects, whereas the Void
type cannot have instances.
Definition 6 (Reference instance): Every reference instance r is defined as
a pair: [t,v] where t T
reference
and v dom
t
.
Definition 7 (Literal): Every literal l is defined as a pair: [t,v] where
t T
literal
and v dom
t
.
Depending on its lifetime, an object can be either transient or persistent.
Definition 8 (Transient object): A transient object o is defined as a triple
[t,v, t
~*
(o is an instance of t)] in which:
t T
object
is the type of the object
v dom
t
is the state of the object
t
~*
(o is an instance of t) is the EPTV that expresses the truth value
of the proposition o is an instance of object type t
Definition 9 (Persistent object): A persistent object o is defined as a
quintuple [oid,N,t,v, t
~*
(o is an instance of t)] in which:
t T
object
is the type of the object
v dom
t
is the state of the object
oid is a unique object identifier
N is a (finite) set of object names
t
~*
(o is an instance of t) is the EPTV that expresses the truth value
of the proposition o is an instance of object type t
The unicity of the object identifier has to be guaranteed over the whole database.
The object identifier oid is used to refer to the (state of the) object. The set of
object names N can be empty.
The set of all the instances of an object type t T
object
is written as V
t
instance
. If
t is a subtype of another object type t, then V
t
instance
V
t
instance
. The extent of
an object type t is written as V
t
extent
and is defined as the set of all the persistent
instances of t within a particular database. Obviously, V
t
extent
V
t
instance
. If t is a
subtype of another object type t, then V
t
extent
V
t
extent
.
Example 2: The instances of the object type TPerson of Example 1 are either
TPerson objects or TEmployee objects (because TEmployee is a subtype of
TPerson). Examples of persistent TPerson objects are as follows:
[oid
1
, { }, TPerson,
( Name ise Ann;
Age ise Around_14;
Address ise (Street ise Cross Street, 12; City ise Ghent);
Languages is {({(Dutch,1)},1), ({(Dutch,1),(French,0.4)},0.8)};
Children ise Set( ) ), {(T,1)}]
and
[oid
2
, { }, TPerson,
( Name ise Tom;
Age is {(Around_16,1), ({(19,1)},1)};
Languages ise {(Dutch,1),(French,0.5),(German,0.7)};
Children ise Set( ) ), {(T,1)}]
An example of a persistent TEmployee object is as follows:
[oid
3
, { }, TEmployee,
( Name ise Joe;
Age ise {(42,1)};
Languages ise {(Dutch,1),(German,0.8),(French,1)};
Children ise Set(oid
1
,oid
2
);
EmployeeID ise ID25;
Works_for ise Bag(( Company ise
( Name ise XYZ;
Location ise Brussels);
Percentage ise {(100,1)})) ), {(T,1)}]
Constraints and Constraint System
Constraints can be formally seen as relations that must be satisfied. With respect
to database systems, constraints are considered to be an important and adequate
means with which to define the semantics of the database (Kuper, Libkin, &
Paredaens, 2000; de Tr & de Caluwe, 2000). For example, if information about
persons is handled, constraints can be used to define the full semantics of the
valid (domain) values for a persons age, height, and weight. Other constraints
can define the valid transitions for a persons salary (e.g., to specify that a salary
cannot decrease) or specify another integrity rule. An instance then belongs to
the database insofar that it satisfies all of its defining constraints.
Constraints can also be used to impose selection criteria for information
retrieval. In this case, every constraint defines a condition for the instances to
belong to the result of the retrieval. Every instance belongs to the result insofar
as it satisfies all the imposed criteria. For example, if someone wants to retrieve
all the persons who are around 20 years old and who live in Paris, two constraints
can be imposed: a constraint that selects all the persons around 20 years old and
a constraint that selects all the persons living in Paris.
Definition of (Specific) Constraints
In order to give a complete definition of a constraint, it is necessary to provide
the rules that define its syntax, as well as the rules that define its semantics.
Definition 10 (Constraint): Each constraint supported by the constraint
system is defined for a set of objects V
instance
and is fully specified by its
syntax and its semantics.
The syntax of a constraint. The syntax rules for a constraint can be
formally described by means of some mathematical expressions.
The semantics of a constraint. The semantic definition of a constraint
c is fully determined by a logical function of the following form:
c: V
instance

~
(I*): o t
~*
(o satisfies c)
that associates an EPTV t
~*
(o satisfies c), with each o V
instance
. The
extra truth value (of EPTVs) is used to model the cases where
constraint c does not (completely) apply to object o (cf. Definition 2).
Definition of the Constraint System
In order to define the constraints supported by the presented database model, a
constraint system was built. Different kinds of constraints are distinguished. A
first distinction is based on whether a constraint is defined for the instances of
one single object type or not (single-type dependent versus multitype dependent).
A second distinction is based on whether or not the entire extent of an object type
is involved in the evaluation of the constraint.
All the constraints supported by the constraint system are formally defined as
specified in Definition 10. Their syntax rules are defined as follows.
Definition 11 (Constraints: syntax rules): Let ID denote the set of valid
identifiers, and let the constraint expressions that satisfy the syntax of the
four distinguished categories be denoted, respectively, as C
i
s
, C
e
s
, C
i
m
, and
C
e
m
, where:
The set C
i
s
consists of single-type dependent constraints that are not
defined with respect to the entire extent of an object type and is defined
as follows:
Not null constraints: If id ID is a path expression
2
that
denotes a property or component
3
of an object type, then:
c
{id}
not_null
[ ] C
i
s
Certainty constraints: If id ID is a path expression that denotes
a property or component of an object type, then
c
{id}
certain
[ ] C
i
s
Value constraints: If id ID is a path expression that denotes a
property or component of an object type t, and e is a logical
expression (resulting in an EPTV), without aggregation opera-
tors, that is defined over the properties and components of t and
its associated types and expresses a restriction for the domain
values of the property or component denoted by id, then
c
{id}
value
[e] C
i
s
Transition constraints: If id ID is a path expression that
denotes a property or component of an object type t, and e is a
logical expression, without aggregation operators, that is defined
over the properties and components of t and its associated types
and expresses a restriction for the transitions between old and
new domain values of the property or component denoted by id
(such transitions occur when the set_property or set_component
operator is applied), then
c
{id}
trans
[e] C
i
s
Aggregate constraints: If t T
object
, and e is a logical expression
with at least one aggregation operator, that is defined over the
properties and components of t and its associated types and
expresses a restriction for the set of instances V
t
instance
of t, then
c
{t}
aggr
[e] C
i
s
The set C
e
s
consists of single-type dependent constraints that are
defined with respect to the entire extent of an object type, and it is
defined as follows:
Key constraints: If t T
object
and {id
1
,id
2
,,id
n
} ID is a finite set
of identifiers of properties of t, then
c
{t}
key
[id
1
,id
2
,,id
n
] C
e
s
The set C
i
m
consists of multitype dependent constraints that are not
defined as follows:
Value constraints: If U = {t
1
,t
2
,,t
n
} T
object
, n > 1, id ID is a
path expression that denotes a property or component of an
object type t U, and e is a logical expression, without aggrega-
tion operators, that is defined over the properties and components
of all types in U and expresses a restriction for the domain values
of the property or component denoted by id, then
c
{id,U}
value
[e] C
i
m
Transition constraints: If U = {t
1
,t
2
,,t
n
} T
object
, n > 1, id ID
is a path expression that denotes a property or component of an
object type t U, and e is a logical expression, without aggrega-
tion operators, that is defined over the properties and components
of all types in U and expresses a restriction for the transitions
between old and new domain values of the property or component
denoted by id, then
c
{id,U}
trans
[e] C
i
m
Aggregate constraints: If U = {t
1
,t
2
,,t
n
} T
object
, n > 1, t U,
and e is a logical expression with at least one aggregation
operator that is defined over the properties and components of all
types in U and expresses a restriction for the set of instances
V
t
instance
of t, then
c
{t,U}
aggr
[e] C
i
m
The set C
e
m
consists of multitype dependent constraints that are
defined as follows:
Uniqueness constraints: If U T
object
and t U, then
c
{t,U}
oid
[ ] C
e
m
and c
{t,U}
name
[ ] C
e
m
Referential constraints: If id ID is a path expression, which
denotes an association relationship of an object type t, then
c
{id}
reference
[ ] C
e
m
If there exists an inverse association relationship in the refer-
enced object type t' and id' ID is the path expression, which
denotes this relationship, then
c
{id,id}
reference
[ ] C
e
m
Then the set C of all constraint expressions is defined by:
C C
i
s
C
e
s
C
i
m
C
e
m
The full semantics of the constraints c C are defined by providing an
appropriate definition for their corresponding logical function (cf. Definition 10).
Below, informal descriptions are given:
Not null constraints. A not null constraint c
{id}
not_null
[ ] excludes the
undefined value
t
from the domain of the type t of the property or
component, which is denoted by the path expression id.
Certainty constraints. A certainty constraint c
{id}
certain
[ ] prevents the
use of the copula is in the allowed values for the property or component,
which is denoted by the path expression id. This implies that all allowed
values have to be described by a generalized constraint id ise v, which
guarantees that no uncertainty exists about the value of property or
component id.
Value constraints. A value constraint c
{id}
value
[e] or c
{id,U}
value
[e] restricts
the domain of the type t of the property or component that is denoted by the
path expression id. This is done by excluding the domain values for which
the expression e evaluates to the EPTV {(F,1)} (i.e., false).
Transition constraints. A transition constraint c
{id}
trans
[e] or c
{id,U}
trans
[e]
prevents the execution of an update of the value of the property or
component that is denoted by the path expression id, in the cases where this
update would result in an evaluation {(F,1)} (false) of the expression e.
Key constraints. A key constraint is used to define a key, i.e., an
irreducible set of one or more properties of an object type with value(s) that
are used together to uniquely identify the persistent instances of the object
type. A key constraint c
{t}
key
[id
1
,id
2
,,id
n
] defines a key for the object type
t that consists of the properties identified by the identifiers id
1
,id
2
,,id
n
.
The constraint guarantees the (irreducibility of the) uniqueness of the
values of these properties over the extent V
t
extent
of type t. Furthermore, the
constraint guarantees that none of these values is undefined.
Aggregate constraints. An aggregate constraint c
{t}
aggr
[e] or c
{t,U}
aggr
[e]
prevents the addition of a new instance to the set of instances V
t
instance
of
type t, in those cases where this addition would result in an evaluation
{(F,1)} (false) of the expression e.
Uniqueness constraints. A uniqueness constraint c
{t,U}
oid
[ ] is used to
guarantee the uniqueness of the object identifiers (oid) of the persistent
instances of type t over the union of the extents of the types of set U.
A uniqueness constraint c
{t,U}
name
[ ] is used to guarantee the uniqueness of
the object names ( N) of the instances of type t over the union of the
extents of the types of set U.
Referential constraints. Referential constraints are used to maintain the
referential integrity of the (binary) association relationships between
objects. A referential constraint c
{id}
reference
[ ] guarantees that all object
identifiers specified in a value of the relationship denoted by the path
expression id exists (are identifiers of objects present in the database).
A referential constraint c
{id,id}
reference
[ ] additionally guarantees that if an
object with identifier oid refers to an object with identifier oid' via its value
for the relationship id, then the object with identifier oid' inversely refers
to the object with identifier oid via its value for the relationship id'.
The constraint system CS, which defines all the valid constraints supported by
the presented database model, is defined by the following:
Definition 12 (Constraint system): The constraint system CS is formally
defined by the triple CS = [ID,E,C] where:
ID is the set of valid identifiers
E is the set of valid expressions
C is the set of valid constraints (cf. Definition 11)
Example 3: With respect to the object types TPerson and TEmployee pre-
sented in Example 1, the following constraints can be considered:
c
1
= c
{TEmployee.EmployeeID}
not_null
[ ]
c
2
= c
{TPerson.Age}
value
[0 TPerson.Age around_120]
c
3
= c
{TEmployee.Works_for.Percentage}
value
[0 TEmployee.Works_for.Percentage 100]
c
4
= c
{TPerson}
key
[TPerson.Name]
c
5
= c
{TPerson,{TPerson,TEmployee}}
oid
[ ]
c
6
= c
{TPerson,{TPerson,TEmployee}}
name
[ ]
c
7
= c
{TEmployee,{TPerson,TEmployee}}
oid
[ ]
c
8
= c
{TEmployee,{TPerson,TEmployee}}
name
[ ]
c
9
= c
{TPerson.Children}
reference
[ ]
Object Schemes and Database Schemes
The definitions of object scheme and database scheme rely on the definitions of
types and constraints.
The Object Scheme and Its Instances
The full semantics of an object are described by its object scheme. This scheme
in fine completely defines the object, now including the definitions of the
specific constraints that apply to it.
Definition 13 (Object scheme): Every object scheme is a quadruple os =
[id,t,M,C
t
] in which:
id ID represents the name of the object scheme
t T
object
is the type of the object scheme
M represents the meaning of the object scheme. M is provided to add
comments, which are usually described in a natural language.
C
t

~
(C
i
s
) is a normalized fuzzy set of constraints, which all have to
be applied onto the objects of type t. The membership grades in C
t
are
interpreted as weights and denote the relative importance of the
constraints with respect to the definition of the object scheme.
The set of all existing object schemes is denoted as OS and is defined as the union
of the set of all the quadruples that satisfy Definition 13 and the singleton {
OS
},
with an element that represents an undefined object scheme.
An instance o of the object type t is defined to be an instance of the object scheme
os = [id,t,M,C
t
], if and only if it satisfies [with an EPTV that differs from
{(F,1)}] all constraints in C
t
and all constraints in the fuzzy sets C
t
of the object
schemes [id,t,M,C
t
] that were defined for the supertypes t of t. By this,
inheritance has an impact on the specific constraints that has to be satisfied.
The set of all the instances of an object scheme os is denoted as V
os
instance
,
whereas the set of all the persistent instances of os is written as V
os
extent
.
Obviously, V
os
instance
V
t
instance
and V
os
extent
V
t
extent
.
Example 4: With the object types TPerson and TEmployee presented in
Example 1 and the constraints c
1
,c
2
,,c
9
presented in Example 3, the following
object schemes can be constructed:
OSPerson = [OSPerson,TPerson,scheme to represent persons,{(c
2
,1)}]
and
OSEmpl oyee = [OSEmpl oyee, TEmpl oyee, scheme t o represent
employees,{(c
1
,1),(c
3
,0.7)}]
The Database Scheme and Its Instances
A database scheme describes the full semantics of the objects stored in a
database.
Definition 14 (Database scheme): Every database scheme ds is a qua-
druple ds = [id,D,M,C
D
] in which:
id ID is the name of the database scheme.
D = {os
1
,os
2
,,os
n
} OS \ {
OS
} is a finite set of object schemes. Each
object scheme in D has a different object type. If an object scheme
os D is defined for an object type t, and t' is a supertype of t or t'
is an object type for which a binary relationship with t has been
defined, then an object scheme os' D has to be defined for t'.
M denotes the meaning of the database scheme.
C
D

~
(C
e
s
C
i
m
C
e
m
) is a normalized fuzzy set of constraints that
impose extra conditions on the instances of the object schemes of D.
The membership grades in C
D
are interpreted as weights and denote
the relative importance of the constraints with respect to the definition
of the database scheme.
For every object scheme os D, uniqueness constraints exist in C
D
that
guarantee the uniqueness of the object identifiers and object names of
the instances of os. Furthermore, every constraint c C
e
s
C
e
m
, for
which
CD
(c) 0, has to be defined over the extent of the type t of an
object scheme os D.
The set of all existing database schemes is denoted as DS and is defined as the
union of the set of all the quadruples that satisfy Definition 14 and the singleton
{
DS
}, with an element that represents an undefined database scheme.
Every persistent instance o of an object scheme os D of a database scheme
ds has to satisfy all the constraints in C
D
, with an EPTV that differs from {(F,1)}.
An instance of a database scheme ds is called a database and is defined as the
set of the extents of all the object schemes of ds. By this definition, every
database is a set of sets of objects.
Example 5: With the object schemes OSPerson and OSEmployee of Example
4 and the constraints c
1
,c
2
,,c
9
presented in Example 3, the following database
scheme can be constructed:
DSEmpl =
[DSEmployee,{OSPerson,OSEmployee}, scheme for an employee database,
{(c
4
,1),(c
5
,1),(c
6
,1),(c
7
,1),(c
8
,1),(c
9
,1)}]
By considering the object identifiers of the persistent objects of Example 2, the
corresponding database can be represented by the following:
{{oid
1
,oid
2
,oid
3
}, {oid
3
}}
Database Model
The database model is finally obtained by extending the formalism with data
definition (DDL) and data manipulation operators (DML).
Data Definition Operators
For data definition purposes, the set of operators O
DDL
model
was introduced.
Definition 15 (Data definition operators):
O
DDL
model
= {create_DB, drop_DB, create_OS, drop_OS, add_Char, drop_Char,
add_OSC, drop_OSC, add_DBC, drop_DBC}
All the operators of O
DDL
model
operate on the set of all database schemes DS:
The operators create_DB and drop_DB are meant to create and remove
a database and its database scheme.
The operators create_OS and drop_OS, respectively, allow an object
scheme in a given database scheme to be created and an object scheme
from a given database scheme to be removed.
The operators add_Char and drop_Char are meant to add and drop a
characteristic, i.e., a property or a method, in the object type of a given
object scheme in a given database scheme.
The operators add_OSC and drop_OSC are used to add and remove a
weighted constraint to or from a given object scheme in a given database
scheme.
The operators add_DBC and drop_DBC are meant to add and remove a
weighted constraint to or from a given database scheme.
Data Manipulation Operators
The data manipulation operators provide a facility for inserting, deleting,
updating, and querying (database) objects. They operate on sets of instances
associated with an object scheme and result in a new object scheme with a new
associated set of instances. This way, every data manipulation operator can
operate on the result of every data manipulation operator. This principle of
compositionality guarantees the closure property of the algebra.
The set of data manipulation operators is denoted as O
DML
model
and is defined by
Definition 16.
Definition 16 (Data manipulation operators):
O
DML
model
= {, , \, , , , , , make_transient, make_persistent}
These operators act as follows:
Union, intersection, and difference (, , and \): The binary operators
union, intersection, and difference are only defined for object schemes that
are scheme compatible, i.e., object schemes as follows:
The types of both schemes have the same (inherited) characteristics
and the associated fuzzy sets of constraints of both schemes are equal.
The types of both schemes are subtypes of a common ancestor type.
The type of one object scheme is a subtype of the type of the other
object scheme.
With the scheme-compatible object schemes os
1
= [id
1
,t
1
,M
1
,C
t1
] and
os
2
= [id
2
,t
2
,M
2
,C
t2
] as arguments, the operation (os
1
,os
2
) [resp. (os
1
,os
2
)
and \(os
1
,os
2
)] results in a new object scheme:
(os
1
,os
2
) = os' = [id',t',M',]
where
The object type t' inherits all common characteristics of the types t
1
and
t
2
, i.e., t' inherits from the supertype or from the common ancestor
type, and has no specific characteristics of its own.
The fuzzy set of specific constraints C
t'
is empty, but, as a result of
inheritance, all constraints that were defined for the inherited charac-
teristics remain valid and must hold.
The set of all instances V
os'
instance
of os' is constructed by preserving the objects
for which the state v is in the union (resp. intersection and difference) of the sets
of states of the instances of os
1
and os
2
and by calculating the associated EPTVs
by applying the logical operators
~
,
~
, and
~
for EPTVs (as presented in de
Tr, 2002).
The set of all the persistent instances of os' is defined to be empty, i.e., V
os'
extent
= .
(Cartesian) product (): With the object schemes os
1
= [id
1
,t
1
,M
1
,C
t1
]
and os
2
= [id
2
,t
2
,M
2
,C
t2
], the binary (Cartesian) product operation (os
1
,os
2
)
returns a new object scheme:
(os
1
,os
2
) = os' = [id',t',M',C
t2
]
where
The object type t' is constructed by merging the (inherited) character-
istics of the types t
1
and t
2
of the given object schemes.
t'
consists of all the single-type
dependent constraints (with associated membership grades) that were
defined for the characteristics of type t' and necessarily have to be an
element of C
t1
, C
t2
, or C
t
, with t being an ancestor type of t
1
or t
2
.
os'
instance
is constructed by calculating the Cartesian
product V
os1
instance
V
os2
instance
and merging the states of the objects of the
resulting pairs. The associated EPTVs are calculated by applying the logical
conjunction operator
~
for EPTVs.
V
os'
extent
=
Projection (): This operator is intended to select a number of character-
istics from the (inherited) characteristics of the type of an object scheme
and the (inherited) characteristics of the object types that are binary related
to this type (via the partial association relation ). If {id
1
,id
2
,,id
n
} ID
is the set of the identifiers of the selected characteristics of the type t of a
gi ven obj ect scheme os = [i d, t , M, C
t
], t hen t he operat i on
(os,{id
1
,id
2
,,id
n
}) results in a new object scheme:
(os,{id
1
,id
2
,,id
n
}) = os' = [id',t',M',C
t'
]
where
The object type t' has as characteristics, the characteristics identified
by the identifiers {id
1
,id
2
,,id
n
}.
t'
consists of the single-type
dependent constraints (with associated membership grades) that were
defined for the characteristics with identifiers id
1
,id
2
,,id
n
and
necessarily have to be an element of C
t
or C
t''
with t'' being an ancestor
type of t, a type that is binary related to t, or an ancestor type of a type
that is binary related to t.
os'
instance
is constructed by adapting the state of the
objects of V
os
instance
by keeping only the values for the selected characteristics.
V
os'
extent
=
Extension (): This operator adds a derived property to the type of a
given object scheme. Derived property values are calculated from the
values of other properties and cannot be changed by the user. If
os = [id, t ,M,C
t
] is the given object scheme, id isr s with isr {ise,is,isv}
and s T
literal
T
reference
is the new property, and e E is the expression
that will be evaluated to obtain the values of this property, then the operation
(os,id isr s,e) results in a new object scheme:
(os,id isr s,e) = os' = [id',t',M',C
t'
]
where
Type t' is obtained by adding the extra property id isr s to the
specification of type t of the object scheme os.
The fuzzy set of constraints C
t'
= C
t
.
Because values for derived properties are not stored in the database, the set
of all instances V
os'
instance
equals V
os
instance
.
V
os'
extent
=
Restriction (): This operator allows extra restrictions to be imposed on
the set of instances of an object scheme. This is obtained by extending the
fuzzy set of constraints of the object scheme with an extra single-type
dependent constraint c C
i
s
, which has to be applied onto the objects of
type t. For a given object scheme os = [id,t,M,C
t
] and a given constraint
c C
i
s
with associated weight w, the operation (os,c,w) results in a new
object scheme:
(os,c,w) = os' = [id',t',M',C
t'
]
where
The object type t' = t.
t'
= C
t
{(c,w)} is obtained as the union
of the fuzzy sets C
t
and {(c,w)}.
os'
instance
consists of all instances of V
os
instance
for which
the extra condition that is imposed by constraint c is satisfied [with an EPTV that
differs from {(F,1)}].
V
os'
extent
=
Threshold (): This operator is intended to restrict the set of instances of
a given object scheme by applying a threshold value for each of the
membership grades
t*(o is an instance of t)
(T),
(F), and
() of the EPTVs t*(o is an instance of t)
associated with the instances o of the (type of the) object scheme. For a
given object scheme os = [id,t,M,C
t
] and given threshold values
T
,
F
, and
, the operation (os,

T
,
F
,
) results in a new object scheme:

(os,
T
,
F
,
) = os' = [id',t',M',C
t'
]
where
The object type t' = t.
t'
= C
t
.
os'
instance
consists of all instances o of V
os
instance
for which
the threshold restriction:
[
t
~
*(o is an instance of t)
(T)
T
] [
t
~
(F)
F
]
[
t
~
()
]
is satisfied.
V
os'
extent
=
The operators make_persistent and make_transient: By definition, all the
instances of the resulting set of instances of the previous operators are
transient. Therefore, the operator make_persistent, as well as its counter-
part make_transient, were added in order to make transient objects (of a
given object scheme) persistent, and vice versa.
Definition of the Database Model
Definition 17 (Database model): The database model DM is defined by the
following DM = [TS, CS, OS, DS, O
DDL
model
, O
DML
model
] in which:
TS is the type of system (Definition 5).
CS is the constraint system (Definition 12).
OS represents the set of all the object schemes.
DS represents the set of all the database schemes.
O
DDL
model
is the set of data definition operators (Definition 15).
O
DML
model
is the set of data manipulation operators (Definition 16).
Illustrative Example
As an illustration of the flexible querying facilities of the presented database
model, consider the database scheme DSEmpl as presented in Example 5.
Example 6: With the employee database with database scheme DSEmpl,
consider the query:
Find the names and employee IDs of all young employees that are fluent in
Dutch and French (the criterion young is less important than the criterion
fluent in Dutch and French).
This query can be expressed by the following:
(((OSEmployee, c
{TEmployee.Age}
value
[TEmployee.Age is young], 0.8),
c
{TEmployee.Languages}
value
[(TEmployee.Languages, Dutch) is fluent
(TEmployee.Languages, French) is fluent ], 1),{EmployeeID,Name})
This results in a new object scheme:
OSResult = [OSResult,TResult,Query result, ]
with
Class TResult ( EmployeeID ise String;
Name ise String )
With the understanding that young is defined by the fuzzy set with membership
function:
young
(x) = 1 if 0 x 30
young
(x) = -x/20 + 5/2 if 30 < x < 50
young
(x) = 0 if x 50
and fluent is defined by the fuzzy set with membership function
fluent
(x) = x if 0 x 1
the set of all instances V
OSResult
instance
consists of all instances that satisfy the
query conditions [with an EPTV that differs from {(F,1)}] and equals:
V
OSResult
instance
= {[TResult,(EmployeeID ise ID25; Name ise Joe), {(T,0.4), (F,0.6)}]}
The EPTV {(T,0.4), (F,0.6)} was calculated as follows:
First, the degree of satisfaction of constraint
c
1
= c
{TEmployee.Age}
value
[TEmployee.Age is young]
is calculated. This is done by means of the following formula, (as fully explained
in de Tr & de Baets, 2003):
t*~(A is F)
(T) = sup
xdom A
min (
A
(x),
F
(x))
t*~(A is F)
(F) = sup
xdom A\{}
min (
A
(x), 1-
F
(x))
t*~(A is F)
() = min (
A
(), 1-
F
())
where
A
is the possibility distribution representing the value of attribute A, and
F
is the membership function representing the linguistic term F. Applying the
previous function yields
t*
~
(c
1
) = {(T,0.4), (F,0.6)}
Second, the degree of satisfaction of constraint
c
2
=

c
{TEmployee.Languages}
value
[(TEmployee.Languages, Dutch) is fluent
(TEmployee.Languages, French) is fluent]
is calculated by applying the same formula two times and calculating the
conjunction of both resulting EPTVs, i.e.:
t*
~
((TEmployee.Languages,Dutch) is fluent) = {(T,1)}
t*
~
((TEmployee.Languages,French) is fluent) = {(T,1)}
so that
t*
~
(c
2
) = {(T,1)} {(T,1)} = {(T,1)}
Next, the impact of the importance weights is calculated by applying the residual
implicator f
im
and co-implicator f
im
co
functions (as fully explained in de Tr & de
Baets, 2003), i.e.:
with f
im
being defined by
f
im
:[0,1]
2
[0,1]:(w, ) sup {v|v[0,1] min(w,v) }
and f
im
co
being defined by
f
im
co
:[0,1]
2
[0,1]:(w, ) inf {v|v[0,1] max(w,v) }
the impact g of weight w on EPTV t is calculated by the following:
g(w,t)
(T) = f
im
(w,
t
(T))
g(w,t)
(F) = f
im
co
(1-w,
t
(F))
g(w,t)
() = f
im
co
(1-w,
t
())
and yields
g(0.8, t*
~
(c
1
)) = {(T,0.4), (F,0.6)} and g(1, t*
~
(c
2
)) = {(T,1)}
so that the degree of satisfaction for all constraints imposed by the query yields
{(T,0.4), (F,0.6)} {(T,1)} = {(T,0.4), (F,0.6)}
Note that with the first criterion being much less important, e.g., with weight 0.2,
this result would have been
{(T,1)} {(T,1)} = {(T,1)}
because then
g(0.2, t*
~
(c
1
)) = {(T,1)} and g(1, t*
~
(c
2
)) = {(T,1)}
Furthermore,
V
OSResult
extent
=
Conclusions and Future Trends
With the foregoing definitions, the fundamentals of a mathematical framework
for the definition of a possibilistic, constraint-based object-oriented database
model were presented. This framework is based on an algebraic type system and
a related constraint system, which is meant to define the database semantics.
Central to the proposed database model are the concepts of object schemes and
database schemes. The proposed model is consistent with the ODMG data
model (as far as its crisp components are considered), which is proven to be very
useful in unifying attempts to define object database models and query lan-
guages. The incorporation of constraints allows for a better definition of
database semantics and opens new perspectives to extend the model toward
other formalism, e.g., supporting fuzzy spatio-temporal databases (de Tr, de
Caluwe, Hallez, & Verstraete, 2002).
Typical for the presented approach is the integration and use of Zadehs
generalized constraints and of logic based on extended possibilistic truth values.
This allows for a general and extensible definition of the semantics and integrity
of the data and of the query criteria. As generalized constraints, only the equality
constraint, the possibilistic constraint, and the veristic constraint were integrated
in the presented framework. In future research, the incorporation of other
generalized constraints and the usability and applicability of Zadehs so-called
protoforms, which can be seen as generalizations of generalized constraints,
will be studied.
References
Alagi , S. (1997). The ODMG object model: Does it make sense? ACM
SIGPLAN Notices, 32(10), 253270.
Beaubouef, T., & Petry F. E. (2002). Uncertainty in OODB modeled by rough
sets. In Proceedings of the IPMU 2002 conference (Vol. III, pp. 1697
1703). Annecy, France.
Berzal, F., Marn, N., Pons, O., & Vila, M. A. (2003). FoodBi: Managing fuzzy
object-oriented data on top of the Java platform. In Proceedings of the
10th IFSA World Congress (pp. 384387). Istanbul, Turkey.
Blanco, I., Marn, N., Pons, O., & Vila, M. A. (2001). Softening the object-
oriented database model: Imprecision, uncertainty and fuzzy types. In
Proceedings of the IFSA/NAFIPS World Congress (pp. 23232328).
Vancouver, Canada.
Bordogna, G., & Pasi, G. (eds.). (2000). Recent issues on fuzzy databases.
Heidelberg, Germany: Physica-Verlag.
Bordogna, G., Lucarella, D., & Pasi, G. (1994). A fuzzy object oriented data
model. In Proceedings of the Third IEEE International Conference on
Fuzzy Systems, FUZZ-IEEE94 (pp. 313318). Orlando, FL.
Bordogna, G., Pasi, G., & Lucarella, D. (1999). A fuzzy object-oriented data
model for managing vague and uncertain information. International
Journal of Intelligent Systems, 14(7), 623651.
Bordogna, G., Leporati, A., Lucarella, D., & Pasi, G. (2000). The fuzzy object-
oriented database management system. In G. Bordogna, & G. Pasi (Eds.),
Recent issues on fuzzy databases (pp. 209236). Heidelberg, Germany:
Physica-Verlag.
Cattell, R. G. G., & Barry, D. (eds.). (2000). The object data standard: ODMG
3.0. San Francisco, CA: Morgan Kaufmann Publishers.
de Cooman, G. (1995). Towards a possibilistic logic. In D. Ruan (Ed.), Fuzzy set
theory and advanced mathematical applications (pp. 89133). Boston,
MA: Kluwer Academic Publishers.
de Cooman, G. (1999). From possibilistic information to Kleenes strong multi-
valued logics. In D. Dubois, E. P. Klement, & H. Prade (Eds.), Fuzzy sets,
logics and reasoning about knowledge (pp. 315323). Boston, MA:
Kluwer Academic Publishers.
de Tr, G. (2002). Extended possibilistic truth values. International Journal of
Intelligent Systems, 17, 427446.
de Tr, G., & de Baets, B. (2003). Aggregating constraint satisfaction degrees
expressed by possibilistic truth values. IEEE Transactions on Fuzzy
Systems, 11(3), 361368.
de Tr, G., & de Caluwe, R. (2000). The application of generalized constraints
to object-oriented database models. Mathware and Soft Computing,
VII(23), 245255.
de Tr, G., & de Caluwe, R. (2003). Modelling uncertainty in multimedia
database systems: An extended possibilistic approach. International
Journal of Uncertainty, Fuzziness and Knowledge-Based Systems,
11(1), 522.
de Tr, G., & de Caluwe, R. (2003a). Level-2 fuzzy sets and their usefulness in
object-oriented database modelling. Fuzzy Sets and Systems, 140, 2949.
de Tr, G., de Caluwe, R., & Van der Cruyssen, B. (2000). A generalised object-
oriented database model. In G. Bordogna, & G. Pasi (Eds.), Recent issues
on fuzzy databases (pp. 155182). Heidelberg, Germany: Physica-Verlag.
de Tr, G., de Caluwe, R., Hallez, A., & Verstraete, J. (2002). Fuzzy and
uncertain spatio-temporal database models: A constraint-based approach.
In Proceedings of the Ninth International Conference on Information
Processing and Management of Uncertainty in Knowledge-Based
Systems IPMU 2002 (pp. 17131720). Annecy, France.
Dubois, D., & Prade, H. (1988). Possibility theory. New York: Plenum Press.
Dubois, D., & Prade, H. (1997). The three semantics of fuzzy sets. Fuzzy Sets
and Systems, 90(2), 141150.
George, R. (1992). Uncertainty management issues in the object-oriented
database model. Ph.D. thesis, Tulane University, New Orleans, LA.
George, R., Yazici, A., Petry, F. E., & Buckles, B. P. (1997). Modeling
impreciseness and uncertainty in the object-oriented data model A
similarity-based approach. In R. de Caluwe (Ed.), Fuzzy and uncertain
object-oriented databases: Concepts and models (pp. 6395). Singapore:
World Scientific.
Gottwald, S. (1979). Set theory for fuzzy sets of higher level. Fuzzy Sets and
Systems, 2(2), 125151.
Kim, W. (1994). Observations on the ODMG-93 proposal for an object-oriented
database language. ACM SIGMOD Record, 23(1), 49.
Kuper, G., Libkin, L., & Paredaens, J. (Eds.). (2000). Constraint databases.
Berlin, Germany: Springer-Verlag.
Lausen, G., & Vossen, G. (1998). Models and languages of object-oriented
databases. Harlow, UK: Addison-Wesley.
Marn, N., Pons, O., & Vila, M. A. (2000). Fuzzy types: A new concept of type
for managing vague structures. International Journal of Intelligent
Systems, 15(11), 10611085.
Mouaddib, N., & Subtil, P. (1997). Management of uncertainty and vagueness
in databases: The FIRMS point of view. International Journal of Uncer-
tainty, Fuzziness and Knowledge Based Systems, 5(4), 437457.
Na, S., & Park, S. (1997). Fuzzy object-oriented data model and fuzzy associa-
tion algebra. In R. de Caluwe (Ed.), Fuzzy and uncertain object-oriented
databases: Concepts and models (pp. 187206). Singapore: World
Scientific.
Prade, H. (1982). Possibility sets, fuzzy sets and their relation to Lukasiewicz
logic. In Proceedings of the 12th International Symposium on Multiple-
Valued Logic (pp. 223227).
Rescher, N. (1969). Many-valued logic. New York: McGraw-Hill.
Rocacher, D., & Connan, F. (1996). A fuzzy algebra for object oriented
databases. In Proceedings of the Fourth European Congress on
Intelligent Techniques and Soft Computing, EUFIT96 (Vol. 2, pp. 871
876). Aachen, Germany.
Rossazza, J. -P. (1990). Utilisation de hirarchies de classes floues pour la
reprsentation de connaissances imprcises et sujettes exception: le
systme SORCIER. Ph.D. thesis, Universit Paul Sebatier, Toulouse,
France.
Rossazza, J. -P., Dubois, D., & Prade, H. (1997). A hierarchical model of fuzzy
classes. In R. de Caluwe (Ed.), Fuzzy and uncertain object-oriented
databases: Concepts and models (pp. 2161). Singapore: World Scientific.
Shaw, G. M., & Zdonik, S. B. (1990). A query algebra for object-oriented
databases. In Proceedings of the Sixth International Conference on
Data Engineering, ICDE90 (pp. 154162). Los Angeles, CA.
Taivalsari, A. (1996). On the notion of inheritance. ACM Computing Surveys,
28(3), 438479.
Tanaka, K., Kobayashi, S., & Sakanoue, T. (1991). Uncertainty management in
object-oriented database systems. In D. Karagiannis (Ed.), Proceedings
of the International Conference on Database and Expert System
Applications, DEXA 1991 (pp. 251256). Berlin, Germany: Springer-
Verlag.
Van Gyseghem, N. (1998). Imprecision and uncertainty in the UFO database
model. Journal of the American Society for Information Science, 49(3),
236252.
Zadeh, L. A. (1968). Probability measures of fuzzy events. Journal of Math-
ematical Analysis and Applications, 23, 421427.
Zadeh, L. A. (1975). The concept of linguistic variable and its application to
approximate reasoning (Parts I, II, and III). Information Sciences, 8, 199
251, 301357 ; 9, 4380.
Zadeh, L. A. (1978). Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets
and Systems, 1(1), 328.
Zadeh, L. A. (1986). Outline of a computational approach to meaning and
knowledge representation based on a concept of a generalized assignment
statement. In M. Thoma, & A. Wyner (Eds.), Proceedings of the
International Seminar on Artificial Intelligence and ManMachine
Systems (pp. 198211). Heidelberg, Germany: Springer.
Zadeh, L. A. (1996). Fuzzy logic = Computing with words. IEEE Transactions
on Fuzzy Systems, 4(2), 103111.
Zadeh, L. A. (1997). Toward a theory of fuzzy information granulation and its
centrality in human reasoning and fuzzy logic. Fuzzy Sets and Systems,
90(2), 111127.
Zadeh, L. A. (1999). From computing with numbers to computing with words
from manipulation of measurements to manipulation of perceptions. IEEE
Transactions on Circuit Systems, 45, 105119.
Zadeh, L. A. (2000). Toward a preception-based theory of probabilistic reason-
ing with imprecise probabilities. Journal of Statistical Planning and
Inference, 105, 233264.
Endnotes
1
The term characteristic is used to denote the properties (structure)
i.e., the attributes and relationships and the operators (behavior) of the
object type.
2
Path expressions are an adequate means with which to identify the
components of a structured type or an object type. In this model, every path
expression is defined as an identifier, which is obtained by applying the
period member operator (.) an adequate number of times with the identifiers
of (the components or characteristics of the) type as arguments.
3
Component of an object type is a short notation for component of a
structured type that is associated with an (inherited) attribute of that object
type.
46 Cao & Nguyen
Chapter II
Fuzzy and Probabilistic
Object Bases
T. H. Cao
Ho Chi Minh City University of Technology, Vietnam
H. Nguyen
Ho Chi Minh City Open University, Vietnam
Abstract
Database systems have evolved from relational databases to those integrating
different modeling and computing paradigms, in particular, object
orientation and probabilistic reasoning. This chapter introduces an
extension of the probabilistic object base model by Eiter et al. (2001), using
fuzzy sets for representing and handling vague and imprecise values of
object attributes. A probabilistic interpretation of relations on fuzzy set
values is proposed to integrate them into that probability-based framework.
Then, the definitions of fuzzy-probabilistic object base schemas, instances,
and selection operation are presented. Other algebraic operations, namely,
projection, renaming, Cartesian product, join, intersection, union, and
difference of the probabilistic object base model are also adapted for its
fuzzy extension.
Fuzzy and Probabilistic Object Bases 47
Introduction
For modeling real-world problems and constructing intelligent systems, the
integration of different methodologies and techniques has been the quest and
focus of significant interdisciplinary research effort. The advantages of such a
hybrid system are that the strengths of its partners are combined and are
complementary to each others weaknesses.
In particular, object orientation provides a hierarchical data abstraction scheme
and an information hiding and inheritance mechanism. Meanwhile, probability
theory and fuzzy logic provide measures and rules for representing and reasoning
with uncertainty and imprecision in the real world. Many uncertain and fuzzy
object-oriented models (e.g., George, Buckles, & Petry, 1993; Itzkovich &
Hawkes, 1994; Rossazza, Dubois, & Prade, 1997; Van Gyseghem & De Caluwe,
1997; Bordogna, Pasi, & Lucarella, 1999; Dubitzky et al., 1999; Yazici & George,
1999; Blanco et al., 2001; Cross, 2003) were proposed and developed. However,
only a few of them combine probability theory and fuzzy logic, in order to deal
with both uncertainty and imprecision.
Early works on fuzzy extension of object-oriented models were done by George,
Buckles, and Petry (1993) and Itzkovich and Hawkes (1994), which introduced
inclusion degrees between classes in a hierarchy. An inclusion degree of one
class to another could be computed on the basis of the fuzzy ranges of their
common attributes. For example, Rossazza, Dubois, and Prade (1997) defined
four inclusion degrees, depending on whether necessary ranges or typical ranges
were used for each of the two classes.
Arguing for flexible modeling, Van Gyseghem and De Caluwe (1997) introduced
the notion of fuzzy property as an intermediate between the two extreme notions
of required property and optional property. Each fuzzy property of a class was
associated with possibility degrees of applicability of the property to the class.
Meanwhile, Yazici and George (1999) presented a deductive fuzzy object-
oriented model but did not address uncertain applicability of properties. A general
data model including fuzzy attribute values as well as uncertain properties was
proposed by Bordogna, Pasi, and Lucarella (1999), where the treatment of
uncertainty was, however, based on possibility theory rather than on probability
theory.
As a first attempt to integrate both probabilistic and fuzzy measures into an
object-oriented model, Dubitzky et al. (1999) assumed that each property of a
concept had a probability degree for it occurring in exemplars of that concept.
However, the method therein for computing a membership degree of an object
to a concept, based on matching the objects properties with the uncertainty
applicable properties of the concept, is in our view not justifiable. Also, the work
48 Cao & Nguyen
did not address the problem of how inheritance is performed under the member-
ship and applicability uncertainty.
Recently, Blanco et al. (2001) and De Tr (2001) sketched out general models
to manage different sources of imprecision and uncertainty, including probabi-
listic ones, on various levels of an object-oriented database model. However, no
foundation was laid to integrate probability theory and fuzzy logic, in case
probability was used to represent uncertainty. Later, Cross (2003) reviewed
existing proposals and presented recommendations for the application of fuzzy
set theory in a flexible generalized object model.
Meanwhile, Cao (2001), Cao et al. (2002), and Cao and Rossiter (2003)
introduced a logic-based fuzzy and probabilistic object-oriented model, which
could represent and handle fuzzy attribute values as well as uncertain class
properties. Mass assignment theory (Baldwin, Martin, & Pilsworth, 1995;
Baldwin, Lawry, & Martin, 1996) was employed to compute with fuzzy sets and
probabilities in an integrated framework. Nevertheless, the definition of class
hierarchies in that model was crisp, that is, no uncertainty was considered on
class links.
In another direction, Eiter et al. (2001) developed algebra to handle object bases
with uncertainty, called POBs, where the conditional probability for an object of
a class belonging to one of its subclasses was specified in the class hierarchy of
discourse. Also, for each attribute of an object, uncertainty about its value was
represented by lower-bound and upper-bound probability distribution functions
over a set of values.
However, the major shortcoming of the POB model is that it does not allow vague
and imprecise attribute values. For instance, in the Plant example therein, the
values of the attribute sun are chosen to be only enumerated symbols, such as
mild, medium, and heavy, without any interpretation. Meanwhile, in practice,
those values are inherently vague and imprecise over degrees of sunlight.
Moreover, without an interpretation, they cannot be measured, and their prob-
ability distributions cannot be calculated.
Because fuzzy set theory and fuzzy logic provide a basis for defining the
semantics of, and computing with, linguistic terms (Zadeh, 1978), we apply them
to extend the POB model to allow vague and imprecise attribute values. For
instance, the values mild, medium, and heavy of the attribute sun in the
aforementioned Plant example can be defined by fuzzy sets. Primary results of
this extension were presented by Cao and Nguyen (2002).
In this chapter, the second section presents fundamentals of probability and fuzzy
set theories and, in particular, introduces a probabilistic interpretation of relations
on fuzzy sets to integrate them into the probability-based framework of POBs.
Then, the third, fourth, fifth, and sixth sections present a fuzzy extension and
generalization of the definitions of POB schemas, instances, and algebraic
operations for fuzzy POBs (FPOBs). Finally, the last section concludes the
chapter and suggests further work.
Fundamentals of Probabilities and
Fuzzy Sets
Voting Model of Fuzzy Sets
In this work, for extending the probabilistic model of POBs with fuzzy set values,
we apply the voting model interpretation of fuzzy sets (Gaines, 1978; Baldwin,
Martin, & Pilsworth, 1995). That is, given a fuzzy set A on a domain U, each voter
has a subset of U as his of her own crisp definition of the concept that A
represents. For example, a voter may have the interval [0, 35] representing
human ages from 0 to 35 years as his or her definition of the concept young, while
another voter may have [0, 25] instead.
The membership function value
A
(u) is then the proportion of voters whose crisp
definitions include u. This model defines a mass assignment (i.e., probability
distribution) on the power set of U, where the mass (i.e., probability value)
assigned to a subset of U is the proportion of voters who have that subset as a
crisp definition for the fuzzy concept A. As such, this mass assignment corre-
sponds to a family of probability distributions on U.
Let us take the Dice example given by Baldwin, Martin, and Pilsworth (1995).
Given the dice values from the set {1, 2, 3, 4, 5, 6}, suppose that a score high
is defined by the discrete fuzzy set {3:0.2, 4:0.5, 5:0.9, 6:1}, i.e., the membership
of value 3 is 0.2, and so on. The voting pattern for a group of 10 persons for this
score could be as shown in Table 1.
Table 1. Voting pattern for high dice values
Voters
Scores
P
1
P
2
P
3
P
4
P
5
P
6
P
7
P
8
P
9
P
10

1

2

3
x x
4
x x x x x
5
x x x x x x x x x
6
x x x x x x x x x x
50 Cao & Nguyen
That is, all voters, P
1
to P
10
, vote for value 6 as a high score, while only two of
them, P
1
and P
2
, vote for 3 as a high score, and so on. In other words, the crisp
definition of P
10
for the high score is {6}, while that of P
1
and P
2
is {3, 4, 5, 6},
for instance. An assumption made in this voting model is that any person who
accepts a value as a high score also accepts all values that have higher
membership grades in the fuzzy set high.
This model defines the following mass assignment (i.e., probability distribution)
on the power set of {1, 2, 3, 4, 5, 6}:
{6}:0.1 {5, 6}:0.4 {4, 5, 6}:0.3 {3, 4, 5, 6}:0.2
where the mass (i.e., probability value) assigned to a subset of {1, 2, 3, 4, 5, 6}
[e.g., m
high
({5, 6}) = 0.4] is the proportion of voters who have that subset as a
crisp definition for the fuzzy concept high score. This mass assignment
corresponds to a family of probability distributions on {1, 2, 3, 4, 5, 6}.
Probabilistic Interpretation of Relations on Fuzzy Sets
On the basis of this voting model, we introduce a probabilistic interpretation of
the following binary relations on fuzzy sets. We write Pr(E
1
| E
2
) to denote the
conditional probability of E
1
given E
2
.
Definition 1. Let A be a fuzzy set on a domain U; B be a fuzzy set on a domain
V; and be a binary relation from {=, , <, , } assumed to be valid on (U V).
The probabilistic interpretation of a relation A B, denoted by prob(A B), is
a value in [0, 1] that is defined by
S,T U
Pr(u v | u S, v T).m
A
(S).m
B
(T).
Intuitively, given fuzzy propositions x A and y B, prob(A B) is the probability
for x y being true. The rationale of the above probabilistic interpretation is that,
given each crisp definition S of A and T of B, the conditional probability u v given
u S and v T is calculated and weighted by the product of the masses
associated with S and T. Then prob(A B) is the sum of those weighted
conditional probability values. Also, we define prob(A B) = prob(B A),
prob(A > B) = prob(B < A), prob(A B) = prob(B A), and prob(A B) =
prob(B A).
Example 1: In the Dice example above, suppose that about_5 is defined by the
fuzzy set {6:0.3, 5:1, 4:0.3}, whose mass assignment is:
{5}:0.7 {4, 5, 6}:0.3
Given x about_5 and y high, prob(about_5 = high) measures how likely
it is that x = y, as calculated below:
prob(about_5 = high)
= Pr(u = v|u {5},v {6}).m
about_5
({5}).m
high
({6})
Pr(u = v|u {5},v {5,6}).m
about_5
({5}).m
high
({5, 6}) +
Pr(u = v|u {5},v {4, 5, 6}).m
about_5
({5}).m
high
({4, 5, 6}) +
Pr(u = v|u {5},v {3, 4, 5, 6}).m
about_5
({5}).m
high
({3, 4, 5, 6}) +
Pr(u = v|u {4, 5, 6},v {6}).m
about_5
({4, 5, 6}).m
high
({6}) +
Pr(u = v|u {4, 5, 6},v {5, 6}).m
about_5
({4, 5, 6}).m
high
({5, 6}) +
Pr(u = v|u {4, 5, 6},v {4, 5, 6}).m
about_5
({4, 5, 6}).m
high
({4, 5, 6}) +
Pr(u = v|u {4, 5, 6},v {3, 4, 5, 6}).m
about_5
({4, 5, 6}).m
high
({3, 4, 5, 6})
= 0.0 0.7 0.1 + 1/2 0.7 0.4 + 1/3 0.7 0.3 + 1/4 0.7 0.2 +
1/3 0.3 0.1 + 1/3 0.3 0.4 + 1/3 0.3 0.3 + 1/4 0.3 0.2
= 0.34
Definition 2. Let A and B be two fuzzy sets on a domain U. The probabilistic
interpretation of the relation A B, denoted by prob(A B), is a value in
[0, 1] that is defined by
S,T U
Pr(u T | u S).m
A
(S).m
B
(T).
The intuitive meaning of prob(A B) is that it is the probability for x B being
true given x A being true. In other words, it is the fuzzy conditional probability
of x B given x A as defined by Baldwin, Martin, and Pilsworth (1995). We
note that the above probabilistic interpretation can also be adapted for fuzzy sets
on continuous domains, using integration instead of addition, as in the definition
of fuzzy conditional probability (Baldwin, Lawry, & Martin, 1996) as follows:

1
0
1
0
1
0
1
0
) (
) (
) prob( dxdy
A
B A
dxdy
A Pr
B A Pr
B A
y
y
x
x
x
x
where
A
x
and
B
y
are -cuts of the fuzzy sets A and B with = x and = y,
respectively. We also define prob(A B) = prob(B A).
52 Cao & Nguyen
Example 2: In the Dice example, one has:
prob(high about_5)
= Pr(u {5} | u {6}).m
high
({6}).m
about_5
({5})
Pr(u {5} | u {5,6}).m
high
({5,6}).m
about_5
({5}) +
Pr(u {5} | u {4,5,6}).m
high
({4,5,6}).m
about_5
({5}) +
Pr(u {5} | u {3,4,5,6}).m
high
({3,4,5,6}).m
about_5
({5}) +
Pr(u {4,5,6} | u {6}).m
high
({6}).m
about_5
({4,5,6}) +
Pr(u {4,5,6} | u {5,6}).m
high
({5,6}).m
about_5
({4,5,6}) +
Pr(u {4,5,6} | u {4,5,6}).m
high
({4,5,6}).m
about_5
({4,5,6}) +
Pr(u {4,5,6} | u {3,4,5,6}).m
high
({3,4,5,6}).m
about_5
({4,5,6})
= 0.0 0.1 0.7 + 1/2 0.4 0.7 + 1/3 0.3 0.7 + 1/4 0.2 0.7 +
1.0 0.1 0.3 + 1.0 0.4 0.3 + 1.0 0.3 0.3 + 3/4 0.2 0.3
= 0.53
Probabilistic Combination Strategies
Given two events e
1
and e
2
having probabilities in the intervals [L
1
, U
1
] and [L
2
,
U
2
], one may need to compute the probability intervals of the conjunction event
e
1
e
2
, disjunction event e
1
e
2
, or difference event e
1
e
2
. In this chapter, we
employ the conjunction, disjunction, and difference strategies given by Lakshmanan
et al. (1997) and Eiter et al. (2001) as presented in Table 2, where , , and
denote the conjunction, disjunction, and difference operators, respectively.
FPOB Types and Schemas
Overview of FPOB Conceptual Model
An architecture of FPOB systems is illustrated in Figure 1, which is adapted and
extended with fuzzy sets from that of POB systems. The user expresses
declarative queries in an FPOB-calculus through a graphical user interface.
Those queries are processed and converted into procedural queries in the FPOB
algebra that this chapter is presenting. They will then be executed by an FPOB
algebra execution engine, accessing data in an FPOB of discourse. All the
components refer to a library consisting of the following:
A set of probabilistic combination strategies for the user to express
dependencies between events.
A set of functions for the user to specify how probabilities are distributed
over the domain of values of attributes.
A set of fuzzy sets for the user to express vague and imprecise values of
attributes.
Table 2. Examples of probabilistic combination strategies
Strategy Operators
Ignorance ([L
1
, U
1
]
ig
[L
2
, U
2
]) = [max(0, L
1
+ L
2
1), min(U
1
, U
2
)]
([L
1
, U
1
]
ig
[L
2
, U
2
]) = [max(L
1
,

L
2
), min(1, U
1
+ U
2
)]
([L
1
, U
1
] y
ig
[L
2
, U
2
]) = [max(0, L
1
U
2
), min(U
1
, 1 L
2
)]
Independence ([L
1
, U
1
]
in
[L
2
, U
2
]) = [L
1
.L
2
, U
1
.U
2
]
([L
1
, U
1
]
in
[L
2
, U
2
]) = [L
1
+ L
2
(L
1
.L
2
), U
1
+ U
2
(U
1
.U
2
)]
([L
1
, U
1
] y
in
[L
2
, U
2
]) = [L
1
.(1

U
2
), U
1
.(1

L
2
)]
Positive correlation
(when e
1
implies e
2
,
or e
2
implies e
1
)
([L
1
, U
1
]
pc
[L
2
, U
2
]) = [min(L
1
, L
2
), min(U
1
, U
2
)]
([L
1
, U
1
]
pc
[L
2
, U
2
]) = [max(L
1
,

L
2
), max(U
1
, U
2
)]
([L
1
, U
1
] y
pc
[L
2
, U
2
]) = [max(0, L
1
U
2
), max(0, U
1
L
2
)]
Mutual exclusion
(when e
1
and e
2
are
mutually exclusive)
([L
1
, U
1
]
me
[L
2
, U
2
]) = [0, 0]
([L
1
, U
1
]
me
[L
2
, U
2
]) = [min(1, L
1
+

L
2
), min(1, U
1
+ U
2
)]
([L
1
, U
1
] y
me
[L
2
, U
2
]) = [L
1
, min(U
1
, 1 L
2
)]
Figure 1. Architecture of FPOB systems

FPOB
Calculus
Query

FPOB-Algebra
Query Manager

FPOB
probabilistic combination strategies probabilistic distributions fuzzy sets

FPOB-Algebra
Execution Engine
GUI
FPOB
Algebra
Query

USER
54 Cao & Nguyen
For FPOBs, we use the same definition of class hierarchy as that used for POBs.
Figure 2 shows an example POB hierarchy of plants given by Eiter et al. (2001),
which are classified as being either perennials or annuals and, alternatively, as
being vegetables, herbs, or flowers. Those subclasses of a class that are
connected to a d node are mutually disjoint (i.e., an object cannot belong to any
two of them at the same time), and they form a cluster of that class. In this
example, the class PLANTS has two clusters, namely, {ANNUALS, PERENNIALS} and
{VEGETABLES, HERBS, FLOWERS}.
The value in [0, 1] associated with the link between a class and one of its
immediate subclasses represents the probability for an object of the class
belonging to that subclass. For instance, the hierarchy says 60% of plants are
annuals, while the rest (40%) are perennials. Also, ANNUALS_HERBS is a common
subclass of ANNUALS and HERBS, where ANNUALS_HERBS constitute 40% and 80%
of annuals and herbs, respectively.
FPOB Types and Values
As in the classical object-oriented model, each class in POBs is characterized
by a number of attributes with values that are of particular types. For POBs,
types can be atomic types, set types, or tuple types. For FPOBs, we extend the
set types to be fuzzy set types as in the following definition.
Definition 3. Let A AA AA be a set of attributes and T T T T T be a set of atomic types. Then
types are inductively defined as follows:
1. Every atomic type from T T T T T is a type.
2. If is a type, then {} is the fuzzy set type of .
Figure 2. An example FPOB class hierarchy

@
0.6 0.4 0.2 0.4 0.3
0.4 0.8 0.3 0.3
PLANTS
ANNUALS PERENNIALS VEGETABLES HERBS FLOWERS
ANNUALS_HERBS PERENNIALS_FLOWERS
@
3. If A
1
, A
2
, , A
k
are pairwise different attributes from A AA AA and
1
,
2
, ,
k
are
types, then = [A
1
:
1
, A
2
:
2
, , A
k
:
k
] is the tuple type over {A
1
, A
2
, , A
k
}.
One writes .A
i
to denote
i
, and A
1
, A
2
, , A
k
are called top-level
attributes of .
Example 3: In the Plant example above, the attributes can be soil, sun, water,
which describe the conditions for a plant to grow, and name, size, width, and
height. Some atomic types can be integer, real, string, and soil-type. Some
fuzzy set and tuple types can be {real}, [soil: soil-type, sun: {real}, water:
integer], and [name: string, size: [height: integer, width: integer]].
Each type has a domain of its values as defined below (cf., Eiter et al., 2001).
Definition 4. Let every atomic type T T T T T be associated with a domain dom().
Then values are defined by induction as follows:
1. For every T TT TT, every v dom() is a value of type .
2. For every T TT TT, every fuzzy set on dom() is a value of type {}.
3. If A
1
, A
2
, , A
k
are pairwise different attributes from A AA AA and v
1
, v
2
, , v
k
are
values of types
1
,
2
, ,
k
, then [A
1
: v
1
, A
2
: v
2
, , A
k
: v
k
] is a value of type
[A
1
:
1
, A
2
:
2
, , A
k
:
k
].
We recall that a crisp set A on a domain U can be considered as a special fuzzy
set A
f
on U with membership defined by, for every x U,
Af
(x) = 1 if x A and
Af
(x) = 0 if x A. Also, every v U can be treated as a special fuzzy set v
f
on
U with membership defined by, for every x U,
vf
(x) = 1 if x = v and
Af
(x) = 0
if x v.
Example 4: In the Plant example, let soil-type be an enumerated type such that
dom(soil-type) = {loamy, swampy, sandy}, and mild, medium, and heavy are
linguistic labels of fuzzy sets on dom(real) as shown in Figure 3, with member-
ship functions as follows:
otherwise
x if x
x if
x mild
0
) 10 , 5 ( 1 ) 5 ( 2 . 0
] 5 , 0 [ 1
) (
56 Cao & Nguyen
otherwise
x if x
x if
x if x
x medium
0
] 20 , 15 [ 1 ) 15 ( 2 . 0
) 15 , 10 [ 1
) 10 , 5 [ 1 ) 10 ( 2 . 0
) (
otherwise
x if
x if x
x heavy
0
] 25 , 20 [ 1
) 20 , 15 [ 1 ) 20 ( 2 . 0
) (
Then, [soil: swampy, sun: mild, water: 3] is a value of the type [soil: soil-type,
sun: {real}, water: integer].
In POBs, for each attribute of an object there can be uncertainty about its value
measured by lower-bound and upper-bound probability distribution functions
over a set of values. For FPOBs, we adapt the definition of probabilistic tuple
values for POBs to represent that uncertain information for fuzzy set values as
well.
Definition 5. Let A
1
, A
2
, , A
k
be pairwise different attributes from A AA AA and, for
each i from 1 to k, V
i
be a finite set of values of type
i
, and
i
,
i
be probability
distribution functions over V
i
. Then ptv = [A
1
: V
1
,
1
,
1
, A
2
: V
2
,
2
,
2
, , A
k
:
V
k
,
k
,
k
] is a fuzzy-probabilistic tuple value of type [A
1
:
1
, A
2
:
2
, , A
k
:
k
]
over {A
1
, A
2
, , A
k
}. One writes ptv.A
i
to denote V
i
,
i
,
i
.
Example 5: Assume we know that the soil type of a thyme plant is loamy.
However, we are not sure whether the plant is French thyme, Silver thyme, or
Wooly thyme, with the same probability between 0.2 and 0.6 for each category.
Figure 3. Fuzzy set values of sunlight

1
0
mild
medium
heavy
sunlight degrees
5 10 15 20
Then this information can be represented by the fuzzy-probabilistic tuple value
[soil: {loamy}, u, u, category: {french, silver, wooly}, 0.6u, 1.8u]. Here, u
represents the uniform distribution function, and 0.6u and 1.8u denote the
distribution functions (x) = 0.6 1/3 = 0.2 and (x) = 1.8 1/3 = 0.6 for every
x from {french, silver, wooly}.
FPOB Schemas
FPOB schemas are now defined the same as POB schemas, as follows:
Definition 6. An FPOB schema is a quintuple (C, , , me, p), where:
1. C CC CC is a finite set of classes.
2. maps each class to a tuple type (c) representing the attributes and their
types of that class.
3. is a binary relation on C CC CC such that (C CC CC , ) is a directed acyclic graph,
whereby each edge c
1
c
2
means c
1
is an immediate subclass of c
2
.
4. me maps each class c C CC CC to a partition of the set of all immediate
subclasses of c, such that the classes in each cluster of the partition me(c)
are mutually disjoint.
5. p maps each edge c
1
c
2
in (C CC CC, ) to a rational number p(c
1
| c
2
) in [0, 1]
measuring the conditional probability for an object picked at random
uniformly from c
2
belonging to c
1
.
Given c
1
c
2
c
k
, one can write c
1
*
c
k
, and, in particular, c

*
c for
every c C CC CC .
Example 6: An FPOB schema for the Plant example above may be defined as
follows:
C CC CC = {PLANTS, ANNUALS, PERENNIALS, VEGETABLES, HERBS, FLOWERS, ANNUALS_HERBS,
PERENNIAL_FLOWERS}.
is given as in Table 3 (cf., Eiter et al., 2001).
(C CC CC, ), me, and p are given as in Figure 1.
58 Cao & Nguyen
An FPOB schema as defined above may be inconsistent when there is no set of
objects that satisfies its class hierarchy and probability assignment. It is
consistent if and only if it has a taxonomic and probabilistic model as in the
following definition adapted from that of POBs.
Definition 7. Let S = (C CC CC, , , me, p) be an FPOB schema. An interpretation
of S is a mapping from C CC CC to the set of all finite subsets of a set O OO OO of object
identifiers. It is said to be a model of S if and only if:
1. (c) for every c C CC CC
2. (c) (d) for all c, d C CC CC such that c d
3. (c) (d) = for all c, d C CC CC such that c and d belong to the same cluster
defined by me
4. |(c)| = p(c | d).|(d)| for all c, d C CC CC such that c d
Table 3. Type assignment of the plant example
C
(c)
PLANTS [name: string, soil: soil-type, water: integer]
ANNUALS [name: string, soil: soil-type, water: integer, sun: {real}]
PERENNIALS [name: string, soil: soil-type, water: integer, sun: {real},
exp-years: integer]
VEGETABLES [name: string, soil: soil-type, water: integer, sun: {real},
exp-years: integer]
HERBS [name: string, soil: soil-type, water: integer, sun: {real},
exp-years: integer, category: string]
FLOWERS [name: string, soil: soil-type, water: integer, sun: {real},
ANNUALS_HERBS [name: string, soil: soil-type, water: integer, sun: {real},
PERENNIALS_FLOWERS [name: string, soil: soil-type, water: integer, sun: {real},
Example 7: As in an example given by Eiter et al. (2001), let S be the FPOB
schema in Example 6 and O be a set of cardinality 800 partitioned into pairwise
disjoint subsets O OO OO
1
, O OO OO
2
, , O OO OO
10
having cardinalities 90, 27, 126, 45, 192, 21, 98, 35,
70, and 96, respectively. Then given in Table 4 is a model of S.
FPOB Inheritance and Instances
FPOB Inherited Schemas
In Definition 6 of FPOB schemas, the attributes specified for a class are only the
top-level attributes of that class, which do not include those attributes inherited
from its superclasses. In practice, different inheritance strategies can be
employed to resolve multiple inheritance (Bertino & Martino, 1993; Meyer, 1997;
Cao, 2001).
Given an FPOB schema S = (C CC CC, , , me, p), applying an inheritance strategy
on S induces another FPOB schema S
*
= (C CC CC,
*
, , me, p), which differs from
S only in the type assignment. Specifically, for each c C CC CC,
*
(c) = [A
1
: (d
1
).A
1
,
A
2
: (d
2
).A
2
, , A
k
: (d
k
).A
k
], where each d
i
is either c or a proper superclass of
c and, respectively, A
i
is a top-level attribute of c or one of d
i
, which c inherits.
An FPOB schema S is said to be fully inherited if and only if S = S
*
. From now
on, we assume that all FPOB schemas are consistent and fully inherited.
Table 4. A model of an FPOB schema
C
(c) |(c)|
PLANTS O
1
O
2
O
10
800
ANNUALS O
1
O
2
O
3
O
4
O
5
480
PERENNIALS O
6
O
7
O
8
O
9
O
10
320
VEGETABLES O
1
O
9
160
HERBS O
2
O
5
O
6
240
FLOWERS O
3
O
7
O
10
320
ANNUALS_HERBS O
5
192
PERENNIALS_FLOWERS O
10
96
60 Cao & Nguyen
FPOB Instances
As for POBs, given an FPOB schema, an FPOB instance is defined as a base
of objects associated with their classes and fuzzy-probabilistic tuple values in
accordance with the schema. The following definition is adapted from that of
POBs.
Definition 8. Let S = (C CC CC, , , me, p) be an FPOB schema and O OO OO be a set of
object identifiers. An FPOB instance over S is a pair (, ) where:
1. maps each c C CC CC to a finite subset of O OO OO such that, for different c
1
, c
2
C CC CC,
(c
1
) (c
2
) = .
2. For each c C CC CC, maps each o (c) to a fuzzy-probabilistic tuple value
of type (c).
We note that, in the definition above, (c) denotes only the set of the identifiers
of the objects that are defined in the class c. Meanwhile, the set of the identifiers
of all the objects that belong to c (i.e., those that are defined in c or its proper
subclasses) are denoted by
*
(c) = {(d) | d C CC CC and d
*
c}. Also, one writes
(C CC CC) to denote {(c) | c C CC CC}.
Example 8: An FPOB instance over the FPOB schema in Example 6 can be (, ),
where and
*
are shown in Table 5 and in Table 6 (cf., Eiter et al., 2001).
Probabilistic Extents of Classes
In classical object bases, the extent of a class comprises all the objects that
belong to that class. In POBs as well as FPOBs, the probabilistic extent of a class
specifies the probability for each object belonging to that class. The following
definition is adapted from that of POBs.
Definition 9. Let (, ) be an FPOB instance over an FPOB schema S = (C CC CC, ,
, me, p). Then, for each class c C CC CC, the probabilistic extent of c, denoted by
ext(c), maps each o (C CC CC) to a set of rational numbers in [0, 1] as follows:
1. If o
*
(c) then ext(c)(o) = {1}.
2. If o
*
(d) and (c) (d) = for every model of S, then ext(c)(o) =
{0}.
3. Otherwise, ext(c)(o) = {p | p is the product of the edge probabilities on a
path from c up to d where c
*
d with d being minimal and o
*
(d)}.
Example 9: For the FPOB instance in Example 8, one has:
ext(ANNUALS_HERBS)(o
1
) = {0.24}
ext(ANNUALS_HERBS)(o
2
) = {1}
ext(PERENNIALS_FLOWERS)(o
1
) = {0.12}
ext(PERENNIALS_FLOWERS)(o
2
) = {0}
Intuitively, as compared with relational databases, a POB/FPOB schema
corresponds to a relational schema, and each object of a POB/FPOB instance
corresponds to a tuple. However, two important differences are that objects can
have methods and identifiers (Garcia-Molina, Ullman, & Widom, 2000).
Table 5. Object mappings and
*
of an FPOB instance
C
(c)
(c)
PLANTS {o
1
} {o
1
, o
2
, o
3
, o
4
, o
5
, o
6
, o
7
}
ANNUALS {} {o
2
, o
3
, o
5
, o
6
, o
7
}
PERENNIALS {} {o
4
}
VEGETABLES {} {}
HERBS {} {o
2
, o
3
, o
5
, o
6
, o
7
}
FLOWERS {} {o
4
}
ANNUALS_HERBS {o
2
, o
3
, o
5
, o
6
, o
7
} {o
2
, o
3
, o
5
, o
6
, o
7
}
PERENNIALS_FLOWERS {o
4
} {o
4
}
62 Cao & Nguyen
FPOB Selection Operation
Selection Conditions
As for relational databases and object bases, selection is a basic operation for
FPOBs. Intuitively, the result of a selection query on an FPOB instance I over
Table 6. Value mapping of an FPOB instance
oid (oid)
o
1
[name: Lady-Fern, Ostrich-Fern, u, u, soil: loamy, u, u, water:
25,, 30, u, u]
o
2
[name: Cuban-Basil, Lemon-Basil, u, u, soil: loamy, sandy, 0.7u,
1.3u, water: 20,,30, u, u, sun: mild, medium, 0.8u,1.2u,
expyears: 2, 3, 4, 0.6u, 1.8u, category: french, silver, wooly, 0.6u,
1.8u]
o
3
[name: Mint, u, u, soil: loamy, u, u, water: 20, u, u, sun:
mild, u, u, expyears: 2, 3, 4, 0.6u, 1.8u, category: french, silver,
wooly, 0.6u, 1.8u]
o
4
[name: Aster, Salvia, u, u, soil: loamy, sandy, 0.6u, 1.4u, water:
20,, 25, u, u, sun: mild, u, u, expyears: 2, 3, 4, 0.6u, 1.8u,
category: french, silver, wooly, 0.6u, 1.8u]
o
5
[name: Thyme, u, u, soil : loamy, u, u, water: 20,,25, u, u,
sun: mild, medium, 0.8u, 1.2u, expyears: 2, 3, 0.8u, 1.2u, category:
french, silver, wooly, 0.6u, 1.8u]
o
6
wooly, 0.6u, 1.4u]
o
7
[name: Sage, u, u, soil: sandy, u, u, water: 20, 21, u, u, sun:
mild, u, u, expyears: 2, 3, 4, 0.6u, 1.8u, category: red, tricolor,
0.6u, 1.4u]
an FPOB schema S is another FPOB instance I' over S such that the objects of
the classes in I' and their attribute values satisfy the selection condition of the
query.
Before defining the FPOB selection operation, we present the formal syntax and
semantics of selection conditions. We start with the syntax of path expressions
and selection expressions. The following definition of path expressions is given
by Eiter et al. (2001).
Definition 10. Given a type = [A
1
:
1
, A
2
:
2
, , A
k
:
k
], path expressions are
inductively defined for every i from 1 to k as follows:
1. A
i
is a path expression for .
2. If P
i
is a path expression for
i
, then A
i
.P
i
is a path expression for .
Example 10: Given the types in Example 3, name, size.height, and size.width are
path expressions for the type [name: string, size: [height: integer, width:
integer]].
For selection expressions on FPOBs, we generalize the binary relations in
selection expressions on POBs to the fuzzy ones, and add in the implication
relation on fuzzy set values, as in the following definition.
Definition 11. Let S = (C CC CC, , , me, p) be an FPOB schema and X XX XX be a set of
object variables. Then fuzzy selection expressions are inductively defined as
having one of the following forms:
1. x c, where x X XX XX and c C CC CC.
2. x.P v, where x X XX XX, P

is a path expression, is a binary relation from
{=, , , , <, >, , , , , , }, and v is a value.
3. x.P
1
=
x.P
2
, where x X XX XX, P
1
and P
2
are path expressions, and is a
probabilistic conjunction strategy of combining the probabilities for x.P
1
= v
1
and x.P
2
= v
2
such that v
1
= v
2
.
4. , where and are selection expressions over the same object
variable, and is a probabilistic conjunction strategy of combining the
probabilities for and being true.
5. , where and are selection expressions over the same object
variable, and is a probabilistic disjunction strategy of combining the
probabilities for and being true.
64 Cao & Nguyen
Those of the first three forms are called atomic fuzzy selection expressions.
Different probabilistic conjunction and disjunction strategies are given by Eiter
et al. (2001).
Example 11: In the Plant example above, the selection of all objects that require
a very mild sun can be done using the atomic expression:
x.sun very mild
where very mild is also a linguistic label of a fuzzy set on dom(real).
Meanwhile, the selection of all objects that require a very mild sun or over 21
units of daily water can be expressed by the query:
x.sun very mild x.water > 21
Selection conditions are now defined as selection expressions to be satisfied with
a probability in a given interval, as for POBs.
Definition 12. Fuzzy selection conditions are inductively defined as follows:
1. If is a fuzzy selection expression and [l, u] is a subinterval of [0, 1], then
()[l, u] is a fuzzy selection condition.
2. If and are fuzzy selection conditions, then , ( ), and ( ) are
fuzzy selection conditions.
Example 12: In the Plant example, the selection of all objects that require a very
mild sun with a probability of at least 0.4 and over 21 units of daily water with
a probability of at least 0.8 can be done using the following selection condition:
(x.sun very mild)[0.4, 1] (x.water > 21)[0.8, 1]
Semantics of Selection Conditions
For defining the semantics of selection conditions, interpretations of path
expressions and fuzzy selection expressions and conditions are introduced. First,
we present the interpretation of path expressions given by Eiter et al. (2001).
Definition 13. Given a type = [A
1
:
1
, A
2
:
2
, , A
k
:
k
] and a value v = [A
1
: v
1
,
A
2
: v
2
, , A
k
: v
k
], the interpretation of a path expression P for under v, denoted
by v.P, is inductively defined as follows:
1. If P = A
i
, then v.P = v
i
.
2. If P = A
i
.P
i
where P
i
is a path expression for
i
, then v.P = v
i
.P
i
.
Example 13: In the Plant example, the interpretations of the path expressions
name, size.height, and size.width under the value [name: Thyme, size: [height: 4,
width: 12]] are the values Thyme, 4, and 12, respectively.
Definition 14. Let S = (C CC CC, , , me, p) be an FPOB schema, I = (, ) be an
FPOB instance over S, and o (C CC CC). The probabilistic interpretation with
respect to S, I, and o, denoted by prob
S,I,o
, is the partial mapping from the set of
all fuzzy selection expressions to the set of all closed subintervals of [0, 1] that
is inductively defined as follows:
1. prob
S,I,o
(x c) = [min(ext(c)(o)), max(ext(c)(o))].
2. prob
S,I,o
(x.P v) = [
uV
(u).prob(u.P' v), min(1,
uV
(u).prob(u.P' v))],
where P = A.P', (o).A = V, , .
3. prob
S,I,o
(x.P
1
=

x.P
2
)
= [
uV
(u).prob(u
1
.P
1
' = u
2
.P
2
'), min(1,
uV
(u).prob(u
1
.P
1
' = u
2
.P
2
'))],
where P
1
= A
1
.P
1
' , (o).A
1
= V
1
,
1
,
1
, P
2
= A
2
.P
2
', (o).A
2
= V
2
,
2
,
2
,
and [(u), (u)] = [
1
(u
1
),
1
(u
1
)][
2
(u
2
),
2
(u
2
)] for all u = (u
1
, u
2
)
V = V
1
V
2
.
4. prob
S,I,o
( ) = prob
S,I,o
()prob
S,I,o
().
5. prob
S,I,o
( ) = prob
S,I,o
()prob
S,I,o
().
Intuitively, prob
S,I,o
(x c) is the interval of the probability for o belonging to c,
prob
S,I,o
(x.A.P' v) is the interval of the probability for the attribute A of o having
a value u such that u.P' v. Also, prob
S,I,o
(x.A
1
.P
1
' =
x.A
2
.P
2
') is the interval of
the probability for the attribute A
1
and A
2
of o (with mutual dependency reflected
in the selected ) having values u
1
and u
2
, respectively, such that u
1
.P
1
' = u
2
.P
2
'. We
note that P', P
1
', and P
2
' can be empty.
Definition 14 is actually an extension of the probabilistic interpretation for POBs,
where prob(u.P' v) and prob(u
1
.P
1
' = u
2
.P
2
') can have values only in {0, 1},
because attribute values are crisp. In the case of FPOBs, they are evaluated to
values in [0, 1].
66 Cao & Nguyen
Example 14: For the FPOB instance in Example 8 and the fuzzy sets defining
mild and medium as in Example 4, one has:
prob
S,I,o2
(x ANNUALS_HERBS) = [1, 1]
prob
S,I,o2
(x.water > 21) = [9/11, 9/11] = [0.82, 0.82]
Meanwhile:
prob
S,I,o2
(x.sun mild)
= [0.8 u(mild) prob(mild mild) + 0.8 u(medium) prob(medium mild),
min(1, 1.2 u(mild) prob(mild mild) + 1.2 u(medium) prob(medium
mild))]
= [0.8 1/2 0.903 + 0.8 1/2 0.068, min(1, 1.2 1/2 0.903 + 1.2 1/2 0.068)]
= [0.39, min(1, 0.59)] = [0.39, 0.59]
because:

1
0
1
0
) ( dxdy
mild
mild mild
mild mild
y
x
x
prob

1 1
0 0
[0,10 5 ] [0,10 5 ]
[0,10 5 ]
x y
dxdy
x

= 0.903

1
0
1
0
) ( dxdy
medium
mild medium
mild medium
y
x
x
prob

+
+
1
0
1
0
] 5 20 , 5 5 [
] 5 10 , 0 [ ] 5 20 , 5 5 [
dxdy
x x
y x x
= 0.068
We recall that prob(A A) is not nessarily equal to 1 when A is a fuzzy set
(Baldwin, Martin, & Pilsworth, 1995). Similarly, the probabilistic interpretation
of the above atomic fuzzy selection expressions with respect to other objects can
be computed as given in Table 7.
The following definitions are adapted from Eiter et al. (2001) for fuzzy selection
conditions in FPOBs.
FPOB instance over S, and o(C CC CC). The satisfaction of fuzzy selection
conditions under prob
S,I,o
is defined as follows:
1. prob
S,I,o
|= ()[l, u] if and only if prob
S,I,o
()

[l, u].
2. prob
S,I,o
|= if and only if prob
S,I,o
|= does not hold.
3. prob
S,I,o
|= ( ) if and only if prob
S,I,o
|= and prob
S,I,o
|= .
4. prob
S,I,o
|= ( ) if and only if prob
S,I,o
|= or prob
S,I,o
|= .
Example 15: In the Plant example above, using the independence probabilistic
conjunction strategy, one has:
prob
S,I,o2
(xANNUALS_HERBS
in
x.sun mild)
= [1.00 0.39, 1.0 0.59] = [0.39, 0.59] [0.3, 0.5]
and
prob
S,I,o2
(x.sun mild
in
x.water > 21)
= [0.39 0.82, 0.59 0.82] = [0.32, 0.48] [0.3, 0.5]
Table 7. Interpretation of atomic fuzzy selection expressions
oid
prob
S,I,o
(x
ANNUALS_HERBS)
prob
S,I,o
(x.sun
mild)
prob
S,I,o
(x.water > 21)
o
1
[0.24, 0.24] Undefined [1.00, 1.00]
o
2
[1.00, 1.00] [0.39, 0.59] [0.82, 0.82]
o
3
[1.00, 1.00] [0.90, 0.90] [0.00, 0.00]
o
4
[0.00, 0.00] [0.90, 0.90] [0.67, 0.67]
o
5
[1.00, 1.00] [0.39, 0.59] [0.67, 0.67]
o
6
[1.00, 1.00] [0.90, 0.90] [0.00, 0.00]
o
7
[1.00, 1.00] [0.90, 0.90] [0.00, 0.00]
68 Cao & Nguyen
therefore:
prob
S, I,o2
| (x ANNUALS_HERBS
in
x.sun mild)[0.3, 0.5]
prob
S, I,o2
|= (x.sun mild
in
x.water > 21)[0.3, 0.5]
Similarly, the probabilistic interpretation of these two fuzzy selection expressions
with respect to other objects can be computed as given in Table 8.
FPOB instance over S, and be a fuzzy selection condition over an object
variable x. The selection on I with respect to , denoted by
(I), is the FPOB

instance I' = (', ') over S such that '(c) = {o (c) | prob
S,I,o
|= } and ' is
restricted to '(C CC CC).
Example 16: In the Plant example above, suppose that:
= (x.sun mild)[0.39, 1.00] (x.water > 21)[0.80, 1.00]
one has:
Table 8. Interpretation of fuzzy selection expressions
oid prob
S,I,o
(x ANNUALS_HERBS
in

x.sun mild)
prob
S,I,o
(x.sun mild
in
x.water > 21)
o
1
Undefined Undefined
o
2
[0.39, 0.59] [0.32, 0.48]
o
3
[0.90, 0.90] [0.00, 0.00]
o
4
[0.00, 0.00] [0.61, 0.61]
o
5
[0.39, 0.59] [0.26, 0.40]
o
6
[0.90, 0.90] [0.00, 0.00]
o
7
[0.90, 0.90] [0.00, 0.00]
(I) = I' that contains only o

2
That is because only o
2
satisfies the fuzzy selection condition as shown below:
prob
S,I,o2
(x.sun mild) = [0.39, 0.59] [0.39, 1.00]
and
prob
S,I,o2
(x.water > 21) = [0.82, 0.82] [0.80, 1.00]
Other FPOB Algebraic Operations
As for relational databases, other basic operations on object base instances are
projection, renaming, Cartesian product, join, intersection, union, and difference.
Those operations for POBs could be straightforwardly applied to FPOBs. For
this chapter to be self-contained, their definitions and examples given by Eiter et
al. (2001) are adapted and presented below.
Projection and Renaming
A projection of an FPOB instance on a set of attributes is a new instance in which
only the attributes in that set are considered for the type of each class and the
value of each object.
Definition 17. Let I = (, ) be an FPOB instance over an FPOB schema
S = (C CC CC, , , me, p) and A be a set of attributes. The projection of I on A,
denoted by
A
(I), is I' = (', ') over the FPOB schema
A
(S) where:
1.
A
(S) = (C CC CC, ', , me, p) such that, for all c C CC CC, '(c) is obtained from
(c) = [B
1
:
1
,, B
k
:
k
] by deleting all B
j
:
j
with B
j
A.
2. '(c) = (c) for all c C CC CC.
3. '(o) =
A
((o)) obtained from (o) = [B
1
: V
1
,
1
,
1
,, B
k
: V
k
,
k
,
k
]
by deleting all B
j
: V
j
,
j
,
j
with B
j
A, for all o (C CC CC).
Example 17: Let I = (, ) be the FPOB instance in Example 8, and A ={name,
water}. Then the projection of I on A is the FPOB instance I' = (', ') on
A
(S),
70 Cao & Nguyen
where ' = , and ' is given in Table 9.
The meaning of the renaming operation is clear, which is to rename some of the
top-level attributes in an FPOB instance by new ones.
Definition 18. Let S = (C CC CC, , , me, p) be an FPOB schema and A be the set
of all top-level attributes of S. A renaming expression has the form

C B
,
where
B = B
1
, B
2
,, B
m
is a list of distinct attributes from A, and
C
= C
1
, C
2
,,C
m
is a list of distinct attributes from A AA AA - A.
Definition 19. Let I = (, ) be an FPOB instance over an FPOB schema
S = (C CC CC, , , me, p) and N be a renaming expression. The renaming in I with
respect to N, denoted by
N
(I), is I' = (', ') over the FPOB schema
N
(S)
where:
1.
N
(S) = (C CC CC, ', , me, p) such that, for all c C CC CC, '(c) is obtained from (c)
= [A
1
:
1
,, A
k
:
k
] by replacing each attribute A
j
= B
i
for some i {1, 2, ..., m}
by the new attribute C
i
.
2. '(c) = (c) for all c C CC CC.
3. '(o) =
N
((o)) obtained from (o) = [A
1
: V
1
,
1
,
1
,, A
k
: V
k
,
k
,
k
]
by replacing each attribute A
j
= B
i
for some i {1, 2, ..., m} by the new
attribute C
i
, for all o (C CC CC).
Table 9. ' Resulting from projection
oid
'(oid)
o
1
[name: Lady-Fern, Ostrich-Fern, u, u, water: 25,, 30, u, u]
o
2
[name: Cuban-Basil, Lemon-Basil, u, u, water: 20,,30, u, u ]
o
3
[name: Mint, u, u, water: 20, u, u]
o
4
[name: Aster, Salvia, u, u, water: 20,, 25, u, u]
o
5
[name: Thyme, u, u, water: 20,, 25, u, u]
o
6
[name: Mint, u, u, water: 20, u, u]
o
7
[name: Sage, u, u, water: 20, 21, u, u]
Example 18: Let I be the FPOB instance computed in Example 17. Then the
renaming in I with respect to the renaming expression name, water name2,
water2 is the FPOB instance I' = (', '), where ' = , and ' is given in
Table 10.
Cartesian Product
We recall that, in relational databases, the Cartesian product of two relations is
a new relation consisting of all tuples that are obtained by concatenating a tuple
in the first relation with a tuple in the second relation. Similarly, the Cartesian
product of two FPOBs should be a new one such that the property list of each
object is obtained by concatenating the property list of an object in the first FPOB
instance with the property list of an object in the second FPOB instance.
Meanwhile, in relational algebra, the Cartesian product of two relational schemas
is defined only if their sets of attributes are disjoint. Thus, in FPOB algebra, we
define the Cartesian product only for two FPOB schemas that do not have any
common top-level attribute.
Also, the Cartesian product operation on both schemas and relations is commu-
tative. For FPOB algebra, given two FPOB schemas S
1
= (C CC CC
1
,
1
, , me
1
, p
1
)
and S
2
= (C CC CC
2
,
2
, , me
2
, p
2
), that should mean S
1
S
2
= S
2
S
1
, which implies
C CC CC
2
C CC CC
1
= C CC CC
1
C CC CC
2
. The latter is achieved by using the following assumption.
Table 10. ' Resulting from renaming
oid
'(oid)
o
1
[name2: Lady-Fern, Ostrich-Fern, u, u, water2: 25,, 30, u, u]
o
2
[name2: Cuban-Basil, Lemon-Basil, u, u, water2: 20,,30, u, u ]
o
3
[name2: Mint, u, u, water2: 20, u, u]
o
4
[name2: Aster, Salvia, u, u, water2: 20,, 25, u, u]
o
5
[name2: Thyme, u, u, water2: 20,, 25, u, u]
o
6
[name2: Mint, u, u, water2: 20, u, u]
o
7
[name2: Sage, u, u, water2: 20, 21, u, u]
72 Cao & Nguyen
Assumption 1. It is assumed that for each FPOB schema S = (C CC CC, , , me, p),
the set of classes C CC CC is a classical relation over a classical relation schema R(S)
= {A
1
, A
2
,, A
m
} associated with S. That is, each class c C CC CC is considered as
a tuple over R(S).
Definition 20. The FPOB schemas S
1
= (C CC CC
1
,
1
, , me
1
, p
1
) and S
2
= (C CC CC
2
,
2
, ,
me
2
, p
2
) are Cartesian product-compatible if and only if R(S
1
) and R(S
2
) are
disjoint.
Definition 21. Let S
1
= (C CC CC
1
,
1
,
1
, me
1
, p
1
) and S
2
= (C CC CC
2
,
2
,
2
, me
2
, p
2
) be two
Cartesian product-compatible FPOB schemas, and R
1
= R(S
1
) and R
2
= R(S
2
).
The Cartesian product of S
1
and S
2
, denoted by S
1
S
2
, is the FPOB schema
S = (C CC CC, , , me, p) such that:
1. C CC CC = C CC CC
1
C CC CC
2
.
2. For all classes c C CC CC, (c[R
1
], c[R
2
]) = [A
1
:
1
,, A
k
:
k
, A
k+1
:
k+1
,, A
k+m
:
k+m
], where
1
(c[R
1
]) = [A
1
:
1
,, A
k
:
k
] and
2
(c[R
2
]) = [A
k+1
:
k+1
,, A
k+m
:
k+m
].
3. The directed acyclic graph (C CC CC, ) is defined as follows. For all c, d C CC CC:
c d iff (c[R
1
]
1
d[R
1
] c[R
2
] = d[R
2
]) (c[R
2
]
2
d[R
2
] c[R
1
] = d[R
1
]).
4. The partitioning me is defined as follows. For all c C CC CC:
me(c) = {P
1
{c[R
2
]}|P
1
me
1
(c[R
1
])} {{c[R
1
])} P
2
|P
2
me
2
(c[R
2
])}.
5. The probability assignment p is defined as follows. For all c d:
p
1
(c[R
1
] |

d[R
1
]) if c[R
2
] =

d[R
2
]
p(c | d) =
p
2
(c[R
2
] |

d[R
2
]) if c[R
1
] =

d[R
1
].
Example 19: Let S
1
and S
2
be the FPOB schemas of the FPOB instances
computed in Examples 17 and 18, respectively. Then the Cartesian product S
1
S
2
= (C CC CC, , , me, p) is given as follows:
A partial view on C CC CC, me, and p is illustrated in Figure 4.
(c) = [name: string, water: integer, name2: string, water2: integer] for every cC CC CC.
Definition 22. Let I
1
= (
1
,
1
) and I
2
= (
2
,
2
) be two FPOB instances over
the Cartesian product-compatible FPOB schemas S
1
= (C CC CC
1
,
1
,
1
, me
1
, p
1
) and
S
2
= (C CC CC
2
,
2
,
2
, me
2
, p
2
), respectively, and let R
1
= R(S
1
) and R
2
= R(S
2
). The
Cartesian product of I
1
and I
2
, denoted by I
1
I
2
, is defined as the FPOB
instance (, ) over the FPOB schema S = S
1
S
2
, where:
1. (c) =
1
(c[R
1
])
2
(c[R
2
]), for all c C CC CC
1
C CC CC
2
.
2. (o) =
1
(o[R
1
])
2
(o[R
2
]), for all o (C CC CC
1
C CC CC
2
), where
1
(o[R
1
]) and
2
(o[R
2
]) are fuzzy-probabilistic tuple values over disjoint sets of attributes
A
1
and A
2
, respectively, and (o) is the fuzzy-probabilistic tuple value over
A
1
A
2
such that (o).A =
1
(o[R
1
]).A if A A
1
or (o).A =
2
(o[R
2
]).A
if A A
2
.
Example 20: Let I
1
and I
2
be the FPOB instances computed in Examples 17 and
18, respectively. Then the Cartesian product I
1
I
2
= (, ), where , are given
in Tables 11 and 12.
Figure 4. Some classes in the Cartesian product of the plant example

...
(pl,pl)
(pl,an) (pl,pe) (ve,pl) (he,pl)
(an,he) (he,he)
0.4 0.6 0.2 0.3 0.4 0.4 0.6 0.4 0.2
(pl,ve) (pl,he) (pl,fl)
(pe,he)
(fl,he) (ah,pl)
(ve,he)
(pf,pl)
(fl,pl) (an,pl) (pe,pl)
0.3 0.4 0.6 0.2 0.3
0.4 0.4
0.3
0.3 0.8
....
d d d d
d d
Table 11. Resulting from Cartesian product (partial view)
c (c)
(pl, pl) (o
1
, o
1
)
(an, pl)
(ah, pl) (o
2
, o
1
), (o
3
, o
1
), (o
5
, o
1
), (o
6
, o
1
), (o
7
, o
1
)
(pf, pl) (o
4
, o
1
)
74 Cao & Nguyen
Join
In relational databases, the join operation is a generalization of the Cartesian
product operation. That is, in the join of two relations, the value of an attribute
of a tuple in the first relation and the value of the same attribute, if any, in the
second relation are combined. For that combination, the types of such a common
attribute name in both relations must be identical as defined below for FPOBs.
Definition 23. The FPOB schemas S
1
= (C CC CC
1
,
1
, , me
1
, p
1
) and S
2
= (C CC CC
2
,
2
, ,
me
2
, p
2
) are join-compatible iff R(S
1
) and R(S
2
) are disjoint and, for all classes
c
1
C CC CC
1
and c
2
C CC CC
2
, if an attribute A is defined for both
1
(c
1
) and
2
(c
2
) then
1
(c
1
).A =
2
(c
2
).A.
Definition 24. Let S
1
= (C CC CC
1
,
1
, , me
1
, p
1
) and S
2
= (C CC CC
2
,
2
, , me
2
, p
2
) be two
join-compatible FPOB schemas, and R
1
= R(S
1
) and R
2
= R(S
2
). The join of S
1
and
S
2
, denoted by S
1
><
S
2
is the FPOB schema S = (C CC CC, , , me, p), where C CC CC, ,
, me are as in the definition of S
1
S
2
, and is defined such that, for all c
C CC CC, the tuple type (c) = [A
1
:
1
,, A
m
:
m
] contains exactly all A
i
:
i
that belongs
to either
1
(c[R
1
]) or
2
(c[R
2
]).
The following definitions are for combination of fuzzy-probabilistic tuple values
of objects in two FPOB instances.
Definition 25. Let pt
1
= V
1
,
1
,
1
and pt
2
= V
2
,
2
,
2
be two fuzzy-
probabilistic triples, and be a probabilistic conjunction strategy. Then pt
1
pt
2
Table 12. Resulting from Cartesian product (partial view)
oid
(oid)
(o
1
, o
1
) [name: Lady-Fern, Ostrich-Fern, u, u, water: 25,, 30, u, u,
name2: Lady-Fern, Ostrich-Fern, u, u, water2: 25,, 30, u, u]
(o
2
, o
1
) [name: Cuban-Basil, Lemon-Basil, u, u, water: 20,, 30, u, u,
(o
3
, o
1
) [name: Mint, u, u, water: 20, u, u,
is the fuzzy-probabilistic triple V, , where V = {v V
1
V
2
| [(v), (v)] =
[
1
(v),
1
(v)][
2
(v),
2
(v)] [0,0]}.
Definition 26. Let ptv
1
and ptv
2
be two fuzzy-probabilistic tuple values over the
sets of attributes A
1
and A
2
, respectively, such that for all A A
1
A
2
, the values
of ptv
1
.A

and ptv
2
.A are of the same type. The join of ptv
1
and ptv
2
under a
probabilistic conjunction strategy , denoted by ptv
1
><
ptv
2
is the fuzzy-
probabilistic tuple value ptv over A
1
A
2
defined by the following:
ptv.A

= ptv
1
.A

for all attributes A A
1
- A
2
ptv.A

= ptv
2
.A

2
- A
1
ptv.A

= ptv
1
.Aptv
2
.A

1
A
2
We are now ready to define the join of two FPOB instances as follows.
1
= (
1
,
1
) and I
2
= (
2
,
2
the join-compatible FPOB schemas S
1
= (C CC CC
1
,
1
, , me
1
, p
1
) and S
2
= (C CC CC
2
,
2
, ,
me
2
, p
2
), and A
1
and A
2
be the sets of top-level attributes of S
1
and S
2
,
respectively, and let R
1
= R(S
1
) and R
2
= R(S
2
). The join of I
1
and I
2
under a
probabilistic conjunction strategy , denoted by I
1
><
I
2
, is defined as the
FPOB instance (, ) over the FPOB schema S
1
><
S
2
, where:
1. (c) = {(o
1
,o
2
)
1
(c[R
1
])
2
(c[R
2
]) | for all A A
1
A
2
, if
(
1
(o
1
)
><
2
(o
2
)).A = V, , , then V }, for all c C CC CC
1
C CC CC
2
.
2. (o) =
1
(o[R
1
])
><
2
(o[R
2
]), for all o (C CC CC
1
C CC CC
2
).
Example 21: Let I
1
be the FPOB instance in Example 17 and I
2
be the renaming
in I with respect to the renaming expression water2 water. Then the join of I
1
and I
2
under the independence probabilistic conjunction strategy is I
1
><
in
I
2
= (,
), where is given in Table 13 and in Table 14.
Intersection, Union, and Difference
As for the intersection of two relations on the same schema, the intersection of
two FPOB instances on the same FPOB schema is a new FPOB instance in
which objects are common to both of the two instances, and the attribute values
of each object are obtained by combining the respective attribute values of that
76 Cao & Nguyen
object in the two instances. First, the intersection of two fuzzy-probabilistic tuple
values is defined as follows.
1
and ptv
2
same set of attributes A. The intersection of ptv
1
and ptv
2
under a probabilistic
conjunction strategy , denoted by ptv
1

ptv
2
is the fuzzy-probabilistic tuple
value over A

defined by ptv.A

= ptv
1
.Aptv
2
.A

for all attributes A A.
1
= (
1
,
1
) and I
2
= (
2
,
2
the same FPOB schema S = (C CC CC, , , me, p). The intersection of I
1
and I
2
under
a probabilistic conjunction strategy , denoted by I
1

I
2
, is the FPOB instance
(, ) over the S, where:
1. (c) =
1
(c)
2
(c) for every c C CC CC.
2. (o) =
1
(o)
2
(o) for every o (C CC CC).
Table 13. Resulting from join (partial view)
Table 14. Resulting from join (partial view)
c (c)
(pl, pl) (o
1
, o
1
)
(an, pl)
(ah, pl) (o
2
, o
1
), (o
5
, o
1
)
(pf, pl) (o
4
, o
1
)
oid (oid)
(o
1
, o
1
) [name: Lady-Fern, Ostrich-Fern, u, u, water: 25,, 30, u/6, u/6,
name2: Lady-Fern, Ostrich-Fern, u, u]
(o
2
, o
1
) [name: Cuban-Basil, Lemon-Basil, u, u, water: 25,, 30, u/11,
u/11, name2: Lady-Fern, Ostrich-Fern, u, u]
(o
5
, o
1
) [name: Thyme, u, u, water: 25, u/36, u/36,
(o
4
, o
1
) [name: Aster, Salvia, u, u, water: 25, u/36, u/36,
Example 22: Let S be the FPOB schema in Example 6, and I
1
= (
1
,
1
) and
I
2
= (
2
,
2
) be the FPOB instances as defined in Tables 15, 16, and 17. Then the
intersection of I
1
and I
2
under the independence probabilistic conjunction
strategy is I
1
in
I
2
= (, ), where is given in Table 15 and in Table 18.
Table 15. Object mappings
1
,
2
, and
C
1
(c)
2
(c) (c)
PLANTS {o
1
} {o
1
} {o
1
}
ANNUALS {} {} {}
PERENNIALS {} {} {}
VEGETABLES {} {} {}
HERBS {} {} {}
FLOWERS {} {} {}
ANNUALS_HERBS {o
2
, o
3
} {o
5
} {}
4
} {o
4
} {o
4
}
Table 16. Value mapping
1
of FPOB instance I
1
oid (oid)
o
1
25,, 30, u, u]
o
2
[name: Cuban-Basil, Lemon-Basil, u, u, soil: loamy, sandy, 0.7u,
1.3u, water: 20,,30, u, u, sun: mild, medium, 0.8u,1.2u,
expyears: 2, 3, 4, 0.6u, 1.8u, category: french, silver, wooly, 0.6u,
1.8u]
o
3
wooly, 0.6u, 1.8u]
o
4
78 Cao & Nguyen
The union and difference operations are then defined similarly, on the basis of
the union and difference operations on fuzzy-probabilistic triples and tuple
values.
1
= V
1
,
1
,
1
and pt
2
= V
2
,
2
,
2
be two fuzzy-
probabilistic triples, and be a probabilistic disjunction strategy. Then pt
1
pt
2
is the fuzzy-probabilistic triple V, , defined as follows:
V = V
1
V
2
.
[
1
(v ),
1
( v)] i f v V
1
V
2

[ (v), ( v) ] = [
2
(v ),
2
( v)] i f v V
2
V
1

[
1
(v ),
1
( v)] [
2
(v ),
2
(v)] i f v V
1
V
2
.
Table 17. Value mapping
2
of FPOB instance I
2
Table 18. Resulting from intersection
oid (oid)
o
1
25,, 30, u, u]
o
4
o
5
[name: Thyme, u, u, soil : loamy, u, u, water: 20,,25, u, u,
sun: mild, medium, 0.8u, 1.2u, expyears: 2, 3, 0.8u, 1.2u, category:
oid (oid)
o
1

[name: Lady-Fern, Ostrich-Fern, 0.5u, 0.5u, soil: loamy, u, u,
water: 25,, 30, u/6, u/6]
o
4

[name: Aster, Salvia, 0.5u, 0.5u, soil: loamy, sandy, 0.18u, 0.98u,
water: 20,, 25, u/6, u/6, sun: mild, u, u, expyears: 2, 3, 4,
0.12u, 1.08u, category: french, silver, wooly, 0.12u, 1.08u]
1
and ptv
2
same set of attributes A. The union of ptv
1
and ptv
2
disjunction strategy , denoted by ptv
1

ptv
2
is the fuzzy-probabilitsic tuple
value over A

defined by ptv.A

= ptv
1
.Aptv
2
.A

for all attributes A A.
1
= (
1
,
1
) and I
2
= (
2
,
2
the same FPOB schema S = (C CC CC, , , me, p). The union of I
1
and I
2
under a
probabilistic conjunction strategy , denoted by I
1

I
2
, is the FPOB instance (,
) over the S, where:
1. (c) =
1
(c)
2
(c) for every c C CC CC.
2.
1
(o) i f o
1
(C)
2
(C)
(o) =
2
(o) i f o
2
(C)
1
(C)
1
(o)
2
(o) i f o
1
(C)
2
(C)
for every o (C).
1
= (
1
,
1
) and I
2
= (
2
,
2
) be the FPOB instances on S in Example 22. Then the union of I
1
and
I
2
under the ignorance probabilistic disjunction strategy is I
1
ig
I
2
= (, ),
where is given in Table 19 and in Table 20.
Table 19. Resulting from union
C
(c)
PLANTS {o
1
}
ANNUALS {}
PERENNIALS {}
VEGETABLES {}
HERBS {}
FLOWERS {}
ANNUALS_HERBS {o
2
, o
3
, o
5
}
4
}
80 Cao & Nguyen
1
= V
1
,
1
,
1
and pt
2
= V
2
,
2
,
2
be two fuzzy-
probabilistic triples, and be a probabilistic difference strategy. Then pt
1
pt
2
is the fuzzy-probabilistic triple V, , defined as follows:
V = V
1
- {v V
1
V
2
| [
1
(v),
1
(v)][
2
(v),
2
(v)] = [0, 0]}.
[
1
(v),
1
(v)] if v V

V
2

[ (v), (v)] =
[
1
(v),
1
(v)] [
2
(v),
2
(v)] if v V V
2
.
1
and ptv
2
same set of attributes A. The difference of ptv
1
and ptv
2
difference strategy , denoted by ptv
1
-
ptv
2
is the fuzzy-probabilistic tuple value
over A

defined by ptv.A

= ptv
1
.Aptv
2
.A

for all attributes AA.
Table 20. Resulting from union
oid (oid)
o
1
[name: Lady-Fern, Ostrich-Fern, u, 2u, soil: loamy, u, u, water:
25,, 30, u, 2u]
o
2
[name: Cuban-Basil, Lemon-Basil, u, u, soil: loamy, sandy,0.7u, 1.3u,
water: 20,,30, u, u, sun: mild, medium, 0.8u,1.2u, expyears: 2, 3,
4, 0.6u, 1.8u, category: french, silver, wooly, 0.6u, 1.8u]
o
3
[name: Mint, u, u, soil: loamy, u, u, water: 20, u, u, sun: mild,
u, u, expyears: 2, 3, 4, 0.6u, 1.8u, category: french, silver, wooly,
0.6u, 1.8u]
o
4
[name: Aster, Salvia, u, 2u, soil: loamy, sandy, 0.6u, 2u, water:
20,, 25, u, 2u, sun: mild, u, u, expyears: 2, 3, 4, 0.6u, 3u,
category: french, silver, wooly, 0.6u, 3u]
o
5
[name: Thyme, u, u, soil : loamy, u, u, water: 20,,25, u, u, sun:
mild, medium, 0.8u, 1.2u, expyears: 2, 3, 0.8u, 1.2u, category:
1
= (
1
,
1
) and I
2
= (
2
,
2
the same FPOB schema S = (C CC CC, , , me, p), and A be the sets of top-level
attributes of S. The difference of I
1
and I
2
under a probabilistic difference
strategy , denoted by I
1
-
I
2
, is the FPOB instance (, ) over the S, where:
1. (c) =
1
(c) - {o
1
(C CC CC)
2
(C CC CC) | (
1
(o)-
2
(o)).A = , _, _ for some
A A}
for every c C CC CC.
2.
1
(o) if o
1
(C)
2
(C)
(o) =
1
(o)
2
(o) if o
1
(C)
2
(C).

for every o (C).
1
= (
1
,
1
) and
I
2
= (
2
,
2
) be the FPOB instances on S in Example 22. Consider I
2
'

= (
2
,
2
')
that is different from I
2
only in
2
'(o
1
).soil = {loamy, sandy}, u, u. Then the
difference of I
1
and I
2
'

under the independence probabilistic difference strategy
is I
1
-
in
I
2
' = (, ), where is given in Table 21 and in Table 22.
We note that o
4
(PERENNIALS_FLOWERS) because (o
4
).sun = , _, _.
Table 21. Resulting from difference
C
(c)
PLANTS {o
1
}
ANNUALS {}
PERENNIALS {}
VEGETABLES {}
HERBS {}
FLOWERS {}
ANNUALS_HERBS {o
2
, o
3
}
PERENNIALS_FLOWERS {}
82 Cao & Nguyen
Conclusion
We presented an extension of the POB model with vague and imprecise values.
In order to integrate fuzzy set values into the probabilistic framework of POBs,
we employed a probability-based voting model of fuzzy sets and introduced a
probabilistic interpretation of relations on them. The definitions of FPOB
schemas, instances, and algebraic operations were then presented, generalizing
those of POBs. The obtained algebra provides a formal basis for development
of fuzzy and probabilistic object bases, as relational algebra does for relational
databases. A prototype of this model was demonstrated, and we are investigating
its full-scale implementation to be applied to build object bases for real-world
problems.
References
Baldwin, J. M., Lawry, J., & Martin, T. P. (1996). A note on probability/
possibility consistency for fuzzy events. In Proceedings of the Sixth
International Conference on Information Processing and Manage-
ment of Uncertainty in Knowledge-Based Systems (pp. 521525).
Granada, Spain.
Table 22. Resulting from difference
oid (oid)
o
1
[name: Lady-Fern, Ostrich-Fern, 0.5u, 0.5u, soil: loamy, 0.5u, 0.5u,
water: 25,, 30, 5u/6, 5u/6]
o
2
[name: Cuban-Basil, Lemon-Basil, u, u, soil: loamy, sandy,0.7u, 1.3u,
water: 20,,30, u, u, sun: mild, medium, 0.8u,1.2u, expyears: 2, 3,
4, 0.6u, 1.8u, category: french, silver, wooly, 0.6u, 1.8u]
o
3
[name: Mint, u, u, soil: loamy, u, u, water: 20, u, u, sun: mild,
u, u, expyears: 2, 3, 4, 0.6u, 1.8u, category: french, silver, wooly,
0.6u, 1.8u]
Baldwin, J. F., Martin, T. P., & Pilsworth, B. W. (1995). Fril Fuzzy and
evidential reasoning in artificial intelligence. Taunton: Research Stud-
ies Press/John Wiley.
Bertino, E., & Martino, L. (1993). Object-oriented database systems: Con-
cepts and architectures. Reading, MA: Addison-Wesley.
oriented database model: Imprecision, uncertainty and fuzzy types. In
Proceedings of the First International Joint Conference of the Inter-
national Fuzzy Systems Association and the North American Fuzzy
Information Processing Society (pp. 23232328). Vancouver, Canada.
model managing vague and uncertain information. International Journal
of Intelligent Systems, 14, 623651.
Cao, T. H. (2001). Uncertain inheritance and recognition as probabilistic default
reasoning. International Journal of Intelligent Systems, 16, 781803.
Cao, T. H., & Nguyen, H. (2002). Towards fuzzy and probabilistic object bases.
In Proceedings of the Third International Conference on Intelligent
Technologies and the Third VietnamJapan Symposium on Fuzzy
Systems and Application (pp. 3541). Hanoi, Vietnam.
Cao, T. H., & Rossiter, J. M. (2003). A deductive probabilistic and fuzzy object-
oriented database language. Fuzzy Sets and Systems, 140, 129150.
Cao, T. H., Rossiter, J. M., Martin, T. P., & Baldwin, J. F. (2002). On the
implementation of Fril++ for object-oriented logic programming with uncer-
tainty and fuzziness. In Bouchon-Meunier, B. et al. (Eds.), Technologies
for Constructing Intelligent Systems, Studies in Fuzziness and Soft
Computing (vol. 90, pp. 393406). Heidelberg: Physica-Verlag.
Cross, V. V. (2003). Defining fuzzy relationships in object models: Abstraction
and interpretation. International Journal of Fuzzy Sets and Systems,
140, 527.
De Tr, G. (2001). An algebra for querying a constraint defined fuzzy and
uncertain object-oriented database model. In Proceedings of the First
International Joint Conference of the International Fuzzy Systems
Association and the North American Fuzzy Information Processing
Society (pp. 21382143). Vancouver, Canada.
Dubitzky, W., Bchner, A. G., Hughes, J. G., & Bell, D. A. (1999). Towards
concept-oriented databases. Data & Knowledge Engineering, 30, 2355.
Eiter, T., Lu, J. J., Lukasiewicz, T., & Subrahmanian, V. S. (2001). Probabilistic
object bases. ACM Transactions on Database Systems, 26, 264312.
84 Cao & Nguyen
Gaines, B. R. (1978). Fuzzy and probability uncertainty logics. Journal of
Information and Control, 38, 154169.
Garcia-Molina, H., Ullman, J. D., & Widom, J. (2000). Database system
implementation. Upper Saddle River, NJ: Prentice Hall.
George, R., Buckles, B. P., & Petry, F. E. (1993). Modelling class hierarchies
in the fuzzy object-oriented data model. Fuzzy Sets and Systems, 60, 259
272.
Itzkovich, I., & Hawkes, L. W. (1994). Fuzzy extension of inheritance hierar-
chies. Fuzzy Sets and Systems, 62, 143153.
Lakshmanan, L. V. S. et al. (1997). ProbView: A flexible probabilistic database
system. ACM Transactions on Database Systems, 22, 419469.
Meyer, B. (1997). Object-oriented software construction. Upper Saddle
River, NJ: Prentice Hall.
Nguyen, H. (2003). An algebra to handle fuzzy and probabilistic object bases.
Masters thesis, Faculty of Information Technology, Ho Chi Minh City
University of Technology.
classes. In R. De Caluwe (Ed.), Fuzzy and uncertain object-oriented
databases: Concepts and models (pp. 2161). Singapore: World Scien-
tific.
Van Gyseghem, N., & De Caluwe, R. (1997). The UFO database model: Dealing
with imperfect information. In R. De Caluwe (Ed.), Fuzzy and uncertain
object-oriented databases: Concepts and models (pp. 123185).
Singapore: World Scientific.
Yazici, A., & George, R. (1999). Fuzzy database modelling. Studies in
fuzziness and soft computing (vol. 26). Heidelberg: Physica-Verlag.
Zadeh, L. A. (1978). PRUF A meaning representation language for natural
languages. International Journal of Man-Machine Studies, 10, 395
460.
Generalization Data Mining 85
Chapter III
Generalization
Data Mining in
Fuzzy Object-Oriented
Databases
Rafal Angryk
Tulane University, USA
Roy Ladner
Naval Research Laboratory, USA
Frederick E. Petry
Tulane University & Naval Research Laboratory, USA
Abstract
In this chapter, we consider the application of generalization-based data
mining to fuzzy similarity-based object-oriented databases (OODBs).
Attribute generalization algorithms have been most commonly applied to
relational databases, and we extend these approaches. A key aspect of
generalization data mining is the use of a concept hierarchy. The objects
of the database are generalized by replacing specific attribute values by
the next higher-level term in the hierarchy. This will then eventually result
in generalizations that represent a summarization of the information in the
database. We focus on the generalization of similarity-based simple fuzzy
attributes for an OODB using approaches to the fuzzy concept hierarchy
86 Angryk, Ladner, & Petry
developed from the given similarity relation of the database. Then
consideration is given to applying this approach to complex structure-
valued data in the fuzzy OODB.
Introduction
Data mining and knowledge discovery have increasing importance as the amount
of data from various sources has rapidly increased. Awash in such volumes of
data, data mining techniques attempt to make sense of this data by formulating
information of value for decision making. This can vary from deciding on
commercial sales promotions to environmental planning to national security
decisions. Much of the current work is in the context of conventional relational
databases. In this chapter, we will discuss how to apply one valuable data mining
approach attribute-oriented generalization to a similarity-based fuzzy
OODB.
Background
In this section, we survey the general area of data mining, discuss some of the
relevant work in fuzzy data mining, and then describe the specific technique of
attribute-oriented induction for generalization, which is the focus of this chapter.
Additionally, we describe the fuzzy object-oriented model based on similarity
relationships that is the context in which we investigate data generalization.
Data Mining
Data mining or knowledge discovery generally refers to a variety of techniques
that have developed in the fields of databases, machine learning, and pattern
recognition. The intent is to uncover useful patterns and associations from large
databases.
Although we are primarily interested here in specific algorithms for knowledge
discovery, we will first review the overall process of data mining (Feelders,
Daniels, & Holsheimer, 2000). The initial steps of data mining are concerned
with preparation of data, including data cleaning intended to resolve errors and
missing data and integration of data from multiple heterogeneous sources. Next
are the steps needed to prepare for actual data mining. These include selection
of the specific data relevant to the task and transformation of this data into a
format required by the data mining approach. These steps are sometimes
considered to be those in the development of a data warehouse, i.e., an organized
format of data available for various data mining tools. There is a wide variety of
specific knowledge discovery algorithms that were developed (Han & Kamber,
2000). These discover patterns that can then be evaluated based on some
interestingness measure used to prune the huge number of available patterns.
Finally, as true for any decision aid system, an effective user interface with
visualization and alternative representations must be developed for presentation
of the discovered knowledge.
Specific data mining algorithms can be considered as belonging to two catego-
ries: descriptive and predictive data mining. In the descriptive category are class
description, association rules, and classification. Class description can provide
characterization or generalization of data or comparisons between data classes
to provide class discriminations. Data generalization is a process of grouping
data, enabling transformation of similar item sets, stored originally in a database
at the low (primitive) level, into more abstract conceptual representations. This
process is a fundamental element of attribute-oriented induction, a descriptive
database mining technique, allowing compression of the original data set into a
generalized relation, which provides concise and summarative information about
the massive set of task-relevant data.
Association rules correspond to correlations among the data items (Agrawal,
Imielinski, & Swami, 1993). They are often expressed in rule form, showing
attribute-value conditions that commonly occur at the same time in some set of
data. An association rule of the form X \Y can be interpreted as meaning that
the tuples in the database that satisfy the condition X also are likely to satisfy
Y, so that the likely implies this is not a functional dependency in the formal
database sense. Finally, a classification approach analyzes the training data (data
with known class membership) and constructs a model for each class based on
the features in the data. Commonly, the outputs generated are decision trees or
sets of classification rules. These can be used for the characterization of the
classes of existing data and to allow the classification of data in the future, and
so can also be considered predictive.
Predictive analysis is also a very developed area of data mining. One common
approach is clustering. Clustering analysis identifies the collections of data
objects that are similar to each other. The similarity metric is often a distance
function given by experts or appropriate users. A good clustering method
produces high-quality clusters to yield low intercluster similarity and high
intracluster similarity. Prediction techniques are used to predict possible missing
data values or distributions of values of some attributes in a set of objects. First,
one must find the set of attributes relevant to the attribute of interest and then
predict a distribution of values based on the set of data similar to the selected
objects. A large variety of techniques is used, including regression analysis,
correlation analysis, genetic algorithms, and neural networks, to mention a few.
Finally, a particular case of predictive analysis is time-series analysis. This
technique considers a large set of time-based data to discover regularities and
interesting characteristics. One can search for similar sequences or subse-
quences, then mine sequential patterns, periodicities, trends, and deviations.
Fuzzy Data Mining
An early and continuing significant application of fuzzy sets has been in pattern
recognition, especially fuzzy clustering algorithms (Bezdek, 1974). Hence, much
of the effort in fuzzy data mining has been made by using fuzzy clustering and
fuzzy set approaches in neural networks and genetic algorithms (Hirota &
Pedrycz, 1999). In fuzzy set theory, an important consideration is the treatment
of data from a linguistic viewpoint. From this, an approach was developed that
uses linguistically quantified propositions to summarize the content of a database
by providing a general characterization of the analyzed data (Yager, 1991;
Kacprzyk, 1999; Dubois & Prade, 2000; Feng & Dillon, 2003). Fuzzy gradual
rules for data summarization were also considered (Cubero et al., 1999). A
common organization of data for data mining is the multidimensional data cube
in data warehouse structures. Treating the data cube as a fuzzy object has
provided another approach for knowledge discovery (Laurent et al., 2000).
Fuzzy data mining for generating association rules was considered by a number
of researchers. There are approaches using the set-oriented mining (SETM)
algorithm (Shu et al., 2001) and other techniques (Bosc & Pivert, 2001), but most
have been based on the Apriori algorithm (Delgado et al., 2003). Extensions
included fuzzy set approaches to quantitative data (Zhang, 1999; Kuok et al.,
1998), hierarchies or taxonomies (Chen et al., 2000; Lee, 2001), weighted rules
(Gyenesei, 2001), and interestingness measures (de Graaf et al., 2001; Gyenesei,
2001; Au & Chan, 2003).
Generalization Data Mining
The basis of a generalization data mining approach rests on three aspects (Han
& Kamber, 2000): the set of data relevant to a given data mining task; the
expected form of knowledge to be discovered; and the background knowledge,
which usually supports the whole process of knowledge acquisition. Generaliza-
tion of data is typically performed with utilization of concept hierarchies, which
in ordinary databases are considered to be part of background knowledge, and
are indispensable for the process.
Despite the progress in research on data mining algorithms, the phase of data
generalization remains a crucial activity. The choice of data to be analyzed as
well as of the concepts for its generalization has a fundamental influence on
retrieved results, regardless of applied knowledge acquisition techniques. Al-
though certain dependencies among data can be discovered at the primitive
concept level, much stronger and often far more interesting dependencies can be
determined at a higher concept level. With data generalization executed at the
initial stage of data mining, the process of knowledge extraction can be more
effective and bring concise results directly at the abstraction level desired by a
user. Moreover, many relations occurring at the lower level may not match the
requirement of minimum support assigned by data analysts to eliminate infre-
quent regularities, whereas after summarization via generalization they may
occur often enough to have significant meaning.
The idea of using concept hierarchies for attribute-oriented induction in data
mining was investigated by several research groups (Han et al., 1992; Han, 1995;
Carter & Hamilton, 1998; Hilderman et al., 1999). Generalization of database
objects is performed on an attribute-by-attribute basis, applying a separate
concept hierarchy for each of the generalized attributes included in the relation
of task-relevant data.
The basic steps and guidelines for attribute-oriented generalization in an OODB
are summarized below (Han, Nishio, & Kawano, 1994):
1. An initial query to the fuzzy OODB with a given similarity threshold
provides the starting generalization class G
0
, which contains the set of data
that is relevant to the users generalization interest.
2. Generalization should be performed on the smallest decomposable compo-
nents (or attributes) of the data objects in each generalization class G
i
.
3. If there is a large set of distinct values for an attribute but there is no higher-
level concept provided for the attribute, the attribute should be removed in
the generalization process.
4. If there a higher-level concept exists in the concept tree for an attribute
value of an object, the substitution of the value by its higher-level concept
generalizes the object. Minimal generalization should be enforced by
ascending the tree one level at a time.
5. Two generalized objects may become similar enough to be merged (see the
next section for merging of objects in a fuzzy OODB). So we include an
added attribute, count, to keep track of how many objects were merged to
form the current generalized object. The value of the count of an object
should be carried to its generalized object, and the counts should be
accumulated when merging identical objects in generalization.
6. The generalization is controlled by providing levels that specify how far the
process should proceed. If the number of distinct values of an attribute in
the given class is larger than the generalization threshold value, further
generalization on this attribute should be performed. If the number of
objects of a generalized class is larger than the generalization threshold
value, the generalization should proceed further.
Attribute generalization should not be mistaken for simple record summarization.
Summaries of data usually have a more simplified character and tend to omit data
that do not occur originally in large quantities in order to simplify the final report.
Gradual generalization through concept hierarchies allows, in contrast, detailed
tracking of all data objects and can lead to the discovery of interesting patterns
among data at the lowest possible abstraction level of their occurrence, decreas-
ing, at the same time, the risk of omitting them due to overgeneralization. The
appropriate attribute-oriented generalization allows extraction of knowledge on
a specific abstraction level but without omitting even rare attribute values. It
might occur that such atypical values, despite being initially (at a low level of the
generalization hierarchy) infrequent, can sum up to impressive cardinalities
when generalized to an efficiently high abstraction level, which can then
sometimes strongly influence the suspected proportions among the original data.
Depending on the approach and the intention of data analysts, generalization of
collected data can be treated as a final step of data mining (e.g., summary tables
are presented to users, allowing them to interpret overall information) or as an
introduction to further knowledge extraction (e.g., extraction of abstract asso-
ciation rules directly from the generalized data).
Fuzzy Object-Oriented Model
The OODB model and object-oriented programming languages arose out of the
necessity of dealing with the complexity of large software systems. Object-
oriented systems view the universe as consisting of objects and try to model the
interaction between objects. The object-oriented model is characterized by its
properties of abstraction, encapsulation, modularity, hierarchy, typing concurrency,
and persistence.
The object-oriented model is a natural successor to record-based models with
explicit mechanisms to overcome their disadvantages (Bertino & Martino, 1991).
The object-oriented data model (OODM) models composite objects, thereby
capturing the IS-PART-OF concept, and relationships directly. Data are orga-
nized into classes, and classes are organized into an inheritance hierarchy. This
methodology is useful in capturing similarities among classes and data and
abstracting them to higher levels.
An object is completely specified by its identity, behavior, and state. The state
of an object consists of the values of its attributes. Its behavior is specified by
the set of methods that operate on the state. An object identifier maintains the
identity of an object, thereby distinguishing it from all others. The use of object
identifiers permits three different types of object equality (Khoshafian &
Copeland, 1986):
1. Identity (=): The identity predicate corresponds to the equality of refer-
ences or pointers in conventional languages.
2. Shallow equality (se): Two objects are shallow equal if their states or
contents are identical, i.e., corresponding instance variables need not be the
same object, contents must be identical objects.
3. Deep equality (de): This ignores object identities and checks whether two
objects are instances of the same class (i.e., same structure or type) and
whether the values of the corresponding base objects are the same.
It is clear that identity is stronger than shallow equality, and shallow equality is
stronger than deep equality. If identity holds, the same can be said of shallow and
deep equality; if shallow equality holds, so does deep equality.
The most powerful aspect of an OODM is its ability to model inheritance. A class
may inherit all the methods and attributes of its superclass. When a class inherits
from one superclass, this is known as single inheritance. The situation in which
a class inherits from more than one superclass is called multiple inheritance, and
the inheritance structure forms a lattice. The classsubclass relationships form
a class hierarchy similar to a generalizationspecialization relationship. Another
hierarchy that may originate at an attribute is the class composition hierarchy
(Kim, 1989). The class composition hierarchy is distinct and orthogonal to the
class hierarchy.
A Fuzzy Class Hierarchy
In this approach (George, Buckles, & Petry, 1993), two levels of imprecision may
be represented: first, the impreciseness of object membership in class values
(fuzzy class extents); and second, the fuzziness of object attribute values. The
class composition schema is enhanced to incorporate the similarities between
object instances, and the effects of the merge operator on class memberships
were considered.
A class is characterized by structure, methods and extension so a class is a pair
C
i
= (t
i
, ext (t
i
)), where t is a type. Next, C
i
is a subclass of C
i
' (C
i
s C
i
') iff:
1. The structure of C
i
' is less equally defined (more general) in comparison to
C
i
.
2. A class possesses every method owned by its superclasses, though the
methods may be refined in the class.
A class hierarchy models class-subclass relationships and may be represented
as:
C
i
s C
i
+ 1 s ... s C
n
where C
n
represents the root (basic) class, and C
i
is the most refined (leaf) class.
Analysis of class-subclass relations indicates that they can be broadly divided
into two different types:
1. Specialization subclasses (also referred to as partial subclass or object-
oriented subclass), where the subclass is a specialization of its immediate
superclass, i.e., computer science is a specialization of engineering.
2. Subclasses that are subsets of its immediate superclass, i.e., the class of
employees is a subset subclass of the class of persons.
A fuzzy hierarchy exists whenever it is judged subjectively that a subclass or
instance is not a full member of its immediate class. Consideration of a fuzzy
representation of the class hierarchy should take into account the different
requirements and characteristics of the class-subclass relations. We associate
with a subclass a grade of membership in its immediate class C
i
s C
i+1
,
represented as
Ci
(C
i+1
). A subclass is represented now by a pair (C
i
, (C
i+1
)),
the second element of which represents the membership of C
i
in its immediate
class C
i+1
. The class hierarchy is now:
(o
i
, (C
i
)) s (C
i
, (C
i +1
)) s (C
i +1
, (C
i+2
)) s ... s (C
n
, (C
n+1
))
The nature of class-subclass relationships also depends on the type of ISA links
existing between the two. It is possible to have strong and weak ISA relationships
between a class and its subclass. In a weak ISA relationship, the membership of
a class in its superclasses is monotonically nonincreasing, while for the strong
ISA link, the membership is nondecreasing. A fuzzy hierarchy possesses the
following properties:
1. Membership of an instance/subclass in any of the superclasses in its
hierarchy is constant, monotonically nonincreasing, or monotonically
nondecreasing. If the membership is constant, the hierarchy is a subset
hierarchy; if nonincreasing, a weak ISA specialization hierarchy; and if
nondecreasing, a strong ISA specialization hierarchy
2. For a weak ISA specialization hierarchy and a strong ISA specialization
hierarchy:
Ci
(C
n
) = f (
Ci
(C
i+1
),
C
i+1
(C
i+2
),...,
Cn-1
(C
n
)).
The function f, which is application dependent, may be a product, min, max,
etc.
3. For two objects o and o' such that o, o' ext(C
i
), if o de o' or o se o', then
o
(C
i
) =
o'
(C
i
). In other words, two objects have the same membership in
a class (and all its superclasses) if they are value equal.
We prescribed a fuzzy hierarchy in which each instance/subclass is described as
a member in its immediate superclass with a degree of membership. And, we
described the membership of an instance in a class as function of the membership
of the instance in the immediate classes that lie between the instance and the
class of interest. However, this may not be possible because the hierarchies are
not always pure and mixed hierarchies are more the rule. In some applications,
it might be necessary to assume that the membership of an object (class) in its
class (superclass) is list directed.
Thus, the expression for the class hierarchy can be generalized to account for
the different types of links that can exist within an object hierarchy:
(o
i
, { (C
i
), (C
i+1
),..., (C
n
) } ) s (C
i
, { (C
i+1
), (C
i+2
),...
(C
n
)}) s ... s (C
n
, (C
n+1
))
Fuzzy Class Schema
The OODM permits data to be viewed at different levels of abstraction based
on the semantics of the data and their interrelationships. By extending the model
to incorporate fuzzy and imprecise data, we allow data in a given class to be
viewed through another layer of abstraction, this time one based on data values.
This ability of the data model to chunk information further enhances its utility. In
developing the fuzzy class schema, the merge operator is defined, which
combines two object instances of a class into a single object instance, provided
predefined level values are achieved. The merge operator at the same time
maintains the membership relationship existing between the object/class and its
class/superclass.
Assume for generality two object members of a given class C
i
with list-directed
class/superclass memberships:
o = (i, <a
k1
:i
k1
, a
k2
:i
k2
,..., a
km
:i
km
>, <
o
(C
i
),
o
(C
i+1
),...,
o
(C
n
)>)
o' = (i' , <a
k1
:i
k1
' , a
k2
:i
k2
' ,...,a
km
:i
km
' >,<
o'
(C
i
),
o'
(C
i+1
),...,
o'
(C
n
)>)
So o is a fuzzy object in C
i
if o ext(C
i
) and
o
(C
i
) takes values in the range [0,1].
Now we must consider how the data values as described by similarity relations
behave (Petry, 1996). Assume attribute a
kj
of class C
i
with a noncomposite
domain D
j
. By definition of fuzzy object, the domain of a
kj
is d
kj
D
j
. So the
similarity threshold of D
j
is:
Thresh(D
j
) = min { min
x,ydjk
[ s(x,y) ] }
where o ext(C
i
) and x, y are atomic elements.
The threshold of a composite object is undefined. A composite domain is
constituted of simple domains (at some level), each of which has a threshold
value, i.e., the threshold for a composite object is a vector. The threshold value
represents the minimum similarity of the values an object attribute may take. If
the attribute domain is strictly atomic for all objects of the class (i.e., cardinality
of a
ij
is 1), then the threshold = 1. As the threshold value ranges toward 0, larger
chunks of information are grouped together, and the information conveyed about
the particular attribute of the class decreases. A level value given a priori
determines the objects that may be combined by the set union of the respective
domains. Note that the level value may be specified via the query language with
the constraint that it may never exceed the threshold value.
Merging Objects
For object o
i
and o
i
', assume a
kj
, the domain (a
kj
) is noncomposite:
o
i
' ' = Merge(o
i
, o
i
' ) = (i'', <a
k1
:i
k1
'', a
k2
:i
k2
'',.., a
kj
:i
kj
'',.., a
km
:i
km
''>,
<
o''
(C
i
),
o''
(C
i+1
),..,
o''
(C
n
)>)
where o
kj
'' = (i
kj
'', {i
kj
, i
kj
'}) and
o
''(C
m
) = f ((C
m
), (C
m'
)) m, m = 1,...n such that
val(i
kj
), val(i
kj
') d
ij
d
ij
': min[s(val(i
kj
), val(i
kj
')) > Level(D
j
)] and
Level(D
j
) Thres(D
j
).
The merge operator permits a reorganization of the objects belonging to a class
scheme by grouping them according to the similarity of an attribute object to
another. As in the definition of threshold, the definition can be extended to
composite objects.
Two objects in an OODBMS can be nonredundant even if they are shallow
equal. By introducing fuzziness into the model, however, we weaken this
property. Two objects that are shallow equal are redundant, as are objects
exhibiting deep equality. But equality alone does not determine redundancy, and
the following is the characteristic of redundancy:
Two objects o
i
and o
i
' are redundant iff j, j = 1, 2, ..., m and Level(D
j
) given
a priori
val(i
kj
), val(i
kj
') d
ij
d
ij
': min[s(val(i
kj
), val(i
kj
')) > Level(D
j
)]
This property of redundancy (Buckles & Petry, 1982) is directly responsible for
the property of value abstraction exhibited by the fuzzy database. It also ensures
that the results of database operations are unique.
Other Fuzzy Object-Oriented Approaches
For OODBs, Zicari (1990) considered issues of incompleteness, albeit without
use of fuzzy concepts. In particular, incomplete data in an object are handled by
the introduction of explicit null values in a similar manner to the relational and
nested relational models. Several researchers have been developing fuzzy
OODB approaches and studying related issues for a number of years (de
Clauwe, 1997; Lee et al., 1999; Pasi & Yager, 1999; Bordogna et al., 2000; de
Tre et al., 2000; Marin et al., 2000; Cao, 2001; Koyuncu & Yazici, 2003; Ma,
2000, 2004). Significant applications of fuzzy object modeling are in the areas of
complex spatial data and GIS (George et al., 1992; Morris & Petry, 1998; Cross
& Firat, 2000).
Generalization in Fuzzy OODB
The starting point for all generalization approaches must be based on the most
frequently encountered attribute values single-valued nonnumeric and nu-
meric data values. We will extensively consider the issues related to generaliza-
tion for single-valued data and then show how this may extend to structured data
and class hierarchy issues.
Attribute Generalization and Concept Hierarchies
For the purpose of attribute-oriented generalization, the concept of hierarchy is
critical and in an environment of fuzzy data may lead to different interpretations
for generalization. Each concept hierarchy reflects background knowledge
about the domain to be generalized. These hierarchies should permit gradual,
similarity-based, aggregation of attribute values in the objects. Typically, a
hierarchy is built in the bottom-up manner, progressively increasing the abstrac-
tion of the generalization concepts at each new level. Creation of new concept
levels in generalization hierarchies is accompanied by an increase of the concept
abstraction and the decrease of cardinality (each higher level includes less data
descriptors, but the descriptors have more general meanings).
Hierarchical grouping (Han, 1995) was based on tree-like generalization hierar-
chies, where each of the concepts at the lower level of the generalization
hierarchy was allowed to have just one abstract concept at the level directly
above it. Fuzzy ISA hierarchies were later applied to data summarization (Lee
& Kim, 1997), allowing a single concept (attribute value) to partially belong to
more than one of the concepts placed at the next abstract level (direct abstracts).
However, this and a similar approach (Raschia & Moudaddib, 2002) lack certain
properties (exact count/vote propagation) that we find are needed in the
attribute-oriented generalization.
Because of the nature of fuzzy OODBs, we can restructure the original data (by
merging objects considered to be identical at a certain -cut level, according to
a given similarity relation) in order to begin attribute-oriented generalization from
a desired level of detail, the initial set G
0
. This approach, when removing
unnecessary detail, must be applied with caution. When merging objects
according the equivalence at the given similarity level (e.g., by using queries with
a high threshold level), we are not able to keep track of the number of original
objects to be merged to one object. This may result in a significant change of
balance among the objects in a class and lead to the erroneous (not reflecting
reality) information presented later in the form of support and confidence of the
extracted knowledge. This problem, which we refer to as a count dilemma in the
count propagation, can easily be avoided by performing extraction of initial
working class G
0
at a detailed level (i.e., = 1.0), where only identical values
are merged (e.g., Bleached and Light Blond will be unified), but no considerable
number of objects would be lost as the result of such redundancy removal.
Another issue that must be emphasized is an exact count propagation dilemma
(also derived from the principle of count propagation). When generalizing data
for data mining purposes, we have to preserve the number of objects and the
relationships between them in identical proportions at each level of generaliza-
tion. In other words, we have to assure that each object from the original class
will be counted once at each of the levels of the generalization hierarchy. This
leads to the two following properties, which must be maintained at each level of
the generalization hierarchy:
1. The set of concepts at each level of hierarchy should cover all of the
attribute values that occurred in the original database (so we are guaran-
teed not to lose the number of objects when generalizing their values).
2. Never allow any attribute value (or its abstract) to be counted more or less
than once at each level of the generalization hierarchy. (When we allow a
concept to partially belong to more than one of its direct abstracts, we have
to check each time that the sum of fractional memberships is equal to 1.0).
This aspect is especially important when we plan to apply attribute-oriented
generalization as a pre-analysis tool, to compress the initial data set to a
form more appropriate for the application of computationally complex data
mining algorithms (e.g., association rules mining).
For the purpose of further analysis, we distinguish three basic types of generali-
zation hierarchies:
1. Crisp concept hierarchy (Han, 1995; Hilderman et al., 1999): Here each
attribute variable (concept) at each level of the hierarchy can have only one
direct abstract (its direct generalization) to which it fully belongs. (There is
no consideration of the degree of relationship, e.g., {master of art, master
of science, doctorate} graduate, {freshman, sophomore, junior, senior}
undergraduate.) This is as shown in the tree in Figure 1.
2. Fuzzy concept hierarchy (Lee & Kim, 1997; Raschia & Mouaddib, 2002):
The hierarchy of concepts here reflects the degree with which one concept
belongs to its direct abstract and more than one direct abstract of a single
concept is allowed. Because of the lack of guarantee of exact count
propagation, such a hierarchy seems to be more appropriate for simplified
data summarization, or for the cases when subjective results are to be
emphasized (when we purposely want to modify the roles or influences of
certain objects). Utilization of the four popular text editors could be
generalized as follows (Lee & Kim, 1997). We denote fuzzy generalization
of concept a to its direct abstract b with membership degree c as a p b|c:
First level of abstraction: {emacs p editor| 1.0; emacs p documentation|
0.1; vi p editor| 1.0; vi p documentation| 0.3;word p documentation
| 1.0; word p spreadsheet| 0.1; wright p spreadsheet| 1.0}
Second level of hierarchy: {editor p engineering| 1.0; documentation p
engineering | 1.0; documentation p business| 1.0; spreadsheet p
engineering | 0.8; spreadsheet p business| 1.0}
Third level of hierarchy: {engineering p any | 1.0; business p any | 1.0}
3. Consistent fuzzy concept hierarchy (recently proposed in Angryk &
Petry, 2003): Each degree of membership is normalized to preserve an
exact count propagation for each object when being generalized.
Extraction of Concept Hierarchies from Similarity
Relations
Here we consider the nature of similarity relations as a mechanism for attribute-
oriented generalization. Commonly, the generalization of concepts in data mining
is based on the two types of ontological relations: (1) Part-Of (e.g., wheels, oil,
oil filter, and brake pads sold by Wal-Mart could be generalized to auto-service
items) and (2) Is-A (e.g., red, auburn, ruby, and scarlet could be described
in general as reddish colors). Part-Of emphasizes the similarity of concepts
to their abstract, while Is-A accentuates the similarity occurring between the
values from lower level of abstraction, trying then to define the descriptor fitting
its character. In practice we may find loose hybrids of these relations, because
the structure of generalization hierarchy strongly depends on the character of the
Figure 1. Crisp concept hierarchy

ANY
graduate undergraduate
freshman sophomore junior senior M.A. M.S. Ph.D.
data mining task or personal preferences of the analyst. Each of these relations
among concepts can be reflected in a similarity relation, because the user or data-
mining analyst can be allowed to modify the values in the similarity table in the
individuals user view of the database to represent the similarity between the
concepts (attribute values) in the context of interest.
The existence of a similarity relation modeled for a particular domain can lead
to the extraction of a crisp concept hierarchy, allowing attribute-oriented
generalization. Let S
be the -cut of the similarity relation S, presented in Table 1.

It can be shown (Zadeh, 1970) that if S is a similarity relation on a given domain
D
j
(which is a single attribute in our case), then (0,1] each S
creates
equivalence classes in the domain D
j
. Now, let
denote the equivalence class

partition induced on domain D
j
by S
. Clearly,
'
is a refinement of
if ' .
A nested sequence of partitions
1
,
2
,,
k
may be represented diagram-
matically in the form of a partition tree.
The nested sequence of partitions in the form of a tree has a structure identical
to the crisp concept hierarchy for data mining generalization purposes (Figure 2).
The increase of abstraction in the partition tree is denoted by decreasing values
of ; lack of abstraction during generalization (0-abstraction level at the bottom
of generalization hierarchy) complies with the 1-cut of the similarity relation
( = 1.0), and can be denoted as S
1.0
.
An advantage of attribute-oriented generalization with OODBs using similarity
relations is that such an hierarchy is implicit in the object-oriented fuzzy model
and can be extracted automatically, even by a user who has no background
knowledge about the particular domain. Experienced analysts not satisfied with
an existing similarity relation may then define their own similarity tables in user
views to better reflect their knowledge about the attribute values.
The only difference in Figure 2 from crisp concept hierarchies is their lack of
abstract concepts used as labels characterizing the sets of generalized (grouped)
concepts. In our example, we could generalize the values blond and bleached
to one common descriptor BLONDISH, auburn and red to REDDISH, and
black and dark brown to DARKISH (to maintain consistency of the naming
Table 1. Proximity table for a domain HAIR COLOR
Black
Dark
Brown
Auburn Red Blond Bleached
Black 1.0 0.8 07 0.5 0.5 0.5
Dark brown 0.8 1.0 0.7 0.7 0.5 0.5
Auburn 0.7 0.7 1.0 0.8 0.5 0.5
Red 0.7 0.7 0.8 1.0 0.5 0.5
Blond 0.5 0.5 0.5 0.5 1.0 0.8
Bleached 0.5 0.5 0.5 0.5 0.8 1.0
convention at the first level of abstraction). At the next level of the generalization
hierarchy, we can keep the concept BLONDISH, because there is no change in
its components; however, according to the taxonomy presented in Figure 2, the
concepts DARKISH and REDDISH should be generalized and should have a
new descriptor, which we call DARK to emphasize the change. A term ANY is
usually placed at the highest level of concept hierarchy, to emphasize that the
name describes all values possibly occurring in the particular domain. When
defining abstract names for generalized sets of attribute values, we need to
remember that the lower cut of the similarity relation (smaller values of )
represents a higher abstraction of generalization descriptors.
Due to the nested character of partitions as a result of -cuts of a similarity
relation, to specify a complete set of abstract descriptors it is sufficient to choose
one value of the attribute per equivalence class partition at each level of the
hierarchy, represented by in Table 2. This is sufficient to build the generaliza-
tion hierarchy in Figure 3.
Because the similarity relation can generate only a nested sequence of equiva-
lence partitions via a decrease in similarity level, we cannot extract a fuzzy
concept hierarchy from the similarity table. The disjoint character of equivalence
classes generated from the similarity relation does not allow any concept in the
Figure 2. Partition tree of domain HAIR COLOR for similarity relation
(Table 1)
Table 2. Abstract descriptors, for the generalization hierarchy in Figure 2,
where abstraction level is represented by value of
Original Attribute Value Abstraction Level Abstract Descriptor
Black 0.8 DARKISH
Red 0.8 REDDISH
Blond 0.8 BLONDISH
Black 0.7 DARK
Blond 0.7 BLONDISH
Black 0.5 ANY
AUBURN BL OND D. BROW N BL ACK RED
A
B
S
T
R
A
C
T
I
O
N

L
E
V
E
L
= 1 . 0
= 0 . 8
= 0 . 7
= 0 . 5
BL EACHED
AUBURN BL OND D. BROW N BL ACK RED BL EACHE D
hierarchy to have more than one direct abstract at every level of the generali-
zation hierarchy. A similarity table can be utilized to form a crisp generalization
hierarchy. Such an hierarchy can be successfully applied as a foundation to the
development of a fuzzy concept hierarchy. Data-mining analysts can extend the
crisp hierarchy with additional edges to represent partial membership of the
lower-level concepts in their direct abstract descriptors. Depending on the
assigned memberships, reflecting preferences of the user, they can create
consistent or inconsistent fuzzy concept hierarchies.
Utilizing Similarity Relations to Define Abstract
Concepts
A similarity relation can be interpreted in terms of fuzzy similarity classes S(x)
(Zadeh, 1970), where the membership of attribute variables in the class S(x) is
equal to the similarity level between these variables and the fuzzy similarity class.
In other words, the grade of membership of y in the fuzzy class S(x), denoted by
S(x)
(y), is xSy.
Based on this consideration, one can define abstract concepts by choosing their
basic representative attribute values (i.e., typical representative specializers)
and then using a similarity table to extract a more precise definition of such
abstract classes. For such extraction, we assume a certain level of similarity (),
which should be interpreted as a level of precision reflected in our abstract
concept definition.
Typically, the more abstract the concepts to be used in data generalization, the
less certain experts are at assigning particular lower-level concepts to them;
often, some values can be easily generalized to abstracts, but others may raise
doubts among experts. In analyzing the problem of imprecise information, it was
noted (Dubois & Prade, 1991) that each attribute has a domain (allowed values),
Figure 3. Crisp generalization hierarchy formed using Tables 1 and 2
AUBURN BLOND D.BROWN BLACK RED
A
B
S
T
R
A
C
T
I
O
N

L
E
V
E
L
1.0
0.8
0.7
0.5
BLEACHED
REDDISH BLONDISH DARKISH
DARK BLONDISH
ANY
a range (actually occurring values), and a typical range (most common values),
and we apply this classification to the generalization process. With an abstract
concept we can usually identify its typical direct specializers, the elements
clearly belonging to it (e.g., we all would probably agree here that black hair can
be generalized to the descriptor DARK with 100% accuracy). This can be
represented as a core of the fuzzy set (abstract concept). However, there are
also lower-level concepts that cannot be definitely assigned to only one of their
direct abstracts (e.g., assigning blond fully to the abstract concept LIGHT hair
is problematic because there are many people with dark blond hair). We term
such cases possible direct specializers, concepts in the group of lower-level
concepts characterized by the given abstract descriptor (fuzzy set) with mem-
bership 1. These are the support of a fuzzy set and are interpreted as the
range of the abstract concept.
Now we define each abstract concept as a set of its typical original attribute
values with the level of doubt about its other possible specializers reflected by
the value of . Then we select the fuzzy similarity class created from the -cut
of similarity relation for these predefined typical specializers and analyze if this
fits our needs. For instance, define the abstract concept LIGHT hair by the
attribute variable bleached with the level of similarity = 0.8 to spread the
range of this abstract descriptor (LIGHT is predefined as the similarity class
BLEACHED
0.8
). From the similarity relation presented in Table 1, we can derive:
LIGHT = BLEACHED
0.8
= {bleached|1.0; blond|0.8}
Of course, each of the abstract concepts can be defined by more than one typical
representative element (in such a case we may also choose an intersection
operator, as best fits our preferences). Assume the descriptor DARK to be
principally defined by the following original values of the HAIR COLOR domain:
black, d.brown, and auburn. Assuming the similarity level to be 0.7, we would
obtain:
DARK = MAX(BLACK
0.7
; D. BROWN
0.7
; AUBURN
0.7
)
={black|1.0;d.brown|1.0;auburn|1.0;red|0.8}
Using both of these abstract concepts, with assumption that only DARK and
LIGHT colors occur at the given level of HAIR COLOR generalization, we
construct the fuzzy generalization hierarchy (Figure 4).
The hierarchy in Figure 4 is called a simplified fuzzy concept hierarchy, because
the fractional memberships of low-level concepts to their abstract descriptors
make it similar to the fuzzy concept hierarchy described previously. Each of the
original attribute values belongs to only one direct abstract, creating a simplified
(crisp-hierarchy-like) structure. For instance, define an abstract concept BLACK-
ISH as the -cut from the similarity table for black at the level 0.7:
BLACKISH = BLACK
0.7
= {black|1.0; d.brown|0.8; auburn|0.7;red|0.7}
Simultaneously introduce the abstract class BROWNISH at the same -level:
BROWNISH = D.BROWN
0.7
= {black|0.8; d.brown|1.0; auburn|0.7;red|0.7}
We can derive the fuzzy concept hierarchy and even modify the generalization
model to become consistent through the normalization of derived memberships:
BLACKISH = BLACK
0.7
= {black|
8 . 1
0 . 1
;d.brown|
8 . 1
8 . 0
; auburn|
4 . 1
7 . 0
;red|
4 . 1
7 . 0
} =
{black|0.6;d.brown|0.4; auburn|0.5;red|0.5}
BROWNISH = D.BROWN
0.7
= {black|
8 . 1
8 . 0
; d.brown|
8 . 1
0 . 1
; auburn|
4 . 1
7 . 0
;red|
4 . 1
7 . 0
}=
{black|0.4;d.brown|0.6; auburn|0.5;red|0.5}
Despite the formally correct appearance, this mechanism may be inappropriate.
We characterized two new generalization concepts (BLACKISH and BROWN-
ISH) with a low level of imprecision (each had only one typical direct specializer),
simultaneously choosing a relatively high degree of abstraction ( = 0.7) when
extracting -cuts from the similarity relation. This resulted in two fuzzy similarity
Figure 4. Simplified fuzzy generalization hierarchy for the attribute HAIR
COLOR
A U B U R N B L ON D
1
.
0
0
.
8
D . B R O W N
1
.
0
B L A C K R E D
A
B
S
T
R
A
C
T
I
O
N

L
E
V
E
L
B L E A C H E D
1
.
0 1
.
0
D A R K L I G H T
A N Y
0
.
8
classes (BLACK
0.7
and D.BROWN
0.7
) that were overlapping and led to the
consistent fuzzy concept hierarchy in Figure 5 (derived through the normalization
of membership degrees). Extraction of two fuzzy classes from the similarity
table at the similarity level where they were considered to be equivalent (black
and d.brown belong to the same equivalence class partition at the similarity level
0.7), despite being formally possible, often may not be semantically meaningful.
This situation may occur when the abstract concepts are characterized incor-
rectly at the particular level of generalization (which is the case here) or the
similarity relation represents the similarity between these concepts in the
perspective not compatible with the context represented in the particular
generalization hierarchy. It makes no sense to define two or more general
concepts at a level of abstraction so high that they are interpreted as identical.
This rationale found its natural reflection in the distribution of memberships
presented in the consistent fuzzy concept hierarchy (Figure 5), where both of the
introduced abstract concepts have almost identical compositions of their direct
specializers.
Some guidelines are needed when characterizing abstract concepts via their
typical direct specializers and trying to extract their full definition (range of
possible direct specializers) using a similarity table:
1. We need to assure that the intuitively assumed value of extracts the cut
(subset) of attribute values that corresponds closely to the definition of the
abstract descriptor for which we were looking. The strategy for choosing
the most appropriate level of -cut when extracting the abstract concept
definitions arises from the guideline of minimal generalization (the minimal
concept tree ascension strategy described in the second section). Based on
this strategy, we would recommend always choosing a definition extracted
at the highest possible level of similarity (biggest ), where all predefined
typical components of the desired abstract descriptor are already embraced
(where they occur for the first time).
Figure 5. Consistent fuzzy concept hierarchy for the attribute HAIR COLOR
BLACK AUBURN
0
.
5
D.BROWN
A
B
S
T
R
A
C
T
I
O
N
L
E
V
E
L
RED
0
.
5
DARKISH
0
.
4
0
.
6

0
.
4

0
.
6

0
.
5
0
.
5
BROWNISH
2. The problem of selecting appropriate representative elements without
external knowledge about a particular attribute remains; however, it can
now be supported by the analysis of the values stored in the similarity table.
Choosing typical values and then extracting a detailed definition with all
possible components of the desired abstract concepts from the similarity
table seems to be easier than describing generalized components in detail.
3. Moreover, we should be aware that if low-level concepts, predefined as
typical components of the particular abstract descriptor, do not occur in the
common similarity class, then the contexts of the generalized descriptor and
of the similarity relation are not in agreement, and revision of the similarity
table (or the abstract) is recommended.
4. We cannot directly specify a restriction stating that all abstract concepts in
the generalization hierarchy have to be at the same level of similarity, when
extracted from the similarity relation. Moreover, definitions extracted to
the example presented in Figure 4 show that this situation is acceptable.
However, when using this approach, we should generally not put at the
same level of generalization hierarchy the abstract descriptors that overlap
with others. This can easily occur when trying to place an abstract defined
via the original concepts on the given level of the generalization hierarchy.
This abstract is already represented on that level by the generalized concept
derived from the equivalence class partition with the higher similarity level.
We have to remember that the abstract concepts derived from the similarity
relation have nested character, and placing one concept simultaneously
with the other may lead to the partial overlapping of partitions (because it
is its actual refinement), which contradicts the character of similarity
relation.
The approach described here seems to allow us to form only flat (one-level)
generalization hierarchies or to derive the generalized concepts at the first level
of abstraction in the concept hierarchy. Each abstract concept defined with this
method is a generalization of original attribute values, and therefore cannot be
placed at the higher level of the concept hierarchy. However, there is no obstacle
preventing these concepts from being further generalized.
The lack of ability to derive multilevel hierarchical structures does not prevent
this approach from being appropriate, and actually convenient, for rapid data
summarization or something we term selective attribute-oriented generaliza-
tion. To summarize the given data set, we may prefer to not perform gradual
(hierarchical) generalization but replace it with a one-level hierarchy covering a
whole domain of attribute values. Such an appropriately built flat hierarchy
would represent the majority of dependencies between the original low-level
concepts, which are to be generalized, by the propagation of fractions of counts
coming from each attribute value, instead of having to perform detailed hierar-
chical generalization.
In selective generalization, we generalize all attribute values from a specific point
of view, which is dictated by the character of the data mining task. Assume that
we are interested in association rules regarding only people who have dark hair.
Using the similarity relation, we derive the following:
DARKISH = MAX(BLACK
0.7
;D.BROWN
0.7
)
={black|1.0;d.brown|1.0;auburn|0.7;red|0.7}
This reflects the following interpretation: All people who have black or dark
brown hair are considered to have DARKISH hair, and 70% of redheads and
people with auburn hair have it in a dark shade. This is sufficient to explain the
difference between selective generalization and the application of data selection
when building the initial data-mining class G
0
. In both cases, we omit all objects
with hair; however, in the case of selective generalization, 70% of each count
represented by each object with red or auburn hair color remains. This is
obviously not equivalent to the extraction of all objects with values red and
auburn and then randomly choosing 70% of them for further generalization.
With selective generalization, we do not omit the objects but decrease their
influence to an appropriate representation of their importance for the given data-
mining problem.
We should finally point out that consistent fuzzy hierarchies are not appropriate
tools for selective attribute-oriented generalization. In this case, we do not want
to have normalization of counts values to preserve exact count dilemma, we
instead want to preserve an unbalanced relation between the objects, as this
reflects dependencies occurring in real-life data. The ordinary fuzzy hierarchies
seem to be the most appropriate for such purposes.
Although we focused on nonnumeric data in this discussion of fuzzy concept
hierarchies, the generalization of numeric attributes can be performed in a similar
manner. Of course, the numeric hierarchy can be based on similarity relation-
ships for fuzzy numbers, such as was already developed for fuzzy databases
(Buckles & Petry, 1984; Petry 1996), and used as described above for nonnumeric
data. In the case of numeric data, it is possible to analyze the data distribution
characteristics. It may then not be necessary to have predefined concept
hierarchies. For example, consider an income range study in which the incomes
can be clustered into several groups, {< 20K, 2035K, 3545K, 4550K, >50K},
based on some statistical clustering tool. Obviously, further clustering can be
done on these groupings to form a multilevel hierarchy. Linguistic terms can be
assigned to groups, {very low, low, medium, high, very high}, to provide labels
in the hierarchy for generalization. This may be a crisp hierarchy, but it is also
possible to formulate a fuzzy hierarchy by techniques such as use of fuzzy
agglomerative clustering (Yager, 2000).
Generalization of Structured Data Values and Class
Hierarchies
In general, we may have complex structured data such as set and list valued data
or data with nested structures. First let us consider an attribute that may be multi-
or set-valued. Each value in a set can first be generalized into its higher-level
concept. For example, if we have the multivalued attribute skills for an
employee, we might have the set of values: {German, Programming, Pilot}. If the
next level in a concept hierarchy were to classify skills as mental or physical, then
we would have the set {(Physical Skills, count
p
), (Mental Skills, count
m
)}, where
count
p
is the value 1, and count
m
2, but each is scaled as appropriate depending
on the type of the concept hierarchy being utilized.
For more complex data, we still base the approach on set-valued data as above.
A list-valued attribute can be generalized in the same manner as that for the list
elements, except that a generalized form of the list order must be used in the
generalization process. For structured data, we can consider that same approach
but must evaluate alternatives to structure generalization. When generalizing
individual attribute values, we may maintain the shape of the structure or provide
some generalization of the structure, such as flattening the structure or removing
low-level values and summarizing them. Recall also that we have a fuzzy class
hierarchy:
(o
i
, (C
i
)) s (C
i
, (C
i+1
)) s (C
i+1
, (C
i+2
)) s ... s (C
n
, (C
n+1
)),
so that when we generalize the object o
i
, we must account for the degree of
membership in its particular class. This can be done scaling the objects count,
o
i
.count, by the membership (C
i
) for the current class of o
i
. If the generalization
of o
i
moves up through the hierarchy, then the appropriate weighting must be
taken into account for o
i
.count.
Conclusions
We considered in detail the issues relative to concept hierarchies for attribute
generalization, as the use of a concept hierarchy is the essential component of
the generalization process. As we have seen, there are several approaches that
can be taken depending on the exact intention of the data-mining application. This
allows one to be more flexible in dealing with fuzzy objects in the similarity-based
fuzzy OODB model we described, in particular, due to the ability to create
hierarchies from the given similarity relationships for the data domains.
There are several directions that can be profitably followed in this area for
OODBs that we have not considered to date. Two of particular interest that we
are currently studying are the issues of generalization of methods and the use of
aggregation as a structuring mechanism. As an application area, the problem of
generalization of multimedia data, especially spatial data (Ladner, Petry, &
Cobb, 2003), in a fuzzy OODB is of particular interest. Also, we have been
considering the extension of fuzzy hierarchy development in a database utilizing
proximity relationships (Angryk & Petry, 2003) and plan on extending the fuzzy
OODM to accommodate generalization via proximity relations.
ACKNOWLEDGMENTS
We would like to thank the Naval Research Laboratorys Base Program,
Program Element No. 0602435N for sponsoring this research.
References
Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules
between sets of items in large databases. In Proceedings of the 1993
ACM-SIGMOD International Conference on Management of Data
(pp. 207216). New York: ACM Press.
Angryk, R., & Petry, F. (2003). Consistent fuzzy concept hierarchies for
attribute generalization. In Proceedings IASTED International Confer-
ence on Information and Knowledge Sharing (IKS 2003) (pp. 158193).
Angryk, R., & Petry, F. (2003). Data mining fuzzy databases using attribute-
oriented generalization. In Proceedings of the IEEE International Con-
ference Data Mining Workshop on Foundations and New Directions
in Data Mining (pp. 815). Melbourne, FL.
Au, W., & Chan, K. (2003). Mining fuzzy association rules in a bank-account
database. IEEE Transactions on Fuzzy Systems, 11(2), 238248.
Bertino, E., & Martino, L. (1991). Object-oriented database management
systems: Concepts and issues. IEEE Computer, 24, 6581.
Bezdek, J. (1974). Cluster validity with fuzzy sets. Journal of Cybernetics, 3,
5872.
Bordogna, G., Leporati, A., Lucarella, D., & Pasi, G. (2000). The fuzzy object-
oriented database management system. In G. Bordogna, & G. Pasi (Eds.),
Recent issues on fuzzy databases (pp. 209236). Heidelberg: Physica-
Verlag.
Bosc, P., & Pivert, O. (2001). On some fuzzy extensions of association rules. In
Proceedings of IFSA-NAFIPS 2001 (pp. 11041109). Piscataway, NJ:
IEEE Press.
Buckles, B., & Petry, F. (1982). A fuzzy representation for relational data bases.
International Journal of Fuzzy Sets and Systems, 7, 213226.
Buckles, B., & Petry, F. (1984). Extending the fuzzy database with fuzzy
numbers. Information Sciences, 34, 4555.
Cao, T. (2001). Uncertain inheritance and recognition as probabilistic default
Carter, C., & Hamilton, H. (1998). Efficient attribute-oriented generalization for
knowledge discovery from large databases. IEEE Transactions on Knowl-
edge and Data Engineering, 10(2), 193208.
Chaudhri, A., & Lommis, M. (Eds.). (1998). Object databases in practice.
New York: Prentice Hall.
Chen, G., Wei, Q., & Kerre, E. (2000). Fuzzy data mining: Discovery of fuzzy
generalized association rules. In G. Bordogna, & G. Pasi (Eds.), Recent
issues on fuzzy databases (pp. 4566). Heidelberg: Physica-Verlag.
Cross, V., & Firat, A. (2000). Fuzzy objects for geographical information
systems. International Journal of Fuzzy Sets and Systems, 113, 1936.
Cubero, J., Medina, J., Pons, O., & Vila, M. (1999). Data summarization in
relational databases through fuzzy dependencies. Information Sciences,
121(34), 233270.
de Clauwe, R. (Ed.). (1997). Fuzzy and uncertain object-oriented databases:
Concepts and models. Singapore: World Scientific.
de Graaf, J., Kosters, W., & Witteman, J. (2001). Interesting fuzzy association
rules in quantitative databases. In Principles of Data Mining and
Knowledge Discovery LNAI 2168 (pp. 140151). Heidelberg: Springer-
Verlag.
de Tre, G., de Clauwe, R., & Van der Cruyssen, B. (2000). A generalized object-
oriented database model. In G. Bordogna, & G. Pasi (Eds.), Recent issues
on fuzzy databases (pp. 155182). Heidelberg: Physica-Verlag.
Delgado, M., Marin, N., Sanchez, D., & Vila, M. (2003). Fuzzy association rules:
General model and applications. IEEE Transactions on Fuzzy Systems,
11(2), 214225.
Dubois, D., & Prade, H. (2000). Fuzzy sets in data summaries outline of a new
approach, In Proceedings of the Eighth International Conference on
Information Processing and Management of Uncertainty in Knowl-
edge-Based Systems (pp. 10351040). Madrid, Spain.
Dubois, D., Prade, H., & Rossazza, J. (1991). Vagueness, typicality and
uncertainty in class hierarchies. International Journal of Intelligent
Systems, 6, 167183.
Feelders, A., Daniels, H., & Holsheimer, M. (2000). Methodological and
practical aspects of data mining. Information and Management, 37, 271
281.
Feng, L., & Dillon, T. (2003). Using fuzzy linguistic representations to provide
explanatory semantics for data warehouses. IEEE Transactions on
Knowledge and Data Engineering, 15(1), 86102.
George, R., Buckles, B., Petry, F., & Yazici, A. (1992). Uncertainty modeling
in object-oriented geographical information systems. In 1992 Proceedings
of Conference on Database & Expert System Applications (pp. 7786).
Heidelberg: Springer-Verlag.
George, R., Buckles, B., & Petry, F. (1993). Modeling class hierarchies in the
fuzzy object-oriented data model. Int. J. of Fuzzy Sets and Systems, 60,
259272.
Gyenesei, A. (2001a). A fuzzy approach for mining quantitative association
rules. Acta Cybernetica, 15, 305320.
Gyenesei, A. (2001b). Interestingness measures for fuzzy association rules. In
Principles of data mining and knowledge discovery LNAI 2168 (pp.
152164). Heidelberg: Springer-Verlag.
Han, J. (1995). Mining knowledge at multiple concept levels. In Proceedings of
the Fourth International Conference on Information and Knowledge
Management (pp. 1924). New York: ACM Press.
Han, J., & Kamber, M. (2000). Data mining: Concepts and techniques. San
Diego, CA: Academic Press.
Han, J., Nishio, S., & Kawano, W. (1994). Knowledge discovery in object-
oriented and active databases. In F. Fuchi, & T. Yokoi (Eds.), Knowledge
building and knowledge sharing (pp. 221230). Singapore: IOS Press.
Han, J., Nishio, S., Kawano, H., & Wang, W. (1998). Generalization-based data
mining in object-oriented databases using an object-cube model. Data and
Knowledge Engineering, 25(12), 5597.
Hilderman, R., Hamilton, H., & Cercone, N. (1999). Data mining in large
databases using domain generalization graphs. Journal of Intelligent
Information Systems, 13(3), 195234.
Hirota, K., & Pedrycz, W. (1999). Fuzzy computing for data mining. In
Proceedings of the IEEE, 87, 15751599.
Kacprzyk, J. (1999). Fuzzy logic for linguistic summarization of databases. In
Proceedings of the Eighth International Conference on Fuzzy Systems
(pp. 813818). Seoul, Korea.
Kacprzyk, J., & Zadrozny, S. (2000). On combining intelligent querying and data
mining using fuzzy logic concepts. In G. Bordogna, & G. Pasi (Eds.),
Recent issues on fuzzy databases (pp. 6781). Heidelberg: Physica-
Verlag.
Khoshafian, S., & Copeland, G. (1986). Object identity. In Proceedings of the
OOPSLA 86 Conference (pp. 406416). New York: ACM Press.
Kim, W. (1989). A model of queries for object-oriented databases. In Proceed-
ings of 15
th
International Conference on Very Large Databases (pp.
4554).
Koyuncu, M., & Yazici, A. (2003). IFOOD: An intelligent fuzzy object-oriented
database architecture. IEEE Transactions Knowledge and Data Engi-
neering, 15(5), 11371154.
Kuok, C., Fu, A., & Wong, H. (1998). Mining fuzzy association rules in
databases. ACM SIGMOD Record, 27, 4146.
Ladner, R., Petry, F., & Cobb, M. (2003). Fuzzy set approaches to spatial data
mining of association rules. Transactions on GIS, 7(1), 123138.
Laurent, A., Bouchon-Meunier, B., Doucet, A., Gancarski, S., & Marasal, C.
(2000). Fuzzy data mining from multidimensional databases. Studies in
Fuzziness and Soft Computing, 54, Proceedings of ISCI (pp. 245256).
Lee, D., & Kim, M. (1997). Database summarization using fuzzy ISA hierar-
chies. IEEE Transactions On Sysems, Man, and Cybernetics Part B,
27(1), 6878.
Lee, J., Xue, N., Hsu, K., & Yang, J. (1999). Modeling imprecise requirements
with fuzzy objects. Inf. Sci., 118, 101119.
Lee, K. (2001). Mining generalized fuzzy quantitative association rules with
fuzzy generalization hierarchies. In Proceedings of IFSA-NAFIPS 2001
(pp. 29772982). Piscataway, NJ: IEEE Press.
Ma, Z., Zhang, W., Ma, W., & Chen, G. (2001). Conceptual design of fuzzy
object-oriented databases using extended entityrelationship model. Inter-
national Journal of Intelligent Systems, 16, 697711.
Ma, Z., Zhang, W., & Ma, W. (2004). Extending object-oriented databases for
fuzzy information modeling. To appear in Information Systems.
Marn, N., Vila, M., & Pons, O. (2000). Fuzzy types: A new concept of type for
managing vague structures. International Journal of Intelligent Sys-
tems, 15, 10611085.
Morris, A., Petry, F., & Cobb, M. (1998). Fuzzy object-oriented database
modeling of spatial data. In Proceedings IPMU Conference (pp. 604
611). Paris: EDK Press.
Pasi, G., & Yager, R. (1999). Calculating attribute values using inheritance
structures in fuzzy object-oriented data models. IEEE Transactions on
Systems, Man, and Cybernetics Part B, 29(4), 556564.
Petry, F. (1996). Fuzzy databases: Principles and applications. Boston, MA:
Kluwer Academic Publishers.
Raschia, G., & Mouaddib, N. (2002). SAINTETIQ: A fuzzy set-based approach
to database summarization. Fuzzy Sets and Systems, 129, 137162.
Shu, J., Tsang, E., & Yeung, D. (2001). Query fuzzy association rules in
relational databases. In Proceedings of IFSA-NAFIPS 2001 (pp. 2989
2993). Piscataway, NJ: IEEE Press:.
Yager, R. (1991). On linguistic summaries of data. In G. Piatesky-Shapiro, &
Frawley (Eds.), Knowledge discovery in databases (pp. 347363).
Boston, MA: MIT Press.
Yager, R. (2000). Intelligent control of the hierarchical agglomerative clustering
process. IEEE Transactions on Systems, Man, and Cybernetics Part
B, 30(6), 835845.
Zadeh, L. (1970). Similarity relations and fuzzy orderings. Information Sci-
ences, 3, 177200.
Zhang, W. (1999). Mining fuzzy quantitative association rules. In Proceedings
of IEEE International Conference on Tools with Artificial Intelligence
Zicari, R. (1990). Incomplete information in object-oriented databases. SIGMOD
RECORD, 19, 3340.
FRIL++ and Its Applications 113
Chapter IV
FRIL++ and
Its Applications
J. M. Rossiter
University of Bristol, UK &
Bio-Mimetic Control Research Center, The Institute of Physical and
Chemical Research (RIKEN), Japan
T. H. Cao
Ho Chi Minh City University of Technology, Vietnam
Abstract
We introduce a deductive probabilistic and fuzzy object-oriented database
language, called FRIL++, which can deal with both probability and
fuzziness. Its foundation is a logic-based probabilistic and fuzzy object-
oriented model where a class property (i.e., an attribute or a method) can
contain fuzzy set values, and uncertain class membership and property
applicability are measured by lower and upper bounds on probability.
Each uncertainly applicable property is interpreted as a default probabilistic
logic rule, which is defeasible, and probabilistic default reasoning on fuzzy
events is proposed for uncertain property inheritance and class recognition.
The design, implementation, and basic features of FRIL++ are presented.
FRIL++ can be used as both a modeling and a programming language, as
demonstrated by its applications to machine learning, user modeling, and
modeling with words herein.
114 Rossiter & Cao
Introduction
For modeling real-world problems and constructing intelligent systems, the
integration of different methodologies and techniques has been the quest and
focus of significant interdisciplinary research effort. The advantages of such a
hybrid system are that the strengths of its partners are combined and are
complementary to each others weakness.
In particular, object orientation provides a hierarchical data abstraction scheme
and an information hiding and inheritance mechanism; probabilistic/fuzzy rea-
soning provides measures and rules for representing and reasoning with uncer-
tainty and imprecision in the real world; logic programming provides a declarative
way for problem specification and well-founded semantics for formal reasoning.
However, research on combining all three modeling and computing paradigms
appears to be sporadic.
In Eiter et al. (2001), the authors developed algebra to handle object bases with
uncertainty, where conditional probabilities for an object of a class being a
member of its subclasses are given, and membership of an object to a class is
expressed by a probability value, but fuzzy values are not allowed in class
properties. Meanwhile, there have been many fuzzy object-oriented models
developed, such as those of Bordogna et al. (1999), George et al. (1993),
Itzkovich and Hawkes (1994), Rossazza et al. (1997), and Van Gyseghem and
De Caluwe (1997), but they are not deductive. Yazici and George (1999) present
a deductive fuzzy object-oriented model that, however, does not address
uncertain applicability of properties.
In Dubitzky et al. (1999), each property of a concept is assumed to have a
probability degree for it occurring in exemplars of that concept. However, the
method therein for computing a membership degree of an object to a concept,
based on matching the objects properties with the uncertainly applicable
properties of the concept, is in our view not justifiable. Also, the work does not
address the problem of how inheritance is performed under the membership and
applicability uncertainty.
Recently, Blanco et al. (2001) and De Tr (2001) sketched general models to
manage different sources of imprecision and uncertainty, including probabilistic
ones, on various levels of an object-oriented database model. However, no
foundation was laid to integrate probability theory, and fuzzy logic in case
probability was used to represent uncertainty. In Cross (2003), the author
reviewed existing proposals and presented recommendations for the application
of fuzzy set theory in a flexible generalized object model.
In this chapter, we summarize the main features of a logic-based probabilistic
and fuzzy object-oriented model where a class property can contain fuzzy sets
interpreted as families of probability distributions, and uncertain class member-
ship and property applicability are measured by lower and upper bounds on
probability. On the basis of this model, we present the development of FRIL++,
which extends FRIL (Baldwin et al., 1995) with object-oriented features, as a
modeling and programming language for probabilistic and fuzzy object-oriented
deductive databases and knowledge bases, in the same way as predicate logic
programming languages (e.g., Datalog) have been used for classical deductive
databases and knowledge bases. Various applications of FRIL++ are then
demonstrated.
The next section presents the logic-based probabilistic and fuzzy object-oriented
model. In the following section, we introduce probabilistic default reasoning and
its application to fuzzy events as a suitable approach to uncertain property
inheritance and class recognition. We then present our solutions for uncertain
inheritance of attributes, uncertain inheritance of methods, and uncertain recog-
nition of classes. Subsequent sections present the implementation and the basic
features of FRIL++. In the final two sections, we present our application of
FRIL++ to machine learning, user modeling, and modeling with words. Finally,
we conclude the chapter and suggest future research.
Probabilistic and Fuzzy Object-Oriented
Model
As in the classical object-oriented model, a class is represented by a finite set
of properties. A property is either an attribute or a method. The model we are
introducing is logic-based, and attributes and methods are represented by Horn-
like clauses.
In the classical object-oriented model, each object is certainly a member of a
class, and all properties of a class certainly apply to its objects. However, in the
real world, such membership and applicability can be uncertain. Moreover,
attribute values can be more imprecise than ones expressible by intervals.
Arguing for flexible modeling, Van Gyseghem and De Caluwe (1997) introduced
the notion of fuzzy property as an intermediate between the two extreme notions
of required property and optional property. Each fuzzy property of a class is
associated with possibility degrees of applicability of the property to the class.
Recently, Dubitzky et al. (1999) addressed the issue by contrasting the prototype
concept model with the classical model. A severe defect of the classical concept
model is noted by the fact that there is no commonly agreed set of defining (i.e.,
necessary and sufficient) properties for many natural, scientific, artificial, and
116 Rossiter & Cao
ontological concepts. Rather, each property of a concept is assumed to have a
probability degree for it occurring in exemplars of that concept.
Here, we propose uncertain class membership and property applicability to be
represented by support pairs defining probability lower and upper bounds, as in
FRIL (Baldwin et al., 1995), a logic programming language that handles both
probability and fuzziness. Specifically, in this probabilistic and fuzzy object-
oriented model, each attribute in a class C has the following form:
[l, u]
where is a fuzzy atom, that is, a predicate with argument values that can be
fuzzy sets, and l, u [0, 1] are interpreted as l Pr( | C) u. We assume that
Pr( | C) is unknown.
Similarly, each method has the following form:
[l
1
, u
1
] [l
2
, u
2
]
where is a fuzzy atom, is a conjunction of fuzzy atoms, and l
1
, u
1
, l
2
, u
2
[0, 1]
are interpreted as l
1
Pr( | , C) u
1
and l
2
Pr( | , C) u
2
. We also
assume that Pr( | , C) and Pr( | , C) are unknown.
For a class hierarchy in a probabilistic and fuzzy object-oriented system, we
assume that a class is totally subsumed by any of its superclasses, or, in other
words, a class totally subsumes any of its subclasses. This assumption is
discussed in detail in Cao (2001).
The totally subsuming subclass relation imposes a constraint on membership
degrees of an object to classes as stated in the following assumptions (Cao &
Creasy, 2000):
1. If an object is a member of a class with some positive characteristic
degree, then it is a member of any superclass of that class with the same
degree.
2. If an object is a member of a class with some negative characteristic
degree, then it is a member of any subclass of that class with the same
degree.
As a consequence of this subsumption assumption, if an object is a member of
a class with a support pair [l, u], then it is a member of any superclass of that class
with the support pair [l, 1], and a member of any subclass of that class with the
support pair [0, u]. This is in agreement with Rossazza et al. (1997), for instance,
who stated that the membership degree of an object to a class is at least equal
to its membership degree to a subclass of that class. In fact, if C
1
is a subclass
of C
2
, then Pr(C
1
) Pr(C
2
).
Probabilistic Default Reasoning on
Fuzzy Events
A well-known fundamental problem in object-oriented modeling is one of
multiple inheritance, that is, how to combine the same property inherited from
different classes. For example, one can have property fly[.9, .95] in a class BIRD,
expressing that 90% to 95% of birds can fly. At the same time, one can also have
fly[0, .05] in a class PENGUIN, expressing that at most 5% of penguins can fly.
Given PENGUIN being a subclass of BIRD, the problem is that a penguin has two
support pairs for its property fly, namely, [.9, .95] from BIRD and [0, .05] from
PENGUIN, which are inconsistent with each other as [.9, .95][0, .05] = []. One
may say that, in this case, [0, .05] overrides [.9, .95], but such a simple solution
is not adequate. For instance, how would we deal with the case when an object
is not certainly a penguin or such support pairs are from classes without being
a subclass to one another? For the general case, there would be two following
extreme solutions to the problem. The most pessimistic one is to assume that no
given Pr(p | C) is applicable to a specific object, and Pr(p | C, E) where E is a
set of evidences, has to be used instead. The most optimistic one is to assume
that any Pr(p | C) remains valid when applied to a specific object, and thus
multiple answers are combined by conjunction.
The drawback of the most pessimistic solution is that, in general, we have no
knowledge of Pr(p | C, E). For instance, to obtain it from Pr(p | C) using the total
probability theorem Pr(p | C) = Pr(p | C, E) Pr(E | C) + Pr(p | C, E)
Pr(E | C), one must know at least Pr(E | C). Meanwhile, the drawback of the
most optimistic solution is that it often leads to inconsistency.
Between these two extreme approaches there is one of default reasoning
(Geffner & Pearl, 1992). The basic idea of default reasoning is to consider a set
of rules as defeasible, so that when they are inconsistent with particular
evidences, only selective consistent subsets of the set are used for inference. A
selection of consistent subsets relies on a priority ordering among the default
rules and a preference ordering, based on that priority ordering, among the
subsets of the set.
An early work of the default reasoning approach to inheritance and recognition
is that of Shastri (1989), which is based on the principle of maximum entropy for
resolving conflicting information. The work, however, has the shortcomings that
118 Rossiter & Cao
inheritance is performed only with certain membership of objects to classes, and
recognition just selects a class that is considered as best matched with an object
rather than provides different membership degrees of the object to different
classes. Also, only class attributes, not class methods, are considered therein.
Recently, Lukasiewicz (2000) extended classical default reasoning to probabi-
listic default reasoning and showed that the latter is intractable in the general
case. The computational complexity is mainly due to checking consistency and
performing global inference on a probabilistic knowledge base. So, in applying
that framework to uncertain inheritance and recognition for probabilistic and
fuzzy object-oriented systems, we propose an approximation using Jeffreys rule
(Jeffrey, 1965) and its inverse for a weaker notion of consistency and for local
inference, in order to reduce the computational complexity.
A probabilistic default theory is defined to be a pair (T, D), where T is a set of
formulas to be always satisfied, and D is a set of defaults. Each formula in T or
D has the form ( | )[l, u], expressing l Pr( | ) u. When = true, we simply
write [l, u], and when l = u = 1, we may write ( | ) only.
The main characteristics of a default reasoning system are its priority ordering
and preference ordering. A priority ordering, denoted by p, is an irreflexive
and transitive binary relation on D. Given a model M, let D
M
= {d D | M satisfies d}.
A model M is said to be preferred to a model M* iff (i.e., if and only if)
D
M*
D
M
and d* D
M*
\D
M
d D
M
\D
M*
: d* p d. A model M is called a
preferred model iff there is no model being preferred to M.
A subset D* of D is said to be in conflict with a default ( | )[l, u] iff T (D*
{( | )[l, u]}) {} is inconsistent. A priority ordering is said to be
admissible iff every subset D* D in conflict with a default d D contains d*
such that d* p d. A formula F is a default consequence of an evidence set E
iff, for every admissible priority ordering, every preferred model of T E is
a model of F. The admissibility of a priority ordering is to guarantee that if
( | )[l, u] D and E = {} (i.e., only is known), then ( | )[l, u] is a
default consequence of E.
As proven in Cao (2001), this definition of default consequence can be
equivalently restated in terms of preferred default subsets instead of preferred
models. For every D
1
, D
2
D, D
1
is said to be preferred to D
2
iff T D
1
E is
consistent, and 1., T D
2
E is inconsistent, or 2., D
1
D
2
and d D
2
\D
1
d*
D
1
\D
2
: d p d*.
For every D
1
D, D
1
is called a preferred default subset of D iff there is no
D
2
D being preferred to D
1
; in particular, D
1
is preferred to D
2
if D
2
D
1
and
T D
1
E is consistent. Then a formula F is a default consequence of E iff, for
every admissible priority ordering and every preferred default subset D* of D,
F is a logical consequence of T D* E.
The practical significance of this later definition is that one needs to consider only
the preferred default subsets and deduction on them in order to obtain default
consequences. Specifically, if P
1
, P
2
, ..., P
n
are all the preferred default subsets
of D and, for every i from 1 to n, F
i
is a logical consequence of T P
i
E, then
F
1
F
2
... F
n
is a default consequence of E. In particular, with F
i
[l
i
, u
i
]
for every i from 1 to n, one has [l, u] is a default consequence of E where
[l, u] =
i=1,n
[l
i
, u
i
], that is, l = min
i=1,n
{l
i
} and u = max
i=1,n
{u
i
}.
For fuzzy events characterized by fuzzy sets, in this work, we apply the voting
model interpretation of fuzzy sets (Baldwin et al., 1995; Gaines, 1978), whereby,
given a fuzzy set A on a domain U, each voter has a subset of U as his or her own
crisp definition of the concept that A represents. The membership function value
A
(u) is then the proportion of voters whose crisp definitions include u. As such,
A defines a probability distribution on the power set of U across the voters, and
thus a fuzzy proposition x is A defines a family of probability distributions of the
variable x on U. Fuzzy events are said to be consistent with each other iff the
intersection of their characterizing fuzzy sets is a normal fuzzy set (i.e., one with
a maximal membership function value of 1). Baldwin et al. (1995, 1996) describe
the conditioning operations over fuzzy sets and the tractable calculation of the
expected fuzzy set used in this default reasoning framework.
Property Inheritance and Class
Recognition
In the classical object-oriented model, without exceptions, a class fully inherits
all the properties of its superclasses, and thus, an object certainly has all
properties of the classes to which it is a member. In the uncertain object-oriented
model, due to uncertain applicability of a property and uncertain membership of
an object to a class, inheritance naturally becomes uncertain. The problem of
uncertain inheritance can be formalized in the framework of default reasoning
as follows. For a particular attribute named , suppose that there are n classes
C
1
, C
2
, ..., C
n
with attributes (A
1
)[l
1
, u
1
], (A
2
)[l
2
, u
2
], ..., (A
n
)[l
n
, u
n
],
respectively, where A
1
, A
2
, ..., A
n
are fuzzy sets on the same domain. Then one
has the default theory (T, D) where T = {(C
i
| C
j
) | C
j
is a subclass of C
i
, 1 i,
j n} and D = {((A
i
) | C
i
)[l
i
, u
i
] | 1 i n}. Also, suppose an evidence set
E = {C
i
[
i
,
i
] | 1 i n}{(A
0
)[l
0
, u
0
]} where each [
i
,
i
] is a support for
an object of discourse O being a member of C
i
, while (A
0
)[l
0
, u
0
] is a prior
attribute given to O.
The problem is to derive A such that (A) being applicable to O is a default
consequence of E. We assume that E is consistent with T, whereby, if C
i
[
i
,
i
],
120 Rossiter & Cao
C
j
[
j
,
j
] E and (C
i
| C
j
) T, then
j

i
, in accordance with the constraint
Pr(C
j
) Pr(C
i
) mentioned previously. As presented in the preceding section,
default reasoning with respect to a default theory comprises the following main
steps:
1. Determine admissible priority orderings on the set of defaults.
2. For each admissible priority ordering, compute preferred default subsets.
3. For each preferred default subset, derive a logical consequence.
As shown in Lukasiewicz (2000), all three steps are intractable in the probabi-
listic case. The computational complexity is mainly due to checking consistency
and performing global inference on a probabilistic knowledge base. In applying
that framework to uncertain inheritance for the uncertain object-oriented model,
we propose an approximation for default consequences correspondingly as
follows:
1. Consider only one priority ordering based on the class specificity ordering.
2. Use a weaker notion of consistency for computing preferred default
subsets.
3. Apply local inference using Jeffreys rule for deriving logical conse-
quences. Details are explained below.
Let D be partitioned into D
0
, D
1
, ..., D
k
such that, for every i and j from 1 to n,
if C
j
is a subclass of C
i
, ((A
i
) | C
i
)[l
i
, u
i
] D
s
and ((A
j
) | C
j
)[l
j
, u
j
] D
t
, then
s < t. Intuitively, D
0
comprises the defaults for of the classes that are not
subclasses of any other; D
1
comprises the defaults for of the classes that are
the immediate subclasses of those classes; and so on. The priority ordering p is
then defined such that d p d* iff d D
s
, d* D
t
, and s < t.
For every i from 1 to n, Jeffreys rule gives:
Pr((A
i
)) = Pr((A
i
) | C
i
). Pr(C
i
) + Pr((A
i
) | C
i
). Pr(C
i
)
with l
i
Pr((A
i
) | C
i
) u
i
,
i
Pr(C
i
)
i
, and Pr(C
i
) = 1 - Pr(C
i
). On the
assumption that only 0 Pr((A
i
) | C
i
) 1 is known, one obtains:
l
i
.
i
Pr((A
i
)) u
i
.
i
+ (1 -
i
)
That is, O inherits (A
i
)[l
i
.
i
, u
i
.
i
+ (1-
i
)] from each C
i
, which can be transformed
into (B
i
), where B
i
is the expected fuzzy set of A
i
[l
i
.
i
, u
i
.
i
+ (1 -
i
)]. We note
that, in general, lower and upper bounds of Pr((A
i
)) also depend on
i
, but not
in this case when Pr((A
i
) | C
i
) is unknown.
Let B
0
be the expected fuzzy set of A
0
[l
0
, u
0
], and, for every i from 1 to n, A
i
*

=
B
i
B
0
. Our notion of weak consistency is now introduced as follows. Let D* be
a subset of D. Without loss of generality, assume that D* = {((A
i
) | C
i
)[l
i
, u
i
]
| 1 i m n}. Then T D* E is said to be w-consistent wrt (i.e., with respect
to) iff
i=1,m
A
i
* is a normal fuzzy set. For computing the preferred default
subsets of D, instead of considering the subsets of D that are consistent with T
and E, one now considers those that are w-consistent wrt with T and E.
As such, the preferred default subsets of D can be obtained in the two following
steps:
1. Find the largest (wrt ) consistent subsets of {A
i
* | 1 i n}, the
intersection of the fuzzy sets in each of which is a normal fuzzy set.
2. Compare those consistent subsets to select the ones that none of the others
is preferred to, based on the priority ordering on D defined above.
The multiple-inherited attribute (A) for O is then with A being the union of those
intersection fuzzy sets obtained from the preferred default subsets.
The reason for taking only the largest consistent subsets in Step 1 is that, as noted
previously, a consistent set of defaults is always preferred to its proper subsets.
For this step, we employ the algorithm in Dubois et al. (2000), which has the
computational complexity O(n
2
), and shows that the maximal number of the
consistent subsets is n. For the second step, as shown in Cao (2001), each
comparison takes time proportional to the sizes of the two involved subsets, while
the number of the comparisons is of the square order of the number of the
consistent subsets. Because the maximal size and the maximal number of the
consistent subsets are n, the computational complexity of this step is O(n
3
). Thus,
the overall computational complexity of the above multiple inheritance procedure
is O(n
3
).
The proposal for uncertain inheritance of attributes presented above can be
extended for uncertain inheritance of methods as follows. Let C
1
, C
2
, ..., C
n
be
the classes that contain methods with heads that are the same y. For each i from
1 to n, let the set of those methods in C
i
be {(A
iq
)
iq
[l
iq1
, u
iq1
][l
iq2
, u
iq2
]
| 1 q m
i
}, and denote
q=1,mi
{((A
iq
) |
iq
, C
i
)[l
iq1
, u
iq1
], ((A
iq
) |
iq
, C
i
)[l
iq2
,
u
iq2
]} by S
i
.
We now consider each S
i
as an elementary default. Then one has the default
theory (T, D), where T = {(C
i
| C
j
) | C
j
is a subclass of C
i
, 1 i, j n}, and
D = {S
i
| 1 i n}. Also, suppose an evidence set E = {C
i
[
i
,
i
] | 1 i n}S
0
,
where S
0
=
q=1,m0
{((A
0q
) |
0q
)[l
0q1
, u
0q1
], ((A
0q
) |
0q
)[l
0q2
, u
0q2
]}. Here
122 Rossiter & Cao
each [
i
,
i
] is a support for an object of discourse O being a member of C
i
, while
S
0
gives prior methods to O.
For a priority ordering p, D is also partitioned into D
0
, D
1
, ..., D
k
in a similar way
as in the case of uncertain inheritance of attributes. That is, for every i and j from
1 to n, if C
j
is a subclass of C
i
, S
i
D
s
, and S
j
D
t
, then s < t; S p S* iff S D
s
,
S* D
t
, and s < t.
Suppose that (A) [l
1
, u
1
] [l
2
, u
2
] is a method in class C and [, ] is a support
pair for an object of discourse O being a member of C. Jeffreys rule gives:
Pr((A)) = Pr((A) | , C).Pr(, C) + Pr((A) | , C).Pr(,

C) +
Pr((A) | , C).Pr(, C) + Pr((A) | , C).Pr(, C)
And, one obtains the lower bound x and

the upper bound y for Pr((A)) as proved
in Cao (2001) as follows:
1. x = max{l
2
., l
1
. - (l
1
- l
2
).(1 - Pr()
min
)} if l
2
l
1
, or
x = max{l
1
., l
2
. - (l
2
- l
1
).Pr()
max
} otherwise.
2. y = 1 - max{(1 - u
1
)., (1 - u
2
). - (u
1
- u
2
).Pr()
max
)} if u
2
u
1
, or
y = 1 - max{(1 - u
2
)., (1- u
1
). - (u
2
- u
1
).(1 - Pr()
min
)} otherwise.
Then the combination of (A) obtained from different methods in different
classes can also be carried out as a multiple inheritance of an attribute, as
presented previously.
For the computation of Pr()
min
and Pr()
max
in the above expressions, suppose
that is a conjunction of
1
,
2
, ...,
k
. One has:
Pr()
min
= max{0, Pr(
1
) + Pr(
2
) + ... + Pr(
k
) - (k - 1)}
Pr()
max
= min{Pr(
1
), Pr(
2
), ..., Pr(
k
)}
For every i from 1 to k, suppose that
i
=
i
(A
i
) is in and
i
(B
i
) is the final
multiple-inherited attribute for O with respect to
i
, where A
i
and B
i
are fuzzy
sets on the same domain. Then lower and upper bounds of Pr(
i
), from which
Pr()
min
and Pr()
max
can be evaluated, are the lower and upper bounds of the
conditional probability Pr(A
i
| B
i
) as introduced previously.
The uncertain recognition problem can be regarded as the inverse of the
uncertain inheritance problem. It can be stated as follows: given an object having
a set of properties associated with support pairs, derive support pairs for that
object being members of the classes having that set of properties. Default
reasoning can also be applied to combine the derived support pairs to be
consistent with the subclass relation between the classes. In this case, we
consider only uncertain recognition based on attributes. Cao (2001) described
uncertain class recognition within the proposed default reasoning framework.
Implementation of FRIL++
The probabilistic and fuzzy object-oriented model presented above provides a
formal basis for the design and implementation of FRIL++ (Baldwin et al., 2000;
Cao et al., 2002; Cao et al., 2001; Rossiter et al., 2000), the object-oriented
extension of FRIL (Baldwin et al., 1995), a PROLOG-like logic programming
language dealing with both probability and fuzziness. Like any other object-
oriented system, a FRIL++ system is associated with a class hierarchy. Besides
particular classes for the domain of the system, there is a special class, namely,
FRIL++, which is common to all FRIL++ systems. The class FRIL++ is at the
top of a class hierarchy, containing all FRIL++ built-in predicates, which can be
inherited by all classes in a FRIL++ system.
As in Moss (1994), objects are also treated as classes situating at the bottom of
a FRIL++ class hierarchy, so that they can have their own properties, which may
not be defined in any class. The reason for this is that in reality, a class can
describe only a finite set of common properties of a group of objects, which may
have other properties. Furthermore, in FRIL++, objects can be changed not only
in the values of its properties, but also in its properties themselves, i.e., being
added or deleted, as happens in the real world.
In McCabe (1992), object-oriented logic programs were translated into normal
logic programs of a logic programming system, such as Prolog, to be executed
by the theorem prover of the system. In order to employ FRILs probabilistic and
fuzzy theorem prover, we follow this approach in the implementation of FRIL++
by writing a compiler, using FRIL to translate a FRIL++ source program into a
FRIL target program to be executed by FRIL.
Following McCabe (1992), the execution of an object-oriented logic program is
considered as having two phases, namely, the label phase and the body phase.
In the label phase, the system determines the actual classes with definitions for
the currently called property that are to be executed. Then, once those classes
are determined, the system enters the body phase to execute the property as
defined in the bodies of the classes.
Corresponding to these label phase and body phase are label clauses and body
clauses of a target program, which is a normal logic program, translated from an
124 Rossiter & Cao
object-oriented logic program. The label clauses perform inheritance, providing
entry points to the definitions that are to be executed for a property call.
Meanwhile, the body clauses are the translation of definitions of class properties.
However, due to uncertain class membership and uncertain property applicabil-
ity, there are important differences between the classical object-oriented model
and the uncertain one, which is out of the scope of McCabe (1992):
1. In the classical model, an object as an instance of a class inherits properties
only from that class or its superclasses. Whereas, because in the uncertain
model an object can be a partial member of a class, it can partially (i.e., with
uncertainty degrees) inherit properties from any class.
2. In the classical model, a property of a class is fully applicable to every object
of the class. Whereas, in the uncertain model, that applicability can be
uncertain, and, moreover, an associated uncertainty degree is not determin-
able at translation time if the membership degree of an object to a class can
change at run time.
They make a difference between the translation of an uncertain object-oriented
logic program and that of a classical one, for both of the label clauses and the
body clauses.
In the classical object-oriented model, with overriding inheritance, an object or
a class does not inherit properties from its superclasses if they have their own
definitions of those properties. In the uncertain case, from the point of view of
default reasoning, the properties from the superclasses could still be inherited as
long as they are inconsistent with those defined in the object or the class.
On the one hand, for a FRIL++ program to behave in the same way as a classical
object-oriented program when there is no uncertainty involved, we adopt
overriding inheritance as a default. That is, a property (possibly associated with
a support pair) in an object or a class is assumed to override properties of the
same names in their superclasses. On the other hand, we provide FRIL++ with
built-in predicates for combining multiple-inherited properties in a user-defined
way, including the default reasoning one presented above.
In the uncertain case, the uncertain membership of an object to a class raises a
new issue regarding overriding inheritance. Specifically, if an object is not a full
member of a class, then the question is whether a property that the object inherits
from the class would override properties of the same name in superclasses of the
class. In FRIL++, we assume that overriding inheritance is effective only with
the full membership, i.e., with the support pair (1 1).
As such, if the membership degree of an object to a class can change at run time,
overriding inheritance is not determinable at translation time. Therefore, in order
to gain execution efficiency at run time, we distinguish static objects from
dynamic objects, so that membership degrees of the former to classes cannot be
changed after they are created. Thus, overriding inheritance can be determined
at translation time.
Basic Features of FRIL++
FRILs syntax is Lisp-like, with list as the primary data and program structure.
A FRIL atom, i.e., a predicate, has the following form:
(predicate-name arg1 arg2 ... argN)
where values of arg1, arg2, ... and argN can be fuzzy sets. The form of a FRIL
clause is as follows:
(h-atom b-atom1 b-atom2 ... b-atomN) : supp
where h-atom is the head and b-atom1, b-atom2, ..., b-atomN are the body of
the clause. Meanwhile, supp is either (l
1
u
1
) or ((l
1
u
1
) (l
2
u
2
)) representing
support pairs for the clause; the default values of (l
1
u
1
) and (l
2
u
2
) are (1 1) and
(0 1), respectively.
A FRIL++ program, which contains class definitions and logical clauses, has the
same list format as a FRIL program. A FRIL++ class definition contains the
following sections: superclass declaration, constant declaration, part declara-
tion, and property definition. The superclass section declares the immediate
superclasses of the class. The constant section declares the constant labels and
their values associated with the class, which can be inherited or overridden as
can class properties. The part section declares the identifiers and classes of the
objects to be included as parts of an instance of the class. The property section
defines the properties (i.e., attributes and methods) of the class, which are
represented by logical clauses, as presented previously. Constants, parts, and
properties can have either one of the visibility modifiers public, protected, or
private as in C++ (Stroustrup, 1997), with public as the default. For an example,
we use the following simple class hierarchy:

TALLMAN
TALLNOTSLIMMAN TALLNOTFATMAN
PERSON
126 Rossiter & Cao
The definitions of the classes are written in the following FRIL++ program:
((public class Person extends (Universal))
(constants
(tall [0:0 1.5:0 1.8:1 2.5:1] )
(notSlim [0:1 16:1 22:0 28:1 45:1])
(notFat [0:1 22:1 28:0 45:0])
(properties
((height _ ))
((weight _ ))
((bodyMassIndex B)
(height H)
(times H H H2)
(weight W)
(times B H2 W))
((Person H W)
(setprop ((height H)) )
(setprop ((weight W)) )) ))
((public class TallMan extends (Person))
(properties
((handsome)) : (.9 1)
((isa TallMan)
(height H)
(match tall H)) ))
((public class TallNotSlimMan extends (TallMan))
(properties
((handsome)) : (0 .5)
((isa TallNotSlimMan)
(isa TallMan)
(bodyMassIndex B)
(match notSlim B)) ))
((public class TallNotFatMan extends (TallMan))
(properties
((isa TallNotFatMan)
(isa TallMan)
(bodyMassIndex B)
(match notFat B)) ))
((public class MainClass extends (Universal))
(properties
((main)
(new John ((Person 1.75 70)) )
(qs ((John.handsome)) )
(new Bill ((Person 1.75 85)) )
(qs ((Bill.handsome)) )) )).
In the class PERSON, the constant section declares the fuzzy sets that define the
linguistic labels tall, notSlim, and notFat. Here, [0:0 1.5:0 1.8:1 2.5:1] represents
the fuzzy set on [0, 2.5], with a membership function that takes value 0 on [0, 1.5],
value 1 on [1.8, 2.5], and is linearly increasing on [1.5, 1.8]; [0:1 16:1 22:0 28:1
45:1] represents the fuzzy set on [0, 45] with a membership function that takes
value 1 on [0, 16] and [28, 45], is linearly decreasing on [16, 22], and linearly
increasing on [22, 28]; [0:1 22:1 28:0 45:0] represents the fuzzy set on [0, 45] with
a membership function that takes value 1 on [0, 22], value 0 on [28, 45], and is
linearly decreasing on [22, 28]. The property bodyMassIndex defines the body
mass index of a person given his or her height and weight. The property person
is a constructor for initializing properties of a new object of the class PERSON.
The properties isa in the classes TALLMAN, TALLNOTSLIMMAN, and
TALLNOTFATMAN are methods for computing support pairs for an object that is
a member of these classes. There, match is a FRIL++ built-in predicate that
computes the conditional probability of its first argument given its second
argument. We note that in FRIL++, isa properties are placed in respective
classes just for better readability of a program. Logically, however, they belong
to the universal class, as mentioned previously, to which every object has full
membership (1 1).
The property ((handsome)) : (.9 1) in the class TALLMAN expresses that At least
90% of tall men are handsome. Meanwhile, ((handsome)) : (0 .5) in the class
TALLNOTSLIMMAN expresses that At most 50% of men who are tall and not slim
are not handsome.
128 Rossiter & Cao
The property main in the class MAINCLASS provides the entry point for executing
a FRIL++ program. In this example, John is created as a person of height 1.75
and weight 70, and a support pair for him being a handsome man is computed by
the FRIL++ built-in support query qs. Similarly, Bill is created as a person of
height 1.75 and weight 85, and a support pair for him being a handsome man is
computed.
As such, John is a member of TALLMAN and TALLNOTSLIMMAN with the support
pairs [.833, 1] and [.119, 1], respectively, and thus inherits handsome[.75, 1]
from TALLMAN and handsome[0, .941] from TALLNOTSLIMMAN. So the support
pair for John being handsome is [.75, 1] [0, .941] = [.75, .941].
Meanwhile, Bill is a member of TALLMAN and TALLNOTSLIMMAN with the support
pairs [.833, 1] and [.799, 1], respectively, and thus inherits handsome[.75, 1]
from TALLMAN and handsome[0, .601] from TALLNOTSLIMMAN. In this case,
because [.75, 1] [0, .601] = [] and ((handsome)) : (0 .5) in TALLNOTSLIMMAN
is assumed to have a higher priority than ((handsome)) : (.9 1) in TALLMAN, the
support pair for Bill being handsome is [0, .601], using default reasoning.
FRIL++ for Machine Learning
Machine learning has become an important area of artificial intelligence, which
allows computers to acquire knowledge automatically or semiautomatically, i.e.,
to learn from experience, in order to do right things for a particular task. Fuzzy
set theory and fuzzy logic have been applied in this area, using soft partitions
defined by fuzzy sets on attribute domains, enhancing the acquired knowledge
transparency and the performance of existing machine-learning algorithms that
use crisp partitions (e.g., Baldwin et al., 1998). Briefly explaining, the better
transparency is due to the use of linguistic labels for partitions, while the better
performance is due to the tolerance of soft partitions in learning processes.
However, as shown by theoretical results, there is no best learning algorithm for
all tasks. That was the motivation of Kohavi et al. (1996) when developing
MLC++ to help choose appropriate algorithms for a particular task, by comparing
different ones and creating new algorithms, and especially by combining existing
ones. It exploits the advantages of the object-oriented methodology, which are
information encapsulation and hiding organized in class hierarchies, using C++
to build a library of different components of a machine-learning system. In
particular, learning algorithms are categorized into classes to be compared or
combined with each other. Inspired by that work, we used FRIL++ to develop
a similar system for fuzzy machine learning.
In Mitchell (1997), machine learning is described as a process of constructing
computer programs from training experience. Using knowledge as an umbrella
term, we view such computer programs and experience as knowledge repre-
sented in different forms. A machine learner is then a kind of knowledge
processor, namely, inducer, that induces knowledge in a high-level form, such
as in rule bases from knowledge in a low-level form such as relational data tables.
A tester of machine learning is another kind of knowledge processor that
operates on the knowledge induced by a machine learner to evaluate its learning
performance.
Therefore, a machine-learning process can be viewed as involving objects of
three classes KNOWLEDGEBASE, INDUCER, and TESTER with their main
properties as illustrated in Figure 1. The class KNOWLEDGEBASE can be divided
into subclasses as depicted in Figure 2. Meanwhile, INDUCER can be placed in the
hierarchy of knowledge processor classes in Figure 3, with different subclasses
of fuzzy logic-based, Bayesian network-based, neural network-based, and
support vector machine-based machine learning algorithms. This object-oriented
view allows us to incrementally develop a toolkit for machine learning in
particular and knowledge processing in general.
Figure 1. Three main classes of objects in machine learning
Figure 2. Hierarchy of knowledge bases

KNOWLEDGEBASE
Content
Input/Output
Edit
Query
INDUCER
Induction Parameters
Induction Method
TESTER
Testing Parameters
Testing Method

KNOWLEDGEBASE
RELATIONALKB DEDUCTIVEKB
GRAPHKB
DATATABLE RULEBASE DECISIONTREE CONCEPTUALGRAPH
130 Rossiter & Cao
We used FRIL++ to implement a particular class of fuzzy machine-learning
techniques called data browser (Baldwin & Martin, 1995), as shown in
Figure 3, which learns fuzzy rules from relational data tables. The four main
classes of the data browser are DATATABLE, RULEBASE, DATABROWSER, and
TESTER. Objects of the class DATATABLE are relational data tables used as
training or testing data sets, while those of the class RULEBASE are sets of fuzzy
rules learned from relational data.
The class DATABROWSER implements the fuzzy machine-learning technique that
computes frequency distributions of the given values of input attributes with
respect to an output attribute in a training data table, and then converts those
frequency distributions into fuzzy sets (Baldwin et al., 1995) for the antecedents
of the corresponding fuzzy rule. Meanwhile, the class TESTER implements a
procedure for testing a learned fuzzy rule base against a testing data table. The
implemented data browser was demonstrated on two well-known benchmark
problems in machine learning the ellipse and the face problems.
The ellipse problem is to learn the points inside an ellipse and those points outside,
based on their two-dimensional coordinates. The data browser approach is to
learn that by producing two fuzzy rules for the inside and outside points,
respectively, in the following forms:
ellipse_point is inside (x_coordinate is A) (y_coordinate is B)
ellipse_point is outside (x_coordinate is A) (y_coordinate is B)
where A and B are fuzzy sets on the respective partitions of the x and y
coordinates. In this example, there are 121 training instances and 127 testing
instances, and the domain [-1.5 1.5] of the x and y coordinates is partitioned into
10 equal triangle fuzzy sets with an overlapping degree of 0.5. The obtained
accuracy is 96.85%.
Figure 3. Hierarchy of knowledge processors

DEDUCER
KNOWLEDGEPROCESSOR
INDUCER ABDUCER
FLBASED NNBASED SVMBASED
DATABROWSER FUZZYID3
BNBASED
The face problem is to learn which faces are male and which faces are female,
based on measurement of 18 attributes of human faces. In this example, there
are 138 training instances and 30 testing instances. The attribute domains are
partitioned into 20 equal triangle fuzzy sets with an overlapping degree of 0.5.
The obtained accuracy is 83.33%. The following FRIL++ codes show the
structures and main properties of the above-mentioned classes of the data
browser, and the main class for running the ellipse and face examples:
((public class DataTable extends (RelationalKB))
(public (parts
/* A data table is associated with an attribute schema of class
AttributeSchema */
(schema AttributeSchema) ))
(private (properties
/* The content of a data table is a list of instances, each of which
corresponds to a row in the table. Each instance is an object of class
Instance */
((instance _rowIndex _instObj)) ))
(public (properties
/* The number of rows of a data table */
((num_row _naturalNumber))
/* To get an instance of a data table */
((get_instance INSTANCE)
. )
/* To display a data table */
((display)
. )
/* Constructor constructs a data table from a data file */
((DataTable DATA_FILE)
. ) )))
((public class RuleBase extends (DeductiveKB))
(public (parts
/* A rule base is associated with an attribute schema of class
AttributeSchema */
132 Rossiter & Cao
(schema AttributeSchema) ))
(private (properties
/* The content of a rule base is a list of rules. Each rule is an object of
class Rule */
((rule _index _ruleObj)) ))
(public (properties
/* The number of rules in a rule base */
((num_rule _naturalNumber))
/* To get a rule in a rule base */
((get_rule RULE)
. )
/* To display a rule base */
((display)
. ) )))
((public class DataBrowser extends (FlBased))
(public (properties
/* To induce a rule base from a data table, given one output (or
categorizing) attribute and a list of input attributes */
((induce DATA_TABLE (OUT_ATTR | IN_ATTR_LIST)
RULE_BASE)
. ) )))
((public class Tester extends (Universal))
(public (properties
/* To test a rule base on a data table with respect to a given output
attribute */
((test RULE_BASE DATA_TABLE OUT_ATTR)
. ) )))
((public class MainClass extends (Universal))
(public (properties
((main)
/* To create a data browser */
(new myDataBrowser ((DataBrowser)) )
/* To create a tester */
(new myTester ((Tester)) )
/* To run the ellipse example */
(ellipse_exe)
/* To run the face example */
(face_exe) )
((ellipse_exe)
/* To create a training data table */
(new ellipseTrainTable ((DataTable ellipse_train)) )
/* To display the created training data table */
(ellipseTrainTable.display)
/* To specify output and input attributes */
(eq _outAttr ellipse_point)
(eq _inAttrList (x_coordinate y_coordinate) )
/* To generate fuzzy set labels and partition input attribute domains */
(forall ((List.member X _inAttrList))
((gensym ELABEL S)
(ellipseTrainTable.(schema).get_attribute X A)
(A.partition triangle S 10 0.5)) )
/* To induce fuzzy rules */
(myDataBrowser.induce ellipseTrainTable
( _outAttr | _inAttrList) _ellipseRuleBase)
/* To display the induced rule base */
( _ellipseRuleBase.display)
/* To create a testing data table */
(new ellipseTestTable ((DataTable ellipse_test)) )
/* To display the created testing data table */
(ellipseTestTable.display)
/* To test the induced rule base */
(myTester.test _ellipseRuleBase ellipseTestTable
_outAttr))
((face_exe)
134 Rossiter & Cao
(new faceTrainTable ((DataTable face_train)) )
(faceTrainTable.display)
(eq _outAttr class)
(eq _inAttrList (attribute1 attribute2 attribute3
attribute4 attribute5 attribute6
attribute16 attribute17 attribute18) )
(forall ((List.member X _inAttrList))
((gensym FLABEL S)
(faceTrainTable.(schema).get_attribute X A)
(A.partition triangle S 20 0.5)) )
/* Learning */
(myDataBrowser.induce faceTrainTable
( _outAttr | _inAttrList) _faceRuleBase)
( _faceRuleBase.display)
/* Testing */
(new faceTestTable ((DataTable face_test)) )
(myTester.test _faceRuleBase faceTestTable _outAttr) ))).
FRIL++ for User Modeling
In recent years, user modeling has become a major topic of academic and
commercial research. This focus has been driven by a combination of two
factors: first, the construction of huge databases of information about our daily
lives; and second, by the desire of organizations to use these data to understand
the people they deal with and, hence, to improve their services.
In this section, we present a new approach to incremental user recognition in
fuzzy environments, where user classification is updated within an object-
oriented epistemological model. First we examine and generalize the FILUM
approach to flexible user modeling (Martin, 2000). We then extend Einhorn and
Hogarths anchor and adjustment method (Hogarth & Einhorn, 1992), derived
from a study of human behavior, from the point value representation of belief and
evidence to the case where belief and evidence are imprecise, expressed by
subintervals of [0, 1].
User Recognition Problem
There are two main questions to ask when modeling users:
1. How do we generate the appropriate user models?
2. How do we classify a user into appropriate models?
The problem of user recognition centers on the temporal aspect of user behavior.
We have some set of known user types {U
1
,,U
n
}, the behaviors of which we
know and to which we provide a corresponding set of services. An unknown user
u at time t behaves in the fashion b
t
, where behavior is commonly the outcome
of some crisp or fuzzy choice, such as whether or not to buy expensive wine. We
wish to determine the similarity of u to each {U
1
,,U
n
} in order to provide the
appropriate service to u at time t. We must repeat this process as t increases.
In an object-oriented environment, we construct a hierarchy of n user classes,
{C
1
,, C
n
}, and we try to determine the support S
t
(u C
m
) for user u belonging
to user class C
m
at time t. This support is some function f of the current behavior
b
t
and the history of behaviors {b
1
,, b
t-1
}. This is shown more generally in
Equation 1.
S
t
(u C
m
) = f ({b
1
, ..., b
t
}) (1)
We can solve this problem at time t if we have the whole behavior series up to
t. Unfortunately, at time t + 1, we will have to do the whole calculation again.
Where t is very large, the storage of the whole behavior series and the cost of
the support calculation may be too expensive. An alternative approach is to view
the support S
t
(u C
m
) as some belief in the statement user u belongs to class
C
m
; this belief is updated whenever a new behavior is encountered. This belief
updating approach is more economical in space, because the whole behavior
series no longer needs to be stored. In computation, this approach is more
efficient, because we now must calculate some function g of just the previous
S
t-1
(u C
m
) and the latest behavior b
t
. This belief updating approach is shown
more generally in Equation 2.
S
t
(u C
m
) = g (S
t-1
(u C
m
), b
t
) (2)
136 Rossiter & Cao
In this section, we examine the case where belief is represented by a support pair,
which is a subinterval of [0, 1].
Simple User Recognition Example
Let us take the example where we classify food consumers into one of the
classes CANDYEATER, COOKIEEATER, or CAKEEATER. We may wish to represent
these consumer classes in a FRIL++ class hierarchy, as shown in Figure 4. In
the same way, an hierarchy can be constructed for the food these consumers eat,
as shown in Figure 5.
The consumer classes also define the prototypical behaviors of these consumers
through the following statements:
a candy-eater eats lots of candy most of the time
a cookie-eater eats lots of cookies most of the time
a cake-eater eats lots of cake most of the time
In a simple representation, we could use a conditional probability interval to
represent the most qualifier. For example, if we find that the statement eats
Figure 4. A consumer class hierarchy
Figure 5. A food class hierarchy

PERSON
COOKIEEATER
CONSUMER
CANDYEATER
CAKEEATER
FOOD
COOKIE
SWEETFOOD
CANDY CAKE
lots of candy most of the time is true for eight or more cases out of every 10
candy-eaters, then we can assign an interval [0.8, 1] to the conditional probability
Pr(eats | lots of candy). This approach gives us the following FRIL++ class
definition for the class CandyEater:
((public class CandyEater extends (Consumer) )
(public (properties
((eats X)
(X.isa Candy)
(X.quantity lots )) : (0.8 1) )))
Now consider a new food consumer u who makes a decision whether or not to
eat food x and we wish to determine us membership to the classes CANDYEATER,
COOKIEEATER, and CAKEEATER. The only information we have is the decision that
u made with respect to eating food x. We can determine memberships by
comparing us decision with the decision that would be made by a prototypical
member of each of the classes CANDYEATER, COOKIEEATER, and CAKEEATER,
given food x. Food x may be an uncertain member of any or all of the classes
CANDY, COOKIE, and CAKE. For example, if x is a sweet iced biscuit, then x is
clearly a member of the class COOKIE but may also have nonzero membership to
the class CANDY.
The remainder of this section is concerned with the case where we wish to
update us membership to the classes CANDYEATER, COOKIEEATER, and CAKEEATER,
as u chooses whether or not to eat each item of food in the ordered stream
x
1
,, x
n
.
Belief Updating for User Recognition
When a new behavior is encountered, it is interpreted as some evidence for or
against the statement user u belongs to class C
m
. When updating beliefs in
response to new evidence, we can evaluate the evidence in two ways. Either we
take the evidence to be absolute and update our beliefs to a degree defined
entirely by the new evidence, or we can take the evidence in the context of our
current beliefs and update our beliefs relatively. In this section, we will examine
the FILUM updating method, which is an absolute belief updating model, and
Einhorns and Hogarths anchor and adjustment belief revision, which is
relative.
138 Rossiter & Cao
Generalized FILUM User Recognition
The FILUM flexible incremental learning approach (Martin, 2000) relies on a
moving average to calculate current support S
n+1
for a hypothesis given the
previous support S
n
and new evidence x
n+1
as shown in Equation 3. Note that
s(x
n+1
) is the support for the given hypothesis provided by evidence x
n+1
:
1
1
( )
1
n n
n
nS s x
S
n
+
+
+
=
+
(3)
This approach is notable in its inflexibility with regard to the weight of impact of
new evidence. That is, new evidence always has a weight 1/(n + 1), and current
belief has a weight n/(n + 1). A more flexible generalization that can be used to
give a higher or lower weighting to new evidence is shown in Equation 4:
1
1
1
1
( )
n n
n
n S n s x
S
n n
l l
l l
-
+
+
-
+
=
+
(4)
Where would typically lie in the interval [0, 1]. If = 1 we have Equation 3,
where current belief is n times as important as new evidence. If = 0, we have
an expression that weights new evidence n times as important as current belief.
This flexibility may be important in cases where we know that users change their
behavior often and must therefore be reclassified quickly.
The advantage of the FILUM approach is its simplicity. It also updates support
where evidence is presented as either a support pair or a point value. Disadvan-
tages include the inflexibility of the model and the large primacy bias.
Anchor and Adjustment Belief Revision
If we are to classify human users, it would seem prudent to look at how humans
might perform this classification task. Hogarth and Einhorn have done much
work on models of belief updating that bear some relation to human behavior
(Einhorn & Hogarth, 1985; Hogarth & Einhorn, 1992). They suggested that the
strength of current belief can have a major effect on how new evidence updates
that belief. For example, the stronger the belief a person has in the trustworthi-
ness of a friend, the greater the reduction in this belief when the friend commits
an act of dishonesty. The typical pattern of behavior is shown in Figure 6. Here
the negative evidence e
-
has two differing effects depending on how large the
belief was before e
-
was presented. Likewise, there are two differing effects
from the same positive evidence e
+
.
The anchor and adjustment belief revision model by Hogarth and Einhorn (1992)
updates a belief given new evidence through two processes. Equation 5a shows
how belief S
k
is updated given new negative evidence. Equation 5b shows how
the same belief S
k
is updated given new positive evidence.
S
k
= S
k-1
+ S
k-1
(s(x
k
) - R) for s(x
k
) R (5a)
S
k
= S
k-1
+ (1 - S
k-1
) (s(x
k
) - R) for s(x
k
) > R (5b)
R is a reference point for determining if the support s(x
k
) for evidence x
k
is
positive or negative, and typically R = 0 or R = S
k-1
. The constants and define
Figure 6. Order effects in anchor and adjustment

Belief

e
-

t
e
+

e
+

e
-

Figure 7. Order effects in interval anchor and adjustment

Belief

t
e
+

e
-

e
+

e
-

140 Rossiter & Cao
how sensitive the model is to negative or positive evidence, respectively, where
0 1 and 0 1.
Anchor and Adjustment with Interval Supports
Because belief and support in our uncertain environment can be presented as a
support pair, we must consider the implications of an interval representation on
the anchor and adjustment model. For a piece of evidence e with the associated
support pair [l, u], we can view l as the positive evidence associated with e and
1-u as the negative evidence associated with e. The general principle is that,
given a current belief [n, p] and a piece of evidence with support [l, u], belief
increases by an amount proportional to 1-p and belief decreases by an amount
proportional to n.
We can apply Equations 5a and 5b to the support pair to yield Equations 6a to 6d,
where S
-
and S
+
are the lower bound and the upper bound of belief, respectively:
S
-
k
= S
-
k-1
+ S
-
k-1
(s
-
(x
k
) - R
-
) for s
-
(x
k
) R
-
(6a)
S
-
k
= S
-
k-1
+ (1 - S
+
k-1
)(s
-
(x
k
) - R
-
) for s
-
(x
k
) > R
-
(6b)
S
+
k
= S
+
k-1
+ S
-
k-1
(s
+
(x
k
) - R
+
) for s
+
(x
k
) R
+
(6c)
S
+
k
= S
+
k-1
+ (1 - S
+
k-1
)(s
+
(x
k
) - R
+
) for s
+
(x
k
) > R
+
(6d)
Note that R
-
is a reference point for determining if the lower bound of the
presented evidence is positive or negative with respect to the lower bound of
belief, and R
+
is the corresponding reference point for the upper bound of belief.
Here, we choose R
-
= S
-
k-1
and R
+
= S
+
k-1
, where 0 1 and 0 1.
Figure 7 shows the order effects of this interval belief updating model. The
precise effects of negative evidence e
-
and positive evidence e
+
are determined
by and , respectively. The effect of new evidence is dependent on the most
recent belief only, and not on t. This is a known characteristic of the anchor and
adjustment model. This recency behavior contrasts with the primacy bias of the
FILUM approach.
This new interval version of Hogarths and Einhorns belief updating model has
a number of advantages over the FILUM method. Recency characteristics allow
the anchor and adjustment model to reclassify users quickly. The order effects
of this model are related to human behavior, and this seems to be an important
consideration when we are recognizing human users. In addition, this method
allows us to control the effects of positive and negative evidence separately. This
last feature may be especially important in medical user modeling applications,
where false-negative classifications have far more serious consequences than
false-positive classifications.
Iterated Prisoners Dilemma (IPD) Problem in FRIL++
The n-player iterated prisoners dilemma problem (Axelrod, 1985) is a good test
bed for user recognition due to the production of streams of behavior that are a
result of user interactions in pairs. The problem is most easily understood by
looking at the noniterated problem with n = 2. Two prisoners are due to be
sentenced. They each have the choice to cooperate together or to defect. If the
players both cooperate, they will both serve three years. If they both defect, they
will both serve one year. If they choose to behave differently, then the defecting
player will serve zero years but the cooperating player will serve five years. The
iterated problem simply continues the game after each round.
A wide range of strategies are possible, ranging from trusting behavior (always
cooperate) to defective behavior (always defect) and including more complex
strategies such as conditional cooperation (cooperating unless the opponents
last m behaviors were defect). The n-player prisoners dilemma is a difficult
problem from which to identify user classes, because a single player p interacts
with unknown and randomly selected partners. As a result, the behavior stream
generated by p is not determined exclusively by the class of p.
If we were to construct a class hierarchy of prisoners in FRIL++, it could
resemble Figure 8. The subclasses of prisoner are the classes that define
prototypical prisoners and their behaviors. The goal of user recognition in this
problem is to determine the class of an unknown prisoner from the unknown
prisoners past and current behaviors. The behaviors of these prototypical
prisoners are described in Table 1. An example of a FRIL++ class definition for
a prototypical prisoner is given in Rossiter et al. (2001a).
A population of 10 prisoners is created, and a game of 75 rounds is initiated. Each
round involves picking pairs of prisoners at random from the population until none
is left, and for each pair recording the behaviors the prisoners exhibit (defect,
Figure 8. A class hierarchy for the prisoners dilemma problem
PERSON
TITFORTAT
PRISONER
UNCOOPERATIVE RANDOM
COOPERATIVE RESPD
142 Rossiter & Cao
cooperate, etc). From the past history of each player, and using the techniques
described earlier (with = = 0.3), they are classified into the five behavior
classes. The winning class is taken as the class in which minimum membership
(i.e., the lower bound of the membership interval) is greatest. If the winning class
matches the actual class in Table 2, then the classification is recorded as a
success.
To recreate the situation where user behavior changes, after 60 rounds, the
behaviors of all 10 prisoners are changed, as shown in the third column of
Table 1. Behavior classes
Behavior Description
Cooperative Always cooperate with opponent
Uncooperative Always defect against opponent
Tit-for-tat Cooperate unless last opponent defected
Random Equal random chance of defect or cooperate
Respd Defect unless the last six opponents chose to cooperate
Table 2. The prisoner population
Individual Behavior before 60
th
round Behavior after 60
th
round
1 Random Cooperative
2 Random Uncooperative
3 Cooperative Tit-for-tat
4 Cooperative Respd
5 Uncooperative Random
6 Uncooperative Respd
7 Tit-for-tat Cooperative
8 Tit-for-tat Random
9 Respd Tit-for-tat
10 Respd Uncooperative
Table 2. After this point, the game is continued for 15 rounds. We compare
classification results using the interval anchor and adjustment belief updating
method with the FILUM method described in Martin (2000). The whole process
is repeated five times, and the mean of the results is taken.
As can be seen from Table 3, classification results before the 60
th
round (the
point of behavior change) are similar between the two methods. After the 60
th
round, however, there is a marked difference in the results, with a large fall in
the performance of the FILUM approach. These results show the primacy
effects present in the FILUM method and the recency effects characteristic of
the interval anchor and adjustment approach. We highlight these effects as
important points to consider when implementing user recognition in any specific
user modeling application.
Results from the iterated prisoners dilemma test bed suggest that the recency
bias of the anchor and adjustment approach is more suitable to the problem of
object-oriented user modeling, where the behaviors of users change over time.
Future work in this area will consider the cases where user behavior is
represented by fuzzy sets. For example, a user buys a large number of
inexpensive items. More investigation is also needed in determining ranges for
the values of R
-
and R
+
in the interval anchor and adjustment approach.
FRIL++ for Modeling with Words
In this section, we discuss how uncertain object-oriented logic programming may
be used for the implementation of object-oriented modeling with words (Rossiter
et al., 2001b). We consider modeling with words to be an extension of computing
with words. Where computing with words can be thought of as performing
operations using linguistic labels, we interpret modeling with words to mean the
generation of models using linguistic labels. In modeling with words, the modeling
Table 3. Classification results
Before 60
th
round After 60
th
round
Interval anchor and
adjustment
FILUM
Interval anchor and
adjustment
FILUM
63.6% 63.3% 57.3% 22.2%
144 Rossiter & Cao
process as well as the final model can be based upon a calculus of linguistic
labels.
The goal of modeling with words is the generation of linguistic models from a
combination of data and background information. Commonly, the background
information can be elicited from domain-specific experts in the form of linguistic
rules. An important feature of modeling with words is that the models generated
must in some way be insightful. By insightful, we mean that some useful
information can be gained by examining the model without having to apply the
model to any classification or prediction problem.
Modeling with words uses simple linguistic variables and sentences to build
models that can be interpreted by all, including those with no technical training.
This is in contrast with many conventional machine-learning paradigms, where
insight into the learned model is restricted by the representation, which is
typically numeric (e.g., x = 0.98), comparative (e.g., n < p), or algebraic
(e.g., z = an + bn
2
). These representations are comprehensible to those experts
trained to understand them, but are frequently incomprehensible to nonexperts.
We might say that numeric, comparative, and algebraic representations result in
black box models, which require some degree of technical skill to interpret.
With linguistic models, on the other hand, some of the blackness of the model is
cleared and the goal is to produce glass box, or transparent, models.
A typical approach to modeling with words involves modeling individual words
as information granules, as proposed by Zadeh (1996). The modeling of granular
information can also be modulated by studies into computing with perceptions
(Zadeh, 1999). The resulting granules correspond to a vocabulary of words that
can then be used for modeling with words. Unfortunately, the restrictions of
granular computation and computing with perceptions result in a vocabulary that
is also restricted.
As a result of this restricted vocabulary, the models generated are not, in fact,
perfectly transparent. Rather, the models are grey, murky, or foggy in
nature. Clearly, this is less than ideal and may result in some reduction in model
comprehension. Even so, we can say that a murky insight into a model is better
than no insight. In other words, we will accept the restriction imposed by this
restricted vocabulary in order to at least gain some insight into the linguistic
model, and hence, the problem domain.
The restricted vocabulary described above enables us to create simple linguistic
sentences such as the tree is tall. In the real world, however, humans find it
natural to classify real-world concepts into taxonomical hierarchies, or at least
into a set of related ontological specifications. We therefore propose extending
modeling with words with taxonomical (and, hence, ontological) information.
Consider the simple hierarchy of trees in Figure 9. In our extended modeling with
words framework, we can now create slightly more sophisticated linguistic
sentences such as the tree is a tall evergreen, where evergreen is an
ontological specification that is more specific than that described by the class
trees.
The suggested extension of modeling with words with taxonomical information
implies that our modeling environment contains taxonomical concepts, or more
specifically, class hierarchies. When considering taxonomical hierarchies, it is
common to think of a class definition as a theory. Because we can introduce
uncertainty into a theory in the form of linguistic terms, a class definition can be
thought of as a linguistic construct as well as a taxonomical construct.
To this end, we propose an object-oriented framework for modeling with words
using linguistic descriptors and taxonomical hierarchies. This object-oriented
approach to modeling with words enables rich models to be generated, while at
the same time promotes the compactness and efficiency of the resulting models.
Object-Oriented Modeling with Words
An object-oriented approach to modeling with words has the following features.
Clear Representation
An hierarchical representation of classes reflects our natural taxonomic view of
the real world. Take, for example, the scientific classification of all living
organisms. The top-most superclass is called ORGANISM; the next level in the
hierarchy defines the domains EUKARYA, EUBACTERIA, and ARCHAEA; the next
level defines the kingdoms (e.g., ANIMALIA); and the next defines phylus (e.g.,
VERTEBRATE); and so on until we reach the species MAN. We apply class
hierarchies to all parts of our lives, even when we do not have specific scientific
knowledge such as in the previous example. For example, we may classify trees
into LARGETREE and SMALLTREE. We may then split LARGETREE into QUITELARGETREE
and VERYLARGETREE. The important thing to see here is that the linguistic terms
commonly used in computing with words (large, small, very large, etc.) may
also be integral to class descriptions.
Figure 9. A tree hierarchy
TREE
DECIDUOUS EVERGREEN
146 Rossiter & Cao
Scalability of Knowledge Representation
Modeling with words has been successful in many small-scale toy problems. The
question now arises: how can modeling with words be scaled to larger, real-world
problems? In object-oriented modeling with words, we naturally have a measure
of scale, namely, our perspective of the hierarchy. If we build a model that has
hundreds of classes in our hierarchy, we can focus on the appropriate level of
the hierarchy for the appropriate linguistic description of the model for which we
are looking.
Summarizing the model can be done at as many levels as there are levels in the
hierarchy. A complex summary involves knowledge from lower down the
hierarchy, while a more general summary involves knowledge from the top of the
hierarchy. Figure 10 illustrates the perspective projection of classes from the top
and bottom of the complex hierarchy on the left onto the simple hierarchy of trees
on the right. In this example, we summarized the complex relationships between
the classes of TREE, SPRUCE, and OAK.
Power of Inheritance, Overriding, Encapsulation, and Information
Hiding
From the knowledge representation point of view, inheritance helps reduce
unnecessary repetition in the class hierarchy. This aids model conciseness.
Overriding, on the other hand, enables us to form a richer hierarchical model, and
in extreme cases, to implement forms of nonmonotonic reasoning. Take for
example the hierarchy of birds. We might say that all birds can fly. Yet if we
define a subclass of bird called penguin, we find that all penguins cannot fly. A
nonmonotonic contradiction exists, because penguins inherit the ability to fly
from the bird superclass, and yet penguins cannot fly. Here we need the concept
Figure 10. Perspective and scalability
TREE
TREE
SPRUCE OAK
perspective
projection
SPRUCE
OAK
of overriding to mitigate the contradiction. The problem of nonmonotonic
inference is discussed in more detail in preceding sections.
Encapsulation (the grouping of methods and attributes within a class) and
information hiding (restricting access to properties within a class) are features
that can make the final model more robust when used in anger. These are
programming aids that are useful in modeling with words, where the models are
produced to solve real-world problems.
Uncertain Classes and Objects
Any object-oriented system for modeling with words needs to be able to
represent concepts using words. The system needs to model the uncertainty that
is inherent in the way humans use words. To this end, a class consists of a set
of properties, each of which can involve some degree of uncertainty. Properties
can be methods (they do things) or attributes (they represent facts). Attributes
can be defined by fuzzy sets representing words, probability values, or any other
established uncertainty representation. Methods, on the other hand, may call
upon uncertain attributes and may thus define uncertain actions.
Given a vocabulary of words, a suitable calculus based on these words, and the
uncertain object-oriented techniques described in this chapter, it is clear that we
can implement object-oriented modeling with words in FRIL++. We propose
FRIL++ as a useful tool for object-oriented modeling with words. This approach
seeks to combine modeling with words with uncertain class hierarchies to give
a richer and more powerful mechanism for the representation of high-level
expert knowledge and the induction of insightful models from data.
Conclusions
We introduced a logic-based probabilistic and fuzzy object-oriented model in
which each class property is represented by a fuzzy rule weighted by probability
lower and upper bounds. We then proposed probabilistic default reasoning on
fuzzy events as a suitable approach to uncertain property inheritance and class
recognition problems. The intractable steps of general probabilistic default
reasoning are reduced to polynomial time ones, using Jeffreys rule and its
inverse for a weaker notion of consistency and for local inference.
On the formal basis of this model, we designed and implemented FRIL++ as the
object-oriented extension of FRIL, a logic programming language dealing with
both probability and fuzziness. We presented the basic features of FRIL++ with
148 Rossiter & Cao
an example, and showed the important differences between the translation of a
probabilistic and fuzzy object-oriented logic program and that of a classical one,
due to uncertain class membership and property applicability. FRIL++ can thus
be used as a modeling and programming language for probabilistic and fuzzy
object-oriented deductive databases and knowledge bases, in the same way as
predicate logic programming languages have been used for classical deductive
databases and knowledge bases.
In particular, we presented the application of FRIL++ to machine learning, user
modeling, and modeling with words. For machine learning, FRIL++ has been
used to build a library of classes of fuzzy machine learning algorithms so that they
can be compared or combined with each other, as there is no best learning
algorithm for all tasks. For user modeling, prototypical user classes can be
modeled in FRIL++ classes, that have properties that can be inherited by a user
with uncertainty degrees depending on the users membership to those classes.
For modeling with words, we propose FRIL++ as a good language for object-
oriented development and implementation. On the other hand, we are also
revising FRIL++, optimizing the compiler and adding more utilities to the
language, in order to make it a powerful tool for modeling and constructing
intelligent systems.
References
Axelrod, R. (1985). The evolution of cooperation. New York: Basic Books.
Baldwin, J. F., & Martin, T. P. (1995). Refining knowledge from uncertain
relations a fuzzy data browser based on fuzzy object-oriented program-
ming in FRIL. In Proceedings of the Fourth IEEE International Confer-
ence on Fuzzy Systems (pp. 2734).
Baldwin, J. F., Lawry, J., & Martin, T. P. (1996). A note on probability/possibility
consistency for fuzzy events. In Proceedings of the 6
th
International
Conference on Information Processing and Management of Uncer-
tainty in Knowledge-Based Systems, 521-526.
Baldwin, J. F., Lawry, J., & Martin, T. P. (1996). Efficient algorithms for
semantic unification. In Proceedings of the Sixth International Confer-
ence on Information Processing and Management of Uncertainty in
Knowledge-Based Systems (pp. 527532).
Baldwin, J. F., Lawry, J., & Martin, T. P. (1998). The application of generalised
fuzzy rules to machine learning and automated knowledge discovery.
International Journal of Uncertainty Fuzziness and Knowledge-Based
Systems, 6, 459487.
Baldwin, J. F., Martin, T. P., & Pilsworth, B. W. (1995). FRIL Fuzzy and
evidential reasoning in artificial intelligence. Hertfordshire, United
Kingdom: Research Studies Press.
Baldwin, J. F., Cao, T. H., Martin, T. P., & Rossiter, J. M. (2000). Towards soft
computing object-oriented logic programming. In Proceedings of the
Ninth IEEE International Conference on Fuzzy Systems (pp. 768773).
oriented database model: Imprecision, uncertainty & fuzzy types. In
Proceedings of the First International Joint Conference of the Inter-
national Fuzzy Systems Association and the North American Fuzzy
Information Processing Society (pp. 23232328).
model managing vague and uncertain information. International Journal
of Intelligent Systems, 14, 623651.
Cao, T. H., & Creasy, P. N. (2000). Fuzzy types: A framework for handling
uncertainty about types of objects. International Journal of Approximate
Reasoning, 25, 217253.
Cao, T. H., Rossiter, J. M., Martin, T. P., & Baldwin, J. F. (2002). On the
implementation of FRIL++ for object-oriented logic programming with
uncertainty and fuzziness. In B. Bouchon-Meunier et al. (Eds.), Technolo-
gies for constructing intelligent systems, studies in fuzziness and soft
computing (Vol. 90, pp. 393406). Heidelberg: Physica-Verlag.
Cao, T. H., Rossiter, J. M., Martin, T. P., & Baldwin, J. F. (2001). Inheritance
and recognition in uncertain and fuzzy object-oriented models. In Proceed-
ings of the First International Joint Conference of the International
Fuzzy Systems Association and the North American Fuzzy Information
Processing Society (pp. 23172322).
Cross, V. V. (2003). Defining fuzzy relationships in object models: Abstraction
and interpretation. International Journal of Fuzzy Sets and Systems,
140, 527.
De Tr, G. (2001). An algebra for querying a constraint defined fuzzy and
uncertain object-oriented database model. In Proceedings of the First
International Joint Conference of the International Fuzzy Systems
Association and the North American Fuzzy Information Processing
Society (pp. 21382143).
Dubitzky, W., Bchner, A. G., Hughes, J. G., & Bell, D. A. (1999). Towards
concept-oriented databases. Data & Knowledge Engineering, 30, 23
55.
150 Rossiter & Cao
Dubois, D., Fargier, H., & Prade, H. (2000). Multiple-sources information fusion
a practical inconsistency-tolerant approach. In Proceedings of the
Eighth International Conference on Information Processing and
Management of Uncertainty in Knowledge-Based Systems (pp. 1047
1054).
Einhorn, H. J., & Hogarth, R. M. (1985). Ambiguity and uncertainty in proba-
bilistic inference. Psychological Review, 93, 433461.
Eiter, T., Lu, J. J., Lukasiewicz, T., & Subrahmanian, V. S. (2001). Probabilistic
object bases. ACM Transactions on Database Systems, 26, 264312.
Gaines, B. R. (1978). Fuzzy and probability uncertainty logics. Journal of
Information and Control, 38, 154169.
Geffner, H., & Pearl, J. (1992). Conditional entailment: Bridging two approaches
to default reasoning. Artificial Intelligence, 53, 209244.
in the fuzzy object-oriented data model. International Journal for Fuzzy
Sets and Systems, 60, 259272.
Hogarth, R. M., & Einhorn, H. J. (1992). Order effects in belief updating: The
belief-adjustment model. Cognitive Psychology, 24, 155.
Itzkovich, I., & Hawkes, L. W. (1994). Fuzzy extension of inheritance hierar-
chies. International Journal for Fuzzy Sets and Systems, 62, 143153.
Jeffrey, R. (1965). The logic of decision. New York: McGraw-Hill.
Kohavi, R., Sommerfield, D., & Dougherty, J. (1996). Data mining using
MLC++: A machine learning library in C++. In Tools with Artificial
Intelligence (pp. 234245). Washington: IEEE Computer Society Press.
Lukasiewicz, T. (2000). Probabilistic default reasoning with conditional con-
straints. Proceedings of the Eighth International Workshop on Non-
Monotonic Reasoning, Special Session on Uncertainty Frameworks in
Non-Monotonic Reasoning.
Martin, T. P. (2000). Incremental learning of user models an experimental
testbed. In Proceedings of the Eighth International Conference on
Information Processing and Management of Uncertainty in Knowl-
edge-Based Systems (pp. 14191426).
McCabe, F. G. (1992). Logic and objects. New York: Prentice Hall.
Mitchell, T. M. (1997). Machine learning. New York: McGraw-Hill.
Moss, C. (1994). Prolog++: The power of object-oriented and logic pro-
gramming. Reading, MA: Addison-Wesley.
classes. In R. De Caluwe, Fuzzy and uncertain object-oriented data-
bases: Concepts and models (pp. 2161). Singapore: World Scientific.
Rossiter, J. M., Cao, T. H., Martin, T. P., & Baldwin, J. F. (2000). A FRIL++
compiler for soft computing object-oriented logic programming. In Pro-
ceedings of the Sixth International Conference on Soft Computing (pp.
340345).
Rossiter, J. M., Cao, T. H., Martin, T. P., & Baldwin, J. F. (2001a). User
recognition in uncertain object-oriented user modelling. In Proceedings of
the 10
th
IEEE International Conference on Fuzzy Systems.
Rossiter, J. M., Cao, T. H., Martin, T. P., & Baldwin, J. F. (2001b). Object-
oriented modelling with words. In Proceedings of the 10
th
IEEE Interna-
tional Conference on Fuzzy Systems, Workshop on Modelling with
Words.
Shastri, L. (1989). Default reasoning in semantic networks: A formalization of
recognition and inheritance. Artificial Intelligence, 39, 283355.
Stroustrup, B. (1997). The C++ programming language (3rd ed.). Reading,
MA: Addison-Wesley.
Van Gyseghem, N., & De Caluwe, R. (1997). The UFO database model: Dealing
with imperfect information. In R. De Caluwe (Ed.), Fuzzy and uncertain
object-oriented databases: Concepts and models (pp. 123185).
Singapore: World Scientific.
Yazici, A., & George, R. (1999). Fuzzy database modelling. Studies in
fuzziness and soft computing (Vol. 26). Heidelberg: Physica-Verlag.
Zadeh, L. A. (1996). Fuzzy logic = computing with words. IEEE Transactions
on Fuzzy Systems, 4, 103111.
Zadeh, L. A. (1999). From computing with numbers to computing with words
from manipulation of measurements to manipulation of perceptions. IEEE
Transactions on Circuits and Systems, 45, 105119.
152 Rossiter & Cao
SECTION II
Fuzzy Information Modeling with the UML 153
Chapter V
Fuzzy Information
Modeling with the UML
Zongmin Ma
Universit de Sherbrooke, Canada
Abstract
Computer applications in nontraditional areas have put requirements on
conceptual data modeling. Some conceptual data models, being the tool of
design databases, were proposed. However, information in real-world
applications is often vague or ambiguous. Currently, less research has
been done in modeling imprecision and uncertainty in conceptual data
models. The UML (Unified Modeling Language) is a set of object-oriented
modeling notations and is a standard of the Object Data Management
Group (ODMG). It can be applied in many areas of software engineering
and knowledge engineering. Increasingly, the UML is being applied to
data modeling. In this chapter, different levels of fuzziness are introduced
into the class of the UML and the corresponding graphical representations
are given. The class diagrams of the UML can hereby model fuzzy
information.
154 Ma
Introduction
One of the major areas of research in databases has been the continuous effort
to enrich existing database models with a more extensive collection of semantic
concepts. Databases have gone through the development from hierarchical and
network databases to relational databases. As computer technology moves into
nontraditional applications such as CAD/CAM, knowledge-based systems,
multimedia, and Internet systems, many feel the limitations of relational data-
bases in these data-intensive application systems. Therefore, some nontradi-
tional data models for databases, such as the entity-relationship (ER) data model
(Chen, 1976), the object-oriented data model, and the logic data model, being the
tool of modeling databases, have been proposed.
One of the semantic needs not adequately addressed by traditional models is that
of uncertainty. Traditional models assume the database model to be a correct
reflection of the world being captured and assume that the data stored is known,
accurate, and complete. It is rarely the case in real life that all or most of these
assumptions are met. Different models have been proposed to handle different
categories of data quality (or lack thereof). Five basic kinds of imperfection have
been identified: inconsistency, imprecision, vagueness, uncertainty, and ambigu-
ity (Bosc & Prade, 1993). Inconsistency is a kind of semantic conflict when some
aspect of the real world is irreconcilably represented more than once in a
database or in several different databases. Inconsistency has traditionally been
applied to data. In the context of multidatabases, where multiple sources are
integrated, attention was given to inconsistency at the modeling level. Impreci-
sion and vagueness are two closely related qualities. They both relate to the
context in which the value attributed to an attribute (or the interpretation assigned
to a concept) is known to come from a given interval (or set of values) but we
do not know exactly which one to choose at present. In general, vague
information is represented by linguistic values. Uncertainty refers to those
situations in which we can apportion some, but not all, of our belief to the fact
that an attribute took a given value or a group of values. The random uncertainty,
described using probability theory, is not considered in this chapter. Finally,
ambiguity means that some elements of the model lack complete semantics,
leading to several possible interpretations. Generally, several different kinds of
imperfection coexist with respect to the same piece of information. A large
number of models have been proposed to handle uncertainty and vagueness.
Most of these models are based on the same paradigms. Vagueness and
uncertainty are generally modeled with fuzzy sets and possibility theory (Zadeh,
1965, 1978). Many of the existing approaches dealing with imprecision and
uncertainty are based on the theory of fuzzy sets. Fuzzy information has been
extensively investigated in the context of the relational model (Buckles & Petry,
1982; Ma, Zhang, & Ma, 1999; Prade & Testemale, 1984; Raju & Majumdar,
1988). Recent efforts have extended these results to object-oriented databases
by introducing the related notions of classes, generalization/specialization, and
inheritance (Bordogna, Pasi, & Lucarella, 1999; Cross, Caluwe, & Vangyseghem,
1997; Cross & Firat, 2000; Dubois, Prade, & Rossazza, 1991; George, Srikanth,
Petry, & Buckles, 1996; Gyseghem & Caluwe, 1998; Lee et al., 1999; Ma,
Zhang, & Ma, 2004; Marn, Vila, & Pons, 2000; Marn et al., 2003). However,
most of this research is focusing on modeling uncertainty at the data level; fewer
results exist when it comes to uncertainty at the conceptual model level. It is
especially true for modeling uncertain information in object-oriented data
models.
The UML (Booch, Rumbaugh, & Jacobson, 1998; OMG, 2001) is a set of object-
oriented modeling notations that was standardized by the ODMG. The power of
the UML can be applied to many areas of software engineering and knowledge
engineering (Mili, Shen, et al., 2001). The complete development of relational and
object relational databases from business requirements can be described by the
UML. The database has traditionally been described by notations called entity-
relationship (ER) diagrams, using graphic representation that is similar but not
identical to that of the UML. Using the UML for database design has many
advantages over the traditional ER notations (Naiburg, 2000). The UML is based
largely upon the ER notations and includes the ability to capture all information
that is captured in a traditional data model. The additional compartment in the
UML for methods or operations allows you to capture items like triggers,
indexes, and the various types of constraints directly as part of the diagram. By
modeling this, rather than using tagged values to store the information, it is now
visible on the modeling surface, making it more easily communicated to everyone
involved. So, increasingly, the UML is being applied to data modeling (Ambler,
2000a, 2000b; Blaha & Premerlani, 1999; Naiburg, 2000). More recently, the
UML was used to model XML conceptually (Conrad, Scheffiner, & Freytag,
2000).
Note that while the UML reflects some of the best object-oriented modeling
experiences available, it suffers from a lack of some necessary semantics. One
thing lacking can be generalized as the need to handle imprecise and uncertain
information. To our knowledge, the issues on fuzzy UML data model have not
been addressed in the literature, although imprecise and uncertain information
exists in knowledge engineering and database systems and have extensively
been studied. In this chapter, different levels of fuzziness will be introduced into
the class in the UML, and the corresponding graphical representations are given.
The class diagrams of the UML can hereby model fuzzy information. The
contribution of this chapter is that an object-oriented conceptual modeling
methodology is fully developed for fuzzy information modeling.
156 Ma
The remainder of this chapter is organized as follows. The second section gives
basic knowledge concerning fuzzy set and possibility distribution theories as well
as knowledge of the UML class model. The fuzzy extension to class model in the
UML is presented in the third section. The fourth section discusses related work,
and the last section concludes this chapter.
Basic Knowledge
Fuzzy Set and Possibility Distribution
The concept of fuzzy sets was originally introduced by Zadeh (1965). Let U be
a universe of discourse. A fuzzy value on U can be characterised by a fuzzy set
F in U. A membership function
F
: U [0,1] is defined for the fuzzy set F, where
F
(u), for each u U, denotes the degree of membership of u in the fuzzy set
F. Thus, the fuzzy set F is described as follows:
F = { (u
1
)/u
1
, (u
2
)/u
2
, ..., (u
n
)/u
n
}
where the pair (u
i
)/u
i
represents the value u
i
and its membership degree (u
i
).
The membership function
F
(u) can be interpreted as a measure of the possibility
that the value of variable X is u. A fuzzy set is equivalently represented by its
associated possibility distribution
X
(Zadeh, 1978):
X
= {
X
(u
1
)/u
1
,
X
(u
2
)/u
2
, ...,
X
(u
n
)/u
n
}
Here,
X
(u
i
), u
i
U, denotes the possibility that u
i
is true. Let
X
and F be the
possibility distribution representation and the fuzzy set representation for a fuzzy
value, respectively. It is apparent that
X
= F is true (Raju & Majumdar, 1988).
UML Class Model
UML provides a collection of models to capture the many aspects of a software
system. From the database modeling point of view, the most relevant model is the
class model. The building blocks in this class model are those of classes and
relationships. We briefly review these building blocks.
Classes
Being the descriptor for a set of objects with similar structure, behavior, and
relationships, a class represents a concept within the system being modeled.
Classes have data structure and behavior and relationships to other elements. A
class is drawn as a solid-outline rectangle with three compartments separated by
horizontal lines. The top name compartment holds the class name and other
general properties of the class (including stereotype); the middle list compart-
ment holds a list of attributes; the bottom list compartment holds a list of
operations. Either or both of the attribute and operation compartments may be
suppressed. A separator line is not drawn for a missing compartment. If a
compartment is suppressed, no inference can be drawn about the presence or
absence of elements in it. Figure 1 shows a class.
Relationships
Another main structural component in the class diagram of the UML is
relationships for the representation of relationship between classes or class
instances. UML supports a variety of relationships:
1. Aggregation and composition: An aggregation captures a wholepart
relationship between an aggregate, a class that represent the whole, and a
constituent part. An open diamond is used to denote an aggregate relation-
ship. Here the class touched with the white diamond is the aggregate class,
denoting the whole.
Figure 2 shows an aggregation relationship.
Figure 1. The class icon
Class name
Attributes
Operations
Figure 2. Simple aggregation relationship

Car
Engine Interior Chassis
158 Ma
Aggregation is a special case of composition where constituent parts are
directly dependent on the whole part, and they cannot exist independently.
Composition mainly applies to attribute composition. A composition rela-
tionship is represented by a black diamond.
2. Generalization: Generalization is used to define a relationship between
classes to build taxonomy of classes: one class is a more general description
of a set of other classes. The generalization relationship is depicted by a
triangular arrowhead. This arrowhead points to the superclass. One or
more lines proceed from the superclass of the arrowhead, connecting it to
the subclasses.
Figure 3 shows a generalization relationship.
3. Association: Associations are relationships that describe connections
among class instances. An association is a more general relationship than
aggregation or generalization. A role may be assigned to each class taking
part in an association, making the association a directed link. An association
relationship is expressed by a line with an arrowhead drawn between the
participating classes.
Figure 4 shows an association relationship.
4. Dependency: A dependency indicates a semantic relationship between
two classes. It relates the classes and does not require a set of instances
for its meaning. It indicates a situation in which a change to the target class
may require a change to the source class in the dependency. A dependency
is shown as a dashed arrow between two classes. The class at the tail of
the arrow depends on the class at the arrowhead.
Figure 5 shows a dependency relationship.
Figure 3. Simple generalization relationship

Vehicle
Car Truck
Figure 4. Simple association relationship

CD Player Car
installing
UML Modeling of Fuzzy Data
In this section, we extend the UML class diagrams to model fuzzy data. Because
the constructs of the UML contain class and relationships, the extension to these
constructs should be conducted based on fuzzy sets.
Fuzzy Class
Objects with the same properties are gathered into classes that are organized into
hierarchies. Theoretically, a class can be considered from two different view-
points:
1. An extensional class, where the class is defined by the list of its object
instances
2. An intensional class, where the class is defined by a set of attributes and
the admissible values of the attributes
Therefore, a class is fuzzy because of the following several reasons. First, some
objects are fuzzy ones, which have similar properties. A class defined by these
objects may be fuzzy. These objects belong to the class with membership degree
of [0, 1]. Second, when a class is intensionally defined, the domain of an attribute
may be fuzzy, and a fuzzy class is formed. Third, the subclass produced by a
fuzzy class by means of specialization and the superclass produced by some
classes (in which there is at least one class that is fuzzy) by means of
generalization are also fuzzy.
Following on the footsteps of Zvieli and Chen (1986), we define three levels of
fuzziness. In the context of classes, the three levels of fuzziness are defined as
follows:
1. Fuzziness in the extent to which the class belongs in the data model as well
as fuzziness on the content (in terms of attributes) of the class
2. Fuzziness related to whether some instances are instances of a class; even
though the structure of a class is crisp, it is possible that an instance of the
class belongs to the class with degree of membership
Figure 5. Simple dependency relationship

Dependent Employee
160 Ma
3. The third level of fuzziness is on attribute values of the instances of the
class; an attribute in a class defines a value domain, and when this domain
is a fuzzy subset or a set of fuzzy subset, the fuzziness of an attribute value
appears
In order to model the first level of fuzziness, i.e., an attribute or a class with
degree of membership, the attribute or class name should be followed by a pair
of words WITH mem DEGREE, where 0 mem 1 and it is used to indicate the
degree to which the attribute belongs to the class or the class belongs to the data
model (Gyseghem & Caluwe, 1998; Marn, Vila, & Pons, 2000). For example,
Employee WITH 0.6 DEGREE and Office Number WITH 0.8DEGREE are
class and attribute with the first level of fuzziness, respectively. Generally, an
attribute or a class will not be declared when its degree is 0. In addition, WITH
1.0 DEGREE can be omitted when the degree of an attribute or a class is 1. It
should be noted that attribute values might be fuzzy. In order to model the third
level of fuzziness, a keyword FUZZY is introduced and is placed in front of the
attribute. In the second level of fuzziness, we must indicate the degree of
membership to which an instance of the class belongs to the class. For this
purpose, an additional attribute is introduced into the class to represent instance
membership degree to the class, with an attribute domain that is [0, 1]. We denote
such special attribute with . In order to differentiate the class with the second
level of fuzziness, we use a dashed-outline rectangle to denote such class.
Figure 6 shows a fuzzy class Ph.D. student. Here, attribute Age may take fuzzy
values, namely, its domain is fuzzy. Ph.D. students may or may not have their
offices. It is not known for sure if class Ph.D. student has attribute Office. But
we know Ph.D. students may have their offices with high possibility, say 0.8. So
attribute Office uncertainly belongs to the class Ph.D. students. This class has
the fuzziness at the first level and we use with 0.8 membership degree to
describe the fuzziness in the class definition. In addition, we may not determine
if an object is the instance of the class because the class is fuzzy. So an additional
attribute is introduced into the class for this purpose.
Figure 6. A fuzzy class
Ph.D. student

ID
Name
FUZZY Age
Office WITH 0.8 DEGREE

Fuzzy Generalization
The concept of subclassing is one of the basic building blocks of the object model.
A new class, called subclass, is produced from another class, called superclass,
by means of inheriting some attributes and methods of the superclass, overriding
some attributes and methods of the superclass, and defining some new attributes
and methods. Because a subclass is the specialization of the superclass, any one
object belonging to the subclass must belong to the superclass. This character-
istic can be used to determine if two classes have a subclass-superclass
relationship.
However, classes may be fuzzy. A class produced from a fuzzy class must be
fuzzy. If the former is still called subclass and the later superclass, the subclass-
superclass relationship is fuzzy. In other words, a class is a subclass of another
class with membership degree of [0, 1] at this moment. Correspondingly, we
have the following method for determining a subclass-superclass relationship:
1. For any (fuzzy) object, if the membership degree that it belongs to the
subclass is less than or equal to the membership degree, then it belongs to
the superclass.
2. The membership degree that it belongs to the subclass is greater than or
equal to the given threshold.
The subclass is then a subclass of the superclass with the membership degree,
which is the minimum in the membership degree to which these objects belong
to the subclass.
Formally, let A and B be (fuzzy) classes and be a given threshold. We say B
is a subclass of A if
( e) (
B
(e)
A
(e))
The membership degree that B is a subclass of A should be min
B (e)
(
B
(e)).
Here, e is the object instance of A and B in the universe of discourse, and
A
(e)
and
B
(e) are membership degrees of e to A and B, respectively.
It should be noted that, however, in the above-mentioned fuzzy generalization
relationship, we assume that classes A and B can only have the second level of
fuzziness. It is possible that classes A and B are the classes with membership
degree, namely, with the first level of fuzziness. Assume that we have two
classes A and B as follows:
162 Ma
A WITH degree_A DEGREE
B WITH degree_B DEGREE
Then B is a subclass of A if
( e) (
B
(e)
A
(e)) (( degree_B degree_A)
That means that B is a subclass of A only if, in addition to the requirement that
the membership degrees of all objects to A and B must be greater than or equal
to the given threshold, and the membership degree of any object to A must be
greater than or equal to the membership degree of this object to B, the
membership degrees of A and B must be greater than or equal to the given
threshold, and the membership degree of A must be greater than or equal to the
membership degree of B.
Consider a fuzzy superclass A and its fuzzy subclasses B1, B2, , Bn with
instance membership degrees
A
,
B1,
B2, ...
, and
Bn
, respectively, which may
have the degrees of membership degree_A, degree_B1, degree_B2, , and
degree_Bn, respectively. Then the following relationship is true:
(e) (max (
B1
(e),
B2
(e), ,
Bn
(e))
A
(e)) (max (degree_B1,
degree_B2, , degree_Bn) degree_A)
It can be seen that we can assess fuzzy subclass-superclass relationships by
utilizing the inclusion degree of objects to the class. Clearly such assessment is
based on the extensional viewpoint of class. When classes are defined with the
intensional viewpoint, there is no object available. Therefore, the method given
above cannot be used. At this point, we can use the inclusion degree of a class
with respect to another class to determine the relationships between fuzzy
subclass and superclass. The notion of inclusion degree was originally developed
in Ma, Zhang, and Ma (1999) for assessment of data redundancy in fuzzy
relational databases. In Ma, Zhang, and Ma (2004), the inclusion degree is
extended to evaluate the membership degree of an object to a class and further
the relationships between fuzzy subclass and superclass.
Formally, let A and B be (fuzzy) classes and the degree that B is the subclass of
A be denoted by (A, B). For a given threshold , we say B is a subclass of A
if
(A, B)
The membership degree that B is a subclass of A is clearly (A, B).
Now let us consider the situation in which classes A or B are the classes with
membership degree, namely, with the first level of fuzziness. Assume that we
have two classes A and B as follows:
B WITH degree_B DEGREE
Then B is a subclass of A if
( (A, B) ) (( degree_B degree_A)
This means that B is a subclass of A only if, in addition to the requirement that
the inclusion degree of A with respect to B must be greater than or equal to the
given threshold, the membership degrees of A and B must be greater than or equal
to the given threshold, and the membership degree of A must be greater than or
equal to the membership degree of B.
The inclusion degree of a (fuzzy) subclass with respect to the (fuzzy) superclass
can be calculated according to the inclusion degree of the attribute domains of
the subclass with respect to the attribute domains of the superclass as well as the
weight of attributes. The methods for evaluating the inclusion degree of fuzzy
attribute domains and further evaluating the inclusion degree of a subclass with
respect to the superclass were developed in Ma, Zhang, and Ma (2004). It should
be noted that in this work (Ma, Zhang, & Ma, 2004), the relationship between
subclass and superclass with the first level of fuzziness was not discussed.
In subclasssuperclass hierarchies, a critical issue is multiple inheritance of
class. Ambiguity arises when more than one of the superclasses have common
attributes, and the subclass does not declare explicitly the class from which the
attribute was inherited. At this moment, the conflicting attribute in the super-
classes is inherited by the subclass dependent on their weights to the correspond-
ing superclasses (Liu & Song, 2001; Ma, Zhang, & Ma, 2004). It should also be
noted that in a fuzzy multiple inheritance hierarchy, the subclass has different
degrees with respect to different superclasses, which is not the same situation
as in classical object-oriented database systems.
In order to represent a fuzzy generalization relation, a dashed peculiar triangular
arrowhead is applied. Figure 7 shows a fuzzy generalization relationship. Classes
Young Student and Young Faculty are all classes with the second level of
fuzziness. These classes may have some instances (objects) that belong to the
164 Ma
classes with membership degree. These two classes can be generalized into
class Youth, a class with the second level of fuzziness.
Fuzzy Aggregation
An aggregation captures a whole-part relationship between an aggregate and a
constituent part. These constituent parts can exist independently. Therefore,
every instance of an aggregate can be projected into a set of instances of
constituent parts. Let A be an aggregation of constituent parts B1, B2, , and Bn.
For e A, the projection of e to Bi is denoted by e
Bi
. Then we have (e
B1
)
B1, (e
B2
) B2, , (e
Bn
) Bn.
A class aggregated from fuzzy constituent parts must be fuzzy. If the former is
still called aggregate, the aggregation is fuzzy. At this point, a class is an
aggregation of constituent parts with membership degree of [0, 1]. Correspond-
ingly, we have the following method for determining a fuzzy aggregation
relationship:
1. For any (fuzzy) object, if the membership degree to which it belongs to the
aggregate is less than or equal to the membership degree to which its
projection to each constituent part belongs to the corresponding constituent
part.
2. The membership degree to which it belongs to the aggregate is greater than
or equal to the given threshold.
The aggregate is then an aggregation of the constituent parts with the member-
ship degree, which is the minimum in the membership degrees to which the
projections of these objects to these constituent parts belong to the corresponding
constituent parts.
Let A be a fuzzy aggregation of fuzzy class sets B1, B2, , and Bn, with instance
membership degrees that are
A
,
B1,
B2, ...
, and
Bn
, respectively. Let be a given
threshold. Then,
Figure 7. A fuzzy generalization relationship

Young Faculty
Youth
Young Student
( e) (e A
A
(e) min (
B1
(e
B1
),
B2
(e
B2
), ...,
Bn
(e
Bn
)))
That means that a fuzzy class A is the aggregate of a group of fuzzy classes B1,
B2, , and Bn if for any (fuzzy) instance object, if the membership degree to
which it belongs to class A is less than or equal to the membership degree to which
its projection to B1, B2, , and Bn, say Bi (1 i n), belongs to class Bi. Besides,
for any (fuzzy) instance object, the membership degree to which it belongs to
class A is greater than or equal to the given threshold. The membership degree
that A is an aggregation of class sets B1, B2, , and Bn should be min
Bi (eBi)
(
Bi
(e
Bi
)) (1 i n). Here, e is object instance of A.
Now let us consider the first level of fuzziness in the above-mentioned classes
A, B1, B2, , and Bn, namely, they are the fuzzy classes with membership
degrees. Let
A WITH degree_A DEGREE,
B1 WITH degree_B1 DEGREE,
B2 WITH degree_B2 DEGREE,
Bn WITH degree_Bn DEGREE.

Then A is an aggregate of B1, B2, , and Bn if
( e) (e A
A
(e) min (
B1
(e
B1
),
B2
(e
B2
), ...,
Bn
(e
Bn
))
degree_A min (degree_B1, degree_B2, , degree_Bn)).
Here is a given threshold.
It should be noted that the assessment of fuzzy aggregation relationships given
above is based on the extensional viewpoint of class. Clearly these methods
cannot be used if the classes are defined with the intensional viewpoint, because
there is no object available. In the following, we present how to determine a fuzzy
aggregation relationship using the inclusion degree.
Let A be a fuzzy aggregation of fuzzy class sets B1, B2, , and Bn, and b be a
given threshold. Also let the projection of A to Bi be denoted by A
Bi
. Then,
min ( (B1, A
B1
), (B2, A
B2
), ..., (Bn, A
Bn
))
166 Ma
Here (Bi, A
Bi
) (1 i n) is the degree to which Bi semantically includes A
Bi
.
The membership degree to which A is an aggregation of B1, B2, , and Bn is min
( (B1, A
B1
), (B2, A
B2
), ..., (Bn, A
Bn
)).
Furthermore, the expression above can be extended for the situation in which
A, B1, B2, , and Bn may have the first level of fuzziness, namely, they may be
the fuzzy classes with membership degrees. Let be a given threshold and
B1 WITH degree_B1 DEGREE
B2 WITH degree_B2 DEGREE
Bn WITH degree_Bn DEGREE

Then A is an aggregate of B1, B2, , and Bn if
min ( (B1, A
B1
), (B2, A
B2
), ..., (Bn, A
Bn
)) degree_A min
(degree_B1, degree_B2, , degree_Bn))
A dashed open diamond is used to denote a fuzzy aggregate relationship. A fuzzy
aggregation relationship is shown in Figure 8. A car is aggregated by engine,
interior, and chassis. In Figure 8, the engine is old, and we have a fuzzy class Old
Engine with the second level of fuzziness. Class Old Car aggregated by classes
interior and chassis and fuzzy class old engine is a fuzzy one with the second
level of fuzziness.
Fuzzy Association
Two levels of fuzziness can be identified in the association relationship. The first
level of fuzziness means that an association relationship fuzzily exists in two
associated classes, namely, this association relationship occurs with a degree of
Figure 8. A fuzzy aggregation relationship

Old Car
Old Engine Interior Chassis
possibility. Also, it is possible that it is unknown for certain if two class instances
respectively belonging to the associated classes have the given association
relationship, although this association relationship must occur in these two
classes. This is the second level of fuzziness in the association relationship and
is caused because an instance belongs to a given class with membership degree.
It is possible that the two levels of fuzziness mentioned above may occur in an
association relationship simultaneously. That means that two classes have a
fuzzy association relationship at a class level on one hand. On the other hand, the
class instances of these two classes may have a fuzzy association relationship
at the class instance level.
We can place a pair of words WITH mem DEGREE (0 mem 1) after the role
name of an association relationship to represent the first level of fuzziness in the
association relationship. We use a double line with an arrowhead to denote the
second level of fuzziness in the association relationship. Figure 9 shows two
levels of fuzziness in fuzzy association relationships. In part (a), it is uncertain
if the CD player is installed in the car, and the possibility is 0.8. Classes CD
Player and Car have the association relationship installing with an 0.8
membership degree. In part (b), it is certain that the CD player is installed in the
car, and the possibility is 1.0. Classes CD Player and Car have an association
relationship installing with 1.0 membership degree. But at the level of instances,
there exists the possibility that the instances of classes CD Player and Car may
or may not have the association relationship installing. In part (c), two kinds of
fuzzy association relationships in parts (a) and (b) arise simultaneously.
Figure 9. Fuzzy association relationships

Car
installing WITH 0.8 DEGREE
CD Player

CD Player Car
installing
(a)
(c)
(b)

Car
installing WITH 0.8 DEGREE
CD Player
168 Ma
It has been shown above that three levels of fuzziness can occur in classes. The
classes with the second level of fuzziness generally result in the second level of
fuzziness in the association, if this association definitely exists (that means there
is no first level of fuzziness in the association). Let A and B be two classes with
the second level of fuzziness. Then, the instance e of A is one with membership
degrees
A
(e), and the instance f of B is one with membership degrees
B
(f).
Assume that the association relationship between A and B, denoted ass (A, B),
is one without the first level of fuzziness. It is clear that the association
relationship between e and f, denoted ass (e, f), is one with the second level of
fuzziness, i.e., with membership degree, which can be calculated by the
following:
(ass (e, f)) = min (
A
(e),
B
(f))
The first level of fuzziness in the association relationship can be indicated
explicitly by the designers, even if the corresponding classes are crisp. Assume
that A and B are two crisp classes and ass (A, B) is the association relationship
with the first level of fuzziness, denoted ass (A, B) WITH degree_ass DEGREE.
At this moment,
A
(e) = 1.0 and
B
(f) = 1.0. Then,
(ass (e, f)) = degree_ass
The classes with the first level of fuzziness generally result in the first level of
fuzziness of the association, if this association is not indicated explicitly. Let A
and B be two classes only with the first level of fuzziness, denoted A WITH
degree_A DEGREE and B WITH degree_B DEGREE, respectively. Then the
association relationship between A and B, denoted ass (A, B), is one with the first
level of fuzziness, namely, ass (A, B) WITH degree_ass DEGREE. Here
degree_ass is calculated by the following:
degree_ass = min (degree_A, degree_B)
For the instance e of A and the instance f of B, in which
A
(e) = 1.0 and
B
(f)
= 1.0, we have:
(ass (e, f)) = degree_ass = min (degree_A, degree_B)
Finally, let us focus on a situation in which the classes are the first level and the
second level of fuzziness, and there is an association relationship with the first
level of fuzziness between these two classes, which is explicitly indicated. Let
A and B be two classes with the first level of fuzziness, denoted A WITH
degree_A DEGREE and B WITH degree_B DEGREE, respectively. Let ass
(A, B) be the association relationship with the first level of fuzziness between A
and B, which is explicitly indicated with WITH degree_ass DEGREE. Also, let
the instance e of A be with membership degrees
A
(e), and the instance f of B
be with membership degrees
B
(f). Then we have:
(ass (e, f)) = min (
A
(e),
B
(f), degree_A, degree_B, degree_a)
Fuzzy Dependency
Let us now focus on the fuzzy dependency relationship between the source class
and the target class. The dependency relationship is only related to the classes
and does not require a set of instances for its meaning. Therefore, the second-
level fuzziness and the third-level fuzziness in class do not affect the dependency
relationship.
Fuzzy dependency relationship is a dependency relationship with a degree of
possibility. Just like the fuzzy association relationship above, the fuzzy depen-
dency relationship can be indicated explicitly by the designers or be implied
implicitly by the source class based on the fact that the target class is decided
by the source class. Assume that the source class is fuzzy, with the first level of
fuzziness. The target class must be fuzzy, with the first level of fuzziness. The
degrees of possibility that the target class is decided by the source class are the
same as the membership degrees of source classes. For source class Employee
WITH 0.85 DEGREE, for example, the target class Employee Dependent
should be Employee Dependent WITH 0.85 DEGREE. The dependency
relationship between Employee and Employee Dependent should be fuzzy, with
an 0.85 degree of possibility. Notice that, not being like the fuzzy association
relationship, only one level of fuzziness can be identified in a dependency
relationship, which is implied by the first level of fuzziness of the source class if
it is not given explicitly.
Figure 10. Fuzzy dependency relationship

Employee WITH 0.5 DEGREE Dependent WITH 0.5 DEGREE
170 Ma
Because the fuzziness of a dependency relationship is denoted implicitly by the
first level of fuzziness of the source class, a dashed line with an arrowhead can
still be used to denote the fuzziness in the dependency relationship. Figure 10
shows a fuzzy dependency relationship.
An Illustrative Example
In Figure 11, we give a simple fuzzy UML data model utilizing some notations
introduced in this chapter. Class Car is a superclass, and New Car and Old Car
are its two fuzzy subclasses, namely, they may have fuzzy instances. Similarly,
class Employee has three fuzzy subclasses: Young Employee, Middle Em-
ployee, and Old Employee. Classes Employee and Car have a fuzzy association
relationship using, which has a fuzziness at the second level. Again, fuzzy
classes Young Employee and New Car have a fuzzy association relationship
like, which has fuzziness at the first level. In addition, class Car is aggregated
by three classes: Engine, Chassis, and Interior. Class Engine has three
attributes. The attributes Id and turbo have crisp values, whereas size is a fuzzy
attribute that can take a fuzzy value. Classes Chassis and Interior are crisp
classes, and they have no fuzziness at the three levels.
Related Work
By using fuzzy set theory, Zvieli and Chen (1986) introduced three levels of
fuzziness in the ER model, corresponding to three levels of database abstract:
schema (metadata), instance (data), and value (data element). At the first level,
entity sets, relationships, and attribute sets may be fuzzy they have an
associated membership degree in the model. The second level is related to the
Figure 11. A fuzzy UML data model

Old Employee Middle Employee
Dependent Car
Old Car
using
New Car
liking WITH 0.9 DEGREE
Engine

ID
Turbo
FUZZY Size
Interior

ID
Dashboard
Seat
Chassis

ID

Employee
Young Employee
fuzzy occurrences of entities and relationships. Such fuzziness means that
instances have membership degree with respect to the entity and the relation-
ship. The third level concerns the fuzzy attribute values of special entities and
relationships. Consequently, ER algebra was fuzzily extended to manipulate
fuzzy data. Based on a fuzzy ER model (Chaudhry, Moyne, & Rundensteiner,
1999), a methodology for design and development of fuzzy relational databases
was proposed through the rules developed for mapping fuzzy ER schema to fuzzy
relational databases.
Based on fuzzy set theory and the fuzzy ER model (Chen & Kerre, 1998), several
major notions in the EER model were extended, including fuzzy extension to
generalization/specialization, and shared subclass/category as well as fuzzy
selective inheritance, and fuzzy inheritance for derived attributes. The full fuzzy
extension to EER and the graphical representations were presented in Ma,
Zhang, Ma, and Chen (2001). In particular, the formal approach to mapping a
fuzzy EER model to a fuzzy object-oriented database schema was provided in
Ma, Zhang, Ma, and Chen (2001).
In addition to the ER/EER model, the IFO data model (Abiteboul & Hull, 1987)
is a mathematically defined conceptual data model that incorporates the funda-
mental principles of semantic database modeling within a graph-based represen-
tational framework. The extensions of IFO to deal with fuzzy information were
proposed in the literature (Vila, Cubero, Medina, & Pons, 1996; Yazici, Buckles,
& Petry, 1999). In Vila, Cubero, Medina, and Pons (1996), several types of
imprecision and uncertainty, such as the values without semantic representation,
the values with semantic representation and disjunctive meaning, the values with
semantic representation and conjunctive meaning, and the representation of
uncertain information, were incorporated into the attribute domain of the object-
based data model. However, some major concepts in object-based modeling, i.e.,
superclass-subclass, class inheritance, etc., were not discussed. In addition to
the attribute-level uncertainty, the uncertainty was considered to be at the object
and class level in Vila, Cubero, Medina, and Pons (1996). In Yazici, Buckles, and
Petry (1999), two levels of uncertainty, namely, the level of attribute values and
the level of entity instances, were considered, and the ExIFO model was hereby
developed. In addition, the mapping that transforms the ExIFO model into fuzzy
nested relational databases was provided in Yazici, Buckles, and Petry (1999).
It should be pointed out that, however, fuzzy extensions to EER and IFO in the
literature (Chen & Kerre, 1998; Ma, Zhang, Ma, & Chen, 2001; Vila, Cubero,
Medina, & Pons, 1996; Yazici, Buckles, & Petry, 1999) took into account only
the second level of fuzziness of entity/class when the major notions in these data
models were extended. This chapter differs from this literature in the following
ways: first, several new notions in the UML, such as association and depen-
dency, were extended; and second, the first level of fuzziness and second level
172 Ma
of fuzziness in entity/class were completely considered when the major notions
in the UML were extended.
Conclusions
We present a fuzzy extended UML to cope with fuzzy as well as complex objects
in the real world at a conceptual level. Different levels of fuzziness are
introduced into the class diagram of the UML, and the corresponding graphical
representations are developed. It is not difficult to see that the classical UML is
essentially a subset of the fuzzy UML. When there is not any fuzziness in the
universe of discourse, the fuzzy UML can be reduced to the classical UML.
The focus of this chapter is on fuzzy data modeling in the UML. As we know,
the UML can be used for knowledge modeling, and knowledge may generally be
imprecise and uncertain. In future work, we will concentrate on the study of class
operations, constraints, and rules in the fuzzy UML modeling. In addition,
mapping the fuzzy UML data model into object-oriented databases will be
interesting.
References
Abiteboul, S., & Hull, R. (1987). IFO: A formal semantic database model. ACM
Transactions on Database Systems, 12(4), 525565.
Ambler, S. W. (2000a). The design of a robust persistence layer for relational
databases. Retrieved from the World Wide Web: http://www.ambysoft.com/
persistenceLayer.pdf
Ambler, S. W. (2000b). Mapping objects to relational databases. Retrieved from
the World Wide Web: http://www.AmbySoft.com/mappingObjects.pdf
Baldwin, J. F., Cao, T. H., Martin, T. P., & Rossiter, J. M. (2000). Toward soft
Blaha, M., & Premerlani, W. (1999). Using UML to design database applica-
tions. Retrieved from the World Wide Web: http://www.therationaledge.com/
rosearchitect/mag/archives/9904/f8.html
Booch, G., Rumbaugh, J., & Jacobson, I. (1998). The Unified Modeling
Language user guide. Reading, MA: Addison-Wesley.
model for managing vague and uncertain information. International
Journal of Intelligent Systems, 14, 623651.
Bosc, P., & Prade, H. (1993). An introduction to fuzzy set and possibility theory
based approaches to the treatment of uncertainty and imprecision in
database management systems. In Proceedings of the Second Workshop
on Uncertainty Management in Information Systems: From Needs to
Solutions.
Buckles, B. P., & Petry, F. E. (1982). A fuzzy representation of data for
relational database. Fuzzy Sets and Systems, 7(3), 213226.
Chaudhry, N. A., Moyne, J. R., & Rundensteiner, E. A. (1999). An extended
database design methodology for uncertain data management. Informa-
tion Sciences, 121(12), 83112.
Chen, G. Q., & Kerre, E. E. (1998). Extending ER/EER concepts towards fuzzy
conceptual data modeling. In Proceedings of the 1998 IEEE Interna-
tional Conference on Fuzzy Systems, 2, 13201325.
Chen, P. P. (1976). The entity-relationship model: Toward a unified view of data.
ACM Transactions on Database Systems, 1(1), 936.
Conrad, R., Scheffiner, D., & Freytag, J. C. (2000). XML conceptual modeling
using UML. In Proceedings of the 19
th
International Conference on
Conceptual Modeling (pp. 558571).
systems. Fuzzy Sets and Systems, 113, 1936.
Cross, V., Caluwe, R., & Vangyseghem, N. (1997). A perspective from the
Fuzzy Object Data Management Group (FODMG). In Proceedings of the
1997 IEEE International Conference on Fuzzy Systems, 2, 721728.
Dubois, D., Prade, H., & Rossazza, J. P. (1991). Vagueness, typicality, and
uncertainty in class hierarchies. International Journal of Intelligent
Systems, 6, 167183.
George, R., Srikanth, R., Petry, F. E., & Buckles, B. P. (1996). Uncertainty
management issues in the object-oriented data model. IEEE Transactions
on Fuzzy Systems, 4(2), 179192.
Gyseghem, N. V., & Caluwe, R. D. (1998). Imprecision and uncertainty in UFO
database model. Journal of the American Society for Information
Science, 49(3), 236252.
Lee, J., Xue, N. L., Hsu, K. H., & Yang, S. J. (1999). Modeling imprecise
requirements with fuzzy objects. Information Sciences, 118, 101119.
174 Ma
Liu, W. Y., & Song, N. (2001). The fuzzy association degree in semantic data
models. Fuzzy Sets and Systems, 117(2), 203208.
Ma, Z. M., Zhang, W. J., & Ma, W. Y. (1999). Assessment of data redundancy
in fuzzy relational databases based on semantic inclusion degree. Informa-
tion Processing Letters, 72(12), 2529.
Ma, Z. M., Zhang, W. J., & Ma, W. Y. (2004). Extending object-oriented
databases for fuzzy information modeling. Information Systems, 29(5),
421435.
Ma, Z. M., Zhang, W. J., Ma, W. Y., & Chen, G. Q. (2001). Conceptual design
of fuzzy object-oriented databases using extended entity-relationship model.
International Journal of Intelligent Systems, 16, 697711.
Marn, N., Medina, J. M., Pons, O., Snchez, D., & Vila, M. A. (2003). Complex
object comparison in a fuzzy context. Information and Software Technol-
ogy, 45(7), 431444.
Marn, N., Vila, M. A., & Pons, O. (2000). Fuzzy types: A new concept of type
Systems, 15, 10611085.
Mili, F., Shen, W., Martinez, I., Noel, Ph., Ram, M., & Zouras, E. (2001).
Knowledge modeling for design decisions. Artificial Intelligence in
Engineering, 15, 153164.
Naiburg, E. (2000). Database modeling and design using Rational Rose 2000.
Retrieved from the World Wide Web: http://www.therationaledge.com/
rosearchitect/mag/current/spring00/f5.html
OMG. (2001). Unified Modeling Language (UML), version 1.4. Retrieved from
the World Wide Web: http://www.omg.org/technology/documents/formal/
uml.htm
Prade, H., & Testemale, C. (1984). Generalizing database relational algebra for
the treatment of incomplete or uncertain information and vague queries.
Information Sciences, 34, 115143.
Raju, K. V. S. V. N., & Majumdar, K. (1988). Fuzzy functional dependencies
and lossless join decomposition of fuzzy relational database systems. ACM
Transactions on Database Systems, 13(2), 129166.
Vila, M. A., Cubero, J. C., Medina, J. M., & Pons, O. (1996). A conceptual
approach for deal with imprecision and uncertainty in object-based data
models. International Journal of Intelligent Systems, 11, 791806.
Yazici, A., Buckles, B. P., & Petry, F. E. (1999). Handling complex and
uncertain information in the ExIFO and NF
2
data models. IEEE Transac-
tions on Fuzzy Systems, 7(6), 659676.
Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8(3), 338353.
Zadeh, L. A. (1978). Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets
and Systems, 1(1), 328.
Zvieli, A., & Chen, P. P. (1986). Entity-relationship modeling and fuzzy
databases. In Proceedings of the 1986 IEEE International Conference
on Data Engineering (pp. 320327).
176 Ma
SECTION III
A Framework to Build Fuzzy Object-Oriented Capabilities 177
Chapter VI
A Framework to Build
Capabilities Over
an Existing
Database System
Fernando Berzal
University of Granada, Spain
Nicols Marn
Olga Pons
M. Amparo Vila
Abstract
Fuzzy object-oriented database models allow the representation, storage,
and retrieval of complex imperfect information according to the object-
oriented data paradigm. This chapter describes both a framework and an
architecture that can be used to develop fuzzy object-oriented capabilities
using the conventional features of the object-oriented data paradigm. We
present a framework composed of a set of classical classes, which gives
178 Berzal, Marn, Pons, & Vila
support to fuzzily described complex objects. We also explain how to deal
with fuzzy extensions of object-oriented features using as a basis, the
conventional object-oriented features. This proposal can be used to build
a fuzzy object-oriented database system, by taking as a base an existing
database system and minimizing the development effort.
Introduction
In the last decade, an important group of database researchers focused its
studies on the adaptation of existing data models to imperfect information
management, most using the Fuzzy Subset Theory, which has proven to be a good
tool for handling this kind of information. At the same time, the object-oriented
data paradigm increased in popularity among programmers and designers, mainly
due to its powerful modeling capabilities.
Most of the commercial database management systems that allow the manipu-
lation of objects belong to the following two categories:
1. Object-oriented database management systems (OODBMSs) (Berler et
al., 2000)
2. Object-relational database management systems (ORDBMSs) (Stonebraker
et al., 1999)
On the one hand, object-oriented databases are designed to easily work with
object-oriented programming languages such as Java, C#, and C++. OODBMSs
use the same model as object-oriented programming languages. In spite of the
difficulties and complexity involved by this approach, some commercial products
can be found (like O2
, ObjectStore
, Objectivity
, and Versant
), although they
represent only a small part of the market.
On the other hand, ORDBMSs span object and relational technology. Many of
the traditional relational products now incorporate the object-relational frame-
work (like Oracle
and Postgres
).
Nowadays, most of the development efforts in the software world use the object-
oriented data paradigm to represent and manipulate their data. When these
applications are related to soft computing, then fuzzy modeling and representa-
tion capabilities are required.
In the world of databases, this fact has motivated the study and development of
fuzzy object-oriented database modeling tools. They arise from the combination
of object-oriented and fuzzy concepts in order to permit the representation of
complex imperfect information (Kuo et al., 2001; Caluwe, 1997).
Background
The beginning of the study of fuzziness in object-oriented models is in close
relation with advanced semantic data models (Ruspini, 1986; Zivieli et al., 1986;
Vanderberghe et al., 1991). After these initial steps, many relevant works can
be found in the literature:
1. J. -P. Rossazza et al. (1998) introduced an hierarchical model of fuzzy
classes, explaining important notions (e.g., the tipicality concept) by means
of the use of fuzzy sets.
2. George et al. (1993) began to use similarity relationships in order to model
attribute value imperfection. The work of George et al. was completed by
Koyuncu et al. (2003), giving as a result, IFOOD, an intelligent fuzzy object-
oriented model.
3. G. Bordogna et al. (1994) introduced an extended graphical notation to
represent fuzzy object-oriented information.
4. N. Van Gyseghem et al. (1998) developed the UFO model, one of the most
complete proposals that can be found in the literature.
Other relevant works in this area can be consulted (Na et al., 1996, 1996b;
Baldwin et al., 2000, 2000b; Cao, 2001). Some define complex algebraic models,
while others are focused on the logic world. Even the entity/relationship model
is being studied as a design tool for object-oriented databases (Ma et al., 2001).
Motivation and Organization
Different trends exist in this research area. Proposals vary from new fuzzy
object-oriented data models to adaptations of the classical object-oriented model
to allow the storage of fuzzy information. From our point of view, the main
drawback of the first kinds of proposals is that when a new fuzzy object-oriented
data model is proposed, a new system needs to be developed in order to allow
users to interact with the new data model. And developing specific systems from
scratch is a hard task that may not be profitable from a commercial point of view.
(In fact, OODBMSs, which have a wider set of intended users, have had a lot
of problems from the commercial point of view.)
However, the approaches belonging to the second trend need only the implemen-
tation of a translation layer over an existing database system. Performance
would probably be lower, but the development effort would be much lower too.
Recently, we developed a proposal to represent fuzzy information in an object-
oriented data model. Our research is mainly motivated by the need for an easy-
to-use transparent mechanism to develop applications dealing with fuzzy infor-
mation. Following our proposal, programmers and designers should be able to
directly use new structures developed to store fuzzy information without the
need for any special treatment, without altering the underlying programming
platform, and with the most possible transparency. Our proposal allows the
programmer to handle data imperfection in an important set of the situa-
tions, where it can appear in an object-oriented software development
effort.
This proposal resulted in the implementation of a framework that can be used in
two ways:
1. Programmers and designers can directly use our proposal over an existing
conventional database system.
2. The proposal could be the basis for the development of a fuzzy object-
oriented database system built over an existing conventional database
system.
As we will see in this chapter, the underlying system must include some object-
oriented capabilities among its characteristics. In fact, though existing OODBMSs
are a good choice, some advanced ORDBMSs could also be used, like the last
versions of Oracle RDBMS.
This chapter is devoted to the explanation of our proposal and is organized as
follows, in sections: Fuzziness and Object-Orientation describes the main
features of our proposal for dealing with fuzziness in an object-oriented context;
in A Supporting Framework section, we explain how to deal with fuzziness
using classical object-oriented concepts as the basis of the discussion; A
FOODBS Architecture presents an architecture that can be used to develop a
system able to store fuzzy information in a classical object-oriented system using
the framework described in previous parts of the chapter. Some concluding
remarks and future work trends end the chapter.
Fuzziness and Object Orientation
Fuzziness and object orientation can be combined from different points of view.
We can consider the case of fuzzily described objects, that is, those objects with
attribute values that are fuzzy values. On the other hand, we can consider the
fuzzification of different concepts of the object-oriented data model, such as the
concept of type or the concept of inheritance. The following two subsections
describe these two matters.
Objects Fuzzily Described
To consider that an object is fuzzily described implies to consider that its state
is fuzzy, that is, that its attributes have fuzzy values. The capability of handling
fuzzy attribute values must be built in the system, taking into account different
semantic interpretations of the domains where these values are defined.
Models exist (George et al., 1993; Yazici et al., 1998) that make a clear
distinction between conjunctive and disjunctive semantics of fuzzy attribute
values. Models also exist (Bordogna et al., 1994) that allow labels to be used in
attribute values, as these labels are possibility distributions defined over a
reference set of possible values. We take as a basis these theoretical consider-
ations and present a complete way to treat imperfect information at this level: the
expressiveness of linguistic labels is used to set imprecise values, but we
consider the different semantics that these labels may have according to the
characteristics of the domain in which they are defined.
Imprecise Attribute Values
The reasons why an attribute value can be ill-defined may differ, from an actual
ill-knowledge of the datum to an in-nature imprecision affecting the domain of
the attribute. Suppose that we want to describe rooms in our database. Let us
consider the following sentences that describe a given room:
1. The room is big.
2. The room is of high quality.
Both sentences are expressed with some lack of precision. We use linguistic
labels to express the imprecise values in each of these sentences, but each label
matches a different semantic pattern. An underlying basic domain exists below
the label used in the first sentence (positive real numbers). In contrast, it is not
easy to find such an underlying domain for the label high of the last sentence.
Labels without Semantic Representation
This case brings together those sentences similar to sentence (2) The room is
of high quality. In those situations, the domain of the attribute is a set of
linguistic labels (e.g., high, regular, low). We cannot define the semantics of the
labels by means of fuzzy sets built over an underlying domain. The imprecision
embedded in each of the labels forces us to use resemblance relationships to
compare values of the domain, instead of the classical equality. (For example,
Table 1 contains the definition of a resemblance relation for quality labels.)
Labels with Semantic Representation
We saw the way to represent the quality of a room. Let us now see how to
represent the extension of the room. In this case, the attribute has a well-defined
basic domain (usually a bounded subset of the real interval), and the labels that
stand for ill-defined values can easily be described by means of fuzzy subsets
defined over the aforesaid basic domain. In fact, the actual value will be one of
the values of the support set. That is, the semantics of the label is a possibility
distribution of values of the underlying basic domain.
Consider, for example, the attribute extension of the class Room. The basic
underlying domain of this attribute is the interval [0, ), and we add to the domain
the set of labels {small, middle-size, big} with definitions that are represented in
Figure 1.
In this case, we can also use the concept of resemblance relationship to compare
labels. Nevertheless, this relationship must be built as an extension of the
classical equality that holds in every set, because we have an underlying basic
domain with values that must be taken into account. If D stands for the domain,
B stands for the basic underlying domain, and L stands for the set of labels,
Equation 1 shows a possible resemblance relationship:
otherwise z z
B z x L l y
B z y L l x
z
B y x y x
B y x y x
y x
y x B z
l
S
)) ( ) ( ( sup
)) ( ) ((
)) ( ) ((
) (
) , ( ) ( 0
) , ( ) ( 1
) , (

(1)
Table 1. Quality attribute values
High Regular Low
High 1 0.8 0
Regular 1 0.8
Low 1
Fuzzy Collections
We now know how to deal with disjunctive fuzzy sets of values. However, we
may have to use fuzzy collections of values in order to express some information
about the object we want to represent in the system. These collections of values
have a conjunctive interpretation and, thus, need special treatment. For example,
consider that we want to represent the set comprised of students who attend their
lessons in a given room. We can relate each student with a room, taking into
account the amount of daytime he or she spends in this room attending his or her
lessons. According to this, the set of student of a room may be expressed as
follows:
(st
1
)/st
1
+ (st
2
)/st
2
+ . + (st
n
)/st
n
where st
i
is a student, and (st
i
) is the degree with which the student belongs to
the room.
If we want to represent this kind of fuzzy value in our system, we also need
suitable operators to compare the fuzzy values, taking into account that, now, the
semantics of the fuzzy set are conjunctive.
Conjunctive fuzzy set comparison is often done by means of the concept of
inclusion:
A = B if and only if (A B) (B A)
To compute the inclusion degree of a fuzzy set A in a fuzzy set B, we can use the
following operator (Rossaza et al., 1998):
Figure 1. Labels for attribute extension
N(B|A) = min
U
{I(
A
(u),
B
(u))}
where I stands for a fuzzy implication operator, and U is the reference set where
A and B are defined. The implication operator can be chosen in accordance with
the properties we want the inclusion degree to fulfill. For example, you can use
the following:
otherwise x y
y x if
y x I
/
1
) , (
It frequently happens that the elements of U are fuzzily described objects. In
these situations, for a given element in the set A, it is not clear which element of
B has to be taken in order to compare the membership degrees. To perform
comparisons among this kind of fuzzy collections, we proposed (Marn et al.,
2003) the following set of operators:
1. An inclusion operator (), which takes into account resemblance between
the elements being compared ( stands for a t-norm):
) , ( max min ) | (
, ,
y x A B
S B A
U y U x
S

where
)) , ( )), ( ), ( ( ( ) , (
, ,
y x y x I y x
S B A S B A

2. A generalized resemblance operator (), which considers both inclusion
directions and which can be weighted with a cardinality ratio ():
)) | ( ), | ( ( ) , (
,
B A A B B A
S S S

1,

( , ) min(| |,| |)
,
max(| |,| |)
if A B
A B A B
otherwise
A B
= =
F =

Comparison of Fuzzily Described Objects
We have seen how to deal with imprecision when it appears in attribute values.
Linguistic labels and resemblance measures from fuzzy subsets theory are the
tools that allow us to handle fuzzily described objects. Until this point, we can
compare a pair of attribute values if they are defined in standard basic domains
or in the set of imprecise domains we described in the previous paragraphs. But,
how can we compare two complex objects of a given class when they are fuzzily
described?
The example illustrated in Figure 2 will help us to introduce the problem of object
comparison. The figure depicts the information of two objects of a given class
Room. Every room is described by its quality and extension (as we considered
in previous examples), as well as the floor each room is on. The set of students
who attend their lessons in each room is fuzzy, and students are also fuzzily
described by name, age, and height. Notice that the description of the objects
belonging to class Room is imprecise due to the following reasons:
1. The quality is expressed by an imprecise label.
2. The attributes extension and floor can be expressed using a numerical
value or an imprecise label.
Room
1

(0.5)Quality: high
(0.8)Extension: 30m
2
(1)Floor: 4
(1)Student: 1/stu
1
+1/stu
2
+
0.8/stu
3
+0.5/stu
4
Room
2

Quality: regular
Extension: big
Floor: high
Students: 1/stu
1
+1/stu
5
+
0.75/stu
3
+0.6/stu
6
Student
1

(1)Name: John
(0.75)Age: young

(0.75)Height: 1.85m
Student
2

Name: Peter
Age: young

Height: 1.7m
Student
3

Name: Mary
Age: middle-aged

Height: short
Student
4

Name: Tom
Age: 24

Height: tall
Student
5

Name: Peter
Age: 25

Height: medium
Student
6

Name: Tom
Age: young

Height: 1.9m
=?
Figure 2. Problem of object comparison
3. The set of students is fuzzy, taking into account the percentage of time each
student spends receiving the lessons in each room to compute the member-
ship degrees.
To compare both rooms, we need to compare every couple of attribute values.
To do that, we know how to handle resemblance in basic domains (quality,
extension, and floor) and how to compare fuzzy collections of fuzzily described
objects (set of students). Nevertheless, we need to solve two extra problems:
1. We need to use recursion in order to deal with complex objects (objects with
attributes values that are also objects). That is, to compute resemblance
between rooms, we have to compare students. During this process, we
have to deal with the possible presence of cycles in the data graph (i.e., it
is possible that we have to compute the resemblance of objects o
1
and o
2
in order to compute the resemblance between objects o
1
and o
2
).
2. We need to aggregate the resemblance information that we collected by
studying particular attributes. Then we must compute a general resem-
blance opinion for the whole objects.
Taking into account the ideas presented in Marn et al. (2003), we can define the
calculus of the resemblance between two objects o
1
and o
2
of a given class C,
with a type that is made up by the set of attributes Str
C
= {a
1
, a
2
, ..., a
n
}, by means
of a function FE:
FE: F
C
O(F
C
) O(F
C
) P(P
2
(O(F
C
))) P(P
2
(O(F
C
))) [0,1]
where F
C
is the family of all the classes, and O(F
C
) is the set of all the class
instances. P stands for the power set, and P
2
represents those members of the
power set whose cardinality is 2 (i.e., pairs of objects in our context). The
calculus of FE(C, o
1
, o
2
,
visited
,
aprox
) involves the recursive computation
described below.
There are two basic cases:
1. When the identity equality holds between the objects:
If o
1
= o
2
, then FE = 1
2. When a known defined resemblance relation exists in the class: As a
particular example, when we compare two fuzzy sets of objects, we can use
a generalized resemblance degree (Marn et al., 2003) that recursively
compares the elements in the sets.
If there exists a resemblance relation S defined in C, then: FE = m
S
(o
1
,o
2
).
In particular, if o
1
and o
2
are fuzzy sets, then:
FE=
FE,
(o
1
, o
2
) = (
FE
(o
2
|o
1
),
FE
(o
1
|o
2
)) , where
FE
(o|o')=min
xSpp(o)
max
ySpp(o)
{I((
o'
(x),
o
(y)) FE(C
D
,x,y,
visited
approx
)}
where C
D
stands for the class that is the reference universe of the sets, I is
an implication operator, Spp(o) is the support set of o, and is a t-norm.
A third case provides a general recursive model that applies an aggregation
operator over recursive calls that compute the resemblance between couples of
attribute values. When aggregating, not all the attributes have the same impor-
tance (w
ai
weights the importance of the attribute a
i
). The aggregation is founded
on the semantics of a quantifier Q (by using o
Q
- orness of Q).
If {o
1
,o
2
}
visited
, then FE = V
Q
(W,R) where R contains the resemblance
values FE(C
ai
, o
1
.a
i
, o
2
.a
i
,
visited
{{o
1
,o
2
}},
approx
) if defined (C
ai
is the
domain class of the attribute a
i
), W contains the weights for attributes a
i
,
and V
Q
is Vilas aggregation operator (Vila et al., 1995 ), which is defined
as:
o
Q
max
i:raiR
{w
ai
r
ai
} + (1-o
Q
)min
i:raiR
{r
ai
(1-w
ai
)}
The fourth and fifth cases use the variables
visited
and
approx
to deal with the
existence of cycles:
1. The first time that the couple {o
1
,o
2
} produces a cycle, which is detected
because {o
1
,o
2
} is already in
visited
, then the couple is inserted into
approx
in order to compute an approximation that focuses only on nonproblematic
attributes (those that do not lead to cycles).
FE = FE(C,o
1
,o
2
,,
approx
{{o
1
,o
2
}})
2. If the couple of objects are in
approx
(i.e., its resemblance is currently being
approximated), then we do not calculate a resemblance value, and the
function FE is undefined.
Otherwise, when {o
1
,o
2
}
visited
{o
1
,o
2
}
approx
, then FE is undefined.
The above function is a resemblance relation, because the properties of the
operators used in each of the basic cases are those of a resemblance relation [see
Marn et al. (2003) for a more in-depth study.]
Fuzzy Object-Oriented Concepts
In the previous section, we studied how to deal with objects that have fuzzy
attribute values. However, this is not the only level where fuzziness may appear
in an object-oriented context. Many proposals can be found in the literature (Kuo
et al., 2001; Caluwe, 1997) where object-oriented concepts are softened so that
fuzzy object-oriented models could appear.
The addition of fuzziness can be considered at different levels of the object-
oriented model (Caluwe, 1997):
1. Attribute values
2. Relationships among objects
3. Class extents
4. Inheritance relationships
5. Definition of the type of a class
Let us describe how fuzziness can be added in these levels in order to improve
the modeling capability of the object-oriented model.
Explicit Uncertainty in Attributes Values
In the previous section, we studied the representation of imprecise attribute
values of different kinds. Nevertheless, there exists a close relationship between
imprecision and uncertainty. For example, when we say that the age of a student
is young, we use an imprecise value to express the age. However, there is an
implicit uncertainty about the age of the student: we do not know exactly the
value of the age. This implicit uncertainty is well represented by the possibility
distribution used to express the semantics of the label young. But, there may be
situations where, as well as an implicit uncertainty, we have to deal with an
explicit uncertainty. For example, consider the following sentences:
1. It is sure that the student is young.
2. It is very possible that the student is young.
3. It is probably that the student is young.
We can use different scales to express the explicit uncertainty that affects an
attribute value: we can use probability (w.r.t. possibility) measures defined
within the [0,1]-interval, linguistic labels of probability (w.r.t. possibility) with
semantic representation that is a disjunctive fuzzy set, certainty measures,
evidences, etc. Though we can express imprecision and explicit uncertainty, to
deal with them we have to take into account that they are convertible (Gonzalez
et al., 1999).
Semantically Enhanced Relationships among Objects
Semantic data models usually offer two ways for connecting objects (that can
be directly translated to the object-oriented data model):
1. Attributes values, which involve a functional approach: Using this alterna-
tive, we relate a class with the classes that are its attribute domains.
2. The aggregation construct, which is used to model those relationships that
explicitly need the definition of a class in order to represent them.
It should be noted that assuming any kind of imperfection in the functional
connection between an object and one of its attribute values is usually equivalent
to considering that the attribute domain is affected by imperfection. We studied
in previous paragraphs how to deal with imprecise and uncertain matters in
connection with this first way of relating objects.
In case the programmer decides to represent aggregations as classes, we should
take into account the following considerations:
1. We may want to represent the fact of having partial knowledge about
whether given object group (usually a pair) are related. That is, we want to
associate some truth value to the relationship.
2. We may want to consider that the connection among the objects admits
degrees of importance. This situation arises when not all the relationship
instances have the same strength. In these situations, as suggested by
Bordogna et al. (1994), we can use numerical or linguistic values to express
this strength.
Notice that the strength has a semantic interpretation different from the one
given to uncertainty. We know that the objects are related, but we consider
different strengths in their connection. Moreover, both semantic nuances can be
used at the same time, if required.
Fuzzy Class Extents
When designing the schema of a given application, we may want to use fuzzy
extents in the classes; that is, our application semantic needs may require the
gradual expression of the object membership to the class it belongs to (for
example, we can use this membership degree to express to what extent the object
is compliant with a prototypical object in the class). The membership degree is
normally valued within [0,1] interval, changing the set of objects that conform the
class (i.e., the class extent) into a fuzzy set of objects.
There are many proposals in the literature that suggest different ways in which
the membership degree of a given object to a certain class can be computed, in
case this membership degree must be inferred from its attribute values in relation
to the archetypical or expected ones for the class. Most of these approaches are
founded on concepts such as inclusion or typicality (Rossaza et al., 1998;
Bordogna et al., 1994).
The presence of imperfection around the objects is not only translated into the
consideration of gradual membership of the objects, but it also generates
important problems in the classification process. Before inserting an object in our
database, we have to answer two relevant questions:
1. What class best represents the object?
2. Does this object already exist in the database?
The answers to these questions are not trivial and could lead us to situations in
which we are not sure about an object of a given class. In such situations, the
gradual membership is substituted by an uncertain membership.
Softening the Inheritance Relationships Level
Specialization processes create subclasses from an existing class according to
one of the following ways:
1. By constraining the description of a property, i.e., the attribute domain of
an existing class (e.g., RedCar is a subclass of Car).
2. By specifying an additional set of properties (e.g., Employee is a subclass
of Person).
Both kinds of specializations could lead to imprecise structures by considering
flexible ways for characterizing the corresponding subclasses. Therefore, it may
be interesting to add a degree in the inheritance connection between two classes.
Uncertainty and Precision Levels in the Schema
The presence of uncertainty in the definitions of the structures that characterize
the schema of a given problem must be avoided. Precisely, one of the most
important aims of a designer is to eliminate uncertainty in the schema, trying to
find the hierarchy of classes that best represents the problem being modeled.
On the contrary, knowledge of the problem could lead us to manage different
levels of precision in the structures.
The structure associated with a given class can be viewed as a set of attributes
or properties, with a series of associated ranges. This concept of structure, that
we call crisp structure, fulfills a large proportion of the needs related to types
when the hierarchical structure of a given application is being found. However,
there are other problems for which this concept of structure is not suitable, and
a softening process is needed. Examples of these problems are the representa-
tions of concepts with different levels of precision, semistructured or unstruc-
tured data management, or the handling of incomplete information. These kinds
of problems require the use of more expressive and powerful techniques to
define the structure of a certain class of objects.
In Marn et al. (2000), we presented a new concept of type that assists in solving
some of these problems. Let us now look at a brief summary of this concept and
its most important characteristics.
Fuzzy Types
Our new concept of type is founded on the idea of fuzzy data structure. A fuzzy
structure is a fuzzy set defined over the set of all the possible attributes in the
model. Taking this definition into account, a fuzzy type is a type with a structural
part S that is a fuzzy structure.
The support set of the fuzzy structure associated with the type is the set of
attributes that can be used to characterize the type at any moment. The kernel-
set contains the basic attributes of the type, while each of the -cuts of the fuzzy
structure defines a precision degree with which the type can be considered.
So far, in the object-oriented model, every instance of a class could reference any
of the attributes of the class (instance variables). However, with our new kind
of type, an instance of a given class may not incorporate certain attributes
depending on the a-cut of the class structure with which it was created.
Each one of the methods defined in a class must have an associated precision
level (as is the case with the attributes or instance variables) that indicates the
minimum precision that an instance must have to incorporate a method in its
behavior. This level of precision depends on attributes and other methods
referenced in the code of the method.
The change proposed in the concept of type involves modifications to the idea of
instantiation. In order to create a new object of a given class, we must be able
to choose the a-cut of properties of the type that will be used to represent it. To
do that, the model has a generic method new() (with (0; 1]), called fuzzy
constructor. The receptor of this method can be any class C, while the parameter
is the level a of the structure of this class C needed to represent the new object.
The effect of sending the message new() to a class C with structural
component S and behavior component B, consists of creating an object incorpo-
rating the set S
of attributes. The set B
of methods defines the behavior of this

object.
The inheritance mechanism H must enable part of the class structure and
behavior to be inherited by its subclasses. As we have done with the instantiation
mechanism, we add a threshold to indicate what proportion of the properties we
want to be inherited. Two different forms of inheritance can be considered:
1. Incorporating inherited attributes and methods to the kernel set of the
structural and behavior components of the subclass, respectively: In this
way, the vagueness of the inherited properties will be eliminated. This type
of inheritance will be called inheritance without propagation H
crisp
.
2. Keeping the vagueness, by inheriting both properties and methods affected
by the corresponding membership degree: This type of inheritance will be
called inheritance with propagation H
fuzzy
.
A Supporting Framework
As we mentioned in the introduction, our research is mainly motivated by the
need for an easy-to-use transparent mechanism to develop applications dealing
with fuzzy information. Following our proposal, programmers and designers
should be able to directly use new structures developed to store fuzzy information
without the need for any special treatment, without altering the underlying
programming platform, and with the most possible transparency. Let us
explain how.
Allowing the Use of Fuzzily Described Objects
This section introduces the class hierarchy that we developed in order to give
support to the model described in the previous section. The hierarchy is
developed using classical object-oriented concepts and allows for the manage-
ment and comparison of fuzzily described complex objects.
Following the principles that guided our research, let us describe the way this
theoretical approach can be implemented in a modern programming platform, so
that programmers can easily design their classes to handle imprecise objects and
compare the fuzzy objects of these classes with a minimum of effort.
We used the reflection capability that many modern programming languages
offer to develop a framework that can be used by user-defined classes in order
to compare objects. Reflection is a feature of many of the modern programming
languages [e.g., Java (java.sun.com) and C# (www.microsoft.com)]. This
feature allows an executing program to examine or introspect itself and
manipulate internal properties of the program. For example, it is possible for a
class to obtain the names of all its members and display them.
Because our final aim is to allow the programmer to define classes and perform
fuzzy comparisons (within queries, for example) without having to write complex
specific code for each class written, we can define a generic FuzzyObject class
that will serve as a basis class for the definition of any class with objects that need
fuzzy representation and comparison capabilities. We can avoid duplicating code
in different classes if we write a generic fuzzyEquals method at the FuzzyObject
class. Taking into account that the fuzzyEquals method requires access to the
particular object fields, the only way we can implement such a general version
of this operator is through reflection.
Just by extending this general FuzzyObject, the programmer can define his or
her own classes to represent fuzzy objects. Our framework, as depicted in Figure
3, also includes some classes to represent common kinds of domains for
imprecision, such as linguistic labels without underlying representation
(DomainWithoutRepresentation), domains where labels are possibility distri-
butions over an underlying basic domain (DisjunctiveDomain and its subclasses
to represent labels with finite support set, basic domain values, and functional
representations of labels with infinite support set, like trapezoidal ones), and,
finally, fuzzy collections of fuzzy objects (ConjunctiveDomain). These classes
define their proper fuzzyEquals logic to handle the different cases we previously
discussed in the chapter.
To enhance the way this framework can be used when writing a soft computing
application using one of the foremost programming platforms, consider the
following java code for the example of rooms and students (Figure 2).
1. To represent the classrooms:
public class Room extends FuzzyObject
{
// Instance variables
public Quality quality;
public Extension extension;
public Floor floor;
public StudentCollection students;
// Constructor
public Room (Quality quality, Extension extension, Floor floor,
StudentCollection students) {
this.quality = quality;
this.extension = extension;
this.floor = floor;

FuzzyObject
fuzzyEquals(fuzzyObject)
DomainWithoutRepresentation
DisjunctiveDomain
ConjunctiveDomain
ConjunctiveFiniteObject
DisjunctiveFiniteObject
TrapezoidalObject
BasicObject
Figure 3. A framework to deal with fuzzily described complex objects
this.students = students;
}
// Field importance
public static float fieldImportance (String ieldname) {
String fields[] = new String[] { "quality", "extension",
"floor", "students" };
float importance[] = new float[] { 0.5f, 0.8f, 1.0f, 1.0f };
for (int i=0; i<fields.length; i++)
if (fields[i].equals(ieldname))
return importance[i];
return 1.0f;
}
}
The fieldImportance method is specified to set the attribute importances
(although it could be omitted if the user gives the same importance to all of
them). The room imprecise attributes can be easily implemented by
extending the classes provided by our framework without having to worry
about the fuzzyEquals implementation.
2. The imprecise room quality is an object of a class (Quality) that extends
DomainWithoutRepresentation, without having to add any special code.
3. The extension and floor attributes are both particular cases of
DisjunctiveDomain and, as such, they can be basic values, trapezoids, or
finitely described labels. The programmer only has to extend the class
DisjunctiveDomain, without having to write specialized code (again).
4. Finally, the set of students is a fuzzy collection of students, where the fuzzy
collection StudentCollection inherits from ConjunctiveFiniteObject, and
students are similarly defined as a classroom is described.
The following code shows the creation of both rooms, once the classes
mentioned above are defined:
// Label definitions for students
Age young = new Age (new Label("young"), 0, 0, 23, 33 );
Age middle = new Age (new Label("middle-aged"), 23, 33, 44, 48 );
Height shortHeight = new Height(new Label("short"),0, 0, 150, 160);
Height mediumHeight=new Height(new Label("medium"),150,160,
170, 180);
Height tall = new Height ( new Label("tall"), 170, 180, 300, 300);
//Student definition
Student student1 = new Student ("John", young, new Height(185) );
Student student2 = new Student ("Peter", young, new Height(170) );
Student student3 = new Student ("Mary", middle, shortHeight );
Student student4 = new Student ("Tom", new Age(24), tall );
Student student5 = new Student ("Peter",new Age(25),mediumHeight );
Student student6 = new Student ("Tom", young, new Height(190) );
// Label definitions for rooms:
// highQuality, mediumQuality, highFloor... (as above)
// Sets of students
Vector vector1 = new Vector();
vector1.add (new MembershipDegree (1.0f, student1 ) );
StudentCollection set1 = new StudentCollection ( vector1 );
Vector vector2 = new Vector();
StudentCollection set2 = new StudentCollection ( vector2 );
//Room definitions
Room room1 = new Room ( highQuality, new Extension(30),
new Floor(4), set1 );
Room room2 = new Room ( mediumQuality, big, highFloor, set2 );
We can compare the rooms by invoking their fuzzyEquals method, as in
System.out.println("room1 fvs room2=" + room1.fuzzyEquals(room2));
that returns an approximate value of 0.81. Thus, we encapsulated fuzzy object
comparisons in our framework classes so that programmers can now freely
compare imprecisely described objects without having to code any comparison
logic.
The capability of comparing fuzzily described objects is the basis for querying in
the system. Every query describes a pattern, and we have to find the objects in
the database that match this pattern. The same set of operators used to develop
the fuzzyEquals method can be directly used to compare real objects with object
patterns.
Allowing the Use of Fuzzy Object-Oriented Concepts
The previous section introduced a framework that allows programmers and
designers to deal with fuzzily described objects. Following the idea that guides
our proposal, this section explains how to deal with fuzzy object-oriented
concepts using as a basis a conventional object-oriented system or an advanced
object-relational system.
Support for Fuzzy Extensions of Object-Oriented Concepts
We have two alternatives when dealing with the fuzzy extensions of object-
oriented features that we described in the previous section:
1. To develop a new system that implements fuzzy object-oriented features
intrinsically
2. To represent new fuzzy extensions using classical object-oriented struc-
tures as the basis
The first alternative implies the implementation of a whole database system,
while the second implies the implementation of an interface that translates fuzzy
concepts into classical ones that are managed by an underlying classical
database system. Moreover, interested users who have to use some fuzzy
features to solve their problems can directly use the proposed classical structures
without needing any special software.
As we saw in the previous section, we can consider the following list of fuzzy
extensions of classical object-oriented concepts:
1. Explicit uncertainty in attribute values
2. Semantically enhanced relationships among objects
3. Fuzzy class extents
4. Fuzzy inheritance connections
5. Fuzzy type definitions
All of these characteristics can be directly translated into classical object-
oriented structures. Table 2 summarizes how to deal with these fuzzy extensions
according to our proposal (Blanco et al., 2001).
A FOODBS Architecture
In this section, we present an architecture that can be used to develop a system
able to store fuzzy information in a classical object-oriented system using the
model described in the previous parts of the chapter. According to the principle
that guided our approach, all the proposed extensions of the object-oriented data
model are built by means of structures that can be directly translated into a set
of standard classes. This feature allows us to decrease the development effort
needed to implement a fuzzy object-oriented database system with the capabili-
ties we propose.
Let us briefly examine our development strategy. Figure 4 depicts a simplifica-
tion of the ANSI/SPARC standard database architecture with little modification.
External views are organized in such a way that the user can transparently
manage data imperfection. This is the fuzzy view of the system. At the same
time, the conceptual schema is divided into two different layers: the upper layer
contains fuzzy schemata definitions, while the lower layer holds the correspond-
ing classical object-oriented representation needed to support these fuzzy
schemata. The internal schema is that of the classical database system being
used as the basis for the fuzzy database system.
The strategy discussed leads us to an architecture organized into three levels
(see Figure 5):
1. The classical database management system will provide most of the
management functions and will store the objects created using the user-
defined schemata.
Table 2. Fuzzy concepts and object orientation
Fuzzy Concept Classical Implementation
Explicit uncertainty
in attribute values
To consider that some kind of uncertainty is associated with some attribute
implies that the attribute domain is an aggregation of the actual attribute
domain and the scale where this lack of certainty is measured
Semantically
enhanced
relationships among
objects
If we want to represent the fact of having partial knowledge about whether
given object groups (usually a pair) are related, we can include in the class a
new attribute standing for the belief in the corresponding aggregation, using
the appropriate truth scales for dealing with explicit uncertainty.
If we want to represent that the connection among the objects admits
degrees of importance, we can use numerical or linguistic values to express
this strength. For example, we can consider the set {"high", "medium",
"low"} of labels, with each label represented as a disjunctive fuzzy set in
[0,1]. That is, we can add to the class an extra attribute that expresses this
importance or strength in the connection.
Fuzzy class extents We only have to add an extra attribute to the class that we want to extend in
a fuzzy way. The domain of this attribute could be:
- The interval [0,1]
- A set of linguistic labels that express membership, defined over the
aforementioned interval
Fuzzy inheritance
connections
Some important models (Rossaza et al., 1998; George et al., 1993) consider
that the superclasssubclass relationship can admit the use of degrees,
founded on the idea of inclusion or matching between typical subclass
attribute values and typical superclass attribute values. This characteristic
can be represented in a classical object-oriented model by means of the use
of static variables that express these connection degrees using suitable
scales.
Fuzzy-type
definitions
This new way of considering the type definition can be easily modeled over
a traditional object-oriented model, using the concept of 1-ramified
hierarchy of classes (Marn et al., 2001).
A 1-ramified hierarchy of classes is defined as a series of classes C
1
, ..., C
i-
1
,C
i
, C
i+1
, ..., C
n
verifying the following properties:
- For any i 1..n - 1, Sub{C
i
} = {C
i+1
} (Sub{C
i
} stands for the set of
subclasses of C
i
).
- For any i 2..n, Sup{Ci} = {Ci-1} (Sup{Ci} stands for the set of
superclasses of C
i
).
- A finite sequence of values
i exists, associated with the hierarchy, such
that
1 = 1,
n > 0, and
i >
i+1.
Each class of the hierarchy is used to represent an -cut of the type being
defined.

2. The conceptual fuzziness handler will augment the classical system capa-
bilities to allow imperfect data manipulation.
3. The interface will communicate with the previous level, hiding the under-
lying complexity, and will allow users to develop their fuzzy object-oriented
databases.
Metadata and general data persistence depend on two storage areas:
1. A metadata catalog will store the fuzzy schemata defined by the user.
2. A classical database will support the storage and management of user
application objects.
Figure 4. New layer to deal with fuzziness
Figure 5. System architecture
A Prototype
In order to experiment with the mentioned architecture and framework, a
prototype was developed. FoodBi is a graphical system that allows the creation
and management of fuzzy object-oriented schemata. By means of this interface,
the user can build a hierarchy of classes with fuzzy types, using, at the same time,
suitable attribute domains for imperfection handling.
This prototype uses Java as the target object-oriented language and Oracle 9i (an
advanced object-relational DBMSs) as the DBMS back-end.
An Example: Class Inspector
The core part of FoodBi is devoted to facilitating the creation of classes with
extended characteristics. The information the user is asked for when defining a
new class is as follows:
1. General metadata that describe the class: identifier, kind of extent (crisp or
fuzzy), description, and so on;
2. Set of attributes that characterize its structural component (which can be
fuzzy);
3. Set of methods that conform its behavioral component (which can also be
fuzzy); and
4. A model of inheritance, using the proposed fuzzy inheritance extensions.
Figure 6 illustrates FoodBi class inspector, when defining the structural part of
a class Image, which is organized in three levels of precision and has some
attributes that may have imprecise values (age and quality). The information
provided by the user when defining an attribute determines the way in which
fuzziness will be handled:
1. In the case of attributes with imprecise values, the user can build labeled
domains by choosing among different semantics: with or without underlying
basic domain, disjunctive, conjunctive, etc.
2. In case the attribute value can be affected by explicit uncertainty, the user
can attach to the attribute domain a set of linguistic labels or the [0,1]
interval in order to express this explicit uncertainty.
3. The user can even graduate the relationship expressed by the attribute,
combining the attribute domain with a suitable linguistic domain for express-
ing strength values.
Once the class description is completed, FoodBi translates it into a set of
standard Java classes that implement it, following the guidelines of the fuzzy
object-oriented model presented in previous sections of this chapter.
Conclusions
In this chapter, we studied several suitable strategies to face the representation
of the different kinds of imperfections that may arise when a database is being
designed in an object-oriented paradigm, according to the level at which these
imperfections may occur.
As part of our proposal, we demonstrated how to implement reusable fuzzy
comparison capabilities in modern programming platforms through the use of
reflection and theoretical results that help us apply fuzzy techniques in object-
oriented models.
We also presented an architecture for the development of a fuzzy object-oriented
database management system. This architecture is founded on the idea of
minimizing the development effort needed to obtain data imperfection manage-
Figure 6. Class inspector
ment capabilities. As the new structures needed to support data imperfection are
implemented using standard object-oriented techniques, we can use an existing
classical database system as the basis for our fuzzy one. This way, we only have
to develop an upper layer on top of the classical system, avoiding the effort
required by the implementation of a whole new system. A prototype was
developed to verify the viability of our proposals.
The theoretical approach is currently being extended in order to deal with
queries: in fact, the FuzzyEquals method described in the chapter is being used
as the basis in order to perform object queries (Marn et al., 2004).
The prototype is currently the basis for two main development efforts:
1. Toward its completion as a fuzzy object-oriented data management system;
and
2. Toward the achievement of a general object-oriented class library that can
be used to manage fuzzy information without the need for any additional
interface.
Acknowledgment
This work was partially supported by the Spanish Comisin Interministerial de
Ciencia y Tecnologa under grants TIC2003-08687-C02-02 and TIC2002-
04021-C02-02.
References
Baldwin, J. F., Cao, T. H., Martin, T. P., & Rossiter, J. M. (2000). Toward soft
Baldwin, J. F., Cao, T. H., Martin, T. P., & Rossiter J. M. (2000b). Implementing
Fril++ for uncertain object-oriented logic programming. In Proceedings of
the Eighth IEEE International Conference on Information Processing
and Management of Uncertainty in Knowledge-Based Systems (pp.
496503).
Berler, M., Eastman, J., Jordan, D., Russell, C., Schadow, O., Stanienda, T., &
Velez, F. (2000). The object data standard: ODMG 3.0. New York: Morgan
Kaufmann Publishers.
Blanco, I. J., Marn, N., Pons, O., & Vila, M. A. (2001). Softening the object-
oriented database-model: Imprecision, uncertainty, and fuzzy types. In
Proceedings of IFSA/NAFIPS World Congress.
model. In Proceedings of FUZZ-IEEE (pp. 313317).
Caluwe, R. de. (1997). Fuzzy and uncertain object-oriented databases:
Concepts and models. Advances in fuzzy systemsapplications and
theory (Vol. 13). Singapore: World Scientific.
Cubero, J.C., Marn, N., Medina, J. M., Pons, O., & Vila M. A. (2004). Fuzzy
object management in an object-relational framework. In Proceedings of
IPMU, pp.1767-1774.
in the fuzzy object-oriented data model. Fuzzy Sets and Systems, 60, 259
272.
Gonzalez, A., Pons, O., & Vila, M. A. (1999). Dealing with uncertainty and
imprecision by means of fuzzy numbers. International Journal of Ap-
proximate Reasoning, 21, 233256.
Gyseghem, N. Van, & Caluwe, R. de. (1998). Imprecision and uncertainty in the
UFO database model. Journal of the American Society for Information
Science, 49, 236252.
database architecture. IEEE Transactions on Knowledge and Data
Engineering, 15(5), 11371154.
Kuo, J. -Y., Lee, J., & Xue, N. -L. (2001). A note on current approaches to
extend fuzzy logic to object oriented modeling. International Journal of
Intelligent Systems, 16, 807820.
Ma, Z. M., Zhang, W. J., Ma, W. Y., & Chen, C. Q. (2001). Conceptual design
of fuzzy object-oriented databases using extended entity-relationship model.
International Journal of Intelligent Systems, 16, 697711.
Marn, N., Pons, O., & Vila M. A. (2001). A strategy for adding fuzzy types to
an object-oriented database system. International Journal of Intelligent
Systems, 16, 863880.
ogy, 45, 431444.
Marn, N., Pons, O., & Vila M. A. (2000). Fuzzy types: A new concept of type
Systems, 15, 10611085.
Na, S. L., & Park, S. (1996). Management of fuzzy objects with fuzzy attribute
values in new fuzzy object oriented data model. In Proceedings of the
Second International Workshop on FQAS (pp. 1940).
Na, S. L., & Park, S. (1996b). A fuzzy association algebra based on fuzzy object
oriented data model. In Proceedings of the 20th International Confer-
ence on Compsac (pp. 624630).
classes. In Fuzzy and uncertain object-oriented databases. Concepts
and models, Advances in fuzzy systemsapplications and theory (Vol.
13, pp. 2161).
Ruspini, E. H. (1986). Imprecision and uncertainty in the entity-relationship
model. In H. Prade, & C. V. Negiota (Eds.), Fuzzy logic and knowledge
engineering (pp. 1828). Heidelberg: Verlag TUV Reheiland.
Stonebraker, M., & Brown, P. (1999). Object/relational DBMSs: Tracking
the next great wave. New York: Morgan Kaufmann Publishers.
Vanderberghe, R. M., & Caluwe, R. de. (1991). An entity-relationship approach
to the modeling of vagueness in databases. In Proceedings of ECSQAU
Symbolic and quantitative approaches to uncertainty (pp. 338343).
Vila, M. A., Cubero, J. C., Medina, J. M., & Pons, O. (1995). The generalized
selection: An alternative way for the quotient operations in fuzzy relational
databases. In B. Bouchon-Meunier, R. Yager, & L. Zadeh (Eds.), Fuzzy
logic and soft computing. Singapore, World Scientific Press.
Yazici, A., George, R., & Aksoy, D. (1998). Design and implementation issues
in the fuzzy object-oriented data model. Journal of Information Sciences,
108, 241260.
Zivieli, A., & Chen, P. P. (1986). Entity-relationship modeling and fuzzy
databases. In Proceedings of the Second International Conference on
Data Engineering IEEE (pp. 1828).
206 Helmer
Chapter VII
Index Structures for
Database Systems
Sven Helmer
Universitt Mannheim, Germany
Abstract
This chapter gives an overview of indexing techniques suitable for fuzzy
object-oriented databases (FOODBSs). First, typical query patterns used
in FOODBSs are identified, namely, single-valued, set-valued, navigational,
and type hierarchy access. The description of the patterns does not follow
a particular fuzzy object-oriented data model but is kept general enough to
be used in different FOODBS contexts. Second, for each query pattern,
index structures are presented that support the efficient evaluation of these
queries. These range from standard index structures (like B-trees) to
sophisticated access methods (like Join Index Hierarchies). Due to space
constraints, an explanation of the basic techniques is given rather than an
exhaustive description. However, the interested reader is supplied with a
broad list of references for further reading. Finally, a summary and
outlook conclude the chapter.
Index Structure for Fuzzy Object-Oriented Database Systems 207
Introduction
One important technique used to accelerate the associative access in database
management systems (DBMS) is the use of index structures. When searching
for data, we want to avoid the worst case, i.e., having to scan through the whole
database and test every data object, because this is inefficient. Index structures
help here as they allow fast access to data by content.
Due to the semantic richness of object-oriented DBMSs, we have different
methods for indexing than, e.g., in relational DBMSs. Adding fuzziness increases
the number of possibilities even further. Unfortunately, publications on indexing
in fuzzy object-oriented DBMSs are few and far between. Although indexing in
advanced DBMSs (e.g., object-oriented, spatial, image, temporal, or XML
databases) is an established research topic (for overviews see Bertino, 1997;
Liu, 1996; Luk, 2002; Manolopoulos, 1999; Mueck, 1997), indexing in fuzzy
databases has not yet received much attention.
This chapter is organized as follows. First, we give a brief introduction to the
concepts of object-oriented DBMSs needed in the remainder of the chapter.
Next, we give an overview of the different aspects of accessing data in fuzzy
object-oriented DBMSs. In the next section, we investigate several index
structures supporting these access patterns. We then express our opinion on
future trends in the area of access methods for FOODBS systems. Finally, in the
last section, we conclude with a brief summary.
Preliminaries
Storage Hierarchy
In every computing system, also in every DBMS, we have several layers of
storage (Figure 1). Generally, the higher a memory type is positioned in this
hierarchy, the faster, the costlier, and the smaller it becomes. The differences
between the levels are usually several orders of magnitude. We divide this
hierarchy into three subcategories: primary, secondary, and tertiary storage.
Primary storage consists of CPU-registers, cache memory, and main memory;
secondary storage comprises the disk level; and tertiary storage includes the tape
level. We restrict ourselves to the levels that are most important for index
structures in DBMSs: main memory and disks.
208 Helmer
Object Model
Now we present a brief introduction to a (nonfuzzy) object-oriented database
model. For a detailed definition see the standard by the Object Data Management
Group (ODMG) (Cattell, 2000). We introduce fuzziness to this model in the next
section when describing the access patterns.
Central to the object-oriented model are objects, which are database entities
described by their identities, their types, and their states. The identity of an
object is defined by a unique object identifier (OID), which never changes
during the lifetime of the object. Each object is also an instance of a certain type
(this also does not change for an object). The type determines the behavior and
structure of an object. The behavior is constituted by a set of operations the
object is able to execute. The structure, in turn, is described by a set of attributes
and the possible relationships the object can enter into with other objects.
Attributes are not restricted to domains with atomic values but are allowed to be
collections, like sets, lists, or tuples. At each point in time, an object has an
internal state. The state of an object is defined by the values of its attributes and
the current relationships it sustains.
A type can inherit its basic structure and behavior from another type and extend
this structure and behavior. In this case, we speak of inheritance: a subtype
inherits properties from a supertype. All objects belonging to a type (and all its
subtypes) are combined in an extent of this type. Another important feature of
an object model is substitutability, i.e., an object can be used at any place in
Figure 1. Levels of storage hierarchy
which an object of one of its supertypes is used. Last but not least, there is
polymorphy. A polymorphic operation is defined for a set of types, not only for
a single type. In this way, types that may otherwise be unrelated can show the
same behavior. For example, the operator + (addition) has to be implemented
differently for integers than it does for floats, but it has the same semantics.
Classification of Access Patterns
We have to distinguish between several different types of representing and
accessing data in fuzzy object-oriented DBMSs. The access methods presented
later will reflect this, i.e., it will not be possible to support all query patterns with
a single index structure. As many different fuzzy object models have been
developed in recent years, we try to keep the description of the data represen-
tation general enough to demonstrate the applicability of different index struc-
tures in the context of FOODBSs. We differentiate between the following
access patterns:
1. Single-valued attributes associated with a degree of uncertainty
2. Multivalued attributes that are described by fuzzy sets or possibility
distributions
3. Navigational access via paths, i.e., objects are linked together with pointers
[Not all fuzzy object-oriented data models support fuzzy associations
between objects, among those that do are by Bordogna (1994), Na (1996),
and Yazici (1997, 1998).]
4. Access via type hierarchies, i.e., queries may refer to specific types or a
subhierarchy of types [Again, not all fuzzy object-oriented models support
fuzzy type hierarchies, among those that do are by Bordogna (1994),
George (1992), Na (1996), and Yazici (1997, 1998).]
Single-Valued Attributes
For our first access pattern, we are going to look at single-valued attributes that
have a grade of certainty (usually ranging from 0 to 1) attached to their values.
This grade reflects the level of belief in this value and is based on certainty theory
(Durkin, 1994; Shortliffe, 1975). Assume that we have a database for the
administration of a university. We could have a type called Staff that holds the
data for employees:
210 Helmer
class Staff {
attribute String Name;
attribute String Position : degree;
attribute Integer Age : degree;
}
The addition of the clause : degree after the attributes Position and Age tells
us that the values of these two attributes can be uncertain. So, if we are unsure
whether a person works as an assistant professor, we can store the value
"Assistant Professor" (0.6) in the attribute Position. (Note that this
approach could also be modeled in crisp object models by adding another
attribute holding the corresponding degree for each attribute that can contain
uncertain data.)
Possible queries in this context would be: Give me the names of all staff that
work as an assistant professor with at least a degree of 0.7 or Give me the
names and positions of all persons who are younger than 30 with a certainty of
0.4.
This approach is popularly applied for inexact reasoning in expert systems. As
a matter of fact, the expert system MYCIN provided the basis for certainty
theory.
Set-Valued Attributes
A more flexible approach than the previous one is to represent the value of an
attribute by means of a (disjunctive) fuzzy set. Look at the following example
(again, we use a general notation for fuzzy attributes):
class Staff {
attribute String Name;
fuzzy attribute String Position;
fuzzy attribute Integer Age;
}
Now the two attributes Position and Age are declared as fuzzy. What does
this mean? If we want to express that it is perfectly possible that a person works
as a research assistant or assistant professor, maybe is even an associate
professor, but probably not a full professor, we can describe this fact by the fuzzy
set in Figure 2(a). Describing the age of this person as young could be done with
the fuzzy set described by the membership function in Figure 2(b).
Querying on fuzzy sets is more flexible but also more complex than querying on
single-valued attributes. One popular approach is based on the possibility theory
by Prade and Testemale (1984). So, we are going to concentrate on this
technique and give a brief description in the following. We want to fetch all
objects (with a fuzzy attribute A) that satisfy a query condition
a
, meaning Aa
is satisfied, where

is a (fuzzy) comparison operator and a is a (fuzzy) constant,
represented by
and
a
, respectively. As the values of A (and the query
condition) can be fuzzy, there is some uncertainty as to whether a data item
satisfies the condition or not. Two fuzzy measures are used to express this
degree of uncertainty. One is the possibility measure defined as follows:
) ( max ) ( : ) (
) (

i
o A
X
X X
P
(1)
where is the domain of attribute A, while P() denotes the power set of . The
value of attribute A of object o
i
is described by a possibility function
A(o
i
)
on
(which basically is a normalized fuzzy set, i.e., at least one item has a membership
degree of 1.0). Associated with each possibility measure is a necessity measure
N(X):
) ( 1 ) ( : ) ( X X N P X
(2)
Figure 2. Examples for fuzzy sets
(a) Possible positions (b) Young age
1.0
0.5
Res.
Assis.
Assis.
Prof.
Assoc.
Prof. Prof.
Full
1.0
0.5
20 40 60 80
212 Helmer
The possibility that the value of attribute A of data item o
i
belongs to the set of
values determined by and a is equal to
)) ( ), ( min( max )) ( | (
) (

i
o A a i
o A a
o
o

(3)
with
)) ' ( ), ' , ( min( max ) (
'

a a

o (4)
The necessity of belonging to this set is equal to
)) ( 1 ), ( max( min )) ( | (
) (

i
o A a i
o A a N

o
o
(5)
Let us also present an example query for this access pattern. We want to find
all persons who are approximately young [see Figure 2(b) for the fuzzy set
young]. The comparison operator for approximately equal to could be
defined similarly to the one found in Prade (1984):
else 0
5 | ' | for
5
| ' |
1
) ' , (

(6)
This formula assumes that is represented by a range of numbers (as in this
example, an age). The comparator determines the degree of similarity between
age and age '. As we are working with a constant fuzzy value (
young
), we can
calculate the query condition (
young
) () beforehand:
)) ' ( ), ' , ( min( max
) (
young
'
young

o
Navigational Access
Relationships between objects are described by references from one object to
another. In Figure 3, we see a schema graph describing the fact that a
department employs several people who are engaged in different projects.
In a FOODBS system, the relationships may be fuzzy, i.e., each link from one
object to another has a degree of uncertainty associated with it. Figure 4 shows
an excerpt of an instantiation of the above schema. Looking at this example, we
see that the person with identification s
1
is certainly employed at the department
d
1
, while we are not 100% sure that this person is working on project p
1
.
A possible query in this context could be: Give me all departments that probably
(with a degree larger than 0.8) employ people who are almost surely (with a
degree greater than 0.95) involved in the projects p
3
or p
4
.
Type Hierarchies
A query in an object-oriented database system may refer to objects of a certain
type or to a certain type and all its subtypes. Look at the hierarchy of types
Figure 3. Relationships between object types

Figure 4. Instantiation of the schema in Figure 3
d
1
d
2
s
1
s
2
s
3
s
4
p
1
p
2
p
3
p
4
p
5
1.0
0.8
0.5
0.4
1.0
0.9
1.0
0.8
0.6
0.3
1.0
0.4
1.0
0.8
214 Helmer
depicted in Figure 5. In the case of FOODBS systems, we may have objects that
are not clearly assigned to a certain type. We do not look at how the membership
grades are determined exactly but assume that we are able to compute them in
some way.
A typical query involving type hierarchies might be: List the names of all
academics who are older than 40 years. Make sure that the degree of
membership to the class Academics or a subclass is at least 0.9. As we will see,
efficiently evaluating queries in which type hierarchies and other properties are
mixed is not straightforward.
Index Structures for Access Patterns
After having introduced different access patterns in the last section, we now
show how these queries can be supported by various index structures. The
outline of this section follows that of the last section.
Single-Valued Attributes
Accesses to single-valued attributes are easiest to handle, as we can use the
standard index structures of (relational) DBMSs. We present two of the most
widely known index structures: B-trees and external hashing.
Figure 5. Type hierarchy
Staf f
Administrative Academic
Technical Teaching Research
B-trees
B-trees (Bayer, 1972) (or the more advanced B
+
-trees) are the standard index
structures in relational database systems. They are balanced multiway trees, i.e.,
in contrast to binary trees, a node can have more than one key and more than two
children (multiway), and all leaves are on the same level (balanced). The keys
in a node N are sorted, and a subtree is assigned to each key. All keys in a subtree
are less than the assigned key. All keys greater than the keys in node N are saved
in an additional subtree (see Figure 6 for an example).
In a database system, the nodes of a B-tree are mapped to pages in the secondary
storage. A B-tree is much shallower than a binary tree, because the fan-out is
much higher. For this reason and because of the balancing, only a few page
accesses are necessary to find a key. To increase branching even further, B
+
-
trees are used. In B
+
-trees, all records are kept in the leaves the inner nodes
contain only reference keys. Normally, these keys are much smaller than the
records. Thus, the level of branching is increased, and the height of the tree
decreases.
More details on B-trees and B
+
-trees can be found in standard textbooks on
database systems (e.g., Silberschatz, 2001).
External Hashing
We describe an extendible hashing index here, as it is a typical representative of
an external hashing scheme. An extendible hashing index is divided into two
parts: a directory and buckets (for details, see also Fagin, 1979). In the buckets,
we store the full hash keys of and pointers to the indexed data items. We
determine the bucket into which a data item is inserted by looking at a prefix h
d
of d bits of the hash key h. For each possible bit combination of the prefix, we
find an entry in the directory pointing to the corresponding bucket. The directory
has 2
d
entries, where d is called global depth (see also Figure 7). When a bucket
Figure 6. B-tree
17 18
9 11
20 23
38 54
58 63
27 56
19
8 14
5 7
216 Helmer
overflows, it is split, and all its entries are divided among the two resulting
buckets. In order to determine the new home of a data item, the length of the
inspected hash key prefix has to be increased until at least two data items have
different hash key prefixes. The size of the current prefix d' of a bucket is called
local depth. If we notice after a split that the local depth d' of a bucket is larger
than the global depth d, we have to increase the size of the directory. This is done
by doubling the directory as often as needed to have a new global depth d equal
to the local depth d'. For the bucket that was split, the new pointers are put into
the directory. For the other buckets, the directory entries are copied.
B-trees and external hashing assume that we want to submit queries involving
one attribute: List the names of all persons that are 35 years old or Return all
persons on whose age we are certain (degree = 1.0). Usually, queries will
combine attribute values with certainty degrees and will even use ranges: I want
to have a list of all persons older than 40 with a certainty degree of at least 0.8.
In such cases, B-trees and external hashing will not be efficient. We need
multidimensional access methods like grid files or k-d trees (to name prominent
representatives).
Grid Files
A grid file can be seen as a generalization of hashing to multiple dimensions
(Nievergelt, 1984). Let us assume that we want to index the attribute Age with
its corresponding degree of uncertainty. Figure 8 shows an example of a grid file
for that case.
The data space is partitioned into cells. The cells can share data pages as
indicated by the dashed lines in Figure 8. For each dimension, we provide a linear
scale that partitions the particular dimension in a uniform way, mapping the
domain to an index. Accesses to the grid are done via these linear scales to
Figure 7. Extendible hashing
000 001 010 011 100 101 110 111
d = 3
d=2 d=3 d=3 d=1
h =00 h =010h =011 h =1
2 3 3 1
determine the correct cell index. Range queries pose no problems. We just have
to be careful to eliminate false drops caused by the page sharing.
K-d Trees
The original k-d tree is a generalization of a binary tree to many dimensions
(Bentley, 1975). In an ordinary (balanced) binary tree, each node splits the
remaining data objects beneath it roughly into two halves. All objects with values
smaller than the node value are found to the left of the node, all those with greater
values are found to the right of the node. At each level of a k-d tree, a different
dimension is chosen to divide the data objects. In our running example, we would
first split according to age, then according to the uncertainty degree, then age
again, and so on.
As binary trees are not well suited for secondary storage structures, several
extensions and modifications to k-d trees were proposed, e.g., k-d B-trees
(Robinson, 1981) and hB
-trees (Evangelidis, 1995). (For a general overview of

multidimensional access methods, see Gaede, 1998.)
Set-Valued Attributes
This query type is more flexible than the previous one on single-valued attributes.
Therefore, this is the area where the most work has been done (Bosc, 1989, 1988;
Boss, 1999; Helmer, 2001). (Additionally, all of these techniques can be used in
other fuzzy DBMSs and are not restricted to object-oriented DBMS.) The basic
principle (as introduced by Prade, 1984) is to look at fuzzy attribute values in
Figure 8. Grid file
0 1 2 3
0
1
2
3
<20 >60
<0.25
5
0
218 Helmer
terms of possibility distributions. Simplifying, we can see a possibility distribution
as a disjunctive normalized fuzzy set, i.e., at least one value from the domain has
a membership degree of one.
Expressions (3) and (5) are unwieldy in terms of calculating them efficiently.
Bosc and Galibourg show in Bosc (1989) how to simplify the evaluation of these
expressions. A data item o
i
belongs to the set of data items possibly satisfying
the query, iff
>
> >
) ( ) ( 0 )) ( | (
0 ) ( 0

o
o
a o A i
L L o A a
i
(7)
where L
>0
are -cuts of fuzzy sets. An -cut of a fuzzy set F is defined as
(0 1)
} ) ( | { ) (

F F
L (8)
We talk of strict -cuts whenever
} ) ( | { ) (
>
> F F
L (9)
There are two special -cuts, the core L
1
(
F
) and the support L
>0
(
F
) of a fuzzy
set F.
For more selective queries, an acceptance threshold can be provided by the
user. Determining qualifying data items then boils down to
> ) ( ) ( )) ( | (
) (

o
o
a o A i
L L o A
i
(10)

>
) ( ) (
) ( 0

o a o A
L L
i
(11)
In both cases, the appropriate -cut of
a
can be calculated beforehand, and

the supports of the fuzzy sets describing the attribute values of the data items can
be used in filtering data items during query evaluation.
The case for the necessity measure is handled similarly. A data item o
i
belongs
to the set of data items necessarily satisfying the query, iff
) ( ) ( 0 )) ( | (
0 ) ( 1

o
o
a o A i
L L o A N
i
>
> (12)
For an acceptance threshold of we get
) ( ) ( )) ( | (
) ( 1

o
o
a o A i
L L o A a N
i
>

(13)
) ( ) (
) ( 1

o a o A
L L
i
(14)
Hereafter, when searching for supports that intersect with the -cut of the query
predicate, we call this a nonempty intersection query. When looking for cores
that are a subset of the query predicate, we call this a subset query.
Queries using this principle are supported by indexing the cores and supports of
the fuzzy sets, respectively. In the literature we find two different approaches.
The first approach assumes that the cores and supports may contain an infinite
number of elements from a (continuous) domain. However, we have to be able
to describe the cores and supports by closed intervals (Bosc, 1989). The second
approach assumes that the cores and supports contain a finite number of
elements from a (discrete) domain. An advantage here is that we are not
restricted to intervals. In the following, we are going to discuss index structures
capable of supporting the interval-based approach and then continue with those
for discrete values.
Relational Interval Trees (RI-trees)
Before introducing the RI-tree, we will present a brief introduction of the
underlying principle, the interval tree by Edelsbrunner (Preparata, 1993). The
backbone of an interval tree is a balanced binary tree on the domain from which
the endpoints of the intervals are taken (see Figure 9 for dividing up the domain).
When inserting an interval i = (l
i
, u
i
) with lower bound l
i
and upper bound u
i
, we
attach it to the highest node v in the tree for which l
i
v u
i
. The intervals
associated with a node are stored in two lists: L
v
and U
v
. In L
v
all intervals are
sorted in ascending order by l
i
, and in U
v
they are sorted in descending order by
u
i
. [See also Figure 9 for an example after inserting the intervals (1,3), (2,5), (3,3),
(3,7), (5,6), (5,7) into an interval tree.] The sorting of the two lists accelerates
the query evaluation, as we will see in the following.
When querying with an interval (or a point, which can be seen as an interval with
l
i
= u
i
), we traverse down from the root to a subset of leaves. Let and be the
220 Helmer
lower and upper bounds of our query interval q, respectively. While descending
down the tree, we have to distinguish three different cases. Figure 10 (taken from
Kriegel, 2000) illustrates this for intersection queries. When v < , we have to
check U
v
for possibly intersecting intervals. As soon as we fail to find intersecting
intervals, we can stop, as U
v
is sorted by u
i
. We then continue by following the
reference to the right child. When < v, we have to check L
v
for possible query
answers and continue down the left child of v. In case of v , we output
all intervals in L
v
(or U
v
) and visit both children.
Searching for subintervals of q (in the case of subset queries) is not hard to do
either. We just have to look at the nodes for which v and search for
candidates in L
v
and U
v
. We can utilize the ordering of the lists by searching them
from back to front.
For the relational interval tree, the backbone is not actually materialized, as it has
a regular structure. We create three different relations: i(v, l
i
, u
i
) for the
intervals, l(v, l
i
) for the lists L
v
, and u(v, u
i
) for the lists U
v
(each with an
appropriate index). Querying is done by computing the numbers of the visited
Figure 9. An interval tree
Figure 10. Querying on an interval tree

4
1
2
3 5
6
7
L
4
U
4
L
2
U
2
L
1
U
1
L
3
U
3
L
5
U
5
L
6
U
6
L
7
U
7
(2,5 )
(3,7 )
(3,7 )
(2,5 )
(1,3) (1,3 ) ( 5,6)
(5,7 )
(5,7 )
(5,6 )
(3,3 ) ( 3,3)
nodes and submitting the corresponding range queries to the list relations l
and u.
G-trees
G-trees are a combination of grid files with B-trees using a clever partition
numbering. In order to describe the index structure in an understandable way, we
restrict ourselves to the two-dimensional case (which is also the case we need
to index fuzzy data). Assume that each partition can hold no more than two data
objects. We start with an initial partitioning as depicted in Figure 11(a) (taken
from Kumar, 1994), where we split along the first dimension and number the
partitions using the binary strings 0 and 1. After inserting some more objects, we
have to split the partition 0 [see Figure 11(b)]. We do so along the second
dimension, numbering the newly created partitions 00 and 01. As more overflows
occur, we alternate between the two dimensions and number the partitions
accordingly [see Figures 11 (c) and (d)]. This regular numbering scheme has
several advantages, e.g., finding parent and child partitions is straightforward, as
is finding complements of a partition (for details see Kumar, 1994).
Figure 11. Partitioning scheme in a G-tree
0 1 1
01
00
1
00
010 011
1
00
010
0111
0110
(a) (b)
(c) (d)
222 Helmer
The partitions are indexed using a B-tree-like structure called G-tree. First, the
binary numbers are converted to decimals in the following way. All binary
numbers are brought to the same length by padding them with trailing 0s. In our
example in Figure 11, we would have 0 (0000), 4 (0100), 6 (0110), 7 (0111), and
8 (1000). These numbers are inserted into the G-tree like into a B-tree. When
searching the tree, we have to compute the relevant partition numbers and then
look them up. When inserting and deleting objects, we have to adjust the
partitioning scheme accordingly (for details see Kumar, 1994).
Liu et al. adapted G-trees for fuzzy data by mapping fuzzy queries onto range
searches (Liu, 1996). The intervals of supports (and cores) of fuzzy sets are
mapped to two-dimensional space by considering the lower bound of the interval
as x-value and the upper bound as y-value. Possible candidates for nonempty
intersection queries are found by retrieving objects for which 0 x and
y . For subset queries, we need to check x and 0 y .
General Two-Dimensional Indexes
The principle used in Liu (1996) for G-trees can also be applied to other two-
dimensional index structures, like the aforementioned grid files and k-d trees. For
example, an approach using a multilevel grid file was introduced by Yazici and
Cibiceli (1999).
Signatures
We will now turn to index structures assuming a finite set of discrete values in
the cores and supports of the fuzzy sets to be indexed. First, we give a brief
review of the superimposed coding technique, and then we will discuss index
structures built around this method.
Superimposed coding is based on the idea of hashing values into random k-bit
codes in a b-bit field and superimposing the codes for each value in a signature
(Knuth, 1973). The fixed size b is called the signature length. We use signatures
to represent the -cuts of
a

and the supports and cores of the indexed fuzzy
sets. There are two advantages to signatures. One is their constant length; keys
of constant length are easier to manage than keys of variable length. The other
advantage is the great speed with which signatures can be compared by using
only bit operations.
Example 1: An example for encoding the core of the fuzzy set position from
Figure 2 in an 8-bit signature with k = 2 is:
We cannot assume that the signatures of distinct sets are distinct. Still
} , { for ) ( sig ) ( sig t s t s (15)
where s and t are arbitrary sets, and sig (s) sig (t) and |sig (s)| are
defined as
) ( sig of the called also
, in set bits of number |: ) ( sig |
) ( sig & ) ( sig : ) ( sig ) ( sig
0 ) ( sig & ) ( sig : ) ( sig ) ( sig
s weight
s s
k t s t s
t s t s

with & denoting bitwise and and denoting bitwise complement. Hence, a
pretest based on signatures can be fast because it involves only bit operations.
Now, instead of comparing L
(
a
) to the support or core of each

A(o
i
)
, we first
compare the signature of L
(
a
) to the signature of each support or core. During

t he eval uat i on of a query, i f si g(L
>0
(
A(o
i
)
) si g(L
a
(
a
)) or
sig(L
1
(
A(o
i
)
)sig(L
(
a
)) holds, we call o
i
a drop. Additionally, if (a
|
A(o
i
)) > or N(a
| A(o
i
)) > also holds, we have a right drop, else o
i
is a
false drop. After determining all data items that are drops, we have to filter out
the false drops. [The probabilities that data items turn out to be false drops have
been studied intensively in Ishikawa (1993). We will not go into detail here.]
value bitcode
Research Assistant 1001 0000
Assistant Professor 0001 0010
Signature 1001 0010
224 Helmer
SSF/Compressed SSF
A sequential signature file (SSF) (Ishikawa, 1993) is a simple index structure. It
consists of a sequence of pairs of signatures (of supports or cores, depending on
the supported query type) and references to data items.
During retrieval, the SSF is scanned and all data items o
i
with matching
signatures are fetched and tested for false drops. Boss and Helmer (1999)
showed that SSF and its compressed counterpart, compressed signature file
(CSF), can be used to index fuzzy sets, and that this approach is faster than
scanning all fuzzy sets. In the following section we will discuss how the usual
ways of structuring indexes, namely, hierarchical organization and partitioning,
are applied to signatures.
Hierarchical Signature Organization
A signature tree (ST) is an hierarchical version of a signature file. The internal
structure of STs is similar to that of R-trees (Guttman, 1984). The leaf nodes of
ST (Deppisch, 1986; Hellerstein, 1994) contain pairs of signatures and refer-
ences. So we find the same information in the leaf nodes of an ST as in an SSF.
We can construct a single signature representing a leaf node by superimposing
all signatures found in the leaf node (with a bitwise or-operation, denoted by |).
This corresponds to uniting the sets in the leaf nodes. We call a union of sets from
lower levels in the tree a bounding set. An inner node contains signatures of and
references to each child node (Figure 12). The meaning of sig (L
x
(
A(o
i
)
)) is the
signature of the support (x=>0) or core (x=1).
When we evaluate a query, we begin by searching the root for matching
signatures. We recursively access all child nodes with signatures that match and
work our way down to the leaf nodes. In inner nodes, a signature matches if
Figure 12. A signature tree (ST)
| denotes bitwise or
[sig(o .A), ref(o )] [sig(o .A), ref(o )] [sig(o .A), ref(o )]
[sig(o .A), ref(o )] [sig(o .A), ref(o )] [sig(o .A), ref(o )]
[sig(o .A), ref(o )] [sig(o .A), ref(o )] [sig(o .A), r
1 1 2 2 3 3
4 4 5 5 6 6
7 7 8 8 9 9
[sig(o .A) | sig(o .A) | sig(o .A), ] [sig(o .A) | sig(o .A) | sig(o .A), ] [sig(o .A) | sig(o .A) | sig(o ]
1 2 3 4 5 6 7 8 9
sig(inner node)sig(L
(
a
)). In leaf nodes, we check sig(L

>0
(
A(o
i
)
))
sig(L
(
a
)) or sig(L
1
(
A(o
i
)
))sig(L
(
a
)), depending on the query type.

An alternative for evaluating nonempty intersection queries is searching for all
supersets of the singletons in L
(
a
) and then forming the union of all retrieved

answer sets. At first glance, this looks inefficient, as we have to start a subquery
for each value in L
(
a
). However, the performance of signature-based access

methods for superset queries is significantly better than for nonempty intersec-
tion queries because the false drop rate can be kept much lower for superset
queries.
Partitioned Signature Organization
An extendible signature hashing index (ESH) (Helmer, 2003) is based on
extendible hashing. As already mentioned, it is divided into two parts: a directory
and buckets. In the buckets we store the signature/reference pairs of all data
items (see Figure 13). We determine the bucket into which a signature/reference
pair is inserted by looking at a prefix of d bits of a signature (where d is the global
depth of the hash table).
In order to find all subsets of L
(
a
), we determine all buckets to be fetched.

We do this by generating all subsets of sig(L
(
a
). Then we access the

corresponding buckets sequentially (by ascending page number), taking care not
to access a bucket more than once. Afterwards we check the full signatures and
eliminate the false drops.
ESH has a disadvantage compared to the other signature-based index structures:
we cannot evaluate nonempty intersection queries directly with this kind of
Figure 13. Extendible signature hashing (ESH)
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
1
2
3
4
h (x) = 00
2
h (x) = 010
3
h (x) = 011
3
h (x) = 1
1
d = 3
d = 2
d = 3
d = 3
d = 1
[sig(o .A), ref(o )],
1
3
4
2
226 Helmer
index. This is due to the fact that we store partial signatures in the directory of
ESH. Let part
d
(sig(s)) denote the first d bits of the signature of set s. Then,
)) ( sig ( part )) ( sig ( part ) ( sig ) ( sig t s t s
d d

but
)) ( sig ( part )) ( sig ( part ) ( sig ) ( sig t s t s
d d
.
This means that we cannot deduce nonempty intersection by looking at partial
signatures. We can, however, evaluate this kind of query similarly to the
alternative technique used for ST by searching for all supersets of the singletons
in L
(
a
).
Inverted Files
An inverted file (see Figure 14) consists of a directory containing all distinct
values in the domain W, and a list for each value consisting of the references to
data items with support or core of
A(o
i
)
contains this value. For an overview on
traditional inverted files, see Kitagawa (1996) and Sacks-Davis (1997). As done
frequently, we can hold the search values of the directory in a B
+
-tree. Moreover,
the lists are modified by storing the cardinality of the cores with each data item
reference (denoted by |o
ixy
.A|in Figure 14). This enables us to answer subset
Figure 14. Inverted file
[ref(o ), |o .A|], [ref(o ), |o
i i i i i i
[ref(o ), |o .A|], [ref(o ), |o
i i i i i i
[ref(o ), |o .A|], [ref(o ), |o
i i i i i i
[ref(o ), |o .A|], [ref(o ), |o
i i i i i i
[ref(o ), |o .A|], [ref(o ), |o
i i i i i i
11 11 12 12 13 13
21 21 22 22 32 32
31 31 32 32 33 33
41 41 42 42 43 43
v
v
v
v
v
1
2
3
4
n
n1 n1 n2 n2 n3 n3
queries efficiently by using the cardinalities as a quick pretest. The lists can also
be compressed using, for example, lightweight compression techniques
(Westmann, 2000).
When evaluating a nonempty intersection query, we simply fetch the lists for all
items in ) (

o a
L and form the union of the retrieved data items.
When evaluating a subset query, we traverse all lists associated with the values
in L
(
a
). We count the number of occurrences for each reference appearing

in a retrieved list. When the counter for a reference is not equal to the cardinality
of its core, we eliminate that reference. We can do this because this reference
also appears in lists associated with values that are not in L
(
a
). The
referenced core cannot be a subset of L
(
a
).
In cases of subset and nonempty intersection queries, we have to check whether
the retrieved data items satisfy the query possibly (or necessarily) as the
supports (and cores) serve only as filters.
Paths
In this section, we investigate index structures for indexing paths in object-
oriented DBMSs and show how they can be adapted to fuzzy object-oriented
DBMSs. We are going to look at two index structures in particular: access
support relations (ASRs) (Kemper, 1992) and join index hierarchies (Han, 1999)
and their respective adaptions to fuzzy DBMSs.
Access Support Relations (ASRs)
ASRs relate objects to each other and may span over reference chains. These
chains may even include collection-valued components, and, depending on the
applications, several different variants of ASRs can be used.
We start by describing binary ASRs, which encode paths of length one. Figure
15 shows the binary ASRs for the example in Figure 4. We dropped the objects
d
2
, s
3
, s
4
, and p
5
to keep the example manageable. We added several new objects
to show how objects not taking part in all relationships are handled. Another
important change is the inclusion of the uncertainty degrees to support fuzzy
queries, which are not included in the original ASRs.
When merging binary ASRs together (in order to support longer paths), we
distinguish between four different extensions: canonical, left-complete, right-
complete, and full. Let us illustrate the properties of these different extensions
by means of our example.
228 Helmer
Canonical extensions contain only information on complete paths, i.e., paths that
start at department objects and end at the names of projects (Figure 16).
Left-complete extensions include all paths starting at department objects but not
necessarily ending at projects (Figure 17).
Similar to this are right-complete extensions, which end at names of projects but
do not necessarily go all the way to department objects (Figure 18).
Full extensions also comprise all partial paths (Figure 19).
Usually we do not materialize all extensions but a mix of different extensions and
decompositions. A decomposition of an ASR is a projection on relevant (con-
secutive) attributes of an extension. Those access relations that are materialized
are indexed using B
+
-trees, speeding navigational accesses considerably. For
details on how to optimize ASRs for specific applications see Kemper (1989).
Figure 15. Binary ASRs
Department.employs

d
1
s
1
1.0
d
1
s
2
0.8
d
3
s
8
0.2

Staff.worksIn

s
1
p
1
0.9
s
1
p
2
1.0
s
2
p
2
0.8
s
2
p
3
0.4
s
2
p
4
0.6
s
6
p
6
0.7
s
7
p
7
0.4

Project.name

p
1
Natix
p
2
Timber
p
3
Tamino
p
4
Rainbow
p
6
Galax
p
8
IPSI-XQ

Figure 16. Canonical extension
ASR
can
: Department.employs.worksIn.name

d
1
1.0 s
1
0.9 p
1
Natix
d
1
1.0 s
1
1.0 p
2
Timber
d
1
0.8 s
2
0.8 p
2
Timber
d
1
0.8 s
2
0.4 p
3
Tamino
d
1
0.8 s
2
0.6 p
4
Rainbow
Join Index Hierarchies (JIHs)
One problem with ASRs can be their sheer size for long paths, even if
decomposing them. Usually, only the endpoints of paths in an ASR are indexed,
which makes updates on links in between costly (as we have to scan the whole
relation).
Figure 17. Left-complete extension
Figure 18. Right-complete extension
ASR
left

d
1
1.0 s
1
0.9 p
1
Natix
d
1
1.0 s
1
1.0 p
2
Timber
d
1
0.8 s
2
0.8 p
2
Timber
d
1
0.8 s
2
0.4 p
3
Tamino
d
1
0.8 s
2
0.6 p
4
Rainbow
d
3
0.2 s
8

ASR
right

d
1
1.0 s
1
0.9 p
1
Natix
d
1
1.0 s
1
1.0 p
2
Timber
d
1
0.8 s
2
0.8 p
2
Timber
d
1
0.8 s
2
0.4 p
3
Tamino
d
1
0.8 s
2
0.6 p
4
Rainbow
s
6
0.7 p
6
Galax
p
8
IPSI-XQ
230 Helmer
JIHs generalize the decomposition principle by allowing the omission of interme-
diate objects in a path (Han, 1999). For example, we could have an index that
jumps from department objects right to projects, skipping staff objects (see
Figure 20). A complete JIH schema for our example can be seen in Figure 21(a)
(d = Department, s = Staff, p = Project, n = Name). The lower part is the base
Figure 19. Full extension
ASR
right

d
1
1.0 s
1
0.9 p
1
Natix
d
1
1.0 s
1
1.0 p
2
Timber
d
1
0.8 s
2
0.8 p
2
Timber
d
1
0.8 s
2
0.4 p
3
Tamino
d
1
0.8 s
2
0.6 p
4
Rainbow
d
3
0.2 s
8

s
6
0.7 p
6
Galax
p
8
IPSIXQ
s
7
0.4 p
7

Figure 20. Example for a join index hierarchy
JIH: Department
Project

d
1
p
1
Natix
d
1
p
2
Timber
d
1
p
3
Tamino
d
1
p
4
Rainbow
JIH, which consists of all binary relationships. Due to space constraints, usually
only part of a full JIH is materialized [see Figure 21(b)].
However, two difficulties have to be overcome. We have to guarantee the
correctness of updates on intermediate links in paths and have to find a way to
handle the intermediate uncertainty degrees.
Updates
Look at part of the schema instantiation in Figure 22. Clearly, there are two paths
from d
1
to p
2
. When deleting one of them (e.g., d
1
s
2
, because s
2
starts working
at another department), we have to decide what to do with our relationship d
1
p
2
in Figure 20. By looking at the JIH in Figure 20, we cannot decide whether d
1
p
2
should be deleted or not.
Han et al. solved this problem by counting the number of links between each pair
of objects. For the base JIH, this is trivial. For our example in Figure 22, we would
store the following four tuples in the appropriate base JIH relations: (d
1
, s
1
, 1),
(d
1
, s
2
, 1), (s
1
, p
2
, 1), and (s
2
, p
2
, 1). This is also done for the relations on higher
levels, e.g., in the tuple (d
1
, p
2
, 2). When deleting a link in the base JIH, we
propagate these changes to the higher levels. In this case, we would subtract one
for the counter for d
1
p
2
and would know that d
1
and p
2
are still connected.
Figure 21. Full JIH vs. partial JIH

(b) (a)
Figure 22. Multiple paths

232 Helmer
Uncertainty Degrees
The question remains how we handle the uncertainty degrees of the intermediate
links we cut away on the higher levels of a JIH. (It is no problem to store them
in the base JIH.) One possible solution is to allocate space in the levels above the
base JIH in which to store all the intermediate uncertainty degrees in lists.
However, we expect this to bloat the index significantly.
A more elegant solution can be found if we are interested in an overall
uncertainty degree of all paths. Assume that the function used to compute this
overall uncertainty degree is reversible, like multiplying the degrees along each
path and averaging all paths. For example, in Figure 22, we would store the sum
of the products of the uncertainty degrees (1.0 1.0 + 0.8 0.8 = 1.64) and the
number of paths in the tuple (d1, p2, 2, 1.64). When deleting a path, the sum is
reduced by the appropriate value, and the counter is decremented by one.
Type Hierarchies
In this section, we will briefly present the conventional techniques used for type
hierarchy indexing in non-FOODBS systems. In a second step, we will show how
to combine and extend these methods for FOODBS systems. One difficulty in
indexing type hierarchies is that we can either group the objects by type or by key
values. Each approach has its advantages and disadvantages, as we will see.
SC-trees
An SC-tree (Kim, 1989) is straightforward. Basically, we build a separate B+-
tree for each type. When querying a subhierarchy of our example in Figure 5, we
determine all types included in the subhierarchy and evaluate a query on each
corresponding B+-tree. When interested in all academics, we have to query the
B+-trees for the class Academic, Teaching, and Research.
H-trees
While an SC-tree maintains a set of isolated structures for each type, an H-tree
(Low, 1992) nests these B
+
-trees to avoid a full search of each component. This
means that the nodes of a superclass B
+
-tree may contain pointers to nodes of
subclass B
+
-trees. There are two important rules for nesting the B
+
-trees. First,
we have to make sure that the ranges of the nesting node and the nested node
are compatible, so that we do not accidentally end up in a different part of the
domain when traversing pointers to a different B
+
-tree. Second, all leaf nodes in
a subclass B
+
-tree have to be reachable from the corresponding superclass B
+
-
tree. Due to space constraints, we are not going to present the details on how this
is done (for further explanations see Low, 1992).
CH-index
A CH-index (Kim, 1989) uses a different approach than SC- or H-trees. Here,
the objects are indexed using a single B
+
-tree structure, and the inner pages look
like the inner pages of a regular B
+
-tree storing the values of the indexed
attributes. The leaf pages look different, however. In the leaf pages, we
distinguish between the different types of objects. Figure 23 shows a simplified
view of a CH-index (for details see Kim, 1989) indexing the ages of staff
members (with a path from the root node to a leaf). For each value (in a leaf
page), we have a list for each type for which objects exist that have this value.
CG-trees
Depending on the size of the indexed type hierarchy, we have many entries in a
leaf page of a CH-index that we are not interested in during query evaluation.
For example, if we want to retrieve all academics, we can ignore objects of the
types Staff, Administrative, Technical, and Nontechnical. Unfortunately, point-
ers to these objects are contained in the leaf pages of a CH-index.
In a CG-tree, we have at most one pointer per type. Figure 24 shows the two
lowest levels of a (slightly simplified) CG-tree (for implementation details, see
Kim, 1989). The objects belonging to the type Academic are stored on the pages
Figure 23. Example for a CH-index
25 35 ........
........ 28 33
28 Academic 29 Research ....... Academic
234 Helmer
P
1
, P
2
, and P
3
, and those for type Research are stored on pages P
4
and P
5
. As
objects of different types are probably not distributed in the same way, pages can
be shared. So, for example, if pages P
1
and P
2
are only lightly filled, they can be
merged to one page that is shared between the two entries for Academic on the
level above (for details on how to balance the leaf pages, see also Kilger, 1994).
Multikey Index
The basic idea in using multikey indexes is to consider the type information as just
another dimension describing an object. The main problem with this approach is
the partial ordering of the types. We would like to impose a total order on the
types in such a way that all queries regarding subhierarchies map to contiguous
range queries. Assume that we want to retrieve all academics between the ages
of 25 and 50. Figure 25(a) shows an optimal way to linearize all the types of the
staff hierarchy, while Figure 25(b) shows a suboptimal solution. When the
objects are optimally arranged by type on disk, we can (for all subtype
Figure 24. CG-tree
Figure 25. Linearizing type hierarchies
(a) (b)
P
1
P
2
P
3
P
4
P
5
28 Academic Research 33 Academic 35 Academic Research
20 40 60 80
Staff
Adminstrative
Technical
N
Academic
Teaching
Research
20 40 60 80
Staff
Adminstrative
Teaching
Research
N
Technical
Academic
hierarchies) retrieve all objects belonging to a certain subtype hierarchy via one
sequential scan without gaps. Mueck and Polaschek gave an algorithm that finds
an optimal linearization (if one exists) (Mueck, 1996, 1997).
After linearizing the type hierarchy, we can use any standard multikey index
structure. However, it is not always possible to find an optimal linearization in the
case of multiple inheritance. This is also important in the context of FOODBS
systems, because fuzzy membership of objects in classes may lead to similar
problems.
Indexing Fuzzy Type Hierarchies
The technique used for SC-trees can be adapted to fuzzy type hierarchies in a
straightforward manner by exchanging the B
+
-trees for other data structures.
However, SC-trees are the most inefficient of all presented indexes, as we have
to do a full search for each subtype. H-trees are too closely knit to the structure
of B
+
-trees, which are not necessarily ideal data structures for FOODBS
systems. Multikey indexes are performant, if we are able to linearize the type
hierarchy well enough. We expect that this may be difficult to do for fuzzy type
hierarchies. For these reasons, we opt for adapting the methods used in CH-
indexes and CG-trees to fuzzy type hierarchies.
Looking at Figure 8, we see that each cell in a grid file has a pointer to a data page
(which it may share with other cells). Similar to CG-trees, we propose that cells
should contain a pointer for each different object type that is present. As we
expect that not all cells are filled evenly, objects of different types will probably
share different data pages. There are still a couple of open questions regarding
the optimization of this data structure, e.g., how do we exactly determine the cell
sharing for each type, and how do we balance the data pages in this two-
dimensional case? Another, more general problem affecting all indexes for fuzzy
type hierarchies is the fact that we store objects belonging to more than one type
redundantly for each type. Adding another level of indirection will solve this
particular problem but will also add inefficiency.
Future Trends
Developing new and improving existent index structures for FOODBS systems
will remain a viable research topic in the future, as many open problems still exist.
Let us name a few important ones here.
236 Helmer
Multidimensional index structures have demonstrated their usefulness for
FOODBS systems by their flexibility, however, they have a general problem with
high-dimensional data (this is called the curse of dimensionality). Fortunately,
when used for indexing fuzzy data, we are at the lower end of multidimension-
ality. This, and the fact that fuzzy data is an application for multidimensional index
structures with special needs and requirements, makes us hopeful that further
improvements can be found.
Efficient support of navigational accesses to objects via paths still lacks
satisfactory handling of uncertainty degrees. For example, the more efficient
JIHs (compared to access support relations) have problems dealing with
uncertainty degrees on intermediate paths that have been cut away in the index.
Indexing fuzzy type hierarchies is also not yet perfect. Details on how to optimize
the index structures for certain applications are missing, and we have some
redundancy in storing objects belonging to more than one type.
Conclusions
Efficient data retrieval is a necessity for a database system in order to be
accepted by end users. The history of database systems is full of examples to
prove this. Relational systems were able to replace network and hierarchical
database systems only after their performances were increased considerably.
Non-FOODBS systems can only be found in niche applications today, as their
performance and scalability could not keep up with relational systems. One issue
today is the performance of native XML database systems, which still lags
behind expectations. In our opinion, the fate of each new kind of database system
will be partly decided by whether or not its performance will improve significantly
over time.
This is also true for FOODBS systems. The task to improve their performance
will not be easy, because in addition to the fuzzy components, the regular object-
oriented components also need to be improved. One important step in improving
the efficiency of a database system is the introduction of powerful index
structures. Although a promising start has been made for FOODBS systems, this
research area has not yet received enough attention. Especially in the area of
path accesses and fuzzy type hierarchies, there are still plenty of opportunities
left for future research.
References
Bayer, R., & McCreight, E. (1972). Organization and maintenance of large
ordered indexes. Acta Informatica, 1, 173189.
Bentley, J. L. ( 1975). Multidimensional binary search trees used for associative
searching. Communications of the ACM, 18(9), 509517.
Bertino, E., Ooi, B. C., Sacks-Davis, R., Tan, K. -L., Zobel, J., Shidlovsky, B.,
& Catania, B. (1997). Indexing techniques for advanced database
systems. Dordrecht: Kluwer Academic Publishers.
model. In Proceedings of the Third IEEE Conference on Fuzzy Systems
(pp. 313318).
Bosc, P., & Galibourg, M. (1989). Indexing principles for a fuzzy database.
Information Systems, 14(6), 493499.
Bosc, P., Galibourg, M., & Hamon, G. (1988). Fuzzy querying with SQL:
Extensions and implementation aspects. Fuzzy Sets and Systems, 28, 333
349.
Boss, B., & Helmer, S. (1999). Index structures for efficiently accessing fuzzy
data including cost models and measurements. Fuzzy Sets and Systems,
108(1), 1137.
Cattell, R., Barry, D. K., Berler, M., Eastman, J., Jordan, D., Russell, C.,
Schadow, O., Stanienda, T., & Velez, F. (Eds.). (2000). The Object Data
Standard: ODMG 3.0. San Francisco: Morgan Kaufmann.
Deppisch, U. (1986). S-tree: A dynamic balanced signature index for office
retrieval. In Proceedings of the 1986 ACM Conference on Research
and Development in Information Retrieval (pp. 7787).
Durkin, J. (1994). Expert systems: Design and development. Upper Saddle
River, NJ: Prentice Hall.
Evangelidis, G., Lomet, D., & Salzberg, B. (1995). The
hb
-tree: A modified hb-
tree supporting concurrency, recovery and node consolation. In Proceed-
ings of the 21st VLDB Conference (pp. 551561).
Fagin, R., Nievergelt, J., Pippenger, N., & Strong H. R. (1979). Extendible
hashing a fast access method for dynamic files. ACM Transactions on
Database Systems, 4(3), 315344.
Gaede, V., & Gnther, O. (1998). Multidimensional access methods. ACM
Computing Surveys, 30(2), 170231.
238 Helmer
George, R., Buckles, B. P., & Petry, F. E. (1992). An object-oriented data model
to represent uncertainty in coupled artificial intelligence-database systems.
In M. P. Papazoglou, & J. Zeleznikow (Eds.), The next generation of
information systems: From data to knowledge A selection of papers
presented at two IJCAI-91 workshops, Sydney, Australia, August 26,
1991 (Vol. 611 of Lecture Notes in Computer Science, pp. 3748). Berlin:
Springer.
Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching.
In Proceedings of the 1984 ACM SIGMOD (pp. 4757).
Han, J., Xie, Z., & Fu, Y. (1999). Join index hierarchy: An indexing structure for
efficient navigation in object-oriented databases. ACM Transactions on
Knowledge and Data Engineering, 11(2), 321337.
Hellerstein, J. M., & Pfeffer, A. (1994). The RD-tree: An index structure for
sets. Technical Report 1252. Madison: University of Wisconsin.
Helmer, S. (2001). Indexing fuzzy data. In Proceedings of the Joint Ninth
IFSA World Congress and 20th NAFIPS International Conference (pp.
21202125).
Helmer, S., & Moerkotte, G. (2003). A performance study of four index
structures for set-valued attributes of low cardinality. VLDB Journal,
12(3), 244261.
Ishikawa, Y., Kitagawa, H., & Ohbo, N. (1993). Evaluation of signature files as
set access facilities in OODBs. In Proceedings of the 1993 ACM
SIGMOD (pp. 247256).
Kemper, A., & Moerkotte, G. (1989). Access support in object bases. Technical
Report 17/89. Karlsruhe: University of Karlsruhe.
Kemper, A., & Moerkotte, G. (1992). Access support relations: An indexing
method for object bases. Information Systems, 17(2), 117146.
Kilger, C., & Moerkotte, G. (1994). Indexing multiple sets. In Proceedings of
20th International Conference on Very Large Data Bases (pp. 180
191).
Kim, W., Kim, K. -C., & Dale, A. (1989). Indexing techniques for object-
oriented databases. In W. Kim, & F. H. Lochovsky (Eds.), Object-
oriented concepts, databases, and applications (pp. 371394). Read-
ing, MA: Addison-Wesley.
Kitagawa, H., & Fukushima, K. (1996). Composite bit-sliced signature file: An
efficient access method for set-valued object retrieval. In Proceedings of
the International Symposium on Co-operative Database Systems for
Advanced Applications (CODAS) (pp. 388395).
Knuth, D. E. (1973). The art of computer programming (Vol. 3): Sorting and
searching. Reading, MA: Addison Wesley.
Kriegel, H. -P., Ptke, M., & Seidl, T. (2000). Managing intervals efficiently in
object-relational databases. In Proceedings of the 26th VLDB Confer-
ence (pp. 407418).
Kumar, A. (1994). G-tree: A new data structure for organizing multidimensional
data. Transactions on Knowledge and Data Engineering, 6(2), 341
347.
Liu, C., Ouksel, A. M., Sistla, A. P., Wu, J., Yu, C. T., & Rishe, N. (1996).
Performance evaluation of G-tree and its application in fuzzy databases. In
CIKM 96, Proceedings of the Fifth International Conference on
Information and Knowledge Management (pp. 235242).
Low, C. C., Ooi, B. C., & Lu, H. (1992). H-trees: A dynamic associative search
index for OODB. In Proceedings of the 1992 ACM SIGMOD Confer-
ence (pp. 134143).
Luk, R. W. P., Leong, H. V., Dillon, T. S., Chan, A. T. S., Croft, W. B., & Allan,
J. (2002). A survey in indexing and searching XML documents. Journal of
the American Society for Information Science and Technology, 53(6),
415437.
Manolopoulos, Y., Theodoridis, Y., & Tsotras, V. J. (1999). Advanced data-
base indexing. Dordrecht: Kluwer Academic Publishers.
Mueck, T. A., & Polaschek, M. L. (1996). Indexing type hierarchies with
multikey structures. In Proceedings of the Seventh Workshop on Persis-
tent Object Systems (POS) (pp. 184193).
Mueck, T. A., & Polaschek, M. L. (1997). Index data structures in object-
oriented databases. Dordrecht: Kluwer Academic Publishers.
Na, S., & Park, S. (1996). A fuzzy association algebra based on a fuzzy object
oriented data model. In Proceedings of the 20th Computer Software and
Applications Conference (COMPSAC 96) (pp. 276281).
Nievergelt, J., & Hinterberger, H. (1984). The grid file: An adaptable, symmetric
multikey file structure. ACM Transactions on Database Systems, 9(1),
3871.
Prade, H., & Testemale, C. (1984). Generalizing database relational algebra for
the treatment of incomplete or uncertain information and vague queries.
Information Sciences, 34, 115143.
Preparata, F. P., & Shamos, M. I. (1993). Computational geometry: An
introduction. Berlin: Springer.
Robinson, J. T. (1981). The k-d B-tree. In Proceedings of the 1981 ACM
SIGMOD (pp. 1018).
240 Helmer
Sacks-Davis, R., & Zobel, J. (1997). Text databases. In Indexing techniques
for advanced database systems (pp. 151184). Dordrecht: Kluwer
Academic Publishers.
Shortliffe, E. H., & Buchanan, B. G. (1975). A model of inexact reasoning in
medicine. Mathematical Biosciences, 23, 351379.
Silberschatz, A., Korth, H. F., & Sudarshan, S. (2001). Database system
concepts. New York: McGraw-Hill.
Westmann, T., Kossmann, D., Helmer, S., & Moerkotte, G. (2000). The
implementation and performance of compressed databases. SIGMOD
Record, 29(3), 5567.
Yazici, A., & Cibiceli, D. (1999). An access structure for similarity-based fuzzy
databases. Information Sciences, 115(14), 137163.
Yazici, A., & Koyuncu, M. (1997). Fuzzy object-oriented database modeling
coupled with fuzzy logic. Fuzzy Sets and Systems, 89(1), 126.
in the fuzzy object-oriented data model. Information Sciences, 108(14),
241260.
Introducing Fuzziness 241
Chapter VIII
Introducing
Fuzziness in Existing
Orthogonal Persistence
Interfaces and Systems
Miguel ngel Sicilia
University of Alcal, Spain
Elena Garca-Barriocanal
Jos A. Gutirrez
Abstract
Previous research has resulted in generalizations of the capabilities of
OODB models and query languages to cope with imprecise and uncertain
information in several ways, informed by previous research in fuzzy
relational databases. As a result, a number of models and techniques to
integrate fuzziness in its various facets in object data stores are available
for researchers and practitioners, and even extensions to commercial
systems have been implemented. Nonetheless, for those models and
techniques to become widespread in industrial contexts, more attention
should be paid to their integration with current database design and
programming practices, so that the benefits of fuzzy extensions could be
easily adopted and seamlessly integrated in current applications. This
chapter attempts to provide some criteria to select the fuzzy extensions that
242 Sicilia, Garca-Barriocanal, & Gutirrez
more seamlessly integrate in the current object storage paradigm known as
orthogonal persistence, in which programming-language object models
are directly stored, so that database design becomes mainly a matter of
object design. Concrete examples and case studies are provided as practical
illustrations of the introduction of fuzziness both at the conceptual and the
physical levels of this kind of persistent system.
Introduction
A number of research groups has investigated the problem of modeling fuzziness
in the context of object-oriented databases (OODBs), e.g., De Caluwe (1998),
Ma, Zang, and Ma (2003), and some of their results include research implemen-
tations on top of commercial systems, e.g., those reported in Yazici, George, and
Aksoy (1998) and in Schenker, Last, and Kandel (2001). Despite the consider-
able amount of significant research in the field, no commercial system is available
today that supports fuzziness explicitly in its core physical or logical model, and
existing database standards regarding object persistence sources like those
of the Object Data Management Group (ODMG) (Cattell, 2000) and JavaData
Objects (JDO) (Russell et al., 2001) do not support vagueness or any other
kind of generalized uncertainty information representation (Klir & Wierman,
1998) in their data models.
One possible reason for this lack of integration of fuzziness in industrial practices
may be found in the relative complexity of modeling with fuzzy mechanisms,
which makes it difficult for average practitioners to fully understand and exploit
the potential of fuzzy techniques. Studies coming from the field of psychology of
programming, like those by Green and Petre (1996) and Kao and Archer (1997),
may serve as points of departure to investigate how fuzziness affects the mental
models of programmers and designers. In any case, further research is needed
in how to extend existing (crisp) database programming technology to its fuzzy
generalization in an acceptable and usable way for the average developer. In
addition, some of these generalizations may eventually lead to reduced perfor-
mance and other inefficiencies, precluding a priori their acceptability. This
chapter aims at providing an overview of some of the issues regarding the just
described situation, and at serving as a point of departure for further research in
the area.
The rest of this chapter is structured as follows. The second section provides a
brief review of existing research on extending OODB models, and the motivation
for research on usability and acceptability of fuzzy constructs in orthogonal
persistence systems and programming interfaces. The third section deals with
the introduction of specific fuzzy constructs in orthogonal persistence systems,
according to their similarities to existing crisp conceptual modeling elements. The
fourth section briefly sketches some of the representational and physical storage
issues that must be taken into account when introducing fuzzy constructs. Finally,
some concrete illustrations of the issues are provided in the fifth section.
Background
Several fuzzy OODB models and applications have been reported to date.
Similarity-based models like the one described in Aksoy, Yazici, and George
(1996) provide class definitions based on similar value ranges of instances.
Models based on possibility theory (Dubois, Prade, & Rossazza, 1991) are able
to represent vagueness and uncertainty in class hierarchies by introducing
constraints in attribute values. Models like UFO (De Caluwe, 1998) provide a
variety of representations for imperfect information, separating concerns for
vagueness and for uncertainty. Other authors proposed fuzzy sets as first-class
programming objects (Inoue, Yamamoto, & Yasunobu, 1991). Existing applica-
tions of fuzzy object databases include geographical information systems (Cross
& Firat, 2000), applications to multimedia (Koprulu, Cicekli, & Yazici, 2003), and
retrieval in image databases (Nepal, Ramakrishna, & Thom, 1999).
Database models like FOOD (Yazici & Koyuncu, 1997) and FRIL++ (Cao &
Rossiter, 2003) integrate with logics or deductive capabilities to provide support
for fuzzy inference, but we will not deal with this issue here, because most
current industrial applications do not include reasoning and are not based on a
sort of knowledge representation formalism, in the sense given by Davis, Shrobe,
and Szolovits (1993).
Despite the fact that current approaches to uncertainty and imprecision in object
databases are fairly diverse in their supporting mathematical frameworks and
assumptions, for now, they are relegated to research systems for specific
applications. In fact, fuzzy object models are not considered in standard modeling
languages like the Unified Modeling Language (UML), and they are not
supported by any kind of free or commercial persistence system. This situation
is aggravated by the fact that object databases are currently considered niche
technologies (Kim, 2003) that have not reached a state of wide industrial
adoption, except for specialized applications like CAD/CAM, resulting in a lack
of common physical and distribution architectures.
Consequently, the case for fuzzy extensions to object databases requires the
practical integration of research models in existing products and programming
interfaces. Such pragmatically directed integration efforts should take as a point
of departure the existing mindset conformed by the most-used object-oriented
languages (like Java or C++) and database systems (converging on ODMG and
more recently, on JDO), considering consistency and ease of understanding as
the primary concerns. Extensions to database or object design artifacts should
first come in the forms of strictly additive increments, so that the (crisp)
semantics of the previous models remain unaffected for backward compatibility.
But this is not always easy, because generalizations often require changes in
basic model definitions, like those of existing extensions to ODMG type systems
(De Tr & De Caluwe, 2003) and to UML basic cardinality definitions (Sicilia,
Garca, & Gutirrez, 2002). This chapter describes a concrete selection of basic
fuzzy extensions and their rationales, along with some implementation concerns
regarding their suitability in practical settings.
Introducing Fuzziness in Orthogonal
Persistence Interfaces
One view of fuzzy extensions to object database technology is that of providing
the more comprehensive range of conceptual elements to obtain the richer model
in terms of features for the representation of uncertainty and imprecision in its
various facets (Smets, 1997). This view is mainly oriented toward obtaining
mathematical models that integrate a large number of features and techniques
in a single model. An example of such an integrated system in the fuzzy relational
database arena is GEFRED (Medina, Pons, & Vila, 1994). But such an approach
does not consider a priori issues of usability and adequacy of the extensions
being included, from the perspective of technology adoption. One alternative
view of extending object database models with fuzzy constructs is that of taking
existing database concepts as points of departure, and selecting for inclusion first
those fuzzy extensions that are closer to existing modeling concepts, in an
attempt to conform a set of extensions that seamlessly integrate with existing
orthogonal persistence systems and programming practices. This latter ap-
proach, that has received little attention to date, is the one adopted in this chapter,
so that the rest of this section addresses general criteria for the introduction of
fuzziness and general extensions to existing and widespread data modeling
concepts.
Criteria for the Introduction of Fuzzy Constructs
Here we are concerned with the selection of fuzzy extensions to the object
database model that are closer to existing widespread object-oriented modeling
and programming concepts, instead of focusing on other kinds of technical
considerations described elsewhere (Askoy & Yazici, 1993). From a cognitive
perspective, database models and associated programming models require the
construction of mental models, and some assumptions are required to select
fuzzy information artifacts. This perspective leads us to consider the usability of
fuzzy constructs as the general criterion. Usability must be understood here as
the extent to which a given fuzzy extension matches the existing concepts that
are commonly dealt with by practitioners. This concept of usability must be
broken down in more concrete attributes that will be discussed in what follows.
According to the cognitive dimension framework (Green, 2000) role-expres-
siveness is a dimension of information artifacts that refers to how easy it is to
discover the rationale for structures. In the study of visual programming
languages (Green & Petre, 1996), it is also mentioned in the dimension of
closeness of mapping of the representation to the domain, and consistency,
which states that similar semantics should be expressed in similar syntactical
structures. These three dimensions can be adapted to become criteria for the
introduction of extensions for fuzziness in object database models, taking as a
point of departure the actual design and programming interfaces of OODBs.
Imperative OODB application programming interfaces stay close to the seman-
tic and syntax of the object-oriented programming languages in which they are
embedded see, for example, (Atkinson et al., 1996) facilitating the
construction of research prototypes that extend commercial systems by adding
a software layer that acts as a proxy filter (Gamma et al., 1995) for the underlying
nonfuzzy languages. Both JDO, ODMG, and other nonstandardized program-
ming interfaces follow to some extent the principles of orthogonal persistence,
so that the problem of introducing fuzziness can be viewed as the problem of
fuzzifying common object-oriented design relationships and design tactics.
This is the approach taken in this chapter, which focuses on widespread design
and programming practices like UML (OMG, 1999) modeling and JDO- or
ODMG-based programming.
Consequently, the criteria considered for our purposes can be stated as follows:
1. The extensions must be consistent with existing OODB design or imple-
mentation elements. That is, they must be recognizable as generalized or
decorated variants or well-known elements.
2. To enhance role-expressiveness, extensions that do not require the
understanding of nontrivial mathematical properties or frameworks will be
selected first.
3. The selected extensions at the conceptual level must not express a concrete
imprecision or uncertainty handling procedure but only reflect properties
that can be captured by average modelers from the domain being modeled.
This set of criteria may be considered controversial, but it represents a first
attempt to come up with a framework to reason about fuzzy technology adoption
in general. The criteria led us to adopt a method to design extensions that
essentially proceeds by extending the main concepts in the UML and in related
object database Application Programming Interfaces (APIs) with the simplest
fuzzy counterpart. This is intended as a first step for adoption that would ideally
be followed by subsequent assessment and redesign steps, all aimed at finally
coming up with full-fledged fuzzy database models that incorporate all the
expressive power currently contained in fuzzy models (De Caluwe, 1996).
From Fuzzy Conceptual Modeling to Fuzzy Databases:
Extending the UML
Currently, the UML is defined in the framework of a four-layer meta-modeling
architecture. The meta-meta-model layer (M3) is a language for the specifica-
tion of meta-models (oriented toward building repositories of modeling lan-
guages) and is loosely connected with the meta-model layer. In turn, the meta-
model layer (M2) contains the essential definition of the UML modeling
constructs. Levels M1 (user model layer) and M0 (user object layer)
correspond with the definition of UML models, and instances of the elements in
these models, respectively.
Extensions to the UML are achieved at the M2 level, and, although this approach
has been recently criticized (Atkinson & Khne, 2000), the majority of the
current extensions are carried out in that way. The relationship between layers
in the UML architecture is conceived exclusively in terms of instance-of
relationships. More specifically, elements at layer M1 are instances of elements
at layer M2, and elements in the M0 layer are instances of both M1 and M2
layers (this is considered a loose meta-modeling approach). The main extension
mechanism in the UML is the concept of Stereotype, which defines a virtual
subclass of a UML metaclass, allowing for the definition of new meta-attributes
and extended semantics. A profile is a stereotyped UML Package that contains
a set of extensions. Tag definitions can be defined independently of any
stereotype, in which case its tagged values can be attached to any ModelElement
instance, as we require.
Fuzzifying Classes and Objects
According to the UML 1.5 specification, a class is the descriptor for a set of
objects with similar structure, behavior, and relationships. The model is con-
cerned with describing the intension of the class, that is, the rules that define it.
This definition precludes approaches to fuzzy classes that are defined by
extension or that allow for partial degrees of applicability for attributes, if
maximum consistency with previous semantics is required. In addition, definition
by intension is difficult to remove from current object-oriented programming
languages. Consequently, the type of fuzziness selected provides a path for
partial membership of instances, but with conventional attribute definitions.
Practical examples of such kinds of models can be found in the literature (Sicilia,
Garca, Daz, & Aedo, 2002b). Class variants that vary in attribute definitions can
be introduced by standard means through multiple classification via inheritance,
interface implementation, or specialized design patterns, if necessary.
Figure 1 shows an example UML diagram with a class in which instances are
allowed partial membership. In most cases, membership is a function of the
actual values of attributes, so that methods to specify the computation of such
degrees have to be provided (e.g., through using some specific tagged values).
In Figure 1, examples of fuzzy attributes are provided. Attribute a is defined in
the domain of a datatype AValueScale that can be used to represent standard
fuzzy values. The details and forms of the membership functions and other
properties could be represented at the conceptual level through UML tagged
values that could be eventually used to generate database code. Attribute b is
stereotyped with <<interval>> denoting that its values can be given in interval
form, and attribute c is stereotyped with <<poss>> indicating that its values are
possibilities. From the perspective of the developer, all these extensions are
simply specialized data types, expressed through the conventional UML nota-
tion. Their implementation does not require specialized database structures,
provided that interpretation and subsequent elaboration are kept as part of the
class responsibilities.
According to fuzzy class semantics, flexible inheritance imposes a constraint on
the membership of instances. In concrete terms, if A is a subclass of B, the
membership of any instances to A must not be greater than its membership in B,
otherwise it would contradict the crisp case.
Figure 1. Example UML diagram with fuzziness at attribute and class levels
-a : AValueScale
interval -b : double(idl)
poss -c
fuzzy
A
+very_low
+low
+medium
+high
+very-high
enumeration
AValueScale
A
(x)
B
(x) A,B

Class
The expression is only a special case of fuzzy generalization-specialization (gen-
spec) relationship, as described by Chen (1998). Stricter requirements may be
enforced through common object constraint language (OCL). In any case, the
interpretation does not interfere with the conventional monotonic interpretation
of inheritance, according to which subclassing is a way of extending, but never
of constraining, some of the semantics of the subclasses.
Note that the kind of fuzziness described for classes and inheritance is introduced
at the M0 level. In addition, all the elements in a (static) UML model can be given
a grade of belonging to the model. This concept is similar to that of the Fuzzy-
EER at level L1 for entities, relationships, and attributes, so that, for example, the
set of entities in a model can be given a membership grade (Chen, 1998, p. 64).
This can be interpreted, for example, as it is not completely sure the role element
E plays in the context of the model. This fuzziness at M1 has other interesting
applications. For example, numeric distance between classes and subclasses
can be used in the construction of applications that consider conceptual struc-
tures (Sicilia, Garca, Aedo, & Daz, 2003). Because all these M1-level elements
are found in specialized, knowledge-based applications, we will not deal with
them here.
Introducing Fuzzy Associations
Associations are considered mathematical relations among instances. A crisp
relation represents the presence or absence of interconnectedness between the
elements of two or more sets. This concept is referred to as association when
applied to object-oriented modeling. According to the UML, an association
defines a semantic relationship between classifiers, and the instances of an
association can be considered a set of tuples relating instances of these
classifiers, where each tuple value may appear, at most, once. A binary
association may involve one or two fuzzy relations (i.e., the unidirectional and
bidirectional cases), although due to the semantic interpretation of associations,
they are in many cases considered to convey the same information (i.e., the
association between authors and books is interpreted in the same way despite the
navigation direction).
Fuzzy relations are generalizations of the concept of a crisp relation in which
various degrees of strength of relation are allowed (Klir, 1988). A binary fuzzy
relation R on XY is a fuzzy subset of that Cartesian product as denoted in
Expression (1):
( ) ( ) ) , ( | ) , ( , , Y X y x y x y x R
R
(1)
All the relation concepts can be extended to the n-ary case, where
( )
n n
X X X X X X R K K
2 1 2 1
, , , (2)
We will restrict ourselves to the binary case, because it is the most common case
in database applications. Fuzzy associations can be represented as literal tuples
between model elements that hold an additional value representing their mem-
bership grade to the association. This assumption implies some constraints in the
implementation of bidirectional associations, because both association ends
should be aware of updates on the other.
Fuzzy associations are represented in UML models by simply adding a <<fuzzy>>
stereotype, for the sake of maximum consistency, as first proposed in (Gutierrez,
Sicilia, & Garcia, 2002). The interpretation of the association is expressed by
additional substereotypes, but at the modeling and database representation level,
the top stereotype could suffice in most common domain modeling situations.
Additional restrictions on associations are represented, as usual, with OCL
constraints. The use of fuzzy cardinalities would require a change in the UML
meta-model, so that we could use annotations for the many (denoted by the
symbol *) cardinality to specify them. In any case, cardinality restrictions do not
affect physical representation but only update semantics, which are usually
enforced by the application, even in the crisp case. An example of association
design will be described later.
Issues of Representation and Efficiency
in Integrating Fuzziness in Object
Sources
Once a number of conceptual-level fuzzy extensions to the object model as
those described in the previous section are selected, the feasibility of
integrating such extended elements in existing database systems must be
addressed. In this section, a number of concrete issues regarding the physical
integration of fuzziness in existing systems are briefly sketched, and empirical
techniques for their assessment will be considered. Of course, the collection of
issues covered in what follows is not intended to be comprehensive but is to
provide an overview of the kind of inquiry efforts required.
Fuzziness and Physical Storage Models in Object Bases
Object databases are diverse in their models of physical storage, with architec-
tures that range from server-based query resolutions, like that of CA-Jasmine,
to models based in client caches of objects that distribute the workload of query
processing to the client applications. The latter architectures put the burden of
computations of membership values in the client, requiring special considerations
for physical clustering, as will be illustrated later, in the context of a case study.
Despite the fact that standards for object database access were proposed (e.g.,
ODMG or JDO), no common storage and distribution architecture currently
exists. Consequently, the provision of fuzzy extensions must be carefully
examined with regard to existing data architectures.
One common feature of object databases is their navigational capabilities, which
entails some concept of database object reference that generalizes the notion of
pointer or reference of programming language objects. Such database refer-
ences use a concrete form of indirection mechanism from secondary storage to
principal memory (Tarr, 1995). This entails that in many cases, object databases
tend to maintain objects in the same physical address, due to the cost of changing
all the references to a given object when moving them. In addition, objects of the
same class are frequently clustered together for performance reasons. Conse-
quently, classification by extension depending on attribute values seems to
interfere with storage models, so that models that retain intensional class
definitions appear to integrate better with physical structures.
Representing Classes and Associations Through -Cuts
Membership degrees in fuzzy classes or degrees of participation in fuzzy
associations are usually represented through infinite domains, e.g., the [0,1]
interval. This entails that every object in a class or association may eventually
be associated to a different membership degree, so that processing of collections
of objects would entail time-consuming iterations. This is a cross-cutting concern
of fuzziness in database systems, because fuzzy queries inherently require the
sorting of query results by degree, or perhaps in some cases, the selection of a
subset of results that satisfies a given requirement on membership degrees, e.g.,
a degree threshold for queries.
Representations based on level-cuts have been proposed as a way to efficiently
access fuzzy structures (Boss & Helmer, 1999). But in the case of orthogonal
object persistence filters, the design of such structures has to be done at the class
design level (Sicilia, Gutirrez, & Garca, 2002). In the following section, a case
study provides details about this approach.
Techniques for Assessing Representation Adequacy
Some previous work addressed the problem of benchmarking object databases
that provide diverse read and update mechanisms (Hosking, 1995). Performance
metrics can be broken down in the read and update categories. Read metrics
are concerned with the mechanism of object faulting, that is, the check that the
referred object is in memory for any pointer or reference use, leading eventually
to data transfer from the server. Update metrics are related to the propagation
of updates on objects to the server, according to the transactional semantics that
are common to practically every object database system. In the latter case, eager
or lazy approaches to updates can be implemented.
In the case of dealing with fuzziness, the key performance determinant is the
retrieval of collections of fuzzy objects and the possible combinations of
membership values with standard fuzzy operators (conjunctive, disjunctive,
negation, hedges, and the like). Consequently, conventional measurement tech-
niques must be informed with attributes related to fuzziness, most notably
including:
1. Extent cardinality for fuzzy classes
2. Fuzzy relation cardinality for fuzzy associations
3. Degree of granulation permitted for instances of fuzzy classes or links in
fuzzy associations
The three elements can be used to make a choice for the underlying collections
supporting them, which may eventually be changed dynamically, reflecting
changes in the cardinality of the participating instances. Cardinalities of classes
and associations become the raw data required to build benchmarking suites, but
also consider the tolerance of queries for each given application to low
membership (relevance) of retrieved objects in general. This indicates that
tolerance becomes a dimension that must be considered when evaluating a fuzzy
OODBMS.
Information granulation is viewed as a form of compression inspired in human
perceptual processes (Zadeh, 1997). As such, the degree of granulation a given
application tolerates impacts on the storage requirements and on the domain of
the types that hold the information, also constituting a dimension in the assess-
ment of database systems for which further research would be necessary.
In addition, the adequacy of fuzzy databases can be approached from the
perspective of the concept of epistemological adequacy, proposed by McCarthy
(1981). Here the perspective is that of assessing the matching of the represen-
tational structures used with the actual forms of uncertainty or imprecision
inherent to the domain being modeled. Currently, this kind of assessment can only
be carried out by contrasting taxonomies of information imperfection (Smets,
1997) with an explicit modelers concern for these kinds of imperfection in the
domain.
Case Studies
In this section, we illustrate some of the issues described in the previous sections
through concrete technological artifacts. First, the extension of JDO database
programming interfaces is discussed, and then performance issues regarding a
small footprint persistence engine and a full-fledged database server are
described.
Fuzzification of Standardized Interfaces: The Case of
JDO
The Java Data Objects (JDO) API
1
is a standard interface-based Java model
abstraction of persistence, developed under the auspices of the Java Community
Process, and somewhat continuing the efforts of the ODMG group. In essence,
JDO provides a standard API for the storage of Java object models in any kind
of supporting database technology, including relational, object-relational or
object databases. Consequently, it provides orthogonal persistence irrespective
of the final physical storage.
Persistent-capable instances in JDO must belong to a class that implements the
PersistenceCapable interface. Classes may directly implement the interface, or
it can be added by enhancer tools, which automatically modify the Java source
code or bytecode. It provides navigational and declarative access to persistent
instance by means of a query API and a query language called JDOQL.
Navigational access can be carried out, for example, by calling the getExtent
method of the persistence manager, which returns a Collection with all the
instances belonging to a given class. JDO provides a method makePersistent in
PersistentManager to make concrete instances persistent, and it also provides
persistence by reachability, so that any instance linked to a persistent one
(transitively) is also made persistent. Consequently, adding fuzzy classes to JDO
requires two sets of extensions. On the one hand, the programming interfaces
must be extended to include the option of explicitly handling membership grades
(in a way consistent with existing programming practices). On the other hand, the
query language must be extended to a flexible one (ideally) dealing with the
extension but not obscuring the original syntax or semantics of the original.
Extending navigational access is basically a matter of providing class extents that
somewhat embody membership values for each instance. Providing such support
without changing Java collection semantics can be done by means of the
genericity of Java container classes that is based on storing any reference type,
i.e., any instance belonging to the Object class. This approach is similar to the one
described in Sicilia, Garca, Daz, and Aedo (2002) to extend RecordSets with
membership grades. It becomes necessary to wrap an existent persistent
manager with a new class FuzzyPersistentManager providing the same interface
but handling internally the processing of membership degrees:
Extent e = null;
try{
pm.currentTransaction().begin();
e = pm.getExtent(myclasses.X, true, asc;min=0.2);
} catch(javax.jdo.JDOException){...}
The second parameter (e.g., true) passed to getExtent() indicates that instances
of subclasses must also be retrieved. The third parameter (asc;min=0.2)
indicates properties of the fuzzy set being retrieved, which can be the ordering
(ascending or descending by membership value) or cuts using thresholds or
ranges of membership values. A typical example of extent iteration is sketched
in what follows:
FuzzyExtent fe = (FuzzyExtent)e;
it = e.fuzzyIterator();
while (it.hasNext()){
FuzzyObject aux = (FuzzyObject) it.next();
X anX = (X) aux.getObject();
double mu = aux.getMembership();
}
The iterator internally points to fuzzy objects that provide membership informa-
tion. If the iterator() method is used instead of fuzzyIterator(), conventional (crisp)
iteration semantics are provided. This retrieval and processing schema leaves
the semantics of JDO interfaces unaffected, guaranteeing backwards compat-
ibility.
The query language JDOQL uses Java syntax for the specification of queries,
which are essentially Boolean filters on instance collections. Because queries
are specified as Strings, the approach, to provide maximum consistency and role-
expressiveness, is that of leaving the syntax unaffected and simply handling
fuzziness implicitly in operators. A typical extended query example is the
following:
String filter =
address.state == state && +
salary >= sal && +
department.name.startsWith(deptName) && +
projects.contains(proj) && +
proj.budget > 10000000;
Extent extent = pm.getExtent(ProductiveEmployee.class, true, asc;min=0.01);
Query query = pm.newFuzzyQuery(extent, filter);
((FuzzyQuery)query).interpretAllFuzzy();
query.declareImports(import Project);
query.declareVariables(Project proj);
query.declareParameters(
String state, String deptName, int sal);
Collection result = (Collection)query.execute(
Georgia, Network, new Integer(100000));
In the above example, ProductiveEmployee is a fuzzy subclass of the employees
who performed properly in the last quarter, according to imprecise criteria. Their
extents are filtered with a degree of 0.01, and then a conventional JDOQL query
is passed to a query object with fuzzy capabilities. The invocation to
interpretAllFuzzy indicates to the query resolution process that all the operators in
its filters are to be interpreted in fuzzy terms, and consequently, the and logical
operator (&&) will also produce the combination of scores according to a T-norm.
Alternatively, the interfaces of FuzzyQuery could be used to force the interpre-
tation of fuzziness only in some of the filters that are affecting the query. This
approach to extending JDOQL is similar to that used in fJDBC (Sicilia, Garca,
Daz, & Aedo, 2002), and puts fuzziness as an optional feature, because
subsequent iteration may choose to discard membership values. It should also be
noted that complex approaches to object comparison (Marn, Medina, Pons,
Snchez, & Vila, 2003) could be implemented without changing the JDOQL
syntax, thanks to the provision of abstract comparison methods in the Java
language.
Fuzzification in Persistence Engines: The Case of db4o
The db4o
2
object database is a lightweight OODB engine that provides a
seamless Java language binding (it uses reflection run-time capabilities to avoid
the need to modify existing classes to make their instances storable) and a novel
query-by-example (QBE) interface based on the results of the SODA
3
(Simple
Object Data-base Access) initiative. In what follows, we will discuss a concrete
representational structure for fuzzy items that acts as an indexed structure. Such
physical representation issues are justified by the fact that fuzzy queries often
retrieve many more objects than crisp ones, which resulted in the investigation
of concrete access mechanisms to improve performance like the relational
access structure described in Yazici and Cibiceli (1999).
Here we will describe a concrete approach to fuzzy association design. Because
it is common practice to develop object-oriented software from previously
defined UML models, we can consider UML semantics as a model from which
associations are implemented in specific object-oriented programming lan-
guages. This occurs through the process of association design that essentially
consists of the selection of the concrete data structure that better fits the
requirements of the association (e.g., Rumbaugh et al., 1996).Therefore, the
process of fuzzy association design will be an extension of conventional
association design practices.
A common representation for fuzzy relations is an n-dimensional array (Klir,
1988), but this representation does not fit well in the object paradigm, in which
a particular object (element of one of the domains in the relation) is aware only
of the tuples to which it belongs (the links), and uses them to navigate to other
instances. We extended the association concept to design fuzzy relations
attached to classes in a programming language so that a particular instance has
direct links (i.e., knows) to instances associated with it. Access to the entire
relation (that is, the union of the individual links of all the instances in the
association) is provided as a class responsibility, as will be described later.
The membership values of the relation must be kept apart from the instances of
the classes that participate in the association. A first approach could be that of
building Proxies for the instances, which will hold a reference to the instance at
the other side of the association and the membership grade, and storing them in
a standard collection. The main benefit of this approach is simplicity, because
only a class called, for example, FuzzyLink (FL from now on), solves the
representation problem. That is enough for the case of association with cardinal-
ity. We used this first approach for comparison purposes with our final design.
A drawback of the FL approach for associations with multiple cardinalities is that
the responsibility of preserving relation properties is left to the domain-class
designer. This is one of the reasons that prompted us to develop a second
approach in which the collection semantics, and not the element semantics, are
extended.
The base of our fuzzy collection framework is a FuzzyAssociationEnd (FAE)
interface that defines common behavior for all fuzzy associations. Concrete
classes implement that interface to provide different flavors of associations. In
this work, we will restrict our discussion to a FuzzyUnorderedAssociationEnd
(FUAE) class. The class diagram in Figure 2 shows how a unidirectional fuzzy
association [Figure 2(b)] from class A to class B can be designed with our
framework [Figure 2(a)].
It should be noted that the put method can be used to add and remove objects
from the relation. The latter case can be carried out by specifying a zero
membership. (We considered in this implementation that zero membership is
equivalent to the lack of a link.) Because many associations that store different
information may exist between the same pair of classes, associations must be
named. The class-instance FUAE is responsible for maintaining a collection of
the associations that are maintained as instances of it (i.e., this behavior is
modeled as a class responsibility). These different associations are represented
by instances as a FuzzyUnorderedAssociation (FUA) class. Therefore, FUA
instances represent entire generic associations and store the union of the links
that belong to them.
Using dictionaries with fixed precision-membership values as keys provides
performance benefits in common operations on fuzzy sets, like -cuts, outper-
forming common container classes (bags, sets, and lists). The rationale behind
Figure 2. Unidirectional binary association design
A B
+put()
i nterface
FuzzyAssociationEnd
1
-assoc 1
FuzzyUnorderedAssociationEnd
1
*

A B
1 -assoc *
fuzzy

(a) (b)
this organization is that association traversal would often be done by specifying
a minimum membership grade, that is, to obtain an element of the partition of the
fuzzy relation. This way, we are representing the relation by its resolution form
defined by Equation (3):
R
R R

U
(3)
where
R
is the level set of R, R
denotes an a-cut of the fuzzy relation, and R
is a fuzzy relation as defined in Equation (4):

) , ( ) , ( y x y x
R R

(4)
The implementation is an extension of Javas HashMap collection, which
essentially substitutes the add behavior with that of a link operation sketched as
follows:
public Object link(Object key, Object value){
if (key.getClass() == Double.class){
double mu = ((Double)key).doubleValue();
// truncates to current precision:
mu = truncateTo(mu);
// Get the set of elements with the given mu:
HashSet elements=(HashSet)this.get( new Double(mu) );
if ( elements == null ){
HashSet aux = new HashSet();
aux.add(value);
super.put(new Double(mu), aux);
}else
elements.add(value);
}
// Inform the association that a new link has been added:
if (association !=null)
association.put(key, this, value);
return null;
}
Figure 3 illustrates our design by showing an is interested in relation between
the set U of the users of a Web site and the set S of subjects of the page it serves.
Experimental studies pointed out the performance benefits of this approach
(Sicilia, Gutirrez, & Garca, 2002). Because the activationDepth parameter of
db4o determines the amount of reference traversals that are read in advance, it
should be considered an important factor in achieving such results. It must
be reduced from the default value 5 to 2 or 1 to obtain a significant
improvement, because with the default value, the entire object graph is always
retrieved.
The resolution form of a fuzzy relation is a convenient way to represent and
subsequently store fuzzy associations in orthogonal persistence engines. Addi-
tional constraints on link insertion semantics can be added to obtain specialized
relations like similarity relations, as described in Gutierrez, Sicilia, and Garcia
(2002).
Figure 3. An example of the is interested in fuzzy relation
u1 : User
u3 : User
u2 : User
sports : Subject
music : Subject
: FUAE
: FUAE
mu = 1
: Set
mu
0.45 : Set
mu = 0.2
: Set
mu = 0.6
: Set
mu
0.8 : Set u3 : User

Interaction of Fuzziness with Physical Structures: Case
of ObjectStore
The ObjectStore database system
4
is one of the most mature and stable products
in the OODB market, currently in its 6.1 version. It provides a fast performance
architecture originally called the Virtual Memory Mapping Architecture (VMMA)
that enables programmers to design physical structures that minimize response
time by clustering objects that are likely to be used together (Hansen, Adams, &
Gracio, 1999). Essentially, this architecture provides a clientserver architecture
that can be tuned to minimize data transfer from the ODB to the client by
carefully distributing objects in fixed-size containers called clusters, which
reside in expandable storage containers called segments.
The VMMA relies on a mechanism in the client (application) side that produces
page faults on a process virtual memory setting each time a pointer or
reference to a persistent object is referenced. If the object is in the clients
address space, it is directly mapped to the application address space, so that only
in cases when the object is not found in the client does the page fault handler goes
to secondary storage.
Cache affinity is the generic term that describes the degree to which data
accessed within a program overlaps with data already retrieved on behalf of a
previous request (Visnick, 2003). Cache affinity is critical for the performance
of the applications, because it minimizes clientserver data transfer, due to a
larger number of hits that are resolved locally in the cache of the client. Data
affinity depends on the set of database pages a client needs at a given time
(working set). Therefore, objects that are normally used together must be put
together in physical storage, so they will be retrieved in the same data pages, thus
minimizing the clients request to ObjectStore. Conversely, objects rarely used
must be kept apart from those frequently used. Clustering refers to that process
of putting together the data that are read or updated frequently at the same time,
and several design criteria are provided in documentation related to ObjectStore
to guide physical design, including uses of indexes, selection of physical storage
structures, and even refactoring of class design.
When dealing with fuzzy classes, flexible queries often act as filters that use
membership grades to select objects depending on a given a-cut. Because fuzzy
querying is not a feature of ObjectStore, the provision of that filtering behavior
would reside with the client, and hence, it is required that the full collection of
membership degrees be retrieved before resolving the query. If we use object
clustering, membership grades would be represented as a field inside the physical
structure of the object, so that each fuzzy query would require the transfer of the
entire object structure, significantly slowing the performance of functionalities
requiring instance selection based on fuzzy degrees. This situation points out the
necessity of separating the fuzzy mappings from the rest of the information on
fuzzy objects. That separation of objects and their membership degrees is a
concrete realization of the HeadBody Split technique described in Visnick
(2003). As a general database design pattern, it can be synthesized in the
following Java-like declarations using a simple delegation scheme:
Once the split into two classes is done, the database designer must allocate
instances of FuzzyClass_Crisp classes in separate physical units, so that only the
lighter version of the instances of fuzzy class X are required to filter by
membership, resulting in decreased data transfer loads.
In the case of fuzzy associations, the collections that hold the mappings of pairs
of instances should be isolated in independent clusters, so that clients are able to
// Original class

public class FuzzyClass{
// field declarations:
private X1 x1;
private X2 x2;

private XN xN;

// membership grade:
private double mu;

// methods

}

// Result of the split

public class FuzzyClass{
// membership grade:
private double mu;
// helper instance:
private FuzzyClass_Crisp _aux;
// constructor
public FuzzyClass(){
aux =
new FuzzyClass_Crisp(..);

}

// accessors for membership
public double getMu(){
return mu;
}

// methods delegated to the
// FuzzyClass_Crisp
}

public class FuzzyClass_Crisp{
// all the method and field
// declarations
// except those related
// to fuzziness.

}

first retrieve the entire fuzzy subset of the Cartesian product, select the fuzzy
links that are interesting for the given functionality, and then retrieve the subset
of pairs of instances that are relevant according to their degrees. The rationale
for such a technique is analogous to the Isolate Index technique described in
Visnick (2003). To summarize, cache-based object architectures require that
computations with membership grades be handled on the client side, so that
degrees of fuzzy classes or associations that are in a working set should be
clustered together.
Future Trends
The eventual widespread adoption of fuzzy object-oriented technology will be,
necessarily, accompanied by a generalized interest in fuzziness as a first-class
citizen in conceptual models and programming technology. Fuzziness generalizes
common crisp modeling constructs to a higher level of flexibility that is not always
required, so that a careful and progressive selection of the fuzzy extensions that
are introduced becomes crucial. A modular extension for fuzziness of the UML
language continuing previous work (Sicilia, Garca, & Gutirrez, 2002) and
leveraging existing research on fuzzy conceptual models (Chen, 1998) may
represent an important step in that direction, especially now that its 2.0 major
version provides improved extension mechanisms.
Moreover, one of the major current drivers of database technology is the
specificity of Web information, which benefits from the navigational structure of
object stores. Recent advances in Web information storage and management
(May & Lausen, 2004) go a step further in the integration of object models with
the specifics of the hypermedia structure of the Web. In addition, provided that
the vision of a Semantic Web (Berners-Lee, Hendler, & Lassila, 2001) eventu-
ally becomes a reality, the amount of metadata expressed in XML-based
languages like RDF will call for new requirements on object models and
databases, and also new query languages (Karvounarakis et al., 2003). Conse-
quently, research on the integration of fuzziness in languages for the description
of Web resources represents an important direction that has yet to be addressed
in a number of research works regarding fuzzy description logics (see, for
example, Straccia, 2001) and their practical applications for Web management
issues (Sicilia, 2003).
With respect to the design and implementation of ODB systems, aspect-oriented
design (AOD) represents a promising new technology that may eventually be
used to add fuzziness to object database models, isolating the storage and
computation of membership degrees from the functionality that is not affected
by them, extending existing related work (Rashid & Sawyer, 2001). Conse-
quently, fuzziness can be considered a cross-cutting concern in information
systems, and its management can be modularized in aspects or other similar
design-level constructs to clearly differentiate it (Sicilia & Garca, 2004). This
would eventually result in aspect-enabled object data stores enabling the storage
handling of uncertainty and imprecision at the programming language level (e.g.,
using the popular aspect-j Java extension5), without changing the crisp
classes. This would result in a cleaner separation of concerns than those using
conventional inheritance (Yazici, George, & Aksoy, 1998).
Conclusions
The introduction of fuzziness in existing OODB models must be carried out by
considering existing database design and programming practices to make the
extensions easier to understand and adopt by practitioners not knowledgeable in
fuzzy set theory or related mathematical frameworks for uncertainty. This
approach is proposed as a way to foster fuzzy technology adoption by the
community of orthogonal-persistence developers. Using consistency and self-
and domain closeness as general criteria, a restricted subset of the rich array of
proposed fuzzy extensions is selected, comprising fuzzy classes and inheritance
(respecting intensional definitions), fuzzy associations as specific fuzzy relations,
and fuzziness at the attribute level implemented as class responsibilities.
A number of issues regarding the physical storage and representation of such
fuzzy extensions were described and illustrated through case studies. First, the
integration of fuzziness with standard fuzzy database access interfaces was
illustrated with the JDO API. Second, the importance of representing member-
ship degrees in compact form was illustrated through a case study about the db4o
database engine. This association design approach provides improved perfor-
mance in operations that involve link retrieval by membership value, and adds no
significant time overhead in common collection iteration processes. In addition,
it was illustrated how cache-based architectures for ODBs like that of
ObjectStore call for physical grouping techniques that must take into account
the fact that computation with membership degrees occurs previous to actual
data transfer processes.
References
Aksoy, D., & Yazici, A. (1993). Criteria for evaluating fuzzy object oriented
database models. In E. Gelenbe (Ed.), Proceedings of the Eighth

Inter-
national Symposium on Computer and Information Sciences (pp. 136
143).
Aksoy, D., Yazici, A., & George, R. (1996). Extending similarity-based fuzzy
object-oriented data model. In K. M. George, J. H. Carroll, D. Oppemheim,
& J. Hightower (Eds.), Proceedings of the 1996 ACM Symposium on
Applied Computing (pp. 542546). New York: ACM Press.
Atkinson, C., & Khne, T. (2000). Strict profiles: Why and how. In A. Evans,
S. Kent, & B. Selic (Eds.), UML 2000 The Unified Modeling
Language, Third International Conference (Lecture Notes in Computer
Science 1939, pp. 309322). New York: Springer.
Atkinson, M. P., Daynes, L., Jordan, M. J., Printezis, T., & Spence, S. (1996).
An orthogonally persistent Java. ACM Sigmod Record, 25(4), 6875.
Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The semantic Web.
Scientific American, 284(5), 3443.
Boss, B., & Helmer, S. (1999). Index structures for efficiently accessing fuzzy
data including cost models and measurements. Fuzzy Sets and Systems
108(1), 1137.
Cao, T. H., & Rossiter, J. M. (2003). A deductive probabilistic and fuzzy OODB
language. Fuzzy Sets and Systems 140(1), 129150.
Cattell, R., Barry, D., Berler, M., Eastman, J., Jordan, D., Russell, C., et al.
(2000). The object data standard: ODMG 3.0. San Francisco, CA:
Morgan Kaufmann Publishers.
Chen, G. (1998). Fuzzy logic in data modeling: Semantics, constraints, and
database design. Norwell, MA: Kluwer.
systems. Fuzzy Sets and Systems 113(1), 1936.
Davis, R., Shrobe, H., & Szolovits, P. (1993) What is a knowledge representa-
tion? AI Magazine, 14(1), 1733.
de Caluwe, R. (Ed.). (1998). Fuzzy and uncertain object-oriented data-
bases: Concepts and models (Advances in Fuzzy Systems, Applications
and Theory, Vol. 13). River Edge, NJ: World Scientific.
de Tr, G., & De Caluwe, R. (2003). Level-2 fuzzy sets and their usefulness in
object-oriented database modeling. Fuzzy Sets and Systems 140(1), 29
49.
Dubois, D., Prade, H., & Rossazza, J. P. (1991). Vagueness, typicality and
uncertainty in class hierarchies. Int. Journal Intelligent Systems, 6, 167
183.
Gamma, E., Helm, R., Johnson, R., & Vlissides, J. (1995). Design patterns:
Elements of reusable object oriented design. Boston, MA: Addison
Wesley.
Green, T. R. G. (2000). Instructions and descriptions: Some cognitive aspects of
programming and similar activities. In V. Di Ges, S. Levialdi, & L.
Tarantino (Eds.), Proceedings of Working Conference on Advanced
Visual Interfaces (pp. 2128). New York: ACM Press.
Green, T. R. G., & Petre, M. (1996). Usability analysis of visual programming
environments: A cognitive dimensions framework. Journal of Visual
Languages and Computing, 7(2), 131174.
Gutirrez, J. A., Sicilia, M. A., & Garca, E. (2002). Integrating fuzzy associa-
tions and similarity relations in object oriented database systems. In
Proceedings of the International Conference on Fuzzy Sets Theory
and Its Applications (pp. 6667).
Hansen, D., Adams, D., & Gracio, D. (1999). In the trenches with ObjectStore.
Theory and Practice of Object Systems, 5(1) 201207.
Hosking, A. (1995). Benchmarking persistent programming languages: Quanti-
fying the language/database interface. In Proceedings of the OOPSLA95
Workshop on Object Database Behavior, Benchmarks, and Perfor-
mance.
Inoue, Y., Yamamoto, S., & Yasunobu, S. (1991). Fuzzy set object: Fuzzy set as
first-class object. In Proceedings of IFSA 1991 (pp. 7073).
Kao, D., & Archer, N. P. (1997) Abstraction in conceptual model design.
International Journal of HumanComputer Studies, 46(1), 125150.
Karvounarakis, G., Magkanaraki, A., Alexaki, S., Christophides, V., Plexousakis,
D., Scholl, M., et al. (2003). Querying the semantic Web with RQL.
Computer Networks, 42(5), 617640.
Kim, W. (2003). A retrospection on niche database technologies. Journal of
Object Technology, 2(2), 3542.
Klir, G., & Wierman, M. (1998). Uncertainty-based information: Elements of
generalized information theory (Studies in Fuzziness and Soft Comput-
ing, Vol. 15). New York: Springer-Verlag.
Koprulu, M., Cicekli, N. K., & Yazici, A. (2003). Spatio-temporal querying in
video databases. Information Sciences (to appear).
Ma, Z. M., Zhang, W. J., & Ma, W. Y. (2003). Extending object-oriented
databases for fuzzy information modeling, Information Systems (in press).
ogy, 45(7), 431444.
May, W., & Lausen, G. (2004). A uniform framework for integration of
information from the Web. Information Systems, 29(1), 5991.
McCarthy, J. L. (1981). Epistemological problems of artificial intelligence. In B.
L. Webber, & N. J. Nilsson (Eds.), Readings in artificial intelligence (pp.
459465). Los Altos, CA: Kaufmann.
Medina, J. M., Pons, O., & Vila, M. A. (1994). GEFRED. A generalized model
of fuzzy relational databases. Information Sciences, 76(12), 87109.
Nepal, A., Ramakrishna, M. V., & Thom, J. A. (1999). A fuzzy object query
language (FOQL) for image databases. In A. L. P. Chen, & F. H.
Lochovsky (Eds.), Proceedings of the Sixth International Conference
on Database Systems for Advanced Applications (pp. 117127).
Piscataway, NJ: IEEE Press.
Object Management Group: OMG Unified Modeling Language Specifica-
tion, Version 1.3 (1999).
Rashid, A., & Sawyer, P. (2001). Aspect-orientation and database systems: An
effective customisation approach. IEE Proceedings Software, 148(5),
156164.
Rumbaugh, J., Blaha, M., Premerlani, W., Eddy, F., & Lorenson, W. (1996).
Object oriented modeling and design. Upper Saddle River, NJ: Prentice
Hall.
Russell, C. et al. (2001). Java Data Objects (JDO) Version 1.0 proposed final
draft, Java Specification Request JSR000012.
Schenker, A., Last, M., & Kandel, A. (2001). Fuzzification of an object-oriented
database system. International Journal of Fuzzy Systems, 3(2), 432
441.
Sicilia, M. A. (2003). The role of vague categories in semantic and adaptive Web
interfaces. In R. Meersman, & Z. Tari (Eds.), Proceedings of the
Workshop on Human Computer Interface for Semantic Web and Web
Applications (Lecture Notes in Computer Science 2519, pp. 210222).
New York: Springer Verlag.
Sicilia, M. A., & Garca, E. (2004). On imperfection in information as an early
crosscutting concern and its mapping to aspect-oriented design. In Pro-
ceedings of the Early Aspects Workshop: Aspect-Oriented Require-
ments Engineering and Architecture Design (to appear).
Sicilia, M. A., Garca, E., & Gutirrez, J. A. (2002). Integrating fuzziness in
object oriented modelling languages: Towards a fuzzy-UML. In Proceed-
ings of the International Conference on Fuzzy Sets Theory and its
Applications (pp. 6667).
Sicilia, M. A., Garca, E., Aedo, I., & Daz, P. (2003). Representation of concept
specialization distance through resemblance relations. In J. M. Benitez, O.
Cordon, F. Hoffmann, & R. Roy (Eds.), Advances in Soft Computing
Engineering, Design and Manufacturing (Springer Engineering series,
pp. 173182). New York: Springer Verlag.
Sicilia, M. A., Garca, E., Daz, P., & Aedo, I. (2002). Extending relational data
access programming libraries for fuzziness: The fJDBC framework. In T.
Andreasen, A. Motro, H. Christiansen, & H. L. Larsen (Eds.), Proceed-
ings of the Flexible Query Answering Systems International Confer-
ence (Lecture Notes in Artificial Intelligence 2522, pp. 314328). New
York: Springer.
Sicilia, M. A., Garca, E., Daz, P., & Aedo, I. (2002b). Fuzziness in adaptive
hypermedia models. In J. Keller, & O. Nasraoui (Eds.), Proceedings of
the North American Fuzzy Information Processing Society Conference
Sicilia, M. A., Gutirrez, J. A., & Garca, E. (2002). Designing fuzzy relations in
orthogonal persistence object-oriented database engines. In F. J. Garijo, J.
C. Riquelme, & M. Toro (Eds.), Advances in artificial intelligence
(Lecture Notes in Computer Science 2527, pp. 243253). New York:
Springer.
Smets, P. (1997). Imperfect information: Imprecision-uncertainty. In A. Motro,
& P. Smets (Eds.), Uncertainty management in information systems:
From needs to solutions (pp. 225254). Norwell, MA: Kluwer Academic
Publishers.
Straccia, U. (2001). Reasoning within fuzzy description logics. International
Journal of Artificial Intelligence Research, 14, 137166.
Tarr, C. (1995). Identity indirection design pattern. In Proceedings of the
OOPSLA 95 workshop on design patterns for concurrent, parallel,
and distributed object-oriented systems.
Visnick, L. (2003). Clustering techniques in ObjectStore. Technical white
paper. Retrieved September 2003 from the World Wide Web: http://
www.objectstore.net
Yazici, A., & Cibiceli, D. (1999). An access structure for similarity-based fuzzy
databases. Information Sciences, 115(14), 137163.
Yazici, A., & Koyuncu, M. (1997). Fuzzy object-oriented database modeling
coupled with fuzzy logic. Fuzzy Sets and Systems 89(1), 126.
in the fuzzy object-oriented data model. Information Sciences, 108(14),
241260.
Zadeh, L. (1997). Toward a theory of fuzzy information granulation and its
centrality in human reasoning and fuzzy logic. Fuzzy Sets and Systems,
90(2), 111127.
Endnotes
1
http://java.sun.com/products/jdo/
2
http://www.db4o.com/
3
http://sodaquery.sourceforge.net/
4
http://www.objectstore.net/
5
http://eclipse.org/aspectj/
SECTION IV
Managing Fuzziness in Spatially Explicit Ecological Models 269
Chapter IX
An Object-Oriented
Approach to Managing
Fuzziness in Spatially
Explicit Ecological
Models Coupled to a
Geographic Database
Vincent B. Robinson
University of Toronto at Mississauga, Canada
Phil A. Graniero
University of Windsor, Canada
Abstract
This chapter uses a spatially explicit, individual-based ecological modeling
problem to illustrate an approach to managing fuzziness in spatial databases
that accommodates the use of nonfuzzy as well as fuzzy representations of
geographic databases. The approach taken here uses the Extensible
Component Objects for Constructing Observable Simulation Models (ECO-
COSM) system loosely coupled with geographic information systems. ECO-
COSM Probe objects flexibly express the contents of a spatial database
within the context of an individualized fuzzy schema. It affords the ability
270 Robinson & Graniero
to transform traditional nonfuzzy spatial data into fuzzy sets that capture
the uncertainty inherent in the data and models semantic structure. The
ecological modeling problem was used to illustrate how combining Probes
and ProbeWrappers with Agent objects affords a flexible means of handling
semantic variation and is an effective approach to utilizing heterogeneous
sources of spatial data.
Introduction
Progress in global connectivity has led to a situation where we now need to deal
with more heterogeneous information consisting of a broad variety of digital
spatial/geographical data and address operational sources, such as simulation
models, which create new data and information. The scale of the problem has
changed from just a few databases to thousands, perhaps millions, as geographi-
cal information resources. Such new resources are most often added indepen-
dently to the accessible set of resources without regard to the myriad end-uses
that may be applied to them (Mackay, 1999). Thus, spatially explicit information
resources may be used in many different contexts without regard for the
underlying uncertainties of the data, or their relationships to the semantics of the
problem domain (Robinson & Frank, 1985; Burrough & Frank, 1996). Although
such uncertainties in geographic databases have been recognized for decades,
it would be extraordinary to have institutional databases contain anything as
detailed as fuzzy membership values or other detailed measures of uncertainty
attached to objects or tuples.
Geographic databases with no explicitly recorded uncertainty measures are
commonly used as the basis for computationally intensive investigations of
complex ecological systems. One major approach that developed over the past
few decades is individual-based modeling (IBM) (Grimm, 1999; Lomnicki, 1999;
Bian, 2003). It is a computational approach to modeling a system through the
interaction of atomic models of each individual inhabiting the system. They
provide several advances over traditional ecosystem models. Foremost among
the advances is the fact that they discard the assumption that there is some
average, or mean, individual that adequately represents every individual in a
population. They also dispose of the assumption that significant interactions take
place evenly across populations. Such models are usually spatially explicit,
allowing interaction between individuals to occur over a wide range of space.
Importantly, they are able to represent the biological, physiological, and behav-
ioral distinctions seen in individuals in the real world. Because the individual is
the atomic unit, the simulation is able to take spatially explicit localized interac-
tions into account. Thus, a model of higher-order entities (e.g., populations)
emerges from the dynamics of individual interactions in much the same manner
as the higher-order phenomena observed in the real world (Anderson, 2002).
One such problem domain concerns the simulation of dispersal behavior of
animals across a landscape.
Previous research suggested that errors in dispersal parameters such as
misclassification of habitat suitability or incorrect estimation of how far a
disperser can travel can have larger consequences for predicting dispersal
success than do errors in landscape classification (Ruckelshaus et al., 1997).
There are crucial parameters in models of movement, such as perceptual range,
that cannot be precisely specified from field and experimental work (Mech &
Zollner, 2002). However, classification errors can still have significant conse-
quences. Ruckelshaus et al. (1997) showed that uncertainty in the model
parameters and in the underlying data stored in a database is a significant
problem to be addressed by ecological modeling efforts. This led to a detailed
suggestion that these problems be investigated by integrating fuzzy information
processing, computational simulation modeling, and spatial database issues with
intelligent systems research while maintaining a direct interplay with real-world
ecological research (Robinson, 2002).
The approach taken here uses the Extensible Component Objects for Construct-
ing Observable Simulation Models (ECO-COSM) system loosely coupled with
GRASS, an open-source GIS (Neteler & Mitasova, 2002) and ArcGIS
(McCoy
& Johnston, 2001). ECO-COSM is a simulation modeling framework used to
build spatially explicit ecological models (Graniero, 2001). Its component-based
structure allows a model design to evolve by replacing or adding individual model
components that change the overall behavior. The simulation framework pro-
vides a library of modular software objects that manage the structure of space
and time within a simulation model. It includes mechanisms to handle concurrent
activity among objects within the simulation. Objects that have embedded
assumptions about the spatial or temporal structure of the simulated world are
packaged into replaceable modules. In this illustrative example, the goal is to
simulate the detailed dispersal movements of a population of squirrels in a
spatially explicit manner, using behavioral modules that fuzzify the spatial
database contrasted to modules that do not.
This effort can be related to several themes in the fuzzy object-oriented database
literature. Like several others, we emphasize the importance of incorporating
some form of intelligence in the system (Bordogna & Chiesa, 2003; Koyuncu &
Yazici, 2003; Petry et al., 2003). As noted by Cross and Firat (2000), one
recognized stream of research in fuzzy databases focused on developing front-
end fuzzy querying capabilities on top of conventional database systems.
Sometimes those databases are object-oriented (Koyuncu & Yazici, 2003) and
sometimes they are conventional (Petry et al., 2003).
A query-directed approach was taken when examining the incorporation of
fuzziness in a system for managing spatially explicit ecohydrologic simulations,
namely the Knowledge-Based Land Information Manager and Simulation
(KBLIMS) system (Robinson, 2000). Like KBLIMS, we focus on a kind of
ecological simulation. However, we have taken a different, albeit object-
oriented, approach to handling fuzziness. Our use of agents with probes allows
us to address issues of fuzziness for individual-based modeling that could not be
adequately addressed by a system such as KBLIMS. The object database
described by Robinson (2000) had to be constructed, originally, from non-object-
based GIS database information. We address this practical issue by concentrat-
ing our object-oriented techniques within the ECO-COSM framework, thus
allowing straightforward access to heterogeneous spatial data sources that are
required for such simulations. By taking this loose-coupling approach, we differ
from those such as Koyuncu and Yazici (2003) who take a tightly coupled
approach to incorporating intelligence in a fuzzy object-oriented architecture.
Like Mackay (1999), we recognize a distinction between the ontology upon
which individual information sources are constructed and the ontology of an end-
user of the information sources. In our case, the end-user is an individual agent
that views its surrounding world in order to make a movement decision, and our
system must manage the queries and actions for populations of agents so that the
results of a simulation may be represented in a GIS database (e.g., for
visualization). Fuzzy database models have been defined for dealing with
imperfect information, either in the database (Robinson, 1988; Petry, 1996), in
the queries (Koyuncu & Yazici, 2003), or in both data and queries (Bordogna &
Chiesa, 2003; Morris, 2003). In a sense, we integrate all three approaches in the
work presented here. Agents use Probe objects to query a database. At this
stage, the database is assumed to be a conventional nonfuzzy, GIS database.
However, as we note later in the chapter, this approach is easily extensible to
accommodate coupling with an object-oriented database, fuzzy or crisp. There-
fore, imperfect information is dealt with at the query end by the Probe objects
that in effect allow each Agent an object-oriented database of its own upon
which the Agent poses queries to gather fuzzy information to support a decision
to either move to a new location or remain in place. The use of Probes allows us
to incorporate knowledge about not only the data and its fuzziness, but also about
the problem domain that is a function of the Agents role within the simulation
model. Thus, the combination of Probe and Agent allows semantics to be
modeled within each Agent. In the framework presented below, each Agent
class has an ontology of its own in which the semantics of its problem domain are
defined. However, the framework we laid out is flexible enough to be able to
incorporate additional object-oriented representation schemes.
We show how the use of ECO-COSMs Probe objects afford the ability to
express the contents of a spatial database within the context of a particular,
individualized fuzzy schema. A traditional crisp spatial database can be easily
transformed into fuzzy sets that capture the following:
1. Uncertainty inherent within the databases contents
2. Uncertainty inherent within the models semantic structure
3. Ambiguity or vagueness in the meaning of the databases contents that is
generated by the different semantic requirements of different agents and
the natural variability among individuals within an ecological population
The next section outlines the key concepts that link individual-based ecological
models, agent-based modeling, object-oriented design, and GIS databases, and
presents the primary challenges of representing fuzziness in such a complex
application domain. Then we present a conceptual overview of the squirrel
dispersal model we use as an illustrative example throughout this chapter. The
architecture of the modeling framework that was used to implement the model
is then described, and some of its key features that provide a solution to the
challenges of this problem domain are explained. The section on fuzzy spatial
relations and database query illustrates how context-specific fuzzy spatial
relations can be created ad hoc to constrain database queries. Then we present
an innovative way to add fuzzy information to a conventional, nonfuzzy GIS
database not only within a models context, but also within the variable context
of individual model objects. The next section demonstrates the utility of deriving
fuzzy information from a nonfuzzy GIS database at the individual level by
presenting differences in modeled squirrel dispersal according to individual
variation in perception of the environment and variation in the decision-making
process. We conclude the chapter with discussion of the strengths, limitations,
and future possibilities of this approach.
Objects, Agents, Geographic Databases,
and Ecological Models
Spatially explicit ecological models are used to study plausible connections
between landscape patterns and species viability (Ruckelshaus et al., 1997). In
an information-based approach to modeling the movement of animals, such
models may link behavioral ecology with landscape-level ecological processes
(Lima & Zollner, 1996). A computing environment that supports development of
spatially explicit individual-based modeling should support, among other require-
ments, the following: mobility, evaluating and interacting with other individuals,
and acquiring and maintaining knowledge about the surrounding landscape
(Westervelt, 2002). Because the behavior of an IBM emerges from individual
behaviors, a more comprehensive, flexible, and accurate model is obtained by
modeling the intelligence inherent in individual inhabitants via implementation as
computational agents (Anderson, 2002; Bian, 2000; Rickel et al., 1998; Westervelt,
2002; Westervelt & Hopkins, 1999). However, the geographic databases in
support of IBMs are usually not object-oriented databases but are repositories
of data that are queried for information by objects in an object-oriented model
environment. Once the data are served to the querying object, they are
incorporated within the object-oriented environment of the model (Westervelt &
Hopkins, 1999; Robinson, 2002). This hybrid object-oriented approach has
allowed the combination of GIS and agent-based models in a variety of
environmental and social contexts not limited to the modeling of animal move-
ments (Gimblett, 2002; Westervelt, 2002; Harper et al., 2002; Petry et al., 2002;
Leclercq et al., 1999; Graniero & Robinson, 2003).
An agent is a program that perceives its environment and acts upon it (Anderson,
2002; Russell & Norvig, 1995). In this modeling domain, it is information drawn
from a GIS database that will supply an agent with its perception of its
environment. The concept of this relationship is illustrated in Figure 1. The
implication of this relationship is that much of the general research related to
geographic, or spatial, databases and geographic information systems (GISs)
may provide relevant support for advancing the development of IBMs.
Figure 1. Conceptual illustration of major components of a spatially
explicit ecological model that focuses on movement behavior of individual
animals, e.g., natal disperal (Note the loosely coupled relationship with the
geographic information system.)
Agent-based approaches have begun to be applied to a number of problems using
GIS databases. Various uses of agent-based applications at the systems level
were reviewed by Li et al. (2001). These authors concluded with a suggestion
of a GeoAgent; that is, a mobile Agent that can enhance its abilities by using
Wrapper Agents in an assemble-on-demand fashion, and use geospatial knowl-
edge to deal with geospatial problems. Wrapper Agents are agents in their own
right, but they are also designed to provide additional layers of processing or
decision-making support to client Agents. The GeoAgent is particularly well-
suited to geospatial problems dealing with a WebGIS (Li et al., 2001). The
distributed environment afforded by the concept of the WebGIS has led to other
applications of agent-based techniques applied to GIS.
To support GIS interoperability, a semantic mediation approach was presented
that utilizes the object-oriented nature of agents and agent wrappers to resolve
semantic differences among systems across a Web-based environment (Leclercq
et al., 1999). Further research in the management of uncertainty in distributed
spatial information systems demonstrated the potential utility of fuzzy sets in
addressing issues of semantic heterogeneity. The suggested approach incorpo-
rates an object-oriented data model that supports the intelligent conflation of
uncertain geographic features in response to a spatial query (Cobb et al., 2000).
Exploiting advances in the representation and processing of fuzzy spatial
relations, this approach was extended to develop a system that retrieves, filters,
integrates/conflates, and validates geospatial data from multiple sources using
intelligent agents. It was argued that the use of intelligent agent technology in this
context offers advantages over the standard client-server architecture (Petry et
al., 2002) which is consistent with experiences in developing spatially explicit
ecological models that depend on spatially explicit databases.
Like other efforts (Anderson, 2002; Rickel et al., 1998; Westervelt & Hopkins,
1999; Harper et al., 2002), we approach the problem of building individual-based,
spatially explicit simulation models from an object-oriented perspective utilizing
spatial databases and mobile agents. It was suggested that the choice of this
problem domain provides the ability to investigate issues that integrate compu-
tational simulation modeling and spatial database issues with intelligent systems
research while maintaining a direct interplay with real-world ecological research
(Robinson, 2002). Figure 1 shows the major components of a spatially explicit
ecological model and the relationship between each of them. Of critical
importance in all the models is some representation of the landscape. Such
information is typically stored in a spatial database that feeds a simulation
model. Here the landscape is treated as a spatial database from which the animal
objects will receive information about their surroundings. A particular challenge
in individual-based ecological modeling is that the ways in which landscape data
are collected and stored in the spatial database are often different from the ways
in which the modeled agents should perceive the same landscape if they are
to remain operationally consistent with the modeled domain.
In this approach to modeling individual animals dispersing across a landscape,
animal objects pose spatial queries to the landscape to acquire information. Like
their counterparts in the real world, they are able to acquire information about the
landscape only within a certain distance determined by the animals perceptual
range (Mech & Zollner, 2002; Zollner, 2000) or finite range of vision (Fahse et
al., 1998). That information is then processed to determine the specifics of which
movement behavior to pursue. Ruckelshaus et al. (1997) suggested that errors
in dispersal parameters have much larger consequences for predicting dispersal
success than do errors in landscape classification. Their conclusions suggest that
uncertainty surrounding dispersal parameters is a significant problem that
ecological models and modelers must face.
The role of fuzzy sets in the representation of objects in geographic databases
for a variety of applications has received considerable attention. However, the
usual approach is to address the representation of uncertainty directly, in some
fashion, with the objects stored in a database (Cross & Firat, 2000; Yazici &
Akkaya, 2000) or as part of the query subsystem (Yazici & Akkaya, 2000;
Morris, 2003). Although appropriate in many applications, such approaches have
limitations when using geographic databases in the context of information-based
simulation modeling of complex environmental and ecological processes. The
simulation models have their own semantics that may be distinct from or
unknown to the database author, the user, or other models (or submodels). This
is especially relevant when trying to reconcile the semantics of the original
observations with the semantics of a simulation modeling domain. In addition,
most complex environmental modeling domains contain many models and
submodels that interact with one another, consequently generating semantic
errors (see Mackay & Robinson, 2000; Mackay, 1999). Furthermore, Robinson
(2000) showed that in an object-oriented database with a visual query system,
environmental simulation models may be embedded in the query or in the query
results. In this case, the user may have one set of semantics in mind that may,
or may not, be consistent with the semantics of the simulation models being used
to generate the answer to the query. In fact, there may be no reconciliation
process. That led to research into methods for modeling semantic agreement and
model self-evaluation (Mackay & Robinson, 2000; Mackay, 1999) and would
seem to justify embedding more intelligence into such systems. Therefore, we
use the concept of Probes in an object-oriented, agent-based system as a
practical means of addressing issues of fuzziness in spatially explicit data, while
at the same time maintaining the integrity of large, complex simulation projects.
From a modeling perspective, this approach can substantially reduce artifacts
caused by parameter uncertainty (Robinson & Graniero, in press).
Overview of the Dispersal Model
Now, we briefly present an overview of the natal dispersal model described in
more detail elsewhere (Robinson & Graniero, in press). In this model, the
dispersal movement process of each animal object consists of two major
decisions: movement and residence. If the object is to move from its current
location, then it must decide on a destination location. Once at the new location,
it will need to assess its surroundings to gather information that is used to make
a residence decision. In other words, has the animal object found a suitable
location, or will it need to continue the dispersal movement? In the following
sections we present a simple fuzzy decision-making process for each decision.
The decision model used is one in which relevant goals and constraints are
expressed in terms of fuzzy sets, and a decision is determined by an appropriate
aggregation of the fuzzy sets (Bellman & Zadeh, 1970).
Fundamental to either the movement or residence decision is information about
the surrounding landscape and conspecifics (other animals of the same species
already residing in nearby locations). This is usually confined to a perceptual
range (Mech & Zollner, 2002) or finite range of vision (Fahse et al., 1998).
Because an animals perceptual range represents its informational window onto
the larger landscape, it determines how much of the area surrounding the
individual it can perceive. In the spatially explicit simulation model outlined in
Figure 1, this is tantamount to the perceptual range being a spatial constraint on
a query to the GIS database.
The basic decision model used here is one in which relevant goals (G
M
) and
constraints (C
M
) are expressed in terms of fuzzy sets, and a decision is
determined by an appropriate aggregation of the fuzzy sets (Bellman & Zadeh,
1970; Klir & Yuan, 1995). More detailed discussion of the goals and constraints
is presented in Robinson and Graniero (in press). In the movement decision
model, the constraints consist of two major sets of locations. One set includes
those locations that are within the visible perceptual range (). The other
constraint relates to distance from conspecifics. Some species are attracted to
concentrations of conspecifics and others are not; locations under consideration
must satisfy the individuals tolerance of nearby conspecifics. The goal of an
individual is to find a location as near the edge of the perceptual range as possible
that is considered to be acceptable habitat and fits the set of constraints. Thus,
the goal set (G
M
) is a function of the spatial arrangement of habitat and what we
call dispersal imperative, the details of which are presented in Robinson and
Graniero (in press) .
On the first move, the degree to which each location within the perceptual range
falls in the decision set (D
M
) is defined by D
M
= C
M
G
M
. Movement is to the
location with the highest value for D
M
, i.e., {xX|
D
M
(x) = max D
M
}. However,
given the nature of the problem, it is possible that more than one location will have
the same maximum value. In that case, should there be ties, the first one in the
list is chosen (i.e., a lazy sufficing strategy). On moves beyond the first, there
is the question of directional bias. Based on previous work reported in the
ecological literature, a bias to move in the general direction of the last move is
incorporated in the decision set. In that case, should there be ties, a random
location among the candidate set (D
M
) is chosen (i.e., an exploratory sufficing
strategy).
Once the animal object has moved to a location, it must then decide whether it
is a location suitable for stopping its dispersal movement. Like the movement
decision model, this is one in which relevant goals (G
R
) and constraints (C
R
) are
expressed in terms of fuzzy sets, and a decision is determined by an appropriate
aggregation of the fuzzy sets (Bellman & Zadeh, 1970; Klir & Yuan, 1995). In
the residence decision model, the animal is constrained by whether or not its
current location is sufficiently spatially separated from conspecifics that a home
range can be established, while the goal is to have habitat of sufficient area.
Finally, a decision rule is applied to the decision set that leads to the animal taking
up residence at the location or attempting a move to another location. The details
of this decision model are presented in Robinson and Graniero (in press).
Because this work is focused on modeling natal dispersal, we use the residence
decision primarily as a stopping rule. Future elaborations will incorporate
exploratory movement so that the agent explores the vicinity around its destina-
tion and uses that information in a more sophisticated decision process than
presented here, to choose whether to establish a home range or not. However,
at the present, we simplified the decision to address just a few key criteria that
were suggested by the literature (Allen, 1987; Wolff, 1999).
Architecture of the ECO-COSM System
The computational environment presented in this section meets the two require-
ments that allow functioning intelligent agents in a simulation model. One
requirement is that a model of the agents behavior be constructed with a facility
for implementing the agents decision-making abilities. The second requirement
is that the simulated world functions as both an environment unto itself and a
virtual reality to the agents inhabiting it (Anderson, 2002). Our approach uses the
ECO-COSM system (Graniero, 2001) loosely coupled with the Geographic
Resources Analysis Support System (GRASS), an open-source GIS (Neteler &
Mitasova, 2002), and ArcGIS
(McCoy & Johnston, 2001). ECO-COSM is a

simulation-modeling framework used to build spatially explicit ecological models.
It has a component-based structure that permits a model design to evolve by
replacing and adding individual model components. The changed, or additional,
components, in turn, change the overall system behavior. The simulation
framework provides a library of modular software objects that manage the
structure of space and time within a simulation model and mediate the behavior
of model components within that structure. It includes mechanisms to handle
concurrent activity among objects within the simulation. Objects that have
embedded assumptions about the spatial or temporal structure of the simulated
world are thereby packaged into replaceable modules. For the spatially explicit
model builder, this feature provides superior control over simulation behavior.
The framework of each simulation program is comprised of a Simulation object
that contains the three interacting Scheduling, Modeling, and Instrumentation
subsystems (Figure 2). The Simulation object is used to describe the overall
structure and relationships between the components comprising the model. It
also looks after the mechanics of receiving external parameters, executing the
simulation, and managing the overhead required to acquire and release comput-
ing resources needed to run the program.
Scheduling Subsystem
Central to the operation of the system is the Scheduling subsystem. The Clock
and Schedule objects are the primary component objects of the Scheduling
subsystem. Each program is constrained to include only one instance of each.
Figure 2. Depicts the main subsystems that compose a Simulation object
(Note that in the World object the Agents cannot know about Layers except
through a Probe in the Instrumentation interface.)
Any object in the simulation program may access the Clocks time or add actions
to the Schedule. The Schedule object keeps track of all pending actions. It
decides which action should occur next and triggers that event. Currently,
scheduling is an event-driven structure, but discrete time step models may be
constructed by adding regularly occurring step actions that reschedule them-
selves every time step.
Modeling Subsystem
The Modeling subsystem provides the main components for constructing a
simulated world. The spatial and temporal structure of the world is defined by the
specific choice of object modules. The primary high-level object is the World,
Figure 3. World Layer and Grid object classes, components, relationships
(Note that BoundaryTopology is an abstract class that defines how Locations
outside the physical boundary of a Grid behave topologically by throwing
an exception or logically remapping the Location into the physical extent
of the Grid.)
(Adapted from Graniero, 2001)
which organizes the model components into collections of landscape objects
and individual agent objects. The landscape collection is made up of Layers
representing various attributes of the study areas extent. Layers are typically
represented using a Grid, though other spatial representations are possible.
Figure 3 illustrates the relationship between the World, Layer, and Grid objects
along with many of the methods attached to each object. Of particular relevance
to this work are methods such as getProbe() attached to the Layer objects and
getValueAt() attached to the Grid object. Grid is a specialized subclass of
Layer, and a World object is composed of one or more Layer objects.
Although Grids can be generated and their grid cell values populated entirely
within the simulation, Grids can also reference an external, abstracted GridSource
to set the grid geometry and populate the grid cell values. For example, an
EsriAsciiGridSource would import data layers exported from the ArcGIS
GRID module (McCoy & Johnston, 2001), or other GridSource specializations

might directly read and write native GIS formats.
Each Layer can have a StepRule that, when triggered by the Schedule, can
calculate a new state for each grid cell based on the current state of the cell and
its neighbors, as well as the state in other Layers at the corresponding location.
This allows the landscape to evolve following ecological processes operating in
the simulated ecosystem.
The individual agent collection is organized into one or more Populations,
each of which contains zero or more Agents. A Population is used to group
Agents that share common traits, with a separate Population for each type of
Agent. Populations can also be used to organize Agents that are of similar
type, but in different fundamental states. In addition, population-level monitoring
is useful for controlling the simulation Schedule. For example, it may be used to
add a TerminateAction when all Agents are in dead or home Popula-
tions, and there are no Agents left in the active Population.
An Agent is a model component that operates autonomously, located on the
landscape and obtaining information about other agents or the local landscape in
order to make decisions about changes in its own state, movement on the
landscape, or changes to the local state of one or more landscape Layers.
Access to information about other model components is controlled by Probe
objects described below. All Agent specializations share a similar data-access
and processing structure but differ in the specific details of their information-
processing and decision-making algorithms. Such differences are what can
evoke important differences in behavior across Agent types. Each individual
instance of a particular type of Agent shares the same decision-making
algorithm. Variations in individual responses are easily achieved by using
different values for fundamental parameters or by using different information-
gathering filters that modify the individuals perceptions of their surroundings.
Instrumentation Subsystem
The Instrumentation subsystem provides the information-access structures that
allow model components to discover the state of other components in a controlled
and safe fashion, ensuring the consistency and integrity of the source databases
and the models overall operating state. The ability to collect data from the
running model is made possible by the Probe/Probeable interface mechanism.
Many of the objects in the Modeling subsystem implement the Probeable
interface as well as fulfill their own modeling functions. Probes can only be
created by Probeable objects; a request is made to the target Probeable object
via its getProbe() method, specifying the desired type of Probe using a keyword.
Each type of Probe is designed to query a specific aspect of the Probeable
objects state. Whenever the Probes probe() method is invoked (e.g., by a
ProbeCommand on the Schedule, or by an Agent requiring current informa-
tion about another object), the Probeables appropriate private data access
method or database query is invoked. As an example, in order to access the data
within a Grid (which is a Probeable object), the client object must call the
Grids getProbe() method, and the Grid will return an appropriate Probe
object. When that Probes probe() method is invoked, it will invoke its target
Grids getValueAt() method using the Probes current Location as a param-
eter. The resulting value is passed to the Probe, which in turn queries the Grids
state at that Location and passes the result to the object using the Probe. Using
this structure, a Probeable object only exposes attributes that are deemed
public knowledge to external objects. In order to keep other attributes
inaccessible, it does not distribute Probe objects that expose those attributes. At
the same time, the Probeable object keeps the access mechanism for those
attributes hidden from public knowledge. All Probes simply respond to a
probe() method, and what happens within that method is kept opaque to the user.
This allows database sources, implementations, or architectures to change with
Figure 4. Structure of the Probe, Probeable, and ProbeWrapper relationship
(Adapted from Graniero, 2001)
no effect on the other model components. All Probes create read-only mecha-
nisms; external objects never have direct access to the Probeable objects state,
which means that they cannot accidentally change the object due to program-
ming errors.
ProbeWrappers extend the power of the Probe mechanism. A ProbeWrapper
is a specialized Probe that has another Probe embedded within it (Figure 4). A
ProbeWrapper is used to modify the pure result retrieved from a Probeable
object in some way (Figure 5). For example, the land-cover type observed at a
distance may be subject to random misclassification due to limits of perceptual
range. Alternatively, the states description scheme may be modified to suit the
purpose of the observer: the grid cell may be described as mature oak in the
land- cover Layer, but the observing Agent may perceive it as suitable location
for inhabiting.
Because ProbeWrappers are also Probes, an object (such as an Agent) can
use either pure Probes or Probes that are modified by ProbeWrappers
transparently, with no knowledge of the difference. By wrapping Probes in
slightly different ways for different individual Agents of a common type, it is
possible for the modeler to introduce variation in an individuals ability to perceive
the world, while using the same basic decision-making process. ProbeWrappers
may be nested as deeply as desired, so highly sophisticated perceptual filters
may be constructed. In addition, some specialized ProbeWrapper objects can
take the results of many nested Probes and combine their results in some
fashion, for example, returning the land-cover class that appears in the majority
of grid cells in a 55 window centered on the Probes Location. In this way,
it is possible to create views of the modeled world and its components at different
scales of observation, yet treat them all in the decision-making process as
identical, localized observations.
The Instrumentation subsystem also allows the modeler to instrument the
operating simulation model in order to monitor the models evolution and collect
data for later analysis. A Sampler is made up of a set of one or more Probes
Figure 5. When the client object invokes the Probes (in this case a
ProbeWrapper) probe() method, the call passes through to the embedded
Probe. The Probeable object returns the state value x to the Probe, which
passes the value on to the ProbeWrapper. The ProbeWrapper transforms
the value by some function F(x), and returns the transformed value to the
client.
that perform the actual queries about system state. The Sampler will typically
take the Probe results and format them in an organized fashion for output to a
file on disk, or for periodic output to the computer console to inform the user on
progress. Data files produced by a Sampler may be used in other separate
analysis programs to generate summary statistics from a large number of model
runs.
The Simulation object acts as the core engine of the simulation model. It
manages the interaction of the components in the three subsystems. The setup()
method structures the simulation appropriately for the desired model, attaches
any instrumentation desired, and acquires any necessary memory or file re-
sources required for the model. The run() method is simple: until the Schedule
is finished, it will trigger the next pending item on the Schedule. The teardown()
method releases any memory or file resources and gets ready for program
termination. The Simulation object may be instantiated and executed as an
independent, stand-alone program. It can also act as a pure object that is
contained in a larger program, such as a simulation experiment that executes
many instances of the Simulation object, each of which has slight variations in
its selection and configuration of model components.
Fuzzy Spatial Relations and Database
Query
A crucial concept implemented in many spatially explicit IBMs is the perceptual
range of individuals. In our application domain, an Agents perceptual range
represents its informational window on its surroundings. It determines how much
of the surrounding area an individual can perceive in terms of habitat quality and
presence of conspecifics. Thus, the perceptual range of an agent is equivalent
to specifying a fuzzy spatial relation that constrains the Agents view of the data
to a particular fuzzy region.
Let X = {x}be a finite set of locations bounded by the limits of the study area. Let
d
c
x
be the Euclidean distance from the location of the dispersing animal object,
c, to location x. P(x) is the fuzzy set defining the perceptual range for a single
individual. Thus, the support of P is
0+
P and can be used to limit the extent of data
operated on or retrieved from the spatial database used to support the model.
+
+ < < +

= =
c
x
c
x
c
x
c
x
p
d if
d if d
d if
x x P

/ 1 0
/ 1 1 ) (
1
) ( ) , ; (
(1)
Using lyrPerceptualRange.setValueAt(), the Layer object lyrPerceptualRange
is populated by membership values based on Equation (1). The support of P is
0+
P and is defined by regPerceptualRange as the set of locations for Agent,
where lyrPerceptualRange.valueAt() > 0. The statement
regPerceptualRange = FuzzySpatialOp.support( lyrPerceptualRange )
creates a Region object named regPerceptualRange that contains only those
locations where fuzzy membership in lyrPerceptualRange > 0.0. This
regPerceptualRange is what is referenced by the Agent as its individual
perceptual range at that particular location at that time step in the simulation.
Using regPerceptualRange to specify the Region defined by
0+
P, the support
of the perceptual range P for a particular Agent, and then limiting all further
processing and decision-making to regPerceptualRange provides three
benefits:
1. Semantics: We effectively shrunk the simulated world to align with the
Agents entire perceptual world for the duration of that Agents process-
ing and decision making. Although the Worlds extent may be larger than
that of the defined Region, the Agent has no way to access it without
changing its Location.
2. Performance: We ensure that we only iterate over layer locations that
require processing. This saves unnecessary processing in zero locations,
hence, boosting computational performance. In the illustrated case of a
squirrel dispersing within a National Recreation Area, this can be a
significant savings.
3. Object-oriented design integrity: By creating an object that defines the
processing region and controls access to that region, we guarantee that
other client objects do not accidentally process inappropriate locations.
Control over the processing region is handled by one object (namely, the
regPerceptualRange Region), whereas control over processing behavior
is handled by another object (namely, the CompSquirrel Agent). This
enforces clear lines of responsibility within the object model. Furthermore,
the method of determining the processing region can be modified by
changing the code that creates the region. This code is isolated from the
processing steps, which means that we can reduce the likelihood of
introducing erroneous programming artifacts (thereby increasing confi-
dence in model results), and it becomes easy to make variants in perceptual
definition for different agents. To do so, encapsulate the region definition
code in an interchangeable object, and the rest of the model is left
unchanged.
Once the perceptual range over an assumed flat surface, i.e., P, is specified, the
next step is to determine to what degree each location is within the visible
perceptual range. In other words, the influence of local topography is taken into
consideration. Let L: X [0,1] be the fuzzy set describing the degree to which
location x is visible from a particular squirrel. The membership function for
L
is
defined by Equation (2) as a closed-form triangular function, where los
c
x
is the
angle at which location x is visible from location c. It is based on the output style
of GRASS GIS (Neteler & Mitasova, 2002), where
90
is looking straight ahead,
below the line of sight is less than
90
, and above the line of sight is greater than
90. If the local terrain creates a physical obstruction to visibility between c and
x, then L = 0.
) 0 , , max(min ) ( ) , , ; (
= =

c
x
c
x
L
los los
x x L
(2)
The degree to which a cell is both visible and falls within the perceptual range
is defined by = PL. This operation takes into account the level plain
perceptual distance and the potential effect topography may have on the ability
of an object to perceive a location. To make it an efficient process, we need only
calculate the value of L for the locations that fall in
0+
P, thus
0+
P defines spatial
extent over which information from the spatial database is extracted and utilized
by the individual agent. In the code for defining an Agent, the statement
spots = regPerceptualRange.getAllLocations();
in effect limits the calculation of L to those locations (x), spots that fall within
the set . Subsequently, the membership values in lyrPerceptualRange and
lyrVisibility are combined using an aggregation operator to arrive at a spatial
object, lyrVisiblePerceptual, which is referenced by an Agent as its individual
visible perceptual range at that particular location at that time step in the
simulation.
Representing and Processing Fuzzy
Geographic Data
It is almost unheard of for spatially explicit ecological models to use GIS data that
are represented as fuzzy data in a fuzzy object-oriented database. In fact, there
are no practical cases known to us. Therefore, in our illustrative example, we will
first show how a World object is derived from an open-source GIS in such a way
that the Probes associated with an Agent are able to retrieve fuzzy information
from nonfuzzy database representations. Then we will discuss how this ap-
proach can be extended to account for other fuzzy representations that may be
relevant to this modeling domain.
The use of the Grid as a representation framework allows for a straightforward
interface to most common raster-based GIS databases. Figure 6 shows how the
specialized GridSource, called GrassAsciiGridSource, is used to create a
World object from GIS data layers stored in a GRASS GIS. The question arises
at this point whether we extract the raw, nonfuzzy, GIS data from GRASS and
manipulate it with ECO-COSM to fuzzify it in a manner meaningful to this
particular problem, or we preprocess the raw, nonfuzzy GIS data to produce
fuzzified GIS data that is then integrated as Layer objects into ECO-COSM. We
will first present the latter, as it was already demonstrated (Robinson & Graniero,
in press), and then we will discuss how the former may be implemented. For
illustrative purposes, let us consider a key component of the residence decision
model goal set. In our formulation, the quality of the habitat (LC) at the Agents
location and the area of the habitat patch (HA) are combined to define the goal
set H = LCHA (Robinson & Graniero, in press). Typically, habitat quality is
inferred from a layer where each grid cell is classified as particular land-cover
type. Taking from our previous study, the degree to which a land-cover type is
considered quality habitat for a gray squirrel is summarized in Equation (3). For
simplicity, the layers of land-cover type in the GRASS GIS were processed so
that each grid cell was coded with its membership value according to Equation
(3) before the data are loaded into the model.
prbLCHabitat = (SpatialProbe) world.getLayerProbe( "lchabitat", "spot" );
prbForestArea = (SpatialProbe) world.getLayerProbe( "forarea", "spot" );

Figure 6. Code example extracted from an Agent constructor, showing how
Probes are retrieved from the modeled World (The notation spot indicates
that the probe should access a single Location (i.e., grid cell), as opposed
to a moving window or other spatial construct
1.0 _
0.9 / _
0.75 _
( ) ( ) 0.0 _
0.0 _ _ _
0.0 , , ,
0.0
LC
if oak forest
if oak deciduous bottomland
if deciduous forest
LC if conifer forest
if early successional deciduous forest
if wetland pasture grassland agriculture
if water

= =

(3)
When the Agent must assess the habitat quality, it does so by requesting the
habitat quality membership value from its corresponding Probe, which acts as
its sensory interaction with the surrounding environment. Operationally, the
Probe queries the spatial database for the habitat quality membership value at
the Agents current location and returns that membership value to the Agent.
In addition to land cover, we use the size of an oak/deciduous forest patch as an
important factor in the residence decision. In Equation (4), we define a fuzzy set,
HA, to express the degree to which a location falls within the class of
minimum_habitat_area. The setting of the parameters HA and HA will vary
depending on the species being modeled. The area measurement is based on the
sizes of patches formed from contiguous cells that were classified as oak,
deciduous, or oak/deciduous bottomland. Let farea() be the area in hectares of
the oak/deciduous forest patch within which that location falls.
Cognitively, the Agent is assessing the size of the oak/deciduous forest patch;
operationally, it is calculating a new fuzzy membership based on forest patch
sizes encoded in a raster, which resulted from a clumping operation on the
same land-cover raster used for evaluating habitat quality. The minimum area
Probe accesses the value of the forest patch grid cell corresponding to the
Agents location and returns the value to the Agent, which then calculates the
fuzzy membership according to Equation (4).
Thus, each Agent has a number of SpatialProbes, that is, Probes that can each
be directed to a specified Location on a target Layer in order to collect
information from that specific Layer. Figure 6 shows how an Agent gets the
Spatial Probe prblLCHabitat for the Probeable Layer lchabitat, which
corresponds to LC above, and the Spatial Probe prbForestArea for the
Probeable Layer forarea, which corresponds to farea() in Equation (4).

= =
HA HA
HA
HA
farea
HA

) (
, 1 min , 0 max ) ( ) (
(4)
Recall from above that the land-cover type layer was preprocessed by the GIS;
lchabitat contains the fuzzy membership value, not the actual land-cover type.
This means that the SpatialProbe retrieves the fuzzy membership value and
passes it to Agent without any intermediate processing. Notice that in the case
of forarea, the Agent must do additional processing on the Probes result
before forming the goal set, as shown in Figure 7. In contrast to the preprocessed
fuzziness for LC, HA is fuzzified after crisp data are queried from the database.
In the earlier description of how an Agent uses a Probe to assess the local
habitat suitability, the entire land-cover raster was preprocessed according to
Equation (3), and the Probe accessed the grid cell values in the transformed
raster. This approach requires that each grid cell be converted only once rather
than every time the grid cell is considered by an Agent, thus streamlining the
computation. However, this restricts the flexibility for more sophisticated IBM
models, because it presumes that all Agents in the system perceive the habitat
quality of a particular land-cover type in the same way.
Different animal species, and perhaps even different individuals of the same
species, may map land-cover classes to slightly different membership values.
This necessitates the calculation of separate rasters for each remap equation,
which creates a much larger database. It also creates risk for database integrity
should the original land-cover map change and the remapped rasters not be
updated accordingly. Also, consider the case of a more intelligent agent that
evolves its perception of habitat quality as it gains experience over its lifetime.
Each change to the remap equation, i.e., each evolution in the Agents
perception, would require a recalculation of its corresponding habitat quality
raster, increasing the computational burden for the model.
_HabitatArea = Math.max(
0,
Math.min(
1,
(((Number)(prbForestArea.probe())).doubleValue() - _HAalpha)/(_HAbeta
- _HAalpha)
)
);
HabitatGoal = FuzzyOp.compensatoryIntersection(
((Number)(prbLCHabitat.probe())).doubleValue(),
_HabitatArea
);

Figure 7. Code example of Probes being used in the formulation of the
residence model goal set (The code operated within the Agents model
logic, perceiving its environment via the probe() methods of its associated
Probes.)
The ProbeWrapper provides the key mechanism with which to avoid these
problems. Recall that every instance of a ProbeWrapper implementation has
another Probe (possibly a ProbeWrapper) embedded within it. When the
ProbeWrappers probe() method is invoked, it, in turn, invokes the embedded
Probes probe() method. When it receives the embedded Probes result, the
ProbeWrapper may perform any kind of operation on it before passing it on as
its own result.
As such, the remap equation can be embedded within a habitat quality
ProbeWrapper that contains the following:
1. A Probe to access the land-cover database
2. Program logic to transform the land-cover query result according to the
remap table
3. An association with an Agent to receive and act on the result
Each Agent can be assigned a customized ProbeWrapper with a slightly
different remap table. For the case of minimum habitat area, a minimum habitat
area ProbeWrapper would contain the following:
1. A Probe to access the patch area database
2. Program logic that applies Equation (4) with the ProbeWrappers particu-
lar parameters
3. An association with an Agent to receive and act on the result
The transformation code shown in Figure 7 moves out of the Agent and into its
minimum habitat area ProbeWrapper.
By using the ProbeWrapper approach, the Agent directly perceives the
habitat quality of its current position according to its own value scheme, and all
model logic occurs within the universe of discourse defined in the fuzzy problem
domain. The Probe handles the mechanics of accessing the spatial database,
thereby insulating the Agents model logic from database-dependent program-
ming issues. The ProbeWrapper takes the query result from the Probe and
independently manages the transformation from the GIS relatively application-
neutral, crisp land-cover scheme to the Agents application-specific, fuzzy
perception of habitat quality. They may all access a single, shared land-cover
raster, and they may modify their perceptions of habitat quality at any time, with no risk
of compromising the database integrity or the behavioral integrity of other Agents.
There are many other ways in which the ProbeWrapper structure may be used
to control fuzzification of a spatial database. To illustrate, take an example based
on an early work demonstrating the use of fuzzy sets in the query of land-cover
databases (Robinson, 1988). Rather than simply retrieving a membership value
that was assigned to a land-cover class that represents the land-cover classs
degree of membership in the habitat set, let there be a similarity relation between
the land-cover types. This similarity relation can be a function of the degree of
confidence, or accuracy, felt to be likely at the location. If we assume that a
location has deciduous forest, then we would expect the similarity to be greater
with other forest types, especially oak. Consequently, the ProbeWrapper may
take this knowledge into consideration by assigning a final membership in the set
habitat based on a combination of the land-cover type at a location and its
similarity relation with other land-cover types. A more realistic approach would
be to take into consideration the surrounding cells as an additional information
channel to inform the ProbeWrapper how similar the location is to surrounding
locations. This can provide additional information to be used to estimate how well
the location fits in habitat. For example, a deciduous forest cell surrounded by
water, i.e., an island, would be poor habitat, whereas a deciduous forest cell
surrounded by deciduous forest might be judged high-quality habitat.
As another example, it is well known that no land-cover database is error free.
One long-standing problem has been the mixed pixel problem, where one grid cell
may have more than one land cover present but be forced by classification
methods to be classified as being in a single type of land cover (Robinson &
Thongs, 1986). The inherently fuzzy nature of land-cover classifications was
discussed by many researchers (Robinson & Frank, 1985; Robinson, 2002;
Matsakis et al., 2000; Cross & Firat, 2000; Hagen, 2003; Foody, 1996; Zhang &
Stuart, 2001). In ECO-COSM, ProbeWrappers can be used to implement an
information-processing function that applies a mixed pixel model to the underly-
ing land-cover data, allowing the Agent to evaluate how closely its current
location conforms to a particular land-cover type.
Because land-cover classification is accomplished using remote sensing or other
classification methods that can incorporate fuzziness, the process can be used
to generate fuzzy geographical objects (Matsakis et al., 2000; Cross & Firat,
2000; Foody, 1996). In a simple case that is analogous to the Semantic Import
model (Robinson, 1988), each cell would have a vector of membership values
indicating the degree to which it belonged to a particular land-cover type. Thus,
a ProbeWrapper can use a Probe to access that information and process it
before passing it to the Agent. For example, a vector might look like {0.8, 0.75,
0.66, 0.3, 0.2, 0.2, 0.0}. Now, what information is passed to the Agent? Perhaps
the whole vector is passed, which means that the Agent would need to have a
method of combining it with the function that determines how well the location
fits the set habitat. Notice in Equation (3) that each land-cover type is associated
with a membership in LC, and that in the vector {0.8, 0.75, 0.66, 0.3, 0.2, 0.2, 0.0},
associated with a single grid cell, provides information on the degree to which
that grid cell belongs to a particular land-cover class. Let
k
LC
(x) be the
membership in LC of land-cover type k while
k
GIS
(x) is the membership value
of grid cell x in land cover k. Thus, we have two vectors LC and GIS:
1 1
2 2
,
LC GIS
LC GIS
LC GIS
k k
LC GIS

j \ j \
, ( , (
, ( , (

, ( , (
, ( , (
, ( , (
( , ( ,
M M
that can be used to arrive at:
,
1
,
2
,
min( , )
LC GIS
LC GIS
LC GIS
k k
LC GIS
k
j \
, (
, (
, (
, (
, (
( ,
M .
Then take the maximum value from this vector to represent the degree to which
the grid cell falls in the set habitat. This formulation is able to be quickly
computed by a ProbeWrapper, and there would be no changes required in the
Agent code. In this manner, the Agent only sees the information presented
to it by the ProbeWrapper, and it focuses strictly on the behavioral elements of
the model and leaving the retrieval or derivation of the fuzzy value to the
ProbeWrapper. Thus, with this simple example, we illustrated how fuzziness
could be represented in a geographic database in two different ways and be used
by a ProbeWrapper to deliver meaningful fuzzy information to an Agent, with
no need to adjust the decision model of the Agent.
The other major informational component of the habitat portion of the residence
decision model is membership in HA, the minimum habitat area. It is a function
of the area of the forest patch. The forest patch is defined in a raster GIS as a
collection of grid cells contiguous with one another and of the same type. In a
vector representation, it would be a polygon. One approach is to represent a
fuzzy region, A, as composed of three parts: the core, the indeterminate
boundary, and the exterior. The indeterminate edge can further be decomposed
into the inside edge and the outside edge. If Z is a referential set of a finite number
of attributes and region A is a fuzzy subset defined in a two-dimensional space
4
2
over Z, the membership function of A can be defined as
A
: X Y Z [0,1].
Each point is assigned a membership value for attribute z, where z Z (Zhan,
1998). This suggests several possible approaches to representing forest area
patches in this problem domain. In the current illustrative example, the forest
patches are determined according to a crisp membership rule of adjacency, and
then the area is calculated, followed by calculation of HA for each grid cell. This
means that the Layer forarea is composed of grid cells, each of which is coded
with membership values that are a function of the area of the patch in which it
belongs. However, if forest patches are fuzzy regions, then this simplistic
approach would need to be changed. Because the grid cell is the atomic spatial
element in our GIS database, the upshot of this approach would be that each cell
(i.e., location) could be a member of more than one patch. In other words, a patch
object may share a location (cell) with another patch object. This problem has
been addressed elsewhere, so the problem is one that has received some
attention in the fuzzy database community (Yazici & Akkaya, 2000; Cross &
Firat, 2000; Cheng et al., 2002; Robinson, 2000; Bordogna & Chiesa, 2003). Of
course, this implies that when estimating the area of a patch for habitat selection
purposes, a location (cell) will contribute to the area of more than one patch.
Hence, fuzzy set theory effectively expands the conventional assumptions
regarding the total area extent of thematic map classes used in nonfuzzy
geographic databases (Ricotta & Avena, 1999). Due to this characteristic of
fuzzy regions, a number of approaches were suggested for estimating the area
of a fuzzy region (Ricotta & Avena, 1999; Schneider, 2001; Yuan & Shen, 2001).
There has been some work on modeling fuzzy regions that exploits the concept
of the -cut, some of which is explicitly linked to the query process (Morris, 2003;
Zhan, 1998; Schneider, 2001; Schneider, 2000). Previous work suggests that the
area of a fuzzy region might be computed as a weighted sum of the areas of all
-level regions (Schneider, 2001). Consider that if F
%
is a fuzzy region, i.e., a
forest patch, and consists of a finite collection {F
1
, ..., F
n
} of crisp -level
regions, then the area of F
%
can be computed as in Equation (5):
1 1

(5)
In this case, ( ) area F
%
is a real number that could be used in Equation (4),
corresponding to farea(). There is a problem with this straightforward linkage,
because it is entirely possible, given the nature of fuzzy region objects, that a
single cell will be associated with more than one fuzzy region with a membership
level greater than 0.0. In such a case, a simple rule can be used such that ( ) area F
%
is calculated for the fuzzy region that bestows the highest membership value on
cell .
An Agent obtains information about the area of forest patch through the Probe
prbForestArea that samples the Layer forarea that contains the value of
farea(). Likewise, it is possible to construct a Layer forarea that would be the
1 1
( , )
( ) ( )
i
i
n n
i i
i i
x y F
area F dxdy area F

= =
= =


value of ( ) area F
%
for that region to which cell belongs to the greatest degree.
Using Probes and ProbeWrappers, it would be possible to develop a multi-
Layer, multi-Probe approach so that all degrees of membership could be seen
by the Agent. This would necessitate the management of multiple Probes by a
ProbeWrapper. It is possible that the ProbeWrapper might then combine that
information before passing it to the Agent, which would still rely on something
like Equation (4) in its decision-making model. Thus, the decision model would
be kept essentially the same, but through the use of Probes and ProbeWrappers,
the values of inputs used in the decision model would be changed.
Modeling Semantic Variation
One of the rationales for the use of IBMs in ecological modeling is the ability to
explicitly model variations in individual behavior. However, this variation is
typically induced by resorting to drawing choices from a random distribution
rather than endeavoring to explicitly model variations in decision making among
individuals. The combination of fuzzy sets and object-oriented modeling allows
for the construction of variations in individual behaviors without resorting to
random draws.
Fuzzy sets research and related fields are exceptionally rich in methods for
aggregation and combination. It was shown elsewhere that differing schemes of
aggregation can be used to operationalize the movement and residence decision
models. Compensatory, noncompensatory, Yager, and crisp versions of the
decision models can be constructed (Robinson & Graniero, in press). Each
describes a particular class of Agents. In our simulation modeling effort, the
program SquirrelDispersal manages a simulation of squirrel dispersal. This
program not only creates the World object from layers drawn from the GRASS
GIS database but also activates different Agent classes. The classes
CompSquirrel, NoncompSquirrel, YagerSquirrel, and CrispSquirrel cor-
respond, respectively, to Agents using decision models based on compensatory,
noncompensatory, Yager, and crisp aggregation methods. Thus, each class
would view the landscape somewhat differently as a consequence of the
methods underlying the decision models. It is important to note that all the Agent
classes use Probes and ProbeWrappers to retrieve data from the same
database of Layers held within the World object. Figure 8 illustrates one
example of how the behaviors of four individual Agents varied according to the
decision models used to model the dispersal process. Thus, although the
information contained in the World object is the same, the way it is viewed and
processed by each Agent can lead to variations in spatial behavior.
Concluding Discussion
We used a spatially explicit individual-based model of a small mammal species
natal dispersal behavior across a real-world landscape to illustrate an object-
oriented approach to creating and managing operational fuzzy information in a
spatial database for use in a spatially explicit simulation model. Even though a
small subset of problems in spatially explicit ecological modeling was addressed
in this chapter, it highlights the breadth and depth of the problems that can be
usefully explored in this problem domain. The illustrative problems presented
here have demonstrated that this is a database and modeling domain rich in fuzzy
information-processing challenges. Hence, it is a scientific field of endeavor that
can benefit greatly from advances in fuzzy database modeling and application.
We would also argue that advances in the theoretical realm of fuzzy object-
oriented databases could result by devoting attention to the needs of this problem
domain.
One of the major consequences of our use of the ECO-COSM modeling
framework has been our demonstration of the utility of using Probe objects and
Figure 8. Resulting dispersal behavior of Agents with the same starting
location but using different fuzzy aggregation methods in the decision
model
ProbeWrappers to manage the interface between individual objects and the
spatial database. Combining the Probes and ProbeWrappers with Agent objects
is a promising avenue of research for using fuzzy sets in an object-oriented
environment to retrieve, manage, and process geographical data that may be
represented in a nonfuzzy or a fuzzy scheme. The ability to handle semantic
variations was demonstrated to be feasible using this approach, as each type of
Agent saw the same data but interpreted it differently as a function of
variations in the semantics of the underlying decision process of an Agent.
Progress in global connectivity has led to a situation in which we now need to deal
with more heterogeneous geographic information that may be spatially distrib-
uted in a Web-based environment. Because other research (Leclercq et al.,
1999; Cobb et al., 2000; Petry et al., 2002) has found the object-oriented, agent-
based approach to be effective, it is reasonable to suggest that the ECO-COSM
approach can be expanded to address problems of combining heterogeneous
geographic data from spatially distributed sources. The flexibility and power of
the Probe/ProbeWrapper concept could easily be extended so that mobile agents
are able to move about on a spatially distributed network to identify and assemble
required information and computational resources to support a large-scale IBM
effort.
Acknowledgments
Partial support in the form of operating research grants to each of the authors
from the Natural Sciences and Engineering Research Council (NSERC) of
Canada is gratefully acknowledged. We are especially grateful to Professor
Haluk Cetin and the Mid-America Remote Sensing Center (MARC) for
graciously providing the digital elevation and Kentucky GAP land-cover datasets.
Comments by anonymous reviewers improved the quality of this chapter.
References
Allen, A. W. (1987). Habitat suitability index models: Gray squirrel, revised
(United States Fish Wildlife Service Biological Report 82 10.135). Wash-
ington, D.C.: United States Department of the Interior.
Anderson, J. (2002). Providing a broad spectrum of agents in spatially explicit
simulation models: The Gensim approach. In H. R. Gimblett (Ed.), Inte-
grating geographic information systems and agent-based modeling
techniques for simulating social and ecological processes (pp. 2158).
Oxford: Oxford University Press.
Bellman, R. E., & Zadeh, L. A. (1970). Decision-making in a fuzzy environment.
Management Science, 17(4), 141164.
Bian, L. (2000). Object-oriented representation for modelling mobile objects in
an aquatic environment. International Journal of Geographical Infor-
mation Science, 14(7), 603623.
Bian, L. (2003). The representation of the environment in the context of
individual-based modeling. Ecological Modelling, 159(23), 279296.
Bordogna, G., & Chiesa, S. (2003). A fuzzy object-based data model for
imperfect spatial information integrating exact objects and fields. Interna-
tional Journal of Uncertainty Fuzziness and Knowledge-Based Sys-
tems, 11(1), 2341.
Burrough, P. A., & Frank, A. U. (1996). Geographic objects with indetermi-
nate boundaries. London: Taylor & Francis.
Cheng, T., Molenaar, M., & Lin, H. (2002). Formalizing fuzzy objects from
uncertain classification results. International Journal of Geographical
Information Science, 15(1), 2742.
Cobb, M., Foley, H., Petry, F., & Shaw, K. (2000). Uncertainty in distributed and
interoperable spatial information systems. In G. Bordogna, & G. Pasi
(Eds.), Recent issues on fuzzy databases (pp. 85108). Berlin: Springer-
Verlag.
systems. Fuzzy Sets and Systems, 113(1), 1936.
Fahse, L., Wissel, C., & Grimm, V. (1998). Reconciling classical and individual-
based approaches in theoretical population ecology: A protocol for extract-
ing population parameters from individual-based models. The American
Naturalist, 162(6), 838852.
Foody, G. M. (1996). Fuzzy modelling of vegetation from remotely sensed
imagery. Ecological Modelling, 85, 312.
Gimblett, H. R. (2002). Integrating geographic information systems and
agent-based modeling techniques for simulating social and ecological
processes. New York: Oxford University Press.
Graniero, P. A. (2001). The effect of spatiotemporal sampling strategies and
data acquisition accuracy on the characterization of dynamic ecologi-
cal systems and their behaviours. Ph.D. dissertation, University of
Toronto.
Graniero, P. A., & Robinson, V. B. (2003). A real-time adaptive sampling
method for field mapping in patchy, heterogeneous environments. Trans-
actions in GIS, 7(1), 3154.
Grimm, V. (1999). Ten years of individual-based modelling in ecology: What
have we learned and what could we learn in the future? Ecological
Modelling, 115, 129148.
Hagen, A. (2003). Fuzzy set approach to assessing similarity of categorical
maps. International Journal of Geographical Information Science,
17(3), 235249.
Harper, S. J., Westervelt, J. D., & Shapiro, A. -M. (2002). Modeling the
movements of cowbirds: Application towards management at the land-
scape scale. Natural Resource Modeling, 15(1), 111131.
Klir, G. J., & Yuan, B. (1995). Fuzzy sets and fuzzy logic: Theory and
applications. Upper Saddle River, NJ: Prentice-Hall.
database architecture. IEEE Transactions on Knowledge and Data
Engineering, 15(5), 11371154.
Leclercq, E., Benslimane, D., & Yetongnon, K. (1999). ISIS: A semantic
mediation model and an agent based architecture for GIS interoperability.
In Database Engineering and Applications, IDEAS 99 International
Symposium Proceedings (pp. 8791). Washington, D.C.: IEEE Press.
Li, Q., Huang, X., & Wu, S. (2001). Applications of agent techniques on GIS. In
Proceedings International Conferences on Info-Tech and Info-Net
(ICII) (pp. 238243). Washington, D.C.: IEEE Press.
Lima, S. L., & Zollner, P. A. (1996). Towards a behavioral ecology of ecological
landscapes. Trends in Ecology and Evolution, 11(3), 131135.
Lomnicki, A. (1999). Individual-based models and individual-based approach to
population ecology. Ecological Modelling, 115, 191198.
Mackay, D. S. (1999). Semantic integration of environmental models for
application to global information systems and decision-making. SIGMOD
Record, 28(1), 1319.
Mackay, D. S., & Robinson, V. B. (2000). A multiple criteria decision support
system for testing integrated environmental models. Fuzzy Sets and
Systems, 113, 5367.
Matsakis, P., Andrefouet, S., & Capolsini, P. (2000). Evaluation of fuzzy
partitions. Remote Sensing of Environment, 74, 516533.
McCoy, J., & Johnston, K. (2001). Using ArcGIS spatial analyst: GIS by
ESRI. Redlands, CA: Environmental Systems Research Institute.
Mech, S. G., & Zollner, P. A. (2002). Using body size to predict perceptual
range. Oikos, 98, 4752.
Morris, A. (2003). A framework for modeling uncertainty in spatial databases.
Transactions in GIS, 7(1), 83103.
Neteler, M., & Mitasova, H. (2002). Open source GIS: A GRASS GIS
approach. Boston, MA: Kluwer Academic Publishers.
Petry, F. E. (1996). Fuzzy databases, principles, and applications. Boston,
MA: Kluwer Academic Publishers.
Petry, F. E., Cobb, M. A., Ali, D., Angryk, R., Paprzycki, M., Rahimi, S., Wen,
L., & Yang, H. (2002). Fuzzy spatial relationships and mobile agent
technology in geospatial information systems. In P. Matsakis, & L. M.
Sztandera (Eds.), Applying soft computing in defining spatial relations
(pp. 121155). Heidelberg: Physica-Verlag.
Petry, F. E., Cobb, M. A., Wen, L., & Yang, H. (2003). Design of system for
managing fuzzy relationships for integration of spatial data in querying.
Fuzzy Sets and Systems, 140, 5173.
Rickel, B. W., Anderson, B., & Pope, R. (1998). Using fuzzy systems, object-
oriented programming, and GIS to evaluate wildlife habitat. AI Applica-
tions, 12(13), 3140.
Ricotta, C., & Avena, G. C. (1999). The influence of fuzzy set theory on the areal
extent of thematic map classes. International Journal of Remote Sens-
ing, 20(1), 201205.
Robinson, V. B. (1988). Some implications of fuzzy set theory applied to
geographic databases. Computers, Environment, and Urban Systems,
12(2), 8997.
Robinson, V. B. (2000). On fuzzy sets and the management of uncertainty in an
intelligent geographic information system. In G. Bordogna, & G. Pasi
(Eds.), Recent issues on fuzzy databases (pp. 109127). Berlin: Springer-
Verlag.
Robinson, V. B. (2002). Using fuzzy spatial relations to control movement
behavior of mobile objects in spatially explicit ecological models. In P.
Matsakis, & L. M. Sztandera (Eds.), Applying soft computing in defining
spatial relations (pp. 158178). Heidelberg: Physica-Verlag.
Robinson, V. B., & Frank, A. U. (1985). About different kinds of uncertainty in
collections of spatial data. In Proceedings of Seventh International
Symposium on Automated Cartography (Auto-Carto 7) (pp. 440450).
Bethesda, MD: American Society for Photogrammetry and Remote Sens-
ing and American Congress on Surveying and Mapping.
Robinson, V. B., & Graniero, P. A. (in press). Spatially explicit individual-based
ecological modeling with mobile fuzzy agents . In M. A. Cobb, F. E. Petry,
& V. B. Robinson (Eds.), Fuzzy modeling with spatial information for
geographic problems. Heidelberg: Springer.
Robinson, V. B., & Thongs, D. (1986). Fuzzy set theory applied to the mixed
pixel problem of multispectral landcover databases. In B. K. Opitz (Ed.),
Geographic information systems in government (pp. 871885). Hamp-
ton, VA: A. Deepak Publishing.
Ruckelshaus, M., Hartway, C., & Kareiva, P. (1997). Assessing the data
requirements of spatially explicit dispersal models. Conservation Biology,
11(6), 12981306.
Russell, S., & Norvig, P. (1995). Artificial intelligence: A modern approach.
Upper Saddle River, NJ: Prentice Hall.
Schneider, M. (2000). Metric operations on fuzzy spatial objects in databases.
In Proceedings of the Eighth ACM International Symposium on Ad-
vances in Geographic Information Systems (pp. 2126). New York:
ACM Press.
Schneider, M. (2001) Fuzzy topological predicates, their properties, and their
integration into query languages. In Proceedings of the Ninth ACM
International Symposium on Advances in Geographic Information
Systems (pp. 914). New York: ACM Press.
Westervelt, J. D. (2002). Geographic information systems and agent-based
modeling. In H. R. Gimblett (Ed.), Integrating geographic information
systems and agent-based modeling techniques for simulating social
and ecological processes (pp. 83103). Oxford: Oxford University Press.
Westervelt, J. D., & Hopkins, L. D. (1999). Modeling mobile individuals in
dynamic landscapes. International Journal of Geographical Informa-
tion Science, 13(3), 191208.
Wolff, J. O. (1999). Behavioral model systems. In G. W. Barrett, & J. D. Peles
(Eds.), Landscape ecology of small mammals (pp. 1126). New York:
Springer.
Yazici, A., & Akkaya, K. (2000). Conceptual modeling of geographic informa-
tion system. In G. Bordogna, & G. Pasi (Eds.), Recent issues on fuzzy
databases (pp. 129151). Berlin: Springer-Verlag.
Yuan, X., & Shen, Z. (2001). Notes on Fuzzy plane geometry I, II. Fuzzy Sets
and Systems, 121, 545547.
Zhan, F. B. (1998). Approximate analysis of binary topological relations between
geographic regions with indeterminate boundaries. Soft Computing, 2, 28
34.
Zhang, J., & Stuart, N. (2001). Fuzzy methods for categorical mapping with
image-based land cover data. International Journal of Geographical
Information Science, 15(2), 175195.
Zollner, P. A. (2000). Comparing the landscape level perceptual abilities of
forest sciurids in fragmented agricultural landscapes. Ecology, 80(3),
10191030.
Object-Oriented Publish/Subscribe for Modeling 301
Chapter X
Object-Oriented
Publish/Subscribe
for Modeling and
Processing Imperfect
Information
Haifeng Liu, University of Toronto, Canada
Hans Arno Jacobsen, University of Toronto, Canada
Abstract
In the publish/subscribe paradigm, information providers disseminate
publications to all consumers who expressed interest by registering
subscriptions with the publish/subscribe system. This paradigm has found
widespread applications, ranging from selective information dissemination
to network management. In all existing publish/subscribe systems, neither
subscriptions nor publications can capture uncertainty inherent to the
information underlying the application domain. However, in many situations,
knowledge of either specific subscriptions or publications is not available.
To address this problem, this chapter proposes a new object-oriented
302 Liu & Jacobsen
publish/subscribe model based on possibility theory and fuzzy set theory to
process imperfect information for expressing subscriptions, publications,
or both combined. Furthermore, the approximate publish/subscribe matching
problem based on fuzzy measures is defined, and the real-world A-ToPSS
system is described.
Introduction
A new data-processing paradigm publish/subscribe is becoming increas-
ingly popular for information dissemination applications. Publish/subscribe sys-
tems anonymously interconnect information providers with information consum-
ers in a distributed environment. Information providers publish information in the
form of publications, and information consumers subscribe their interests in the
form of subscriptions. The publish/subscribe system performs the matching task
and ensures the timely delivery of published events (a.k.a. notifications) to all
interested subscribers. Publish/subscribe has been well studied, and many
systems have been developed supporting this paradigm. Existing research
prototypes include, among others, Gryphon (Aguilera, 1999), LeSubscribe
(Fabret, 2001), and ToPSS (Liu, 2002); industrial strength systems include
various implementations of JMS (Happner, 2002; Monson-Haefel, 2000), the
CORBA Notification Service (OMG, 2002), and TIB/RV. All of these systems
are based on a crisp data model, which means that neither subscribers nor
publishers can express imperfect information in subscriptions and publications,
respectively. In this crisp model, subscriptions are evaluated to be true or false
for a given publication. Moreover, most of these systems do not expose a well-
structured subscription language model and publication data model.
However, in many situations, knowledge to specify subscriptions or publications
is not available. In these cases, uncertainty about the state of the world has to
be cast into the crisp data model that defines absolute limits. Moreover, for a user
of the publish/subscribe system, it may be simpler to describe the state of the
world with imperfect concepts we say, in an approximate manner.
In a selective information dissemination context, for instance, users may want to
submit subscriptions about an apartment with a constraint on rent that is cheap.
On the other hand, information providers may not have exact information for all
items published. In a secondhand market, a seller may not know the exact age
of a vase, so the seller can describe it as an old vase but cannot describe it with
an exact age. Temperature and humidity information collected by sensors is
often not precise but only correct within a certain error interval around the value
measured. It would be more appropriate to publish such imperfect information
rather than a wrong exact value if such publish/subscribe capabilities were
possible. Moreover, the underlying publish/subscribe system may need to store
the publications submitted for ulterior processing (i.e., for subscriptions that are
submitted to the system after publication submission). For these reasons, it is an
advantage to provide a publish/subscribe data model and a matching scheme that
allow for the expression and processing of imperfect information for both
subscriptions and publications.
In a publish/subscribe system, we are concerned with two major types of
imperfect information as defined in Smets (1997): imprecision and uncertainty.
Imprecision is related to the content of the statement. Publications and
subscriptions are statements about events and users interests. The expressions
may be incomplete, ambiguous, or not well-defined, but involve the content of the
statements. Thus, we refer to this type of imperfection in publications and
subscriptions as imprecision. Another type of imperfection exists in the matching
between publications and subscriptions, which we refer to as uncertainty.
Uncertainty concerns the state of knowledge about the relationship between the
world and the statement about the world. All publish/subscribe systems devel-
oped to date are based on the assumption that a match between a subscription
and a publication is either true or false. However, it is difficult to decide whether
a publication matches a subscription involving imprecision in the publication and
the subscription. We call the imperfection inherent to the matching problem
uncertainty. To illustrate the difference between imprecision and uncertainty,
consider these two examples: (1) Charles is a tall guy, and I am sure of it. (2)
Charles is six feet tall, but I am not sure of it. The height of Charles is imprecise
in the former case, but it is certain. In the latter statement, the height is precise
but uncertain.
To support imperfect information in publish/subscribe, we extend current
subscription and publication languages to incorporate the expression of impreci-
sion at the language level and develop a matching mechanism to support
processing of the extended language in publish/subscribe systems. To simplify
the terminology, we use approximate as a general term for all types of
imperfection involved. The extended subscriptions and publications supporting
imprecision will be called approximate subscriptions and publications. The
matching between approximate publications and approximate subscriptions is
called approximate matching. And the systems (or models) that support
approximate subscriptions/publications and implement approximate matching
are called approximate publish/subscribe systems (or models). Crisp is used
to refer to the traditional publish/subscribe systems.
304 Liu & Jacobsen
There are five interesting cases according to the different combinations of
subscriptions and publications with imprecision. These are as follows:
1. Crisp subscriptions and crisp publications (conventional publish/subscribe)
2. Approximate subscriptions and crisp publications
3. Crisp subscriptions and approximate publications
4. Approximate subscriptions and approximate publications
5. A combination of crisp and approximate constraints in subscriptions and
publications
Models 2 to 5 constitute new publish/subscribe system models not previously
investigated. All existing publish/subscribe systems are based on a crisp data
model that cannot process imprecision in publications or subscriptions. The
exception is A-ToPSS, the Approximate Matching-Based Toronto Publish/
Subscribe System (Liu, 2002, 2003, 2004a, 2004b) that introduced a subscription
language model and a publication data model that can express imprecise
information, such as cheap, large, and close to as constraints. In this
chapter, we discuss how to efficiently support all the above cases with the A-
ToPSS approach. This raises questions regarding matching between crisp
subscriptions and approximate publications, as well as matching between
approximate subscriptions and approximate publications. We propose a novel
object-oriented data model that can model all five cases described above. We
also define a matching mechanism that applies to the cases involving uncertain-
ties. Moreover, our approach follows an object-oriented design, treating sub-
scriptions as objects, publications as objects, and notifications as objects. The
latter entities are modeled by classes, thus supporting a well-structured design
that can be cleanly integrated with other object-oriented technologies (object-
oriented databases, distributed objects systems, etc.).
From a database point of view, publications in the publish/subscribe system can
be seen as data items (e.g., tuples, columns, or tables) in a database model, and
subscriptions closely resemble database queries. Publish/subscribe systems
solve a problem inverse to database query processing. Therefore, a well-
structured, object-oriented subscription language model and publication data
model will give rise to a clean integration of the publish/subscribe paradigm with
(object-oriented) database technology complementing database query evalua-
tion techniques with publish/subscribe query indexing techniques.
Information Dissemination with
Publish/Subscribe Systems
Publish/Subscribe Messaging Paradigm
The publish/subscribe paradigm is an interaction model that consists of informa-
tion providers who publish events to the system, and information consumers who
subscribe to specific interests in events within the system. The publish/subscribe
system matches events with subscriptions and ensures the timely notification of
subscribers upon event occurrence. Figure 1 shows the paradigm of publish/
subscribe systems.
Events are published in the form of publications, and users interests are
subscribed in the form of subscriptions. A publication describes the attributes of
a real-world artifact. A subscription defines a users interest through a list of
predicates, where each predicate is a constraint on an attribute domain. The
matching problem is to filter all satisfied subscriptions with constraints that are
matched by an incoming publication.
Overview of Publish/Subscribe Systems
Publish/subscribe has been well studied, and many systems were developed to
support this paradigm. The current publish/subscribe models can be classified
according to three main categories: information categorization, expressiveness
of the system, and treatment of data persistence.
Information Categorization
Publications and subscriptions are information in publish/subscribe systems.
There are three common approaches to grouping the information to help query
and search: channel-based, hierarchical, and type-based.
In the channel-based approach, information is grouped together under different
channels. A channel is a medium that carries information of related meaning. To
publish a message to a channel implies that this message will be broadcasted to
all subscribers who have subscribed to this channel, and vice versa. Newsgroups
are an example of the use of a channel-based publish/subscribe system. CORBA
event service, CORBA notification service, and Java Message Service (JMS)
also use the channel-based data model.
306 Liu & Jacobsen
The hierarchical approach uses a tree structure to classify information. This
approach is also refered to as topic-based from the expressiveness aspect. Each
node of the tree is a topic. The matching between publications and subscriptions
depends on the associated topic with the right content and the appropriate level
of granularity. The subject-based addressing technology of TIBCO Rendez-
vous allows publications and subscriptions to be categorized in an hierachical
fashion.
Expressiveness
Expressiveness refers to the ability of publishers and subscribers to express their
interests and events in the form of publications and subscriptions. A higher level
of expressiveness usually requires more computation power and a more ad-
vanced algorithm design.
The content-based data model provides more expressive power than other
models to filter publications and is more easily customized for individual
subscribers. The match between subscriptions and publications involves only the
content of the information without any other concerns. JMS lets subscribers
define message selectors, which are based on a subset of the SQL-92 conditional
expression syntax used in the WHERE clauses of SQL statements. CORBA
Notification Service takes a similar approach. From the aspect of information
transmission between subscribers and publishers through broker, content-based
routing is also an interesting research topic that improves the information
delivery efficiency.
Matched subscriptions
notifications
publications subscriptions
Subscribers Publishers
Matching
Filtering
Notification Engine
Figure 1. Publish/subscribe paradigm
Another publish/subscribe model that concerns event correlation uses a rule-
based approach (Chakravarthy, 1994; Samani, 1997) with which subscriptions
and publications are expressed as a composition of events. An event is a
happening of interest. It is a state transmission within the system, triggered
internally or externally. Brokers that can process composite events can make
publisher and subscriber processes easier to implement, because the event
correlation logic no longer needs to be handled programmatically.
Persistence
Persistence refers to the storage of data and states of publish/subscribe systems.
The ability of data and state persistence affects the behavior and efficiency of
systems. Most publish/subscribe systems are designed as memory-less messag-
ing systems that do not save the contents or states of publications. The limitation
of a memory-less model can be overcome by an event history persistence model,
where all messages received by the broker are persisted, forming an event
history. It is common to use conventional relational databases as offline storage
systems. However, traditional databases are not designed to process data
streams (continuous sequence of messages entering the broker) efficiently. The
STREAM project (Bahu, 2001) led by Standford University studies techniques
for special storage management and query processing for data streams.
A state-persistent publish/subscribe system stores the states of publications and
subscriptions. In such a system, a publication represents the state of some
objects of interest, and a subscription specifies a state that consitutes the
interests of the subscriber. The broker should only send notifications of a
publication to those subscribers whose subscriptions undergo state transitions in
the relationship with the publication. In other words, the broker component only
notifies subscribers of publications that enter the states specified by their
subscriptions. Hubert and Jacobsen (2003) proposed a subject space model for
state-persistent publish/subscribe systems. The objective of this data model is
the introduction of state-persistence into publish/subscribe systems and its
symmetrical treatment of data and query.
Type-Based Publish/Subscribe
The type-based publish/subscribe model was proposed as an alternative to
express publications, subscriptions, and their interactions. The type-based model
uses features of high-level, strongly typed programming languages, such as
strong typing, scoping, objects, classes, and inheritance to define matching
semantics between subscriptions and publications. In type-based publish/sub-
308 Liu & Jacobsen
scribe (Eugster, 2000), each topic is represented by a type definition. Subtopics
can be formed by inheriting from other topic class definitions. Also, a publication
can conform to more than one topic type by using multiple inheritances or
implementing multiple interfaces. Publications are considered as objects, which
are strongly typed, as known from many object-oriented languages. A subscriber
to publications of type T receives all publication objects that conform to T. This
model is used in several standard implementations, such as, in part, in the
CORBA Event Service, the CORBA Notification Service, and JMS.
Application Domains
Publish/subscribe is a messaging paradigm and an information management
methodology. It is desirable that the technologies developed for publish/sub-
scribe systems be generic and applicable in many application domains. Most
research studies on publish/subscribe systems use the stock-brokering applica-
tion as an example and the motivation of various algorithm designs of publish/
subscribe systems. The stock-brokering application is a typical example, be-
cause the roles of publishers, subscribers, and brokers are well defined.
However, there are many other application domains with information manage-
ment characteristics that satisfy the definition of the publish/subscribe paradigm.
Selective information dissemination is the class of distributed applications that
distributes information according to some restrictions or conditions. Conven-
tional Internet search engines, such as Google, can be modeled as publish/
subscribe systems. The search engine indexes many Web pages, and users can
execute search queries on the indexed pages. A more general form of data
subscription is exemplified by the emerging peer-to-peer file sharing and
publishing systems, such as Napster, Gnutella, Mojo Notion, Free Haven
(Dingledine, 2000), and Freenet (Clarke, 2000). These systems are forms of
publish/subscribe systems, where the broker component is physically distributed.
They attempt to solve the problems of scalable distributed data storage and
retrieval. A geographic information system is an example where an application
can possess the roles of multiple logical components of a publish/subscribe
system. The location information of mobile users is used to provide users with
relevant information based on their positions. There are many other applications
to which the publish/subscribe paradigm is applicable, such as workflow man-
agement (Cugola, 2001), intraenterprise process automation, supply chain man-
agement, enterprise application integration (Barrett, 1996), and network moni-
toring.
Subscription Language and Publication
Data Model
Object-Oriented Publish/Subscribe Model
In this section, we show one possible object-oriented design for the public
interfaces of a publish/subscribe system. Various design can be found in the
literature (OMG, 2001, 2002, 2004; Sun, 2002). Our design is simple and has
proven itself in the design of the ToPSS system (Liu, 2004).
Our design is based on two class hierarchies. One, the User class hierarchy to
represent publisher, subscriber and notifier. Second, the Information class
hierarchy to represent publications, subscriptions and notifications. These class
hierarchies are shown in Figure 2 and Figure 3, respectively.
The publisher class serves a publishing entity to submit information as publica-
tions to the system. The subscriber class serves a subscribing entity to submit
interest specifications (i.e., subscriptions) to the system. The notifier class
Figure 2. Definition of User class and its subclass

Subscriber
public int subscribe (subscription s)
public int unSubscribe (subscription s)

Publisher
public int publish (publication e)

Notifier
public int register_callback(subscription s, cb_info i )
public notification getNotification (subscription s)

User
string Username

public void login( )

310 Liu & Jacobsen
allows the programmer to design entities that can poll for notification information
or can register callbacks for notifications. These notifier objects can be different
from the actual subscriber objects. In this design, the notifier objects are tied to
a specific subscription by passing it to the system through the method call. In our
design, subscriptions are represented by their subscription objects; an alternative
may be to identify subscriptions, publications, and notifications with identifiers
that are passed back upon successful submission of these objects. The ToPSS
system uses that approach.
The Information class hierarchy in Figure 3 foresees subscriptions, publications
and notifications. Subscriptions define user interests through Boolean combina-
tions of predicates. The subscription type is determined by the predicate types.
We allow in our model the specification of crisp types, approximate types, and
Figure 3. Definition of Information class

Information
string Id
char type

Subscription
int numOfPred
predicate [ ] preds
float threshold

public Subscription ( )
public addPred(predicate p )

public float getThreshold ( )
public char getSubType ( )
public String getId ( )
public String toString ( )

Publication
int numOfAttr
attr_value [ ] av_pairs
float threshold

public Publication ( )
public addAttrValue (attr_value av )

public char getPubType ( )
public String getId ( )
public String toString ( )

Notification
Publication e
Subscription s
float matching_degree
int nofityType

public Notification ( )

public sentNotification(subscription s)
public getNotifyType( )

mixed types. In most systems, a subscription is a conjunct of predicates for which
a simple list suffices for the representation of the subscription. More complex
subscription formulae must be represented as a tree-structured expression.
Publications are defined as sets of attribute-value pairs. Notifications are
essentially publications. However, certain applications may only forward part of
the publication to the interested subscribers and filter out, combine or suppress
part of a publication. To enable this semantic, we define an additional notification
object.
Traditional (Crisp) Publish/Subscribe Model
In the crisp publish/subscribe system, users intents have to be cast into a certain
model with specific requirements. A subscription s is a Boolean formula (often
simply a conjunction) over predicates, each of which is a triple consisting of an
attribute, a value, and a relational operator (<, ,=,!=,,>). A publication (a.k.a.
event) is a set of attribute-value pairs, where each pair consists of an attribute
and a value. Any two pairs cannot have the same attribute. For example, {(car,
Honda Accord),(price, $30,000),(age, new)} is an event.
An attribute-value pair (a,v) matches a subscription predicate (a,v, relop) if
a=a and v relop v. For example, (price, $30,000) matches (price, $35,000,
) because they share the same attribute and $30,000 $35,000. An event e
satisfies a subscription s if every predicate in s is matched by some pair in e. For
example, the event {(car, Honda Accord),(price, $30,000),(age, new)}
satisfies the subscription s=(car, Honda Accord, =) and (price, $35,000, )
and (price, $20,000, ). The matching problem is as follows: Given an event
e and a set of subscription S find all subscriptions that are satisfied by e.
Publish/Subscribe Model Supporting Imperfect Information-Processing
Subscription Language Model
Subscriptions are Boolean formulae over predicates. Each predicate is a
constraint over a domain of values. A predicate is represented as (a
i
,
i
). a
i
is
the attribute of the predicate;
i
is a membership function (Zadah, 1989) that
represents a fuzzy constraint on the attribute. We use R to represent the Boolean
relation of predicates within one subscription (R can be intersection, union, or any
other relation), then a subscription is formalized as follows:
)) , ( , ), , ( ), , ((
2 2 1 1 m m
a a a R s L =
312 Liu & Jacobsen
For example, a student is looking for an apartment with constraints on price, size,
and age. Her subscription in natural language that specifies these constraints is:
S: (size is medium) AND (price is no more than 1500) AND (age is not very old)
The first predicate approximates the constraint using an uncertain notion
medium. A membership function is used to represent it:
80
80 70
70 50
50 40
40
0
10
70
1
1
10
40
0
) (
< <

< <
=
x
x
x
x
x
if
if
if
if
if
x
x
x
medium
The second predicate constrains the attribute price. It is defined in a crisp

manner. It can be represented by a characteristic function:
>
1500
1500
0
1
) (
1500
x
x
if
if
x
The third predicate constitutes another approximate predicate. We use the
following membership functions to represent the concept of old:
< <
=
80
80 40
40
1
40
40
1
0
) (
x
x
x
if
if
if
x
x
old
The three membership functions of this subscription are pictured in Figure 4.

In this subscription, the relation of these three predicates is conjunctive. All
predicates are linked by intersection (i.e., mathematical symbol is ). The
formalization of this subscription is:
) 1 , ( ) , ( ) , (
2
1500 old medium
age price size S =

Publication Data Model
Publications describe real-world artifacts or states of interest through a set of
attribute value pairs. For certain attributes, exact values may not be available.
In these cases, we use a possibility distribution to show the possibility that the
attribute has a given value. A publication is thus defined as a list of attribute
function pairs as follows:
)} , ( , ), , ( ), , {(
2 2 1 1 n n
a a a e L =
For example, an apartment advertised for rent may be described with a condition
of 60m
2
size and cheap rent. The first attribute is crisp, it defines a value for
attribute size. The second attribute is approximate. It is qualified as cheap, which
is represented by a possibility distribution function
cheap
.
cheap
defines the
possibility of each value in the domain of discourse (i.e., all admissible rent
values) as being cheap. The graphical representation of this event is shown in
Figure 5. Formally, this publication can be represented by a set of attribute
function pairs as follows:
)} , ( ), , {(
60 cheap
rent size P =
Figure 4. Membership function of predicates
Figure 5. Possibility distributions for publication
314 Liu & Jacobsen
where
> <
=
=
) 60 ( ) 60 (
60
0
1
) (
60
x x
x
if
if
x
and
1500
1500 1200
1200
0
300
1200
1
1
) (
< <
=
x
x
x
if
if
if
x
x
cheap
Matching in Publish/Subscribe
In the general approximate model, the subscription, the publication, or both may
refer to imperfect concepts. The truth value, true or false, is no longer sufficient
for representing the state of a match between a publication and a subscription.
We need a value between 0 and 1 to represent the degree of the match between
a subscription and each publication processed by the system. Individual subscrip-
tion can match a given publication, more or less, depending on this degree of
match.
Recall that subscriptions and publications are represented as follows:
)) , ( , ), , ( ), , ((
2 2 1 1 m m
a a a R s L =
)} , ( , ), , ( ), , {(
2 2 1 1 n n
a a a e L =
The semantics of matching subscriptions with publications is to measure the
possibility and necessity (Dubois, 1988) with which the publication satisfies the
expectation expressed by a subscription. Based on possibility theory, we use a
pair (
i
, N
i
) to denote the evaluation of the possibility and necessity of how the
publication satisfies each predicate i (i.e., the match between
i
and
i
in a
subscription). This measure is done by computing the intersection between
i
and
i
. In the following, we will discuss the match on the basis of predicate, then
introduce the matching problem for the whole subscription. The possibility and
necessity of a match between two functions
i
and
i
are computed by
)) ( ), ( min( sup x x
i i
D x
i

=
)) ( 1 ), ( max( inf x x N
i i
D x
i
=
A degree of possibility can be viewed as an upper probability bound. is not

enough for defining the matching degree between a publication and a subscrip-
tion since it is too coarse. We need its dual measure, necessity N, as a
complementarity to possibility. In Figure 6, we show several cases of the
possibility measure. In Figure 7, we show cases of the necessity measure.
With the possibility and necessity degrees for each predicate, the overall
matching degree for a subscription is evaluated using the s-norm or t-norm
function according to whether the relation of predicates contained in the
subscription is conjunctive or disjunctive. Usually we choose the maximum
operation as the t-norm function and the minimum as the s-norm. We generalize
the computation of the matching between a subscription and a publication into a
formula:
Figure 6. Cases of possibility measure
316 Liu & Jacobsen
)) ( , ), ( ), ( ( ) , , , (
2 2 1 1 2 1 n in i i n i
x x x R x x x S L L =
)} ( , ), ( ), ( { ) , , , (
2 2 1 1 2 1 n n n
x x x x x x e L L =
))) ( ), ( min( sup , )), ( ), ( min( (sup ) , , , (
1 1 1 1 2 1 n n n in i n S e
x x x x t x x x
i
L L =

))) ( ), ( max( inf , )), ( ), ( max( (inf ) , , , (
1 1 1 1 2 1 n n n in i n S e
x x x x t x x x N
i
L L =
.
We take x
1
, , x
n
as the attributes that are concerned by subscriptions and
publications, thus attribute names are omitted in the representations. eS
i
stands
for e matches S
i
. The t is the operator to treat relation R for overall evaluation.
For example, if the relation R of the predicates is conjunctive and we choose min
as the operation t, then the overall match degree of a subscription is the minimum
of the degrees of predicates this subscription contains.
With this matching semantic, a much larger number of subscriptions will match
than before, as all matches with degrees greater than 0 are prospective matching
candidates. Users perceptions of what constitutes a good match versus a
bad match will certainly differ. Furthermore, a large number of slightly
matching subscriptions, i.e., with a low degree of match, may not be useful,
because users may be overwhelmed with the number of matches returned. For
these reasons, the approximate matching model introduces two parameters to
control the tolerance of a match on a per-predicate basis for each subscription.
They are
and
N
, and they define users satisfaction of the possibility and
Figure 7. Cases of necessity measure
necessity of how their interests are matched. Users constraints are matched if
both the possibility and necessity degrees are larger than the thresholds
and
N
. The general representation of a subscription is modified to:
)) , , , ( , ), , , , ((
1 1
1 1
m m
N m m N
a a R sub

= L
Now we give the definition of matching between subscriptions and publications.
Given a set of subscriptions S and a publication p, the matching problem in the
approximate publish/subscribe system is to identify all sS such that s and p
match with degrees greater than the thresholds defined on s by any subscriber.
Core Engine Design
To demonstrate the viability of the approximate publish/subscribe model, the
Approximate Toronto Publish/Subscribe System (A-ToPSS) was implemented.
Next, we will describe the overall system architecture of A-ToPSS and features
supported by its Web interface. The functions of a control panel will be explained
to show how to adjust experimental values and monitor the behavior of the
system.
System Architecture
The main challenge in applying publish/subscribe systems to real-world applica-
tions lies in the design of efficient matching algorithms that exhibit scalability. At
Internet-scale, such a system has to be able to process millions of subscriptions
and react to thousands of publications. The A-ToPSS is implemented based on
this consideration. Figure 8 shows the architecture of A-ToPSS. Publishers and
subscribers send requests through a Web server (e.g., Apache) to the system.
The requests include personal information registration, subscribing their interests
and publishing data information. Subscriptions and publications are processed by
a matching engine. At the same time, all of the users information passes through
a script engine [e.g., PHP, JavaServer Pages(JSP), or Meta-HTML, etc.],
and is stored in a database. The matching engine matches publications against
subscriptions and returns the matched subscriptions to a notification engine. The
pervasive notification engine sends different types of notifications (e.g., e-mail,
ICQ, TCP/UDP, etc.) to the subscribers according to their requests.
318 Liu & Jacobsen
Web Interface
A-ToPSS provides a Web interface for users to interact with the system. The
interactive user interface is implemented by Meta-HTML Web programming
language. Meta-HTML is a powerful, extensible server-side programming
language specifically designed for working on the World Wide Web. It resembles
a hybrid of HTML and Lisp languages and has a huge existing function library,
including supports for sockets, image creation, perl, GNU plot, etc. It is
extensible in both Meta-HTML and other languages (C, etc.).
A-ToPSS offers four classes of normal operations: registration, subscribing,
publishing, and notification. The first time a user visits the Web interface,
registration is required to access the information resource. A user needs to
create an ID and set a password. Personal information such as name and address
is optional. However, the contact information relevant to the notification must be
provided in order to successfully receive notifications. For example, e-mail
address must be provided by the user if the user wants to receive notifications
via e-mail. These are administrative operations, which are common to most Web
applications. Next, we will describe features specific to publish/subscribe
systems.
For simplicity, we will explain the operations for subscribing as an illustration.
Operations for publishing are similar, and we will not elaborate here. There are
two types of users in the system: administrators and regular users. Only
administrators have the privilege of creating new subscription types, editing the
Figure 8. Overall architecture of publish/subscribe system
existing ones, or deleting them. Subscription types are templates for subscrip-
tions. These templates specify the number of predicates and whether an attribute
accepts crisp or approximate values. Before the modification or deletion of a
subscription type, the system will check whether any subscription is defined
under this type. Subscription types can only be edited when no subscriptions are
defined under them.
The user-level operations on subscriptions are designed for typical users.
Subscribers can add new subscriptions, edit them, or delete the subscriptions
they previously defined. When adding a new subscription, the user first chooses
a type, and then our system will ask users to input corresponding information
according to the requirements specified by the subscription type. For crisp subscrip-
tions, users need to provide attribute names, operators (e.g., >, <, =, , and ), and
values (e.g., integers, floats, strings, etc.). For approximate subscriptions, it is
more complicated. In addition to attribute names, users need to provide the
number of approximate constraints for each attribute. For example, the price
attribute may have three approximate constraints, which are expensive,
reasonable, and cheap. For the representation of each constraint, the Web
interface provides a trapezoidal membership function where the default values
are set with public common sense. The Web interface also gives users flexibility
in adapting the membership according to their specifications. A user chooses
among a family of functions to represent the imperfect information and set the
parameters. Figure 9 shows a screen shot of the subscription entry panel of our
system, where a user can view and adapt the membership function representing
his or her predicate.
Figure 9. Power users interface for defining approximate subscriptions
320 Liu & Jacobsen
After users submit subscriptions and publications, their information will be stored
in a database and transmitted to the matching engine at the same time to be
processed. After the matching, matched subscriptions are sent back to the Web
interface and stored in the database. For the moment, A-ToPSS supports
notification only by a pull model. When a user clicks the notification button, the
results of matched publications for subscriptions will be displayed on the Web.
The user can browse the information through a link to the publication that
matches his or her subscription. If any subscription or publication is deleted, the
match related to it will be broken and will not be sent back to the user.
Control and Monitoring Experiments
There are many variables, such as users satisfaction thresholds and publication
rates, that may affect system behavior. In order to illustrate the effects of these
parameters on the performance of the system, we developed a control panel for
adjusting the values of system parameters and a monitoring panel for displaying
system metric and observing the system behavior in real time. Both the control
panel and the monitoring panel are written as Java applets.
To demonstrate the differences between the crisp and approximate publish/
subscribe models, for each model we deploy an experiment control panel (a Java
applet) where users can manage the change of parameters, and a monitoring
panel (a Java applet) that observes and displays system metrics. Figure 10
displays a screen shot for part of the control panel.
On the control panel, users can adjust the following parameters (for crisp and
approximate models): rate of subscription generation, rate of publication genera-
Figure 10. Control panel
tion, rate of subscription deletion, rate of publication deletion, and thresholds of
users satisfaction.
Because the number of predicates and subscriptions in the system is large, it is
difficult to control the thresholds for each predicate or subscription. In the control
panel, we use the one pair of thresholds for all subscriptions to check their overall
matching degrees. The control of the representation of membership functions is
implemented in the normal system operations part. Users can choose a form
from a function family and adjust the shape of the function according to their own
specifications. The effect of the representation of functions on the number of
matched subscriptions is still in progress.
On the monitoring panels, the following metrics are observed and displayed:
subscription loading time, matching time, number of matched predicates, and
number of matched subscriptions. These metrics are taken at monitoring and
control points, as indicated in Figure 10. This part aims at experimenting with the
matching model to demonstrate and exploring its degrees of freedom. We can
see that with the increase of the subscription thresholds, the number of
matched subscriptions decreases, as we expect. Figure 11 shows the moni-
toring panel.
Evaluation
The performance is evaluated with respect to time and memory to confirm the
efficiency of the algorithms and compare the differences between a crisp
publish/subscribe model and an approximate model. Experiments are processed
under various subscription and publication workloads.
Figure 11. Monitoring panel
322 Liu & Jacobsen
Performance Evaluation
To evaluate performance metrics, the following metrics are considered: sub-
scription loading time, overall system throughput, and used memory. Time
measurements are taken in milliseconds and memory measurements in KB.
In Figure 12, we can see that there is a trade off between the loading time and
matching time. Spending more time to load subscriptions in a good organization
will decrease the matching in evaluation against event coming. In real-world
applications, most subscriptions are static (i.e., they are stored in the system for
a long time), and therefore, the matching time is more important than the loading
time. Moreover, because the publication rate is usually high, it is more important
to have a fast matching algorithm that responds in a very short time. In the
memory comparison, the char-wise algorithm uses less memory than the float-
wise algorithm due to the space saved by using 1 byte chars instead of 4 byte
floats.
Comparison between the Crisp and Approximate Models
There are several properties unique to our publish/subscribe model with uncer-
tainties, such as the expression of predicates, the truth value, and the possibilities.
Here the differences beetween crisp and approximate publish/subscribe models
Figure 12. Performance evaluation

are compared in two scenarios. In one scenario, the type of publication is fixed,
and we vary the types of subscriptions and thresholds to compare crisp matching
and approximate matching. The other scenario is the opposite of the first.
Table 1 shows the different numbers of matched subscriptions when a fixed
publication is published to the system and matched against various types of
subscriptions with different -cuts. ( is used as the thresholds for possibilities
and necessities.) For each subscription type, the number of matches decreased
with the increase of -cut values, which displayed the threshold effect of . With
the same , the pessimistic case resulted in the largest number of matches, and
the optimistic case resulted in the fewest matches. The approximate case and the
middle case had almost the same results, because the less restrictive the
subscription, the higher the probability of being matched.
Table 2 shows the numbers of matched subscriptions for different types of
publications when the subscription type is fixed. The graphical explanation is
shown in Figure 13. When = 0, the approximate publication returned the largest
number of matches, and the point type returned the least number of matches.
This happened because the value of the approximate publication has a wider
domain, and thus, there is a higher possibility that subscriptions constraints are
matched. However, with higher values of , the results reversed: the approxi-
mate publication matched a very small number of subscriptions, while the point
Table 1. Comparison of the number of matched subscriptions for various
subsciption types (Publication type is approximate; the number of
subscriptions is 70,000; and the number of publications is 10.)
Subscription type = 0 = 0.5 = 1
Approximate 4628 184 7
Pessimistic 4628 804 281
Middle 4438 184 39
Optimistic 3763 47 7
Table 2. Comparison of the number of matched subscriptions for various
publication types (Subscription type is approximate; the number of
subscriptions is 70,000; and the number of publications is 10.)
Publication type = 0 = 0.5 = 1
Approximate 4628 184 7
Interval 3720 474 170
Point 2960 1932 868
324 Liu & Jacobsen
type matched a larger number of subscriptions. This phenomenon can be
explained by the intuitive interpretation of possibility and necessity definition.
Compared to the point type publication, the approximate publications have a
wider domain of possible values for each attribute. Though there is a higher
possibility that the publication satisfies the predicate constraint, it is also more
likely for the publication to intersect with the complementary region of subscrip-
tions, in which case the necessity degree of match will be 0. Therefore, the
necessity threshold cannot be reached.
Related Work
Industry Standards
There have been a number of standardization efforts on middleware architec-
tures and distributed system interfaces to promote interoperability. The Common
Object Request Broker Architecture (CORBA) is a middleware architecture
standardized by the Object Management Group (OMG). The CORBA Event
Service (OMG, 2001) and Notification Services specifications (OMG, 2002)
augment the CORBA middleware platform with event-based messaging capa-
Figure 13. Number of matches for different publication types
bilities. The Java Message Service (JMS) is the standard Java API for message-
oriented middleware proposed by Sun Microsystems to add messaging integra-
tion capabilities into the J2EE platform.
The CORBA Event Service specification defines an indirect channel-based
event transport for distributed object frameworks. An event channel decouples
event suppliers and consumers. Suppliers generate events and place them onto
a channel. Consumers obtain events from the channel. Two serious limitations
of the Event Service Specification are that it only supports limited event-filtering
capabilities, and it cannot be configured to support different qualities of service.
Most Event Service implementations deliver all events that are sent to a
particular channel to all consumers connected to that channel on a best-effort
basis.
A primary goal of the Notification Service is to enhance the Event Service by
introducing the concepts of event filtering and quality of service specifications.
Clients of the Notification Service can subscribe to events by associating filter
objects with the proxies through which the clients communicate with event
channels. These filter objects encapsulate specific constraints on the events to
be delivered to the client. Furthermore, the Notification Service enables each
channel, each connection, and each message to be configured to support the
desired quality of service with respect to delivery guarantees, event aging
characteristics, and event priorities.
The JMS is an API for enterprise messaging created by Sun Microsystems. JMS
is not a messaging system. It is an abstraction of the interfaces and classes
needed by messaging clients when communicating with messaging systems.
JMS provides publish/subscribe and point-to-point messaging models. Under the
JMS publish/subscribe model, publishers can send messages to many consumers
through a virtual channel called a topic. All messages addressed to a topic are
delivered to all the topics subscribers. The message delivery is push-based, and
no polling is required. The point-to-point messaging model uses queues to store
and forward messages from suppliers to consumers. A given queue may have
multiple receivers, but only one receiver may consume each message. It is a one-
to-one communication model.
Continuous Queries
Continuous queries are issued once and are logically run continuously over a
database. Sometimes they are referred to as queries for future data, because
data included in the result set may not exist at the time when the query was
created, but will be created in the future. Traditional one-time queries, in
contrast, run only once to completion and return a result based on the current data
326 Liu & Jacobsen
sets. The notion of continuous queries is similar to subscriptions in publish/
subscribe systems. A publish/subscribe system will continuously evaluate a
subscription against the new incoming publication stream, until the subscription
is removed from the system.
Two research projects, Open CQ (Liu, 1999) and NiagaraCQ (Chen, 2000),
support continuous queries for monitoring persistent datasets spread over a
wide-area network. Open CQ uses a query processing algorithm based on
incremental view maintenance. NiagaraCQ addresses scalability in number of
queries by proposing techniques for grouping continuous queries for efficient
evaluation. STREAM (Stanford stream data management) is a research project
at Stanford that focuses on query processing of continuous queries over data
streams. It provides a general and flexible architecture for query processing in
the presence of data streams.
Database Trigger Technology
The study on active databases and database triggers are relevant to continuous
queries. Triggers are event-condition-action rules that are used to monitor events
and conditions in databases, and to execute actions automatically when specific
situations are detected. Wolski et al. (1998) proposed a fuzzy trigger to
incorporate imprecise reasoning in active databases. The rules that control the
eventconditionaction are modeled by fuzzy membership functions. This work
proposes two trigger models: C-fuzzy trigger and CA-fuzzy trigger. The C-fuzzy
trigger involves fuzzy inference only in the process of evaluation of the condition.
If actions are also expressed in fuzzy terms and integrated with the condition
part, it leads to the CA-fuzzy trigger.
Other Publish/Subscribe Research
Much work has been devoted to developing publish/subscribe systems and event
notification services such as Gryphon (Aguilera, 1999), LeSubscribe (Fabret,
2001), and ToPSS (Liu, 2002). Industrial strength systems include various
implementations of JMS, the CORBA Notification Service, and TIB/RV.
Common to all current systems is the crisp matching semantic neither
subscriptions nor publications can express uncertain information, and a match is
either established or not. These systems are different in the subscription
language and publication data model they offer and algorithms performing the
matching task.
LeSubscribe aims at publish/subscribe support for Web-based applications. It
focuses on the algorithmic efficiency in supporting millions of subscriptions and
high event-processing rates. The language and data models are based on an
LDAP-like semistructured data model for expressing subscriptions and publica-
tions. In this system, a subscription is a conjunction of predicates, each of which
is a triplet (attribute, operator, value). Supported relational operators include <,
, , , >. This system supports both push- and pull-based information dissemi-
nation. The matching engine of LeSubscribe falls within the class of two-step
matching algorithms a predicate matching step and a subscription evaluation
step. In the first step, all predicates are matched against the publication. In the
second step, subscriptions are evaluated based on the set of matched predicates.
Instead of two-step matching algorithms, Gryphon uses a tree-based data
structure to index subscriptions, which leads to another category of matching
algorithms. In Gryphon, all subscriptions are preprocessed into a tree where each
non-leaf node is a test for one attribute, and the edges derived from that node
represent different results. During matching, the incoming publication goes down
through the branch it matches until it arrives at the leaf nodes containing the
matched subscriptions.
Another approach using a tree-based algorithm is binary decision diagrams
(BDDs) (Compailla, 2001). In this model, each subscription is a Boolean function
represented by a BDD. This approach is distinguished in two aspects: one is that
it can support any Boolean formula; the other is that overlapping subscription
expressions are operated only once if the variable ordering was chosen properly.
Elvin (Segall, 1997) is a content-based notification/messaging service that
targets application integration environments and monitoring of distributed sys-
tems. Elvin supports a more expressive subscription language that is created as
strings. Subscriptions contain powerful string-processing functions and opera-
tors on built-in data types covering integer, string, and Boolean relations. In
addition to the traditional comparison operators like <, , =, , >, , Elvin supports
operations such as matching extended regular expressions with strings.
SIENA (Scalable Internet Event Notification Architectures) (Carzaniga, 1998)
comprises another example of a publish/subscribe event-notification service that
presents a similar publication and subscription language model. This research
project is based on a content-based networking service and focuses on the
routing of subscriptions and publications in a distributed environment so that both
services notification selection (i.e., determining which publication matches
which subscription) and notification delivery (i.e., distributing matching notifica-
tions from publishers to subscribers) are balanced. The advantage of this
infrastructure is that it maximizes expressiveness in the selection mechanism
without sacrificing scalability in the delivery mechanism.
The last research project we introduce here is READY (Gruber, 2000), led by
the AT&T research lab. READY is an implementation of the CORBA Notifi-
cation Service. The specific features of READY, which are not offered by
existing commercial products, include information consumer specifications that
328 Liu & Jacobsen
can be matched over single and compound event patterns, and quality of service
(QoS) that is managed by providing ordering properties for event delivery.
The Toronto Publish/Subscribe System Family
Recently, the publish/subscribe paradigm has gained wide-spread interest for
modeling applications like selective information dissemination services and
location based services. The Middleware System Research Group at the
University of Toronto is working on the Toronto Publish/Subscribe System
family of research projects in this context including A-ToPSS (Liu, 2002; 2003;
2004 ICDE; 2004 VLDB), S-ToPSS (Petrovic, 2003), L-ToPSS (Burcea, 2003;
Xu 2004), persistent-ToPSS (Leung, 2003), M-ToPSS, and P2P-ToPSS (Tam,
2003).
S-ToPSS (Semantic Toronto Publish/Subscribe System) is a semantic-aware
system where the matching between subscriptions and publications is based on
the semantic of terms rather than on the syntax. For example, publications about
automobiles may be sent to subscribers who are interested in vehicles. S-
ToPSS uses three approaches to support semantic matching capabilities. The
first one is the use of synonyms. The second one uses a concept hierarchy which
provides the relationships (specialization and generalization) between attributes
and values. The third approach defines a set of mapping functions that allow
arbitrary relationships between schemas and attribute values. The added seman-
tic capability is realized by passing the incoming publications and subscriptions
through three components that implement the above stages, respectively. A set
of semantically equivalent publications and subscriptions are generated and then
matched by the existing algorithm.
L-ToPSS (Location-aware ToPSS) uses the publish/subscribe paradigm to
implement push-oriented location based services. On top of the filtering engine,
L-ToPSS adds a location staging component to periodically process users
location updates. A location matching engine is used to match the location
constraints exposed by subscriptions and publications. Considering the limited
power and input capability of mobile devices, this prototype provides services in
a push-oriented style, thus offering an efficient notification mechanism for
mobile users.
Persistent ToPSS develops a new publish/subscribe model that accommodates
subscription and publication state. Traditional publish/subscribe models are
stateless, however, the here developed state persistent subject spaces model
tracks publications and subscriptions throughout their lifetime.
M-ToPSS (Mobile ToPSS) develops efficient state transfer protocols to support
disconnected operations in distributed publish/subscribe broker networks. A
subscriber connected to one broker may travel to another part of the network
connecting to a new broker. The publish/subscribe broker network has to store
any matching publications for the subscriber, forward these subscriptions to the
new location, and change the routing information in the network to route future
traffic directly to the new location of the subscriber.
P2P-ToPSS (peer-to-peer ToPSS) develops techniques to layer a content-based
publish/subscribe protocol on top of a peer-to-peer substrate, thus leveraging the
p2p networks benefits (i.e., scalability, fault tolerance, and resource availabil-
ity.)
Summary
In this chapter, we presented the publish/subscribe paradigm and introduced a
model that allows expression of imperfect information in both subscriptions and
publications. Fuzzy set theory and possibility theory are used to represent notions
of imprecision in predicates and publications. The most important property of this
approximate publish/subscribe model is that the language model is flexible and
powerful in that it allows subscriptions and publications to be either crisp or
approximate. Furthermore, the possibility and necessity measures used to
calculate the degree of match are expressive. The two measures can be used to
model users with different preferences, such as optimistic and pessimistic.
References
Aguilera, M. K., Strom, R. E., Sturman, D. C., Astley, M., & Chandra, T. D.
(1999). Matching events in a content-based subscription system. Presented
at the Symposium on Principles of Distributed Computing.
Bahu, S., & Widom, J. (2001). Continuous queries over data streams. ACM
Special Interest Group on Management of Data (SIGMOD) Record,
2001(3), 109120.
Banavar, G., Chandra, T. D., Mukherjee, B., Nagarajarao, J., Storm, R. E., &
Sturman, D. C. (1999). An efficient multicast protocol for content-based
publish/subscribe systems. Presented at the International Conference on
Distributed Computing Systems.
Barrett, D. J., Clarke, L. A., Tarr, P. L., & Wise, A. E. (1996). A framework
for event-based software integration. In ACM Transaction on Software
Engineering and Methodology, 5(4), 378421.
330 Liu & Jacobsen
Burcea, I., & Jacobsen, H. A. (2003). L-ToPSS Push-oriented location-
based services. Presented at the Fourth VLDB Workshop on Technolo-
gies for E-Services (TES03). Humboldt University, Berlin, Germany.
Burcea, I., Jacobsen, H.A., DeLara, E., Muthusam, V., & Petrovic, M. (2004).
Disconnected operations in publish/subscribe. In 2004 IEEE Interna-
tional Conference on Mobile Data Management (MDM).
Carzaniga, A., Rosenblum, D. S., & Wolf, A. L. (1998). Design of a scalable
event notification service: Interface and architecture. Technical Report
CU-US-863-98, Department of Computer Science, University of Colorado.
Chakravarthy, S., & Mishra, D. (1994). Snoop: An expressive event specifica-
tion language for active databases. Data and Knowledge Engineering,
14(1):1-26, Nov.
Chen, J., Dewitt, D. J., Tian, F., & Wang, Y. (2000). NiagaraCQ: A scalable
continuous query system for internet databases. In Proceedings of the
2000 ACM Special Interest Group on Management of Data (SIGMOD)
International Conference on Knowledge Discovery and Data Mining
(pp. 917).
Clarke, I., Sandberg, O., Wiley, B., & Hong, T. W. (2000). Freenet: A distributed
anonymous information storage and retrieval system. In Proceedings of
ICSI Workshop on Design Issues in Anonymity and Unobservability,
International Computer Science Institute.
Compailla, A., Chaki, S., Jha, S., & Veith, H. (2001). Efficient filtering in publish/
subscribe system using binary decision diagrams. In the Proceedings of
the 23
rd
International Conference on Software Engineering (ICSE).
Cugola, G., Nitto, E. D., & Fuggetta, A. (2001). The JEDI event-based
infrastructure and its application to the development of the OPSS WFMS.
IEEE Transaction on Software Engineering, 27(9), 827850.
Dingledine, R., Freedman, M. J., & Molnar, D. (2000). The Free Haven project:
Distributed anonymous storage service. In Proceedings of Workshop on
Design Issues in Anonymity and Unobservability.
Dubois, D., & Prade, H. (1988). Possibility theory: An approach to comput-
erized processing of uncertainty. New York: Plenum Press.
Eugster, P. Th., Guerraoui. R., & Sventek, J. (2000). Distributed asynchronous
collections: Abstractions for publish/subscribe interaction. In 14th
AITOEuropean Conference on Object Oriented Programming (ECOOP
2000), pp. 252-276.
Fabret, F., Jacobsen, H. A., Lirbat, F., Pereira, J., Ross, K. A., & Shasha, D.
(2001). Filtering algorithm and implementation for fast publish/subscribe
systems. Presented at the ACM Special Interest Group on Management
of Data (SIGMOD) Conference, Santa Barbara, CA.
Gruber, R. E., Krishnamurthy, B., & Panagos, E. (2000). READY: A high
performance event notification service. In Proceedings of the 16th
International Conference on Data Engineering. San Diego, California,
USA.
Happner, M., et al. (2002). Java message service API TUtorial and Refer-
ence: Messaging for the J2EE platform. Addison-Wesley Pub Co.
Leung, H., & Jacobsen, H. (2003). Efficient matching for state-persistent
publish/subscribe systems. In Proceedings of the 2003 Conference of
the Centre for Advanced Studies Conference on Collaborative Re-
search. Toronto, Canada.
Liu, H., & Jacobsen, H. A. (2002). A-ToPSS a publish/subscribe system
supporting approximate matching. Presented at The 28
th
International
Conference on Very Large Data Bases, Hong Kong, China.
Liu, H., & Jacobsen, H. A. (2003). Approximate matching in publish/subscribe.
In Proceedings of the Fifth IEEE International Symposium on Compu-
tational Intelligence in Robotics and Automation (CIRA2003). Kobe,
Japan.
Liu, H., & Jacobsen, H. A. (2004a). Modeling uncertainties in publish/subscribe
system. In Proceedings of 20th International Conference on Data
Engineering, Boston, MA.
Liu, H., & Jacobsen, H.A. (2004b). A-ToPSS A publish/subscribe system
supporting imperfect information processing. In Proceedings of the 30
th
International Conference on Very Large Data Bases, Toronto, Canada.
Liu, L., Pu, C., & Tang, W. (1999). Continuous queries for internet scale event-
driven information delivery. IEEE Transaction on Knowledge and Data
Engineering, 11(4), 583590.
Monson-Haefel, R., & Chappell, D. (2000). Java message service. OReilly.
Object Management Group. (2001). Event Service Specification, Version 1.1.
Object Management Group. (2002). Notification Service Specification, version
1.0.1.
Object Management Group (2004). Data Distribution Service Specification.
Version 1.0 finalization underway.
Petrovic, M., Burcea, I., & Jacobsen, H. A. (2003). S-ToPSS: Semantic Toronto
publish/subscribe system. In Proceedings of 29
th
International Confer-
ence on Very Large Data Bases. Humboldt-University, Berlin, Germany.
Samani, M.M., & Sloman, M. (1997). Gem-a generalized event monitoring
language for distributed systems. In Joint International Conference on
Open Distributed Processing (ICODP) and Distributed Platforms
(ICDP) 97, Toronto, Canada.
332 Liu & Jacobsen
Segall, B., & Arnold, D. (1997). Elvin has left the building: A publish/subscribe
notification service with quenching. Proceedings of the Australian UNIX
and Open Systems User Group Conference (AUUG97). Brisbane,
Australia.
Smets, P. (1997). Imperfect information: Imprecision-uncertainty, uncertainty
management in information systems: From needs to solutions (pp. 225
254). Dordrecht: Kluwer Academic Publishers.
Sun Microsystems Inc. (2002). Java message service specification. Version 1.1.
Tam, D., Azimi, R., & Jacobsen, H. A. (2003). Building content-based publish/
subscribe systems with distributed hash tables. Presented at the Interna-
tional Workshop on Databases, Information Systems and Peer-to-Peer
Computing. Humboldt University, Berlin, Germany.
Wolski, A., & Bouaziz, T. (1998). Fuzzy triggers: Incorporating imprecise
reasoning into active database. In Proceedings of the 14
th
International
Conference on Data Engineering.
Xu, Z., & Jacobsen, H.A. (2004). Efficient constraint processing for highly
personalized location based services. In Proceedings of the 30
th
Interna-
tional Conference on Very Large Data Bases, Toronto, Canada.
Zadeh, L. A. (1989). Knowledge representation in fuzzy logic. IEEE Transac-
tion on Knowledge and Data Engineering, 1, 89100.
About the Authors 333
About the Authors
Zongmin Ma received his Ph.D. from the City University of Hong Kong (2001).
His current research interests include intelligent database systems, knowledge
management, Web-based data management, e-learning systems, intelligent
planning and scheduling, decision making, robot path/motion planning, engineer-
ing database modeling, and enterprise information systems. He published many
papers in journals, conferences, edited books, and encyclopedias in these areas.
Also, he is currently authoring and editing several upcoming books being
published by Kluwer Academic Publishers and Idea Group Inc., respectively.
* * *
Rafal Angryk received a Ph.D. in computer science from Tulane University
(USA) and also has an M.A. in business management and an M.Sc. in computer
systems. He worked as a research assistant at the Center for Computational
Sciences, a program organized in cooperation between Stennis Space Center
(NASA) and University of Southern Mississippi. Previously, he was on the
faculty at the Institute of Computer Science, University of Szczecin, Poland. His
current research interests are large databases (data mining, spatial databases),
mobile agents technology (distributed processing, Web-mining), and artificial
intelligence (fuzzy modeling, neural networks), and he has over a dozen papers
in these areas.
Fernando Berzal is an assistant professor in the Department of Computer
Science and Artificial Intelligence at the University of Granada, where he is a
334 About the Authors
member of the Intelligent Databases and Information Systems research group
(IdBIS, for short). His current research interests include knowledge discovery
in databases and data mining, OLAP and data warehousing, intelligent informa-
tion systems, and almost anything related to software development, from model-
driven development to design patterns and software engineering practices.
Tru Hoang Cao is currently vice dean of the Faculty of Information Technology,
Ho Chi Minh City University of Technology. He received his B.Eng. (Gold
Medal) in computer science and engineering from Ho Chi Minh City University
of Technology (1990), M.Eng. (Tim Kendall Memorial Prize) in computer
science from Asian Institute of Technology (1995), and a Ph.D. in computer
science from University of Queensland (1999). He then spent more than two
years doing postdoctoral research in the Artificial Intelligence Group Univer-
sity of Bristol and Berkeley Initiative in Soft Computing University of
California at Berkeley. His research interests are uncertain and imprecise
knowledge representation and reasoning, conceptual structures, nonclassical
logics and their applications, object-oriented systems, and intelligent Internet. He
is author and co-author of more than 30 research papers in international journals,
edited books, and conference proceedings.
Rita de Caluwe studied mathematics at Ghent University, earning an M.Sc. in
computer science (1965) from the Universit Scientifique et Mdicale of
Grenoble (France) and graduating with a Ph.D. in 1973. She has been profession-
ally active as an assistant at the Computing Centre of Ghent University. Her
academic career as a professor in computer science at the same university
started in 1974, and she leads a research group on fuzzy databases. She is (co-
)author of a number of publications in this field, has served as a reviewer of many
conference and journal papers, and has participated actively in the elaboration
of major conferences. She organized a series of Lectures on Fuzziness and
Databases at Ghent University (1992-1997). Furthermore, she has been
involved in IFIP activities for more than 25 years, representing Belgium in the
General Assembly (1998-2002).
Elena Garca-Barriocanal obtained a university degree in computer science
from the Pontifical University of Salamanca in Madrid (1998) and a Ph.D. from
the Computer Science Department of the University of Alcal. In 1998, she
joined the Computer Science Department of this university as assistant profes-
sor. Starting from 2000, she has been associate professor with the Computer
Science Department of the University of Alcal and she is a member of the
Knowledge and Soft Computing group of this university. Her research interests
mainly focus on topics related to human-computer interactions and knowledge
representation; concretely she actively works on ontological aspects in usability
and accessibility areas.
Phil Graniero is an assistant professor in the Earth Sciences Department at the
University of Windsor and a researcher at the Great Lakes Institute for
Environmental Research, with more than 10 years of experience in GIS-related
research and development in academia and industry. His research combines
environmental science with computer science, emphasizing investigations in
spatial sampling strategies, eco-hydrological modeling, and wetland dynamics.
His primary research interest is the integration of GIS, artificial intelligence, data
acquisition technologies, and ecosystem models into innovative tools that maxi-
mize spatial information effectiveness. He teaches undergraduate and graduate
courses in GIS, spatial problem solving, and environmental modeling.
Jos A. Gutirrez obtained university degrees in computer science (Polytech-
nic University of Madrid), mathematics (Complutense University), and library
science (University of Alcal), and a Ph.D. from the University of Alcal. He
has worked in several companies as project manager and he held the position of
head of information systems at the University of Alcal, vice-dean of the
Polytechnic School of University of Alcal. He currently works as a full
professor at the Computer Science Department of University of Alcal, and
supervises several Ph.D. works in the areas of fuzzy sets and software
engineering.
Sven Helmer studied computer science (Informatik) at the University of
Karlsruhe in Germany (1989-1995). Following that, he acquired a Ph.D. doing
research in the area of database performance at the University of Mannheim,
Germany (2000). Currently, he is working on his Habilitation (postdoctoral
lecture qualification, roughly comparable to an assistant professorship) in the
area of native XML database systems. He published more than 25 papers in
various journals, conference proceedings, and books. Furthermore, he served as
a reviewer for different journals and as a member in several program commit-
tees.
Hans Arno Jacobsen holds a faculty position with the Department of Electrical
and Computer Engineering and the Department of Computer Science at the
University of Toronto (Canada), where he leads the Middleware Systems
research group. His principal areas of research include middleware systems,
distributed systems, and information systems. He received a Ph.D. from
Humboldt University, Berlin (1999), and his M.A.Sc. from the University of
Karlsruhe (1994). From 1992-1998, he conducted research at various institutes
around the globe, including LIFIA in Grenoble, France; ICSI in Berkeley; LBNL
in Berkeley; and INRIA in Rocquencour, France. He served as a program
committee member of numerous international workshops and conferences,
including ICDCS, OOPSLA, Middleware, and VLDB. He is the program chair
of the Fifth International Middleware Conference, in Toronto, Canada. For more
information, please visit http://www.eecg.toronto.edu/~jacobsen.
Roy Ladner received an M.S. in computer science and a Ph.D. in engineering
and applied science from the University of New Orleans. He works as a
research scientist at the Naval Research Laboratory at Stennis Space Center,
Mississippi (USA). His work emphasizes the investigation of spatiotemporal
database issues and advanced methods to improve delivery of spatiotemporal
data over the Internet. His research was published in national and international
conference proceedings and journals.
Haifeng Liu is a Ph.D. student in the Department of Computer Science at the
University of Toronto (Canada). Her research areas include database technol-
ogy, information systems, distributed system, and Web information retrieval.
Haifeng Liu received her masters degree from the University of Toronto (2003)
and bachelors degree from the University of Science and Technology of China
(2001). She interned as a visiting student in Microsoft Research Asia in Beijing
(July-September 2003).
Nicols Marn received a Ph.D. in computer science from the University of
Granada, Spain (2001). He currently works as a full-time assistant professor in
the Department of Computer Science and Artificial Intelligence at the University
of Granada, where he is a member of the Intelligent Databases and Information
Systems Research Group of the Andalusian Government. He is a member of the
team of several financed projects, and his research interest is focused on the
fields of fuzzy databases, knowledge discovery and data mining, fuzzy sets
theory, soft computing, OLAP, and data warehousing.
Hoa Nguyen is a lecturer of the Faculty of Information Technology, Ho Chi
Minh City Open University. He received his B.Sc. in mathematics from Vinh
Pedagogical University (1982), and M.Eng. in computer science and engineering
from Ho Chi Minh City University of Technology (2003). His research interests
are mathematical logic and their applications, fuzzy and probabilistic database
modeling, and technologies for constructing intelligent systems.
Frederick E. Petry received a Ph.D. in computer science from Ohio State
University, was on the faculty of the University of Alabama in Huntsville and
Ohio State and is currently a full professor in Electrical Engineering & Computer
Science at Tulane University (USA). His recent research interests include
representation of imprecision via fuzzy sets and rough sets in databases, GIS, and
other information systems. Dr. Petry has more than 300 scientific publications,
and his monograph on fuzzy databases was widely recognized as a definitive
volume on this topic. He was selected an IEEE Fellow in 1996 for research on
fuzzy sets for modeling imprecision in databases, and in 2003 he was made an
IFSA Fellow.
Olga Pons received a Ph.D. in computer science from the University of
Granada, where she currently works as an associate professor in the Department
of Computer Science and Artificial Intelligence. She participated in several
financed research projects of the Spanish Ministry of Science and Technology.
She wrote several book chapters for important editorial companies and more
than 20 articles that appeared in international journals. She also participates in
well-known congresses on the fields of soft computing and fuzzy sets (EUFIT,
IPMU, FuzzyIEEE, IFSA, ISMIS, FQAS, etc.), where she also chaired sessions
and participated in the program committees. Her research interest is focused on
the fields of fuzzy databases, knowledge discovery and data mining, fuzzy sets
theory, and soft computing.
Vincent B. Robinson is an associate professor in the Department of Geogra-
phy at the University of Toronto, Canada. He held the Alberta Forestry, Lands,
and Wildlife Professorship in Digital Mapping and Spatial Data Management at
The University of Calgary and came to the University of Toronto as director of
the Institute for Land Information Management. He published extensively on
topics relating to fuzzy information processing to problems of geographic
information systems. His current research is a strong interdisciplinary blend of
geographical information science, intelligent systems, and landscape biogeogra-
phy. He teaches undergraduate and graduate courses in geographic information
processing and landscape biogeography.
Jonathan Michael Rossiter is a lecturer in artificial intelligence in the
Department of Engineering Mathematics, University of Bristol, UK. He is
currently a JSPS and royal society research fellow spending two years in the
Biologically Integrative Sensory Systems Laboratory, Bio-mimetic Control
Systems Laboratory, RIKEN (the Institute of Physical and Chemical Research),
Japan. He received his B.Eng. in electronics (1992), his M.Sc. in computer
science (1996), and his Ph.D. in artificial intelligence (2000), all from the
University of Bristol. His research interests include humanist computing, uncer-
tain reasoning, uncertain conceptual structures, information fusion, image pro-
cessing, and medical information processing. He is author and co-author of more
than 20 research papers in international journals, edited books, and conference
proceedings.
Miguel . Sicilia obtained a university degree in computer science from the
Pontifical University of Salamanca in Madrid, Spain (1996) and a Ph.D. from
Carlos III University in Madrid, Spain (2002). In 1997 he joined an object-
technology consulting firm, after enjoying a research grant at the Instituto de
Automtica Industrial (Spanish Research Council). From 1997-1999, he worked
as assistant professor at the Pontifical University, after which he joined the
Computer Science Department of the Carlos III University in Madrid as a
lecturer, working simultaneously as a software architect in e-commerce consult-
ing firms, and as a member of the development team of a personalization engine.
From 2002-October 2003, he worked as a full-time lecturer at Carlos III
University working actively in the area of adaptive hypermedia. Currently, he
works as a full-time professor at the Computer Science Department, University
of Alcal (Madrid). His research interests are primarily adaptive hypermedia,
learning technology, and human-computer interaction, with special focus on the
role of uncertainty and imprecision handling techniques on those fields.
Mara-Amparo Vila received her M.S. in mathematics (1973) and her Ph.D. in
mathematics (1978), both from the University of Granada. Since 1992, she is a
professor in the Department of Computer Science and Artificial Intelligence.
Since 1997, she is also head of the department and the IdBIS research group. Her
research activity is centered around the application of soft computing techniques
to different areas of computer science and artificial intelligence, such as
theoretical aspects of fuzzy sets; decision and optimization processes in fuzzy
environments; fuzzy databases, including relational, logical, and object-oriented
data models; and information retrieval. She has been responsible for 10 research
projects and the advisor of seven Ph.D. theses. She published more than 50
papers in prestigious international journals, more than 60 contributions to
international conferences, and many book chapters.
Index 339
Index
A
A-ToPSS 317
access patterns 209
access support relations (ASRs) 227
access via type hierarchies 209
adjustment belief revision 138
agent 272
application programming interfaces
(APIs) 246
approximate matching 303
approximate publish/subscribe systems
303
artificial intelligence 128
association rules 87
associations 158
atomic fuzzy selection expression 64
atomic type 54
attribute generalization 96
attribute generalization algorithm 85
attribute-oriented induction 86
B
B-trees 215
basic type 11, 15
Bayesian network-based 129
body clauses 123
body phase 123
C
cardinality ratio 184
Cartesian product 69
CG-trees 233
CH-index 233
class 115
class hierarchy 48, 193
class inspector 202
class recognition 119
closeness of mapping 245
clusters 259
collection type 11, 15
complex objects 185
concept hierarchy 85, 96
conceptual data model 153
conceptual data modeling 153
conditional probability 48
consistent fuzzy concept hierarchy 98
constraint 23
constraint system 23, 24
continuous queries 325
core engine design 317
crisp concept hierarchy 97
340 Index
D
data browser 130
data cube 88
data definition operators 32
data generalization 86
data graph 186
data manipulation operators 32
data mining 85
data warehouse 87
database management systems
(DBMS) 207
database model 1, 31
database query 273, 284
database researchers 178
database scheme 29
database trigger technology 325
db4o 255
decision model 277
dependency 158
difference 69
disjunctive fuzzy set 183
dispersal model 277
E
ECO-COSM 269
ecological models 273
ellipse problem 130
entity-relationship (ER) 154
enumeration type 12, 15
equality constraint 6
existing database system 177
expressiveness 306
extended possibilistic truth value
(EPTV) 8
extendible hashing 216
extendible signature hashing index
(ESH) 225
extent cardinality 251
external hashing 215
F
face problem 130
FILUM 138
FIRMS model 5
flat hierarchy 105
flexible inheritance 247
food model 4
FOODBS architecture 198
FOODM model 5
FPOB instances 60
FRIL++ 113, 123
fuzzily described objects 185
fuzzy aggregation 164
fuzzy algebra 4
fuzzy association 166, 249
fuzzy association algebra 5
fuzzy association design 255
fuzzy atom 116
fuzzy attributes 210, 247
fuzzy class 159, 179, 247
fuzzy class extents 190
fuzzy class hierarchy 91
fuzzy class schema 93
fuzzy clustering algorithm 88
fuzzy collections 183
fuzzy concept 178
fuzzy concept hierarchy 97
fuzzy conceptual modeling 246
fuzzy constructor 192
fuzzy constructs 244
fuzzy data 159
fuzzy data mining 88
fuzzy database model 272
fuzzy databases 246
fuzzy dependency 169
fuzzy extensions 197, 244
fuzzy extents 190
fuzzy generalization 161
fuzzy generalization relation 163
fuzzy generalization-specialization 248
fuzzy graph constraint 7
fuzzy information modeling 153
fuzzy logic 47, 114
fuzzy measure 302
fuzzy modeling 178
fuzzy object oriented database model 1
fuzzy object-oriented capabilities 177
fuzzy object-oriented concepts 197
fuzzy object-oriented databases
(FOODBSs) 206
fuzzy object-oriented model 47, 86,
90
Index 341
fuzzy property 115
fuzzy region 284
fuzzy relation cardinality 251
fuzzy relations 248
fuzzy selection conditions 64
fuzzy selection expression 63
fuzzy set 88, 156, 210, 276
fuzzy set comparison 183
fuzzy set theory 48, 302
fuzzy spatial relation 273, 284
fuzzy superclass 162
fuzzy type 191
fuzzy values 247
G
G-trees 221
general two-dimensional indexes 222
generalization 158
generalized constraint 6
generalized resemblance operator 184
geographic database 270
geographic information systems (GISs)
274
geographical data 270
global connectivity 270
global depth 215
grid files 216
H
H-trees 232
hierarchical model 146
hierarchical signature organization 224
I
imprecise value 181
imprecision 303
inclusion operator 184
index structures 214
individual-based modeling (IBM) 270
inducer 129
information categorization 305
information dissemination 305
inheritance 208
inheritance relationship 190
instrumentation subsystem 282
interpretation of path expressions 64
intersection 69
interval supports 140
iterated prisoners dilemma (IPD) 141
J
Java data objects (JDO) 242, 252
join-compatible 74
K
K-d trees 217
KBLIMS 272
knowledge discovery 86
knowledge representation 146
L
label clauses 123
label phase 123
landscape 271
linguistic labels 55, 181
lisp 125
location-aware ToPSS 328
logic programming 114
M
machine learning 86, 128
mass assignment 49
membership degree 47, 163
membership function 285
meta-meta-model layer (M3) 246
modeling subsystem 280
modeling with words 143
monitoring experiments 320
multikey Index 234
multivalued attributes 209
multivalued reference type 14
N
navigational access 213
navigational access via paths 209
necessity measure 211
neural network-based 129
nonempty intersection query 219
342 Index
O
object bases 48
object constraint language (OCL) 248
object data management group (ODMG)
208, 242
object identifier (OID) 208
object persistence sources 242
object scheme 29
object-centered model 3
object-oriented data paradigm 178
object-oriented database 113, 274
object-oriented database management
systems (OODBM) 178
object-oriented database model 2
object-oriented databases (OODBs)
85, 242
object-oriented logic programs 123
object-oriented model 3, 47
object-relational database management
systems 178
ObjectStore 259
orthogonal persistence interfaces 244
orthogonal persistence system 242
P
partition tree 99
partitioned signature organization 225
path expression 63
pattern recognition 86
perceptual range 271
persistent object 21
physical storage models 250
polymorphy 209
possibilistic constraint 6
possibility distribution 156, 181
possibility measure 211
possibility theory 302
preferred default subset 118
probabilistic combination strategies 53
probabilistic constraint 7
probabilistic default reasoning 117
probabilistic extent 60
probabilistic interpretation 48
probabilistic object base 46
probabilistic tuple values 56
probability degree 47
probability distribution 48
probability theory 47
probability-value constraint 7
probe 272
programming language 113
projection 69
PROLOG 123
properties 115
property inheritance 119
prototype 201
publication data model 309, 312
publish/subscribe messaging paradigm
305
publish/subscribe paradigm 301
publish/subscribe systems 305
Q
query language 241
query-directed approach 272
querying 211
R
random set constraint 7
recursion 186
reference instance 21
reference type 14
reflection capability 193
relational interval trees 219
renaming 69
renaming expression 70
resemblance relationship 181
RI-trees 219
role-expressiveness 245
rough object-oriented database 5
S
SC-trees 232
segments 259
selection expression 63
selection operation 63
semantic data model 179, 189
semantic representation 181
semantic structure 273
semantics of a constraint 23
Index 343
sequential signature file (SSF) 224
set type 54
signature tree (ST) 224
signatures 222
similarity relations 101
similarity relationship 86
similarity-based model 4
simple user recognition 136
simulation models 270
single-valued attributes 209, 214
single-valued reference type 14
soft computing 178
software development 180
sophisticated access method 206
spatial data 270
specialization process 190
standard database architecture 198
standard index structure 206
state-persistent publish 307
storage hierarchy 207
structured type 12
subscription language 309
subscription language model 311
subset query 219
superclasses 59
superimposed coding technique 222
support pairs 116
syntax of a constraint 23
syntax rules 11
T
top-level attributes 55
Toronto publish/subscribe system family
327
transient object 21
translation layer 179
tree hierarchy 145
tuple type 54
type 10
type hierarchies 213, 232
type system 10
type-based publish/subscribe 307
U
UFO model 4
UML 153
uncertainty 188, 303
uncertainty degrees 232
unified modeling language (UML) 243
union 69
user model layer 246
user modeling 134
user object layer 246
user recognition 137
usuality constraint 7
V
veristic constraint 7
virtual memory mapping architecture
259
void type 14
voting model 49
An excellent addition to your library
Its Easy to Order! Order online at www.idea-group.com or
call 717/533-8845 x10
Mon-Fri 8:30 am-5:00 pm (est) or fax 24 hours a day 717/533-8661
ISBN 1-59140-134-8 (h/c) US$79.95 ISBN 1-59140-222-0 (s/c) US$64.95
388 pages Copyright 2004
Idea Group Publishing
Hershey London Melbourne Singapore
Organizational
Data Mining:
Leveraging Enterprise
Data Resources for Optimal
Performance
Hamid R. Nemati, University of North Carolina at Greensboro, USA
Christopher D. Barko, Laboratory Corporation of America, USA
Successfully competing in the new global economy requires
immediate decision capability. This immediate decision capability
requires quick analysis of both timely and relevant data. To
support this analysis, organizations are piling up mountains of
business data in their databases every day. Terabyte-sized
databases are common in organizations today, and this enormous
growth will make petabyte-sized databases a reality within the
next few years. Those organizations making swift, fact-based
decisions by optimally leveraging their data resources will
outperform those organizations that do not. A technology that
facilitates this process of optimal decision-making is known as
organizational data mining (ODM). Organizational Data Mining:
Leveraging Enterprise Data Resources for Optimal Performance
demonstrates how organi zati ons can l everage ODM for enhanced
competitiveness and optimal performance.
This book provides a timely account of data warehousing and data mining applications
for the organizations. It provides a balanced coverage of technical and organizational
aspects of these techniques, supplemented by case studies of real commercial
applications. Managers, practitioners, and research-oriented personnel can all benefit
from the many illuminating chapters written by experts in the field.
- Fereidoon Sadri, University of North Carolina, USA

Zongmin Ma-Advances in Fuzzy Object-Oriented Databases Modeling and Applications-Idea Group Publishing (2004)

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Zongmin Ma-Advances in Fuzzy Object-Oriented Databases Modeling and Applications-Idea Group Publishing (2004)

Transféré par

Droits d'auteur :

Formats disponibles

Advances in Fuzzy

-cut, and (where (F,x) returns the membership

, the operation (os,

) results in a new object scheme:

(I), is the FPOB

(I) = I' that contains only o

be the -cut of the similarity relation S, presented in Table 1.

denote the equivalence class

Bn WITH degree_Bn DEGREE.

Bn WITH degree_Bn DEGREE

A Framework to Build Fuzzy Object-Oriented Capabilities 185

of attributes. The set B

of methods defines the behavior of this

-trees (Evangelidis, 1995). (For a general overview of

can be calculated beforehand, and

) to the support or core of each

) to the signature of each support or core. During

)). In leaf nodes, we check sig(L

)), depending on the query type.

) and then forming the union of all retrieved

). However, the performance of signature-based access

), we determine all buckets to be fetched.

). Then we access the

). We count the number of occurrences for each reference appearing

denotes an a-cut of the fuzzy relation, and R

is a fuzzy relation as defined in Equation (4):

(McCoy & Johnston, 2001). ECO-COSM is a

GRID module (McCoy & Johnston, 2001), or other GridSource specializations

294 Robinson & Graniero

The second predicate constrains the attribute price. It is defined in a crisp

The three membership functions of this subscription are pictured in Figure 4.

A degree of possibility can be viewed as an upper probability bound. is not

Vous aimerez peut-être aussi