Data Models

311098918.
doc
Chapter 11: Data Models

(Study guide P.99 105)
Learning objectives
Page 1 of 14
Describe how a real-life system can be modelled within a data model
Define the relational model and discuss its components
Define basic object-oriented concepts
Evaluate and critique the relational and object-orited models
311098918.doc
Data Models
How is the logical structure of data described? and, How is data manipulated at the
logical level? The answer is that data is modelled at the logical/abstract level by means (or
within) a modelling theory.
A schema is written using a DDL. Unfortunately, this type of language is too low level to
describe the data requirements of an organization in a way that is readily understandable by
a variety of users. Hence a higher-level description of the schema: a data model.
A Data Model is an integrated collection of concepts for describing and manipulating data
[from real-world], relationships [associations among] between data [real-world
objects/events] , and constraints on the data in an organization.
The purpose of a data model is to represent data and to make the data understandable. If it
does this, then it can be easily used to design a database
we can identify three related data models:

1.
an external data model, to represent each users view of the organization,

sometimes called the Universe of Discourse (UoD);
2.
a conceptual data model, to represent the logical (or community) view that is
DBMS-independent;
3.
an internal data model, to represent the conceptual schema in such a way that it
can be understood by the DBMS.
Data Models fall into three broad categories: object-based, record-based, and physical data models. The
first two are used to describe data at the conceptual and external levels, the third is used to describe
data at the internal level.
Object-based Data Models
Some of the more common types of object-based data model are: Entity-Relationship (ER)
Semantic Functional Object-oriented
Record-based Data Models
There are three principal types: the relational data model, the network data model, and the
hierarchical data model. The majority of modern commercial systems are based on the relational
paradigm, whereas the early database systems were based on either the network or hierarchical
data models. The latter two models require the user to have knowledge of the physical database being
accessed, whereas the former provides a substantial amount of data independence. Hence, relational
systems adopt a declarative approach to database processing (that is, they specify what data is to
be retrieved), but network and hierarchical systems adopt a navigational approach (that is, they
specify how the data is to be retrieved).
Physical Data Models
Physical data models describe how data is stored in the computer, representing information such
as record structures, record orderings, and access paths. the most common ones are the unifying
model and the frame memory.
Conceptual (or logical) Modelling
the conceptual schema is the heart of the database. It supports all the external views and is, in turn,
supported by the internal schema. However, the internal schema is merely the physical implementation of
the conceptual schema. The book talks about Conceptual and Logical Modelling.
Page 2 of 14
311098918.doc
Why Study the relational model
To be a good practitioner by understanding the theory
A good understanding of the theoretical model ensures a quick understanding of any of its
implementations
The relational theory is the most commonly used data modelling theory for commercial
database systems.
Object-Oriented Database Management Systems (OODBMSs) were predicted by some to
soon become the primary database technology, the argument was that RDBMSs were not
designed to handle the type of multimedia data frequently found on the internet. However,
so far none of these predictions has come to pass. Yet OODBMS face further competition
from Object-Relational Database Management Systems (ORDBMSs) [an earlier term used
was Extended-Relational DBMSs] which have added object capabilities to relational
databases
Why Study the Object Oriented model?
To use or develop OO Model in the future when need arise
To understand object-oriented principles found in ORDBMS and object-oriented

programming languages
Although not as popular as predicted, OODBMSs have cornered some niche markets.
The Relational Model
The relational model is a theory in which all data is modelled as relations: no pointers,5 no
records and no other data structures; only relations.
A relational DBMS requires only that the database be perceived by the user as tables.
RDBMS is the dominant data-processing software in use today, with an estimated total
software revenue worldwide of US$24 billion in 2011 and estimated to grow to about US$37
billion by 2016.
Owing to the popularity of the relational model, many nonrelational systems now provide a
relational user interface, irrespective of the underlying model
The relational theory consists of three component:
a set of concepts (i.e. relational data objects) by means of which data (corresponding to a
real-life system) is modelled (structural part)
a set of operators by means of which the objects of the model are manipulated (manipulative
part)
a set of rules, which specify how the concepts and operators are allowed to be put together
(set of integrity constraints)
How an entity (e.g. person) is modelled (i.e. what data is kept and in what format) depends on the
information requirements of a particular organisation.
It is not sufficient to devise a good theoretical model; eventually, the data model has to be
implemented. A software system that supports the implementation of the relational theory is a
relational DBMS. The implementation of an abstract data model in a DBMS is a database
system
Page 3 of 14
311098918.doc
NOTE: None of the extant commercial DBMSs fully implement a theory. They usually
impose certain restrictions and, accordingly, implement only a part of the theory. Therefore,
certain abstract data models would have to be adjusted before they could be implemented.
Brief History of the Relational Model

In the late 1960s and early 1970s, there were two mainstream approaches to constructing DBMSs :
Hierarchical and Network based approaches. These both formed the first generation of DBMSs.
These two models had some fundamental disadvantages:
complex programs had to be written to answer even simple queries based on navigational
record-oriented access;
there was minimal data independence;
there was no widely accepted theoretical foundation.
RDBMS software represents the second generation of DBMSs and is based on the relational data
model proposed by E. F. Codd (1970).
The relational models objectives were specified as follows:
To allow a high degree of data independence. Application programs must not be affected by
modifications to the internal data representation, particularly by changes to file
organizations, record orderings, or access paths.
To provide substantial grounds for dealing with data semantics, consistency, and
redundancy problems. In particular, Codds paper introduced the concept of normalized
relations, that is, relations that have no repeating groups.
To enable the expansion of set-oriented data manipulation languages.
Three Research Projects that developed interest in Relational Model:
Late 1970s - IBMs San Jos Research Laboratory in California, the prototype relational DBMS
System R
the System R project led to many rearch and propotypes, two major developments were:
o
the development of a structured query language called SQL
the production of various commercial relational DBMS products during the late
1970s and the 1980s: for example, DB2 and SQL/DS from IBM and Oracle from
Oracle Corporation.
Late 1970s - the INGRES (Interactive Graphics Retrieval System) project at the University of
California at Berkeley
1976 - The third project was the Peterlee Relational Test Vehicle at the IBM UK Scientific
Centre in Peterlee
Page 4 of 14
311098918.doc
Terminology
Relation A relation is a table with columns and rows.
Attribute An attribute is a named column of a relation.
Domain A domain is the set of allowable values for one or more attributes.
Domains are an extremely powerful feature of the relational model. Every attribute in a
relation is defined on a domain. Domains may be distinct for each attribute, or two or more
attributes may be defined on the same domain.
The domain concept is important, because it allows the user to define in a central place the
meaning and source of values that attributes can hold.
a complete implementation of domains is not straightforward, and as a result, many
RDBMSs do not support them fully.
Tuple A tuple is a row of a relation.
A relation with only one attribute would have degree one and be called a unary relation or
one-tuple. A relation with two attribute
es is called binary, one with three attributes is called ternary, and after that the term n-ary is
usually used. The degree of a relation is a property of the intension of the relation.
Degree The degree of a relation is the number of attributes it contains.
Cardinality The cardinality of a relation is the number of tuples it contains.
Relational database A collection of normalized relations with distinct relation names.
Alternate Terminology
FORMAL TeRMS
Relation
Tuple
Attribute
ALTeRNATIVe 1
Table
Row
Column
ALTeRNATIVe 2
File
Record
Field
Database Relations
Relation schema
A named relation defined by a set of attribute and domain name pairs. In the same
way that a relation has a schema, so too does the relational database.
Relational database schema: A set of relation schemas, each with a distinct name.
Properties of A Relation
A relation has the following properties:
the relation has a name that is distinct from all other relation names in the relational schema;
each cell of the relation contains exactly one atomic (single) value;
each attribute has a distinct name;
the values of an attribute are all from the same domain;
each tuple is distinct; there are no duplicate tuples;
the order of attributes has no significance;
the order of tuples has no significance, theoretically. (However, in practice, the order may
affect the efficiency of accessing tuples.)
Page 5 of 14
311098918.doc
Because each cell should contain only one value, it is illegal to store two postcodes for a
single branch office in a single cell. In other words, relations do not contain repeating groups.
A relation that satisfies this property is said to be normalized or in first normal form
Provided that an attribute name is moved along with the attribute values, we can
interchange columns. (Similarly, tuples can be interchanged)
Most of the properties specified for relations result from the properties of mathematical
relations
Relational Keys
Superkey An attribute, or set of attributes, that uniquely identifies a tuple within a relation.
Candidate key A superkey such that no proper subset is a superkey within the relation.
A candidate key K for a relation R has two properties: Uniqueness. In each tuple of R, the
values of K uniquely identify that tuple. Irreducibility. No proper subset of K has the
uniqueness property.
That an instance of a relation cannot be used to prove that an attribute or combination of
attributes is a candidate key. The fact that there are no duplicates for the values that appear
at a particular moment in time does not guarantee that duplicates are not possible. However,
the presence of duplicates in an instance can be used to show that some attribute
combination is not a candidate key. Identifying a candidate key requires that we know the
real-world meaning of the attribute(s) involved so that we can decide whether duplicates
are possible. Only by using this semantic information can we be certain that an attribute
combination is a candidate key.
composite key a candidate key consisting of more than one attribute
Primary key: The candidate key that is selected to identify tuples uniquely within the
relation.
Alternate keys: the candidate keys that are not selected to be the primary key.
Foreign key: An attribute, or set of attributes, within one relation that matches the candidate
key of some (possibly the same) relation.
Representing Relational Database Schemas

A relational database consists of any number of normalized relations.
The conceptual model, or conceptual schema, is the set of all relation schemas of the database
Object-Oriented Data Model

There is no abstract, formally defined object data model. Thus there is some confusion over levels of
abstraction and some definitions are not universally agreed on.
OODB A persistent and sharable collection of objects defined by an OODM.
OODM A (logical) data model that captures the semantics of objects supported in objectoriented programming.
OODBMS The manager of an OODB
These definitions are very nondescriptive and tend to reflect the fact that there is no one objectoriented data model equivalent to the underlying data model of relational systems.
Page 6 of 14
311098918.doc
The object model proposed by the Object Data Management Group (ODMG), which many vendors
intend to support. The ODMG object model is important, because it specifies a standard model for
the semantics of database objects and supports interoperability between compliant OODBMSs.
The underlying concept behind object technology is that all software should be constructed out of
standard, reusable components wherever possible.
the concepts of object-oriented data models are drawn from different areas.
Basic OO Concepts
Objects
o
Anything that can be modelled. Objects are instances of some abstraction. Think of
real-life objects, like books, aircrafts or even dogs. Real-world objects share two
characteristics: they all have state and behaviour.
In the object-oriented data model state is defined by the values an object has for a set of
properties. A property may be both an attribute of the object and a relationship
between the object and one or more other objects.
Behaviour is defined by a set of methods or operations performed by or on the object.
an object is an entity that can be uniquely identified and which contains both the
attributes defining its state and behaviour (operations) associated with it.
Whats the difference btw an Object and a relation
Encapsulation and information hiding

o
The concept of encapsulation means that an object contains both a data structure and
the set of operations that can be used to manipulate it. While the concept of
information hiding means that the external aspects of an object are separated from its
internal details, which are hidden from the outside world.
encapsulation implies physical data independence
Classes
o
Page 7 of 14
A class can be conceptualised as a template/specification for creating objects.
311098918.doc
A class describes the rules by which objects behave; these objects are referred to as
instances of that class.
A class specifies the structure of data which each instance contains as well as the
methods (functions) which manipulate the data of the object;
A method is a function with a special property that has access to data stored in an
object.
One of the benefits of programming with classes is that all instances of a particular
class will follow the defined behaviour of the class they instantiate
Classes are often related in some way. The most popular of these relations is
inheritance
Inheritance
o
Inheritance is a way to form new classes using classes that have already been defined
Inheritance is intended to help reuse of existing code with little or no modification.
Inheritance is also called generalisation

For instance, a fruit is a generalisation of apple, orange, mango and many others. We
say that fruit is an abstraction of apple, orange, etc. Conversely, we can say that since
apples are fruit (i.e. an apple is a fruit) then they inherit all the properties common to
all fruit, such as being a fleshy container for the seed of a plant.
Subclasses usually consist of several kinds of modifications to the base class: such as
addition of new instance variables; addition of new methods and overriding of
existing methods to support the new instance variables.
Storing objects in a relational database [to make them adopt OO object capable,
possibly to work with OOPs]
RDBMS can be used with OOP languages.
They can serve as the underlying storage engine. This requires mapping class instances
(objects) to one or more tuples distributed over one or more relations, [which implies
embedding DDL query language such as SQL into an OO programming language] but
this can be problematic. The problems with using two different language paradigms
have been collectively called the impedance mismatch between the application
programming language and the database query language. It has been claimed that as
much as 30% of programming effort and code space is devoted to converting data from
database or file formats [of the DB] into and out of program-internal formats [of the
OOP]. The integration of persistence into the programming language frees the
programmer from this responsibility.
The research into persistent programming languages [or to use the more encompassing
term, Persistent Application Systems] has had a significant influence on the
development of OODBMSs
Object-oriented Database Management Systems

Traditionally, software engineering and database management have existed as separate
disciplines. Database management has focused on the static aspects of information storage,
while software engineering has concentrated on modelling the dynamic aspects of
software. These two disciplines have been combined with the arrival of the third generation
Page 8 of 14
311098918.doc
of DBMSs, i.e. an Object-Oriented Database Management System (OODBMS) and an

Object-Relational Database Management System (ORDBMS). These new generation DBMSs
allow concurrent modelling of both data and processes acting upon data.
When you integrate database capabilities with object program language capabilities, the
result is an OODBMS
An OODBMS makes database objects appear as programming language objects in one

or more object programming languages.
An OODBMS extends the language with transparently persistent data (hence they are
also called persistent application systems), concurrency control, data recovery,
associative queries, and other capabilities.
Object-oriented database design requires the database schema to include both a

description of the object data structure and constraints, and the object behaviour
The relational model versus object-oriented Data Model
It is sometimes argued that the relational model is inadequate for certain complex longduration transactions (mostly involving engineering experiments in CAD systems)
the traditionalists believe that it is sufficient to extend the relational model with
additional (object-oriented) capabilities. Others believe that an underlying relational
model is inadequate to handle complex applications, such as computer-aided design,
computer-aided software engineering, and geographic information systems.
Revolutionary approach: Moving away from the traditional relational data model is to
integrating object-oriented concepts with database systems.
Evolutionary approach extends the relational model to integrate object oriented concepts
with database systems.
RDBMSs have their failingsparticularly their limited modeling capabilities. Attempts to

address this problem include: In 1976, Chen presented the EntityRelationship model that is
now a widely accepted technique for database design and F. E. Codd 1979 reworking of the
relational model (to version RM/T)
Semantic data modeling: a classification of data modelling that represents the real world
more closely. Functional data model (FDM) is one of the simplest in the family of semantic
data models.
In response to the increasing complexity of database applications, two new data models have
emerged: the Object-Oriented Data Model (OODM) and the Object-Relational Data Model
(ORDM), previously referred to as the Extended Relational Data Model (ERDM). However,
unlike previous models, the actual composition of these models is not clear.
Page 9 of 14
311098918.doc
There is currently considerable debate between the OODBMS proponents and the relational
supporters, which resembles the network/relational debate of the 1970s. Both sides agree
that traditional RDBMSs are inadequate for certain types of application. However, the
two sides differ on the best solution. The OODBMS proponents claim that RDBMSs are
satisfactory for standard business applications but lack the capability to support more
complex applications. The relational supporters claim that relational technology is a
necessary part of any real DBMS and that complex applications can be handled by
extensions to the relational model.
The proponents of the object-oriented model claim that it represents the problem domain
more closely by mapping abstractions of real world entities as objects/ classes.
At present, relational/object-relational DBMSs form the dominant system and objectoriented DBMSs have their own particular niche in the marketplace.
If OODBMSs are to become dominant, they must change their image from being systems
solely for complex applications to being systems that can also accommodate standard
business applications with the same tools and the same ease of use as their relational
counterparts. In particular, they must support a declarative query language compatible with
SQL.
The object-oriented data model is thought of as particularly suitable for handling complex
applications, such a computer-aided design, computer-aided software engineering, network
management systems, multimedia systems, digital publishing and geographic information
systems. This model however suffers from problems too.
Advantages of OO Models
Enriched modeling capabilities
Extensibility
Removal of
More expressive query language
Support
for schema evolution
Support
for long-duration transactions
Applicability to
Improved performance
impedance
advanced database applications
Disadvantages of OO Models
Page 10 of 14
mismatch
311098918.doc
Lack of
universal
Lack of experience
Lack of standards
Competition
Query optimization compromises encapsulation
Locking
Complexity
Lack of
support for views
Lack of
support for security
at
object
data model
level
may impact
performance
Persistent Programming Languages

This is a language that provides its users with the ability to (transparently) preserve data across
successive executions of a program and even allows such data to be used by many different
programs.
Database programming language: a language that integrates some ideas from the database
programming model with traditional programming language features. E.g. Object SQL.
In contrast, a database programming language is distinguished from a persistent programming
language by its incorporation of features beyond persistence, such as transaction management,
concurrency control, and recovery.
Alternative Strategies for Developing an OODBMS
1. Extend an existing object-oriented programming language with database capabilities.
2. Provide extensible object-oriented DBMS libraries.
3. Embed object-oriented database language constructs in a conventional host language.
4. Extend an existing database language with object-oriented capabilities. Owing to the
widespread acceptance of SQL, vendors are extending it to provide objectoriented
constructs. This approach is being pursued by both RDBMS and OODBMS vendors. The
1999 release of the SQL standard, SQL:1999, supports object-oriented features.
5. Develop a novel database data model/data language. This is a radical approach that starts
from the beginning and develops an entirely new database language and DBMS with objectoriented capabilities.
Issues In OODBMS
Three areas that are problematic for relational DBMSs and that are addresses in OODBMSs:
Long-duration transactions:
to support long-duration transactions we need to use different protocols from those used for
traditional database applications, in which transactions are typically of a very short duration.
Versions:
It is therefore necessary in databases that store designs to keep track of the evolution of
design objects and the changes made to a design by various transactions.
The process of maintaining the evolution of objects is known as version management. An
object version represents an identifiable state of an object; a version history represents the
evolution of an object.
Page 11 of 14
311098918.doc
three types of version:

o
Transient versions
Working versions
Released versions
Owing to the performance and storage overhead in supporting versions, some vendors
(e.g. Itasca) requires that the application indicate whether a class is versionable
schema evolution:
Design is an incremental process and evolves with time. To support this process, applications
require considerable flexibility in dynamically defining and modifying the database schema.
Typical changes to the schema include (Banerjee et al., 1987b): (1) Changes to the class
definition: (a) modifying attributes; (b) modifying methods. (2) Changes to the inheritance
hierarchy: (a) making a class S the superclass of a class C; (b) removing a class S from the list
of superclasses of C; (c) modifying the order of the superclasses of C. (3) Changes to the set
of classes, such as creating and deleting classes and modifying class names.
The changes proposed to a schema must not leave the schema in an inconsistent state. Some
vendors (e.g. Itasca and GemStone) define rules for schema consistency, called schema
invariants, which must be complied with as the schema is modified.
Architecture [For OODBMS]

How best to apply the client server architecture to the OODBMS environment, and the storage of
methods?
Object server. This approach attempts to distribute the processing between the two
components. This is the best architecture for cooperative, object-to-object processing in an
open, distributed environment.
Page server. Most of the database processing is performed by the client. The server is
responsible for secondary storage and providing pages at the clients request.
Database server Most of the database processing is performed by the server. The client
simply passes requests to the server, receives results, and passes them on to the application.
This is the approach taken by many RDBMSs.
In each case, the server resides on the same machine as the physical database; the client may reside
on the same or different machine.
Storing and executing methods
There are two approaches to handling methods: store the methods in external files and store the
methods in the database.
The second approach offers several benefits:
It eliminates redundant code.
It simplifies modifications.
Methods are more secure.
Methods can be shared concurrently.
Improved integrity.
Benchmarking
Page 12 of 14
311098918.doc
Over the years, various database benchmarks have been developed as a tool for comparing the
performance of DBMSs.
Wisconsin benchmark
Owing to the importance of accurate benchmarking information, a consortium of

manufacturers formed the Transaction Processing Council (TPC) in 1988.
o
TPC-A and TPC-B benchmarks
TPC-C benchmark
TPC-H, for ad hoc decision support environments
TPC-R
TPC-W
OOI benchmark The Object Operations Version 1 (OO1) benchmark is intended as a

generic measure of OODBMS performance
OO7 benchmark
In 1993, the University of Wisconsin released the OO7 benchmark, based on a more
comprehensive set of tests and a more complex database. OO7 was designed for
detailed comparisons of OODBMS products (Carey et al., 1993).
Object-Oriented Analysis and Design with UML

UML represents a unification and evolution of several object-oriented analysis and design methods
that appeared in the late 1980s and early 1990s.
UML is commonly defined as a standard language for specifying, constructing, visualizing, and
documenting the artifacts of a software system. Analogous to the use of architectural blueprints in
the construction industry, the UML provides a common language for describing software models.
The primary goals in the design of the UML were to:
Provide users with a ready-to-use, expressive visual modeling language so they can develop
and exchange meaningful models.
Provide extensibility and specialization mechanisms to extend the core concepts. For
example, the UML provides stereotypes, which allow new elements to be defined by
extending and refining the semantics of existing elements. A stereotype is enclosed in double
angle brackets (<< ... >>).
Be independent of particular programming languages and development processes.
Provide a formal basis for understanding the modeling language.
Encourage the growth of the object-oriented tools market.
Support higher-level development concepts such as collaborations, frameworks, patterns,

and components.
Integrate best practices.
UML Diagrams
The main ones can be divided into the following two categories:
Structural diagrams, which describe the static relationships between components. These
include:
class diagrams,
object diagrams,
Page 13 of 14
311098918.doc
component diagrams,
deployment diagrams.
Behavioral diagrams, which describe the dynamic relationships between components. These
include:
use case diagrams,
sequence diagrams,
collaboration diagrams,
statechart diagrams,
activity diagrams.
Page 14 of 14

Data Models

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Data Models

Transféré par

Droits d'auteur :

Formats disponibles

311098918.

Chapter 11: Data Models

Describe how a real-life system can be modelled within a data model

Define the relational model and discuss its components

Define basic object-oriented concepts

Evaluate and critique the relational and object-orited models

we can identify three related data models:

an external data model, to represent each users view of the organization,

Why Study the relational model

To be a good practitioner by understanding the theory

Why Study the Object Oriented model?

To use or develop OO Model in the future when need arise

To understand object-oriented principles found in ORDBMS and object-oriented

The Relational Model

The relational theory consists of three component:

Brief History of the Relational Model

there was minimal data independence;

there was no widely accepted theoretical foundation.

To enable the expansion of set-oriented data manipulation languages.

Three Research Projects that developed interest in Relational Model:

the development of a structured query language called SQL

Relation A relation is a table with columns and rows.

Attribute An attribute is a named column of a relation.

Tuple A tuple is a row of a relation.

Degree The degree of a relation is the number of attributes it contains.

Cardinality The cardinality of a relation is the number of tuples it contains.

Relational database A collection of normalized relations with distinct relation names.

each attribute has a distinct name;

the values of an attribute are all from the same domain;

each tuple is distinct; there are no duplicate tuples;

the order of attributes has no significance;

composite key a candidate key consisting of more than one attribute

Representing Relational Database Schemas

Object-Oriented Data Model

OODB A persistent and sharable collection of objects defined by an OODM.

OODBMS The manager of an OODB

Behaviour is defined by a set of methods or operations performed by or on the object.

Whats the difference btw an Object and a relation

Encapsulation and information hiding

encapsulation implies physical data independence

A class can be conceptualised as a template/specification for creating objects.

Inheritance is intended to help reuse of existing code with little or no modification.

Inheritance is also called generalisation

RDBMS can be used with OOP languages.

Object-oriented Database Management Systems

of DBMSs, i.e. an Object-Oriented Database Management System (OODBMS) and an

An OODBMS makes database objects appear as programming language objects in one

Object-oriented database design requires the database schema to include both a

The relational model versus object-oriented Data Model

RDBMSs have their failingsparticularly their limited modeling capabilities. Attempts to

Enriched modeling capabilities

More expressive query language

for schema evolution

for long-duration transactions

advanced database applications

Query optimization compromises encapsulation

support for views

support for security

Persistent Programming Languages

three types of version:

Architecture [For OODBMS]

It eliminates redundant code.

Methods are more secure.

Methods can be shared concurrently.