Académique Documents
Professionnel Documents
Culture Documents
doc
Learning objectives
Page 1 of 14
311098918.doc
Data Models
How is the logical structure of data described? and, How is data manipulated at the
logical level? The answer is that data is modelled at the logical/abstract level by means (or
within) a modelling theory.
A schema is written using a DDL. Unfortunately, this type of language is too low level to
describe the data requirements of an organization in a way that is readily understandable by
a variety of users. Hence a higher-level description of the schema: a data model.
A Data Model is an integrated collection of concepts for describing and manipulating data
[from real-world], relationships [associations among] between data [real-world
objects/events] , and constraints on the data in an organization.
The purpose of a data model is to represent data and to make the data understandable. If it
does this, then it can be easily used to design a database
2.
a conceptual data model, to represent the logical (or community) view that is
DBMS-independent;
3.
an internal data model, to represent the conceptual schema in such a way that it
can be understood by the DBMS.
Data Models fall into three broad categories: object-based, record-based, and physical data models. The
first two are used to describe data at the conceptual and external levels, the third is used to describe
data at the internal level.
Object-based Data Models
Some of the more common types of object-based data model are: Entity-Relationship (ER)
Semantic Functional Object-oriented
Record-based Data Models
There are three principal types: the relational data model, the network data model, and the
hierarchical data model. The majority of modern commercial systems are based on the relational
paradigm, whereas the early database systems were based on either the network or hierarchical
data models. The latter two models require the user to have knowledge of the physical database being
accessed, whereas the former provides a substantial amount of data independence. Hence, relational
systems adopt a declarative approach to database processing (that is, they specify what data is to
be retrieved), but network and hierarchical systems adopt a navigational approach (that is, they
specify how the data is to be retrieved).
Physical Data Models
Physical data models describe how data is stored in the computer, representing information such
as record structures, record orderings, and access paths. the most common ones are the unifying
model and the frame memory.
Conceptual (or logical) Modelling
the conceptual schema is the heart of the database. It supports all the external views and is, in turn,
supported by the internal schema. However, the internal schema is merely the physical implementation of
the conceptual schema. The book talks about Conceptual and Logical Modelling.
Page 2 of 14
311098918.doc
A good understanding of the theoretical model ensures a quick understanding of any of its
implementations
The relational theory is the most commonly used data modelling theory for commercial
database systems.
Object-Oriented Database Management Systems (OODBMSs) were predicted by some to
soon become the primary database technology, the argument was that RDBMSs were not
designed to handle the type of multimedia data frequently found on the internet. However,
so far none of these predictions has come to pass. Yet OODBMS face further competition
from Object-Relational Database Management Systems (ORDBMSs) [an earlier term used
was Extended-Relational DBMSs] which have added object capabilities to relational
databases
Although not as popular as predicted, OODBMSs have cornered some niche markets.
The relational model is a theory in which all data is modelled as relations: no pointers,5 no
records and no other data structures; only relations.
A relational DBMS requires only that the database be perceived by the user as tables.
RDBMS is the dominant data-processing software in use today, with an estimated total
software revenue worldwide of US$24 billion in 2011 and estimated to grow to about US$37
billion by 2016.
Owing to the popularity of the relational model, many nonrelational systems now provide a
relational user interface, irrespective of the underlying model
a set of concepts (i.e. relational data objects) by means of which data (corresponding to a
real-life system) is modelled (structural part)
a set of operators by means of which the objects of the model are manipulated (manipulative
part)
a set of rules, which specify how the concepts and operators are allowed to be put together
(set of integrity constraints)
How an entity (e.g. person) is modelled (i.e. what data is kept and in what format) depends on the
information requirements of a particular organisation.
It is not sufficient to devise a good theoretical model; eventually, the data model has to be
implemented. A software system that supports the implementation of the relational theory is a
relational DBMS. The implementation of an abstract data model in a DBMS is a database
system
Page 3 of 14
311098918.doc
NOTE: None of the extant commercial DBMSs fully implement a theory. They usually
impose certain restrictions and, accordingly, implement only a part of the theory. Therefore,
certain abstract data models would have to be adjusted before they could be implemented.
complex programs had to be written to answer even simple queries based on navigational
record-oriented access;
RDBMS software represents the second generation of DBMSs and is based on the relational data
model proposed by E. F. Codd (1970).
The relational models objectives were specified as follows:
To allow a high degree of data independence. Application programs must not be affected by
modifications to the internal data representation, particularly by changes to file
organizations, record orderings, or access paths.
To provide substantial grounds for dealing with data semantics, consistency, and
redundancy problems. In particular, Codds paper introduced the concept of normalized
relations, that is, relations that have no repeating groups.
Late 1970s - IBMs San Jos Research Laboratory in California, the prototype relational DBMS
System R
the System R project led to many rearch and propotypes, two major developments were:
o
the production of various commercial relational DBMS products during the late
1970s and the 1980s: for example, DB2 and SQL/DS from IBM and Oracle from
Oracle Corporation.
Late 1970s - the INGRES (Interactive Graphics Retrieval System) project at the University of
California at Berkeley
1976 - The third project was the Peterlee Relational Test Vehicle at the IBM UK Scientific
Centre in Peterlee
Page 4 of 14
311098918.doc
Terminology
Domain A domain is the set of allowable values for one or more attributes.
Domains are an extremely powerful feature of the relational model. Every attribute in a
relation is defined on a domain. Domains may be distinct for each attribute, or two or more
attributes may be defined on the same domain.
The domain concept is important, because it allows the user to define in a central place the
meaning and source of values that attributes can hold.
a complete implementation of domains is not straightforward, and as a result, many
RDBMSs do not support them fully.
A relation with only one attribute would have degree one and be called a unary relation or
one-tuple. A relation with two attribute
es is called binary, one with three attributes is called ternary, and after that the term n-ary is
usually used. The degree of a relation is a property of the intension of the relation.
Alternate Terminology
FORMAL TeRMS
Relation
Tuple
Attribute
ALTeRNATIVe 1
Table
Row
Column
ALTeRNATIVe 2
File
Record
Field
Database Relations
Relation schema
A named relation defined by a set of attribute and domain name pairs. In the same
way that a relation has a schema, so too does the relational database.
Relational database schema: A set of relation schemas, each with a distinct name.
Properties of A Relation
A relation has the following properties:
the relation has a name that is distinct from all other relation names in the relational schema;
each cell of the relation contains exactly one atomic (single) value;
the order of tuples has no significance, theoretically. (However, in practice, the order may
affect the efficiency of accessing tuples.)
Page 5 of 14
311098918.doc
Because each cell should contain only one value, it is illegal to store two postcodes for a
single branch office in a single cell. In other words, relations do not contain repeating groups.
A relation that satisfies this property is said to be normalized or in first normal form
Provided that an attribute name is moved along with the attribute values, we can
interchange columns. (Similarly, tuples can be interchanged)
Most of the properties specified for relations result from the properties of mathematical
relations
Relational Keys
Superkey An attribute, or set of attributes, that uniquely identifies a tuple within a relation.
Candidate key A superkey such that no proper subset is a superkey within the relation.
A candidate key K for a relation R has two properties: Uniqueness. In each tuple of R, the
values of K uniquely identify that tuple. Irreducibility. No proper subset of K has the
uniqueness property.
That an instance of a relation cannot be used to prove that an attribute or combination of
attributes is a candidate key. The fact that there are no duplicates for the values that appear
at a particular moment in time does not guarantee that duplicates are not possible. However,
the presence of duplicates in an instance can be used to show that some attribute
combination is not a candidate key. Identifying a candidate key requires that we know the
real-world meaning of the attribute(s) involved so that we can decide whether duplicates
are possible. Only by using this semantic information can we be certain that an attribute
combination is a candidate key.
Primary key: The candidate key that is selected to identify tuples uniquely within the
relation.
Alternate keys: the candidate keys that are not selected to be the primary key.
Foreign key: An attribute, or set of attributes, within one relation that matches the candidate
key of some (possibly the same) relation.
OODM A (logical) data model that captures the semantics of objects supported in objectoriented programming.
These definitions are very nondescriptive and tend to reflect the fact that there is no one objectoriented data model equivalent to the underlying data model of relational systems.
Page 6 of 14
311098918.doc
The object model proposed by the Object Data Management Group (ODMG), which many vendors
intend to support. The ODMG object model is important, because it specifies a standard model for
the semantics of database objects and supports interoperability between compliant OODBMSs.
The underlying concept behind object technology is that all software should be constructed out of
standard, reusable components wherever possible.
the concepts of object-oriented data models are drawn from different areas.
Basic OO Concepts
Objects
o
Anything that can be modelled. Objects are instances of some abstraction. Think of
real-life objects, like books, aircrafts or even dogs. Real-world objects share two
characteristics: they all have state and behaviour.
In the object-oriented data model state is defined by the values an object has for a set of
properties. A property may be both an attribute of the object and a relationship
between the object and one or more other objects.
an object is an entity that can be uniquely identified and which contains both the
attributes defining its state and behaviour (operations) associated with it.
The concept of encapsulation means that an object contains both a data structure and
the set of operations that can be used to manipulate it. While the concept of
information hiding means that the external aspects of an object are separated from its
internal details, which are hidden from the outside world.
Classes
o
Page 7 of 14
311098918.doc
A class describes the rules by which objects behave; these objects are referred to as
instances of that class.
A class specifies the structure of data which each instance contains as well as the
methods (functions) which manipulate the data of the object;
A method is a function with a special property that has access to data stored in an
object.
One of the benefits of programming with classes is that all instances of a particular
class will follow the defined behaviour of the class they instantiate
Classes are often related in some way. The most popular of these relations is
inheritance
Inheritance
o
Inheritance is a way to form new classes using classes that have already been defined
Storing objects in a relational database [to make them adopt OO object capable,
possibly to work with OOPs]
They can serve as the underlying storage engine. This requires mapping class instances
(objects) to one or more tuples distributed over one or more relations, [which implies
embedding DDL query language such as SQL into an OO programming language] but
this can be problematic. The problems with using two different language paradigms
have been collectively called the impedance mismatch between the application
programming language and the database query language. It has been claimed that as
much as 30% of programming effort and code space is devoted to converting data from
database or file formats [of the DB] into and out of program-internal formats [of the
OOP]. The integration of persistence into the programming language frees the
programmer from this responsibility.
The research into persistent programming languages [or to use the more encompassing
term, Persistent Application Systems] has had a significant influence on the
development of OODBMSs
311098918.doc
When you integrate database capabilities with object program language capabilities, the
result is an OODBMS
An OODBMS extends the language with transparently persistent data (hence they are
also called persistent application systems), concurrency control, data recovery,
associative queries, and other capabilities.
It is sometimes argued that the relational model is inadequate for certain complex longduration transactions (mostly involving engineering experiments in CAD systems)
the traditionalists believe that it is sufficient to extend the relational model with
additional (object-oriented) capabilities. Others believe that an underlying relational
model is inadequate to handle complex applications, such as computer-aided design,
computer-aided software engineering, and geographic information systems.
Revolutionary approach: Moving away from the traditional relational data model is to
integrating object-oriented concepts with database systems.
Evolutionary approach extends the relational model to integrate object oriented concepts
with database systems.
Semantic data modeling: a classification of data modelling that represents the real world
more closely. Functional data model (FDM) is one of the simplest in the family of semantic
data models.
In response to the increasing complexity of database applications, two new data models have
emerged: the Object-Oriented Data Model (OODM) and the Object-Relational Data Model
(ORDM), previously referred to as the Extended Relational Data Model (ERDM). However,
unlike previous models, the actual composition of these models is not clear.
Page 9 of 14
311098918.doc
There is currently considerable debate between the OODBMS proponents and the relational
supporters, which resembles the network/relational debate of the 1970s. Both sides agree
that traditional RDBMSs are inadequate for certain types of application. However, the
two sides differ on the best solution. The OODBMS proponents claim that RDBMSs are
satisfactory for standard business applications but lack the capability to support more
complex applications. The relational supporters claim that relational technology is a
necessary part of any real DBMS and that complex applications can be handled by
extensions to the relational model.
The proponents of the object-oriented model claim that it represents the problem domain
more closely by mapping abstractions of real world entities as objects/ classes.
At present, relational/object-relational DBMSs form the dominant system and objectoriented DBMSs have their own particular niche in the marketplace.
If OODBMSs are to become dominant, they must change their image from being systems
solely for complex applications to being systems that can also accommodate standard
business applications with the same tools and the same ease of use as their relational
counterparts. In particular, they must support a declarative query language compatible with
SQL.
The object-oriented data model is thought of as particularly suitable for handling complex
applications, such a computer-aided design, computer-aided software engineering, network
management systems, multimedia systems, digital publishing and geographic information
systems. This model however suffers from problems too.
Advantages of OO Models
Extensibility
Removal of
Support
Support
Applicability to
Improved performance
impedance
Disadvantages of OO Models
Page 10 of 14
mismatch
311098918.doc
Lack of
universal
Lack of experience
Lack of standards
Competition
Locking
Complexity
Lack of
Lack of
at
object
data model
level
may impact
performance
Long-duration transactions:
to support long-duration transactions we need to use different protocols from those used for
traditional database applications, in which transactions are typically of a very short duration.
Versions:
It is therefore necessary in databases that store designs to keep track of the evolution of
design objects and the changes made to a design by various transactions.
The process of maintaining the evolution of objects is known as version management. An
object version represents an identifiable state of an object; a version history represents the
evolution of an object.
Page 11 of 14
311098918.doc
Transient versions
Working versions
Released versions
Owing to the performance and storage overhead in supporting versions, some vendors
(e.g. Itasca) requires that the application indicate whether a class is versionable
schema evolution:
Design is an incremental process and evolves with time. To support this process, applications
require considerable flexibility in dynamically defining and modifying the database schema.
Typical changes to the schema include (Banerjee et al., 1987b): (1) Changes to the class
definition: (a) modifying attributes; (b) modifying methods. (2) Changes to the inheritance
hierarchy: (a) making a class S the superclass of a class C; (b) removing a class S from the list
of superclasses of C; (c) modifying the order of the superclasses of C. (3) Changes to the set
of classes, such as creating and deleting classes and modifying class names.
The changes proposed to a schema must not leave the schema in an inconsistent state. Some
vendors (e.g. Itasca and GemStone) define rules for schema consistency, called schema
invariants, which must be complied with as the schema is modified.
Object server. This approach attempts to distribute the processing between the two
components. This is the best architecture for cooperative, object-to-object processing in an
open, distributed environment.
Page server. Most of the database processing is performed by the client. The server is
responsible for secondary storage and providing pages at the clients request.
Database server Most of the database processing is performed by the server. The client
simply passes requests to the server, receives results, and passes them on to the application.
This is the approach taken by many RDBMSs.
In each case, the server resides on the same machine as the physical database; the client may reside
on the same or different machine.
Storing and executing methods
There are two approaches to handling methods: store the methods in external files and store the
methods in the database.
The second approach offers several benefits:
It simplifies modifications.
Improved integrity.
Benchmarking
Page 12 of 14
311098918.doc
Over the years, various database benchmarks have been developed as a tool for comparing the
performance of DBMSs.
Wisconsin benchmark
TPC-C benchmark
TPC-R
TPC-W
OO7 benchmark
In 1993, the University of Wisconsin released the OO7 benchmark, based on a more
comprehensive set of tests and a more complex database. OO7 was designed for
detailed comparisons of OODBMS products (Carey et al., 1993).
Provide users with a ready-to-use, expressive visual modeling language so they can develop
and exchange meaningful models.
Provide extensibility and specialization mechanisms to extend the core concepts. For
example, the UML provides stereotypes, which allow new elements to be defined by
extending and refining the semantics of existing elements. A stereotype is enclosed in double
angle brackets (<< ... >>).
UML Diagrams
The main ones can be divided into the following two categories:
Structural diagrams, which describe the static relationships between components. These
include:
class diagrams,
object diagrams,
Page 13 of 14
311098918.doc
component diagrams,
deployment diagrams.
Behavioral diagrams, which describe the dynamic relationships between components. These
include:
use case diagrams,
sequence diagrams,
collaboration diagrams,
statechart diagrams,
activity diagrams.
Page 14 of 14