Vous êtes sur la page 1sur 12

Relational Data Model

One of the most important applications for computers is storing and managing
information. The manner in which information is organized can have a profound effect on how
easy it is to access and manage. Perhaps the simplest but most versatile way to organize
information is to store it in tables.

The relational model is centered on this idea: the organization of data into collections of
two-dimensional tables called “relations.” We can also think of the relational model as a
generalization of the set data model that we discussed in Chapter 7, extending binary relations to
relations of arbitrary arity. Originally, the relational data model was developed for databases —
that is, information stored over a long period of time in a cDatabase omputer system — and for
database management systems, the software that allows people to store, access, and modify this
information. Databases still provide us with important motivation for understanding the
relational data model. They are found today not only in their original, large-scale applications
such as airline reservation systems or banking sys- tems, but in desktop computers handling
individual activities such as maintaining expense records, homework grades, and many other
uses. Other kinds of software besides database systems can make good use of tables of
information as well, and the relational data model helps us design these tables and develop the
data structures that we need to access them efficiently. For example, such tables are used by
compilers to store information about the variables used in the program, keeping track of their
data type and of the functions for which they are defined.
Relations
Section 7.7 introduced the notion of a “relation” as a set of tuples. Each tuple of
a relation is a list of components, and each relation has a fixed arity, which is the
number of components each of its tuples has.
The columns of the table are given names, called attributes. Attribute In Fig. 8.1, the
attributes are Course, StudentId, and Grade.

the order in which the rows of a table are listed


has no significance, and we can rearrange the rows in any way without changing
the value of the table, just as we can we rearrange the order of elements in a set
without changing the value of the set.
The order of the components in each row of a table is significant, since different
columns are named differently, and each component must represent an item of the
kind indicated by the header of its column. In the relational model, however, we
may permute the order of the columns along with the names of their headers and
keep the relation the same.
Each row in the table is called a tuple and represenTuple ts a basic fact. The first
row, (CS101, 12345, A), represents the fact that the student with ID number 12345
got an A in the course CS101.
A table has two aspects:
1. The set of column names, and
2. The rows containing the information.
The term “relation” refers to the latter, that is, the set of rows. Each row represents
a tuple of the relation, and the order in which the rows appear in the table is
immaterial. No two rows of the same table may have identical values in all columns.
Relation scheme Item (1), the set of column names (attributes) is called the scheme of the
relation. The order in which the attributes appear in the scheme is immaterial, but
we need to know the correspondence between the attributes and the columns of
the table in order to write the tuples properly.
Databases
A collection of relations is called a database. The first thing we need to do when
designing a database for some application is to decide on how the information to
be stored should be arranged into tables. Design of a database, like all design
problems, is a matter of business needs and judgment. In an example to follow, we
shall expand our application of a registrar’s database involving courses, and thereby
expose some of the principles of good database design.
Some of the most powerful operations on a database involve the use of several
relations to represent coordinated types of data. By setting up appropriate data
structures, we can jump from one relation to another efficiently, and thus obtain
information from the database that we could not uncover from a single relation.
Queries on a Database
We saw in Chapter 7 some of the most important operations performed on relations
and functions; they were called insert, delete, and lookup, although their appropri-
ate meanings differed, depending on whether we were dealing with a dictionary, a
function, or a binary relation. There is a great variety of operations one can perform
on database relations, especially on combinations of two or more relation
1. insert(t,R). We add the tuple t to the relation R, if it is not already there.
This operation is in the same spirit as insert for dictionaries or binary relations.
2. delete(X,R). Here, X is intended to be a specification of some tuples. It
consists of components for each of the attributes of R, and each component
can be either
a) A value, or
b) The symbol ∗, which means that any value is acceptable.
The effect of this operation is to delete all tuples that match the specification
X. For example, if we cancel CS101, we want to delete all tuples of the
Course-Day-Hour
relation that have Course = “CS101.” We could express this condition by
delete(“CS101”, ∗, ∗), Course-Day-Hour_
That operation would delete the first three tuples of the relation in Fig. 8.2(c),
because their first components each are the same value as the first component
of the specification, and their second and third components all match ∗, as any
values do.
3. lookup(X,R). The result of this operation is the set of tuples in R that match
the specification X; the latter is a symbolic tuple as described in the preceding
item (2). For example, if we wanted to know for what courses CS101 is a
prerequisite, we could ask
lookup(∗, “CS101”), Course-Prerequisite_
The result would be the set of two matching tuples
(CS120, CS101)
(CS205, CS101)
Navigation among Relations
Until now, we have considered only operations involving a single relation, such
as finding a tuple given values for one or more of its components. The power of
the relational model can be seen best when we consider operations that require
us to “navigate,” or jump from one relation to another. For example, we could
answer the query “What grade did the student with ID 12345 get in CS101?” by
working entirely within the Course-StudentId-Grade relation. But it would be more
natural to ask, “What grade did C. Brown get in CS101?” That query cannot be
answered within the Course-StudentId-Grade relation alone, because that relation
uses student ID’s, rather than names.
To answer the query, we must first consult the StudentId-Name-Address-Phone
relation and translate the name “C. Brown” into a student ID (or ID’s, since it is
possible that there are two or more students with the same name and different ID’s).
Then, for each such ID, we search the Course-StudentId-Grade relation for tuples
with this ID and with course component equal to “CS101.” From each such tuple
we can read the grade of some student named C. Brown in course CS101. Figure
8.7 suggests how this query connects given values to the relations and to the desired
answers.
History:
The relational model was invented by E.F. (Ted) Codd as a general model of data, and
subsequently maintained and developed by Chris Date and Hugh Darwen among others. In The
Third Manifesto (first published in 1995) Date and Darwen show how the relational model can
accommodate certain desired object oriented features.

Controversies:
Codd himself, some years after publication of his 1970 model, proposed a three-valued
logic (True, False, Missing or NULL) version of it to deal with missing information, and in his
The Relational Model for Database Management Version 2 (1990) he went a step further with a
four-valued logic (True, False, Missing but Applicable, Missing but Inapplicable) version. But
these have never been implemented, presumably because of attending complexity. SQL's NULL
construct was intended to be part of a three-valued logic system, but fell short of that due to
logical errors in the standard and in its implementations.
Introduction:
The Relation is the basic element in a relational data model.

A relation is subject to the following rules:


• Relation (file, table) is a two-dimensional table.
• Attribute (i.e. field or data item) is a column in the table.
• Each column in the table has a unique name within that table.
• Each column is homogeneous. Thus the entries in any column are all of the same type (e.g.
age, name, employee-number, etc).
• Each column has a domain, the set of possible values that can appear in that column.
• A Tuple (i.e. record) is a row in the table.
• The order of the rows and columns is not important.
• Values of a row all relate to some thing or portion of a thing.
• Repeating groups (collections of logically related attributes that occur multiple times within
one record occurrence) are not allowed.
• Duplicate rows are not allowed (candidate keys are designed to prevent this).
• Cells must be single-valued (but can be variable length). Single valued means the following:
• Cannot contain multiple values such as 'A1,B2,C3'.
• Cannot contain combined values such as 'ABC-XYZ' where 'ABC' means one thing and
'XYZ' another.
Relationships:
One table (relation) may be linked with another in what is known as a relationship.
Relationships may be built into the database structure to facilitate the operation of relational joins
at runtime.
• A relationship is between two tables in what is known as a one-to-many or parent-child or
master-detail relationship where an occurrence on the 'one' or 'parent' or 'master' table may
have any number of associated occurrences on the 'many' or 'child' or 'detail' table. To
achieve this the child table must contain fields which link back the primary key on the parent
table. These fields on the child table are known as a foreign key, and the parent table is
referred to as the foreign table (from the viewpoint of the child).
• It is possible for a record on the parent table to exist without corresponding records on the
child table, but it should not be possible for an entry on the child table to exist without a
corresponding entry on the parent table.
• A table may be the subject of any number of relationships, and it may be the parent in some
and the child in others.
• Some database engines allow a parent table to be linked via a candidate key, but if this were
changed it could result in the link to the child table being broken.
• Some database engines allow relationships to be managed by rules known as referential
integrity or foreign key restraints. These will prevent entries on child tables from being
created if the foreign key does not exist on the parent table, or will deal with entries on child
tables when the entry on the parent table is updated or deleted.
• A relation may be expressed using the notation R(A1,A2,A3, ...An) where:
R = the name of the relation.
(A1,A2,A3, ...An) = the attributes within the relation.
A1 = the attribute(s) which form the primary key.

Relational Data Model:


The relational model has provided basis for:
• Research on theory of data/relationship/constraint
• Numerous database design methodologies
• The standard database access language SQL
• Almost all modern commercial database management systems use this model
• The relational data model describes the world as “a collection of inter-related relations (or
tables)”
Fundamental concepts in Relational Data Model:
1.Domain: A domain D is the original sets of atomic values used to model data. By
atomic, we mean that each value in the domain is indivisible as far as the relational model is
concerned. For example:
• The domain of day shift is the set of all possible days : {Mon, Tue, Wed…}
• The domain of salary is the set of all floating-point numbers greater than 0 and less than
200,000 (say).
• The domain of name is the set of character strings that represents names of person
2.Relation (Relation state): A relation is a subset of the Cartesian product of a list of
domains characterised by a name. Given n domains denoted by D1, D2, …, Dn , R is a relation
defined on these domains if R⊆D1×D2×...×Dn. Relation can be viewed as a “table”. In that table,
each row represents a tuple of data values and each column represents an attribute.
3.Attribute: A column of a relation designated by name. The name associated should be
meaningful. Each attributes associates with a domain.
4. A relation schema denoted by R is a list of attributes (A1, A2, …, An). The degree of
the relation is the number of attributes of its relation schema. The cardinality of the relation is the
number of tuples in the relation.
Example of relation, relation schema and attribute:
STUDENT is Relation Name
Roll No, Name, Birthdate, Semester, Department are called Attributes (or Columns)
Each row of the tables is called Tuple (or Row/Record)
STUDENT

Roll No. Name Birthda Semester Department


y
Relation

Tuple

Characteristic of relations:
• Ordering of Tuples in a relation: A tuple is a set of values. A relation is a set of tuples. Since
a relation is a set, there is no ordering on rows.
• Ordering of Values within a tuple: The order of attributes and their values within a relation is
not important as long as the correspondence between attributes and values is maintained.
Thus the following is a different representation of the above EMPLOYEE relation.
STUDENT

Roll No. Name Birthday Semester


Department

100001 Samavia 14 Feb 5th


Fine Arts

100002 DX 13 Feb 5th


Law

• Values and NULL values in the tuple: Each value in a tuple is atomic. That means each value
cannot be divided into smaller components. Hence, the composite and multivalued attributes
are not allowed in a relation.
Constraints in Relational Data:
Constraint is a very important feature in relational model. In fact, relational model
support a well-defined theory of constraint on attributes or tables. Constraint is useful because it
allows designer to specify the semantics of data in database and it is the rules to enforce DBMSs
to check that new data satisfies the semantics.
Integrity constraint:
Relation allows us to represent data and association. Domain restricts the values of
attributes in the relation and it is a constraint of relational model. However, there are real world
semantics on data that cannot specifies if use only domain. We need more specific way to state
what data values are not allows, what format is suitable for an attributes. For example, Student
number must be unique, students’ age is in the range e.g 20-30 years.
Such information is provided in logical statements called integrity constraints. There are
several kinds of integrity constraints:
1. Key constraint:
A relation is a set of tuples. By definition, all elements in a set are distinct hence all tuple
in a relation must be distinct. In relational model, tuples have no identity like object
identification. Tuple identity is totally value based. Therefore, we need key constraint that is the
way of uniquely identify a tuple. Given a relation schema R with U is the list of attributes, there
are a set K which is a subset of U. If in a relation R of E with any two distinct tuples t1 and t2 we
have the constraint that t1[K] ≠ t2[K] then K is called a superkey of the relation schema R. A
superkey that have no reduntant attributes is called a candidate key.
Since a relation schema may have more than one candidate key thus there is a chosen
candidate key whose values are used to uniquely identify tuples in the relation. Such key is
primary key. Primary key is usually the most simple candidate key (i.e. key with single attribute
or small number of attributes)
2. Entity constraint:
No attribute in the primary key can be NULL. This is because, NULL values for the
primary key means we cannot identify some tuples. For example, in the EMPLOYEE relation
showed above, CellPhone cannot be a key since we cannot use this attribute to identify
employees 20012322 and employee 19991323.
3. Referential constraint: The constraint that is specified between two relations and maintain
the correspondence between tuples in these relations. It means the reference from a tuple in one
relation to other relation must be valid.
Example of Referential integrity constraint:
In the Bank Database (From Data Modelling lecture) : The ACCOUNT relation need to
take note the BRANCH where each account is held so in implementation, in each tuple of
ACCOUNT relation, there is an attribute such as branchname to identify the associate
BRANCH. The referential integrity constraint must state that the branchname attribute in the
ACCOUNT relation refer to a valid branch (i.e. existing branch).
Referential constraint in relational model relates to notation of foreign key.
A set of attributes FK in a relation schema R1 is foreign key if
The attributes in FK correspond to the attributes in the primary key of another relation schema
R2.
The value for FK in each tuple of R1 either occur as values of primary key of a tuple in R2 or is
entirely NULL.
In a database of many relations, there are usually many foreign keys. They provide the “glue”
that links individual relations into a cohesive database structure.
Semantic constraints: This is a special kind of constraints that may have to enforce in relational
database. Such constraints describe the semantics of data in the database or sometimes called the
rules on data. For example, in the COMPANY database, we have the rule “ An employee cannot
take a part in more than 5 projects” or “Salary of an employee cannot exceed the salary of the
employee’s manager”.
Functional Dependency constraints: This constraints establishes a functional relationship
among two sets of attributes.
Relational Database:
Relations, keys, foreign keys and integrity constraints provide a complete toolkit for
building relational databases. A relational database consists of many relations and tuples in
relations are related in various ways. Here, we will define relational database schema and
relational database instance.
A relational database schema is:
• A set of relation schemas S = {R1, R2, … , Rn} , and
• A set of integrity constraints
• A relational database instance is:
• A set of relations (relation states) {r1(R1) , r2(R2) , … , rn(Rn) } where all of the integrity
constraints are satisfied.
Constraint Checking:
Relational database instance is changing over time. At a moment of time, we can have an
instance that satisfied all the constraints but when some update operations performs, we must re-
check the constraints. There are three basic update operations on relations: insert a new record,
delete an existing record and modify an existing record.

ACCOUNT
branchName balance accountNumber
HaThanh 20000 C-12894349
DongDo 20000 C-12894350
DongDo 3500 S-141510751
HaThanh 50000 S-520522620
CUSTOMER
customerNumber Name address homeBranch
111111 Anh Hai Ba Trung HaThanh
121314 Van Anh Hai Ba Trung Dong Do
515016 Son Hoan Kiem HaThanh ACCOUNT-HOLDER
customerNumber accountNumber
111111 C-12894349
Domain constraint checking: 121314 C-12894350
For insert operation, it is need to check 121314 S-141510751
attribute value for type and other domain BRANCH 515016 S-520522620
restrictions. branchName
111111 Address assets
C-12894350
For delete operation, it is no need to check HaThanh Hai Ba Trung 900000000
any domain constraints
DongDo Dong Da 400000000
For update operation, it is also need to
check attribute value for type and other domain ThangLong Hoan Kiem 500000000
restrictions.
The following changes satisfy domain constraints
Insert Account(HaThanh, 50000, S-20071280)
Insert Account(HaThan, 20000, C-20072242) ( it is looks ok but actually the data value is
not correct)
Update Account(HaThanh, 50000, S-20071280) to Account(HaThanh, 60000, S-
20071280)
The changes that do not satisfy domain constraints:
Insert Account(HaThanh, 5000USD, S-20071280)
Insert Account(DongDo, -20, C-12894349)
Update Account(HaThanh, 50000, S-34252525) to Account(60000, HaThanh, S-
34252525)
Key constraint checking:
For insert operation, it is need to check the key value does not occur in any existing tuple
in the relation.
For delete operation, it is no need to check any domain key constraints
For update operation, if the key value is modified then need the same check as for
insertion.
Changes that satisfy key constraints:
Insert Account(DongDo, 20000, C-12894350) (there is no account with that account
number in the current relation)
Insert Account-Holder(12334, C-12894350) (ok, but no such customer with number
12334)
Update Account(HaThanh, 50000, S-34252525) to Account(60000, HaThanh, S-
34252525) (key is not modified)
Changes that do not satisfied key constraints:
Insert Account(DongDo, 50000, C-12894350) (key is alredy there in the relation)
Update Account(DongDo, 20000, C-12894350) to Account(DongDo, 20000, C-
12894349) ( the account C-12894349 is already in the relation)
Referential integrity constraint checking:
For insert operation, it is need to check the foreign keys occur as primary keys in the
referenced relation.
For delete operation, check all relations that have foreign keys refering to this relation
An update need to treat as delete - then – insert for referential constraints checking.
Changes that satisfy referential constraint:
Insert Account(ThangLong, 5000, C-12891230)
Insert Account-Holder(111111, C-12891230)
Update Customer(515016, Son, Hoan Kiem, HaThanh) to Customer(515016, Son, Hoan
Kiem, ThangLong)
Delete Account-Holder(111111, C-12894350)
Changes that does not satisty referential constraint:
Insert Account-Holder(12334, C-12894350) ( no such customer)
Insert Customer(222222, Nha, DongDa, An Binh) ( no such branch)
Delete Customer with customerNumber = ‘111111’ ( this is not acceptable since there are
tuples in Account-Holder relation refer to this customer).
Deletion can violate referential constraint when the tuple being deleted is referenced by
the foreign keys from others tuples in a different relation. Several approaches are consider to
handle this kind of violation. The first approach is simply disallow the deletion. The second
approach user must find the refering tuple then either delete them manually or change their
foreign key to an acceptable value or NULL value ( not possible if the foreign key also forms
part of the primary key such as in the Account-Holder relation). The third approach: attempt to
remove all refering tuple automatically (cascade)
When the referential constraint is specified in the database during the creation phase, the
DBMSs will allow user to specify which of the above approach applies when a violation occur
Relational operations:
Users (or programs) request data from a relational database by sending it a query that is
written in a special language, usually a dialect of SQL. Although SQL was originally intended
for end-users, it is much more common for SQL queries to be embedded into software that
provides an easier user interface. Many web sites, such as Wikipedia, perform SQL queries when
generating pages. In response to a query, the database returns a result set, which is just a list of
rows containing the answers. The simplest query is just to return all the rows from a table, but
more often, the rows are filtered in some way to return just the answer wanted.
Often, data from multiple tables are combined into one, by doing a join. Conceptually,
this is done by taking all possible combinations of rows (the Cartesian product), and then
filtering out everything except the answer. In practice, relational database management systems
rewrite ("optimize") queries to perform faster, using a variety of techniques.
There are a number of relational operations in addition to join. These include project (the
process of eliminating some of the columns), restrict (the process of eliminating some of the
rows), union (a way of combining two tables with similar structures), difference (which lists the
rows in one table that are not found in the other), intersect (which lists the rows found in both
tables), and product (mentioned above, which combines each row of one table with each row of
the other). Depending on which other sources you consult, there are a number of other operators
- many of which can be defined in terms of those listed above. These include semi-join, outer
operators such as outer join and outer union, and various forms of division. Then there are
operators to rename columns, and summarizing or aggregating operators, and if you permit
relation values as attributes (RVA - relation-valued attribute), then operators such as group and
ungroup. The SELECT statement in SQL serves to handle all of these except for the group and
ungroup operators.
The flexibility of relational databases allows programmers to write queries that were not
anticipated by the database designers. As a result, relational databases can be used by multiple
applications in ways the original designers did not foresee, which is especially important for
databases that might be used for a long time (perhaps several decades). This has made the idea
and implementation of relational databases very popular with businesses.
SQL and the relational model:
SQL, initially pushed as the standard language for relational databases, deviates from the
relational model in several places. The current ISO SQL standard doesn't mention the relational
model or use relational terms or concepts. However, it is possible to create a database
conforming to the relational model using SQL if one does not use certain SQL features.
The following deviations from the relational model have been noted in SQL. Note that
few database servers implement the entire SQL standard and in particular do not allow some of
these deviations. Whereas NULL is ubiquitous, for example, allowing duplicate column names
within a table or anonymous columns is uncommon.
Duplicate rows:
The same row can appear more than once in an SQL table. The same tuple cannot appear
more than once in a relation.
Anonymous columns:
A column in an SQL table can be unnamed and thus unable to be referenced in
expressions. The relational model requires every attribute to be named and referenceable.
Duplicate column names:
Two or more columns of the same SQL table can have the same name and therefore
cannot be referenced, on account of the obvious ambiguity. The relational model requires every
attribute to be referenceable.
Column order significance:
The order of columns in an SQL table is defined and significant, one consequence being
that SQL's implementations of Cartesian product and union are both noncommutative. The
relational model requires there to be no significance to any ordering of the attributes of a
relation.
Views without CHECK OPTION:
Updates to a view defined without CHECK OPTION can be accepted but the resulting
update to the database does not necessarily have the expressed effect on its target. For example,
an invocation of INSERT can be accepted but the inserted rows might not all appear in the view,
or an invocation of UPDATE can result in rows disappearing from the view. The relational
model requires updates to a view to have the same effect as if the view were a base relvar.
Columnless tables unrecognized:
SQL requires every table to have at least one column, but there are two relations of
degree zero (of cardinality one and zero) and they are needed to represent extensions of
predicates that contain no free variables.
NULL:
This special mark can appear instead of a value wherever a value can appear in SQL, in
particular in place of a column value in some row. The deviation from the relational model arises
from the fact that the implementation of this ad hoc concept in SQL involves the use of three-
valued logic, under which the comparison of NULL with itself does not yield true but instead
yields the third truth value, unknown; similarly the comparison NULL with something other than
itself does not yield false but instead yields unknown. It is because of this behaviour in
comparisons that NULL is described as a mark rather than a value. The relational model depends
on the law of excluded middle under which anything that is not true is false and anything that is
not false is true; it also requires every tuple in a relation body to have a value for every attribute
of that relation. This particular deviation is disputed by some if only because E.F. Codd himself
eventually advocated the use of special marks and a 4-valued logic, but this was based on his
observation that there are two distinct reasons why one might want to use a special mark in place
of a value, which led opponents of the use of such logics to discover more distinct reasons and at
least as many as 19 have been noted, which would require a 21-valued logic.[citation needed]
SQL itself uses NULL for several purposes other than to represent "value unknown". For
example, the sum of the empty set is NULL, meaning zero, the average of the empty set is
NULL, meaning undefined, and NULL appearing in the result of a LEFT JOIN can mean "no
value because there is no matching row in the right-hand operand".

Vous aimerez peut-être aussi