Académique Documents
Professionnel Documents
Culture Documents
One of the most important applications for computers is storing and managing
information. The manner in which information is organized can have a profound effect on how
easy it is to access and manage. Perhaps the simplest but most versatile way to organize
information is to store it in tables.
The relational model is centered on this idea: the organization of data into collections of
two-dimensional tables called “relations.” We can also think of the relational model as a
generalization of the set data model that we discussed in Chapter 7, extending binary relations to
relations of arbitrary arity. Originally, the relational data model was developed for databases —
that is, information stored over a long period of time in a cDatabase omputer system — and for
database management systems, the software that allows people to store, access, and modify this
information. Databases still provide us with important motivation for understanding the
relational data model. They are found today not only in their original, large-scale applications
such as airline reservation systems or banking sys- tems, but in desktop computers handling
individual activities such as maintaining expense records, homework grades, and many other
uses. Other kinds of software besides database systems can make good use of tables of
information as well, and the relational data model helps us design these tables and develop the
data structures that we need to access them efficiently. For example, such tables are used by
compilers to store information about the variables used in the program, keeping track of their
data type and of the functions for which they are defined.
Relations
Section 7.7 introduced the notion of a “relation” as a set of tuples. Each tuple of
a relation is a list of components, and each relation has a fixed arity, which is the
number of components each of its tuples has.
The columns of the table are given names, called attributes. Attribute In Fig. 8.1, the
attributes are Course, StudentId, and Grade.
Controversies:
Codd himself, some years after publication of his 1970 model, proposed a three-valued
logic (True, False, Missing or NULL) version of it to deal with missing information, and in his
The Relational Model for Database Management Version 2 (1990) he went a step further with a
four-valued logic (True, False, Missing but Applicable, Missing but Inapplicable) version. But
these have never been implemented, presumably because of attending complexity. SQL's NULL
construct was intended to be part of a three-valued logic system, but fell short of that due to
logical errors in the standard and in its implementations.
Introduction:
The Relation is the basic element in a relational data model.
Tuple
Characteristic of relations:
• Ordering of Tuples in a relation: A tuple is a set of values. A relation is a set of tuples. Since
a relation is a set, there is no ordering on rows.
• Ordering of Values within a tuple: The order of attributes and their values within a relation is
not important as long as the correspondence between attributes and values is maintained.
Thus the following is a different representation of the above EMPLOYEE relation.
STUDENT
• Values and NULL values in the tuple: Each value in a tuple is atomic. That means each value
cannot be divided into smaller components. Hence, the composite and multivalued attributes
are not allowed in a relation.
Constraints in Relational Data:
Constraint is a very important feature in relational model. In fact, relational model
support a well-defined theory of constraint on attributes or tables. Constraint is useful because it
allows designer to specify the semantics of data in database and it is the rules to enforce DBMSs
to check that new data satisfies the semantics.
Integrity constraint:
Relation allows us to represent data and association. Domain restricts the values of
attributes in the relation and it is a constraint of relational model. However, there are real world
semantics on data that cannot specifies if use only domain. We need more specific way to state
what data values are not allows, what format is suitable for an attributes. For example, Student
number must be unique, students’ age is in the range e.g 20-30 years.
Such information is provided in logical statements called integrity constraints. There are
several kinds of integrity constraints:
1. Key constraint:
A relation is a set of tuples. By definition, all elements in a set are distinct hence all tuple
in a relation must be distinct. In relational model, tuples have no identity like object
identification. Tuple identity is totally value based. Therefore, we need key constraint that is the
way of uniquely identify a tuple. Given a relation schema R with U is the list of attributes, there
are a set K which is a subset of U. If in a relation R of E with any two distinct tuples t1 and t2 we
have the constraint that t1[K] ≠ t2[K] then K is called a superkey of the relation schema R. A
superkey that have no reduntant attributes is called a candidate key.
Since a relation schema may have more than one candidate key thus there is a chosen
candidate key whose values are used to uniquely identify tuples in the relation. Such key is
primary key. Primary key is usually the most simple candidate key (i.e. key with single attribute
or small number of attributes)
2. Entity constraint:
No attribute in the primary key can be NULL. This is because, NULL values for the
primary key means we cannot identify some tuples. For example, in the EMPLOYEE relation
showed above, CellPhone cannot be a key since we cannot use this attribute to identify
employees 20012322 and employee 19991323.
3. Referential constraint: The constraint that is specified between two relations and maintain
the correspondence between tuples in these relations. It means the reference from a tuple in one
relation to other relation must be valid.
Example of Referential integrity constraint:
In the Bank Database (From Data Modelling lecture) : The ACCOUNT relation need to
take note the BRANCH where each account is held so in implementation, in each tuple of
ACCOUNT relation, there is an attribute such as branchname to identify the associate
BRANCH. The referential integrity constraint must state that the branchname attribute in the
ACCOUNT relation refer to a valid branch (i.e. existing branch).
Referential constraint in relational model relates to notation of foreign key.
A set of attributes FK in a relation schema R1 is foreign key if
The attributes in FK correspond to the attributes in the primary key of another relation schema
R2.
The value for FK in each tuple of R1 either occur as values of primary key of a tuple in R2 or is
entirely NULL.
In a database of many relations, there are usually many foreign keys. They provide the “glue”
that links individual relations into a cohesive database structure.
Semantic constraints: This is a special kind of constraints that may have to enforce in relational
database. Such constraints describe the semantics of data in the database or sometimes called the
rules on data. For example, in the COMPANY database, we have the rule “ An employee cannot
take a part in more than 5 projects” or “Salary of an employee cannot exceed the salary of the
employee’s manager”.
Functional Dependency constraints: This constraints establishes a functional relationship
among two sets of attributes.
Relational Database:
Relations, keys, foreign keys and integrity constraints provide a complete toolkit for
building relational databases. A relational database consists of many relations and tuples in
relations are related in various ways. Here, we will define relational database schema and
relational database instance.
A relational database schema is:
• A set of relation schemas S = {R1, R2, … , Rn} , and
• A set of integrity constraints
• A relational database instance is:
• A set of relations (relation states) {r1(R1) , r2(R2) , … , rn(Rn) } where all of the integrity
constraints are satisfied.
Constraint Checking:
Relational database instance is changing over time. At a moment of time, we can have an
instance that satisfied all the constraints but when some update operations performs, we must re-
check the constraints. There are three basic update operations on relations: insert a new record,
delete an existing record and modify an existing record.
ACCOUNT
branchName balance accountNumber
HaThanh 20000 C-12894349
DongDo 20000 C-12894350
DongDo 3500 S-141510751
HaThanh 50000 S-520522620
CUSTOMER
customerNumber Name address homeBranch
111111 Anh Hai Ba Trung HaThanh
121314 Van Anh Hai Ba Trung Dong Do
515016 Son Hoan Kiem HaThanh ACCOUNT-HOLDER
customerNumber accountNumber
111111 C-12894349
Domain constraint checking: 121314 C-12894350
For insert operation, it is need to check 121314 S-141510751
attribute value for type and other domain BRANCH 515016 S-520522620
restrictions. branchName
111111 Address assets
C-12894350
For delete operation, it is no need to check HaThanh Hai Ba Trung 900000000
any domain constraints
DongDo Dong Da 400000000
For update operation, it is also need to
check attribute value for type and other domain ThangLong Hoan Kiem 500000000
restrictions.
The following changes satisfy domain constraints
Insert Account(HaThanh, 50000, S-20071280)
Insert Account(HaThan, 20000, C-20072242) ( it is looks ok but actually the data value is
not correct)
Update Account(HaThanh, 50000, S-20071280) to Account(HaThanh, 60000, S-
20071280)
The changes that do not satisfy domain constraints:
Insert Account(HaThanh, 5000USD, S-20071280)
Insert Account(DongDo, -20, C-12894349)
Update Account(HaThanh, 50000, S-34252525) to Account(60000, HaThanh, S-
34252525)
Key constraint checking:
For insert operation, it is need to check the key value does not occur in any existing tuple
in the relation.
For delete operation, it is no need to check any domain key constraints
For update operation, if the key value is modified then need the same check as for
insertion.
Changes that satisfy key constraints:
Insert Account(DongDo, 20000, C-12894350) (there is no account with that account
number in the current relation)
Insert Account-Holder(12334, C-12894350) (ok, but no such customer with number
12334)
Update Account(HaThanh, 50000, S-34252525) to Account(60000, HaThanh, S-
34252525) (key is not modified)
Changes that do not satisfied key constraints:
Insert Account(DongDo, 50000, C-12894350) (key is alredy there in the relation)
Update Account(DongDo, 20000, C-12894350) to Account(DongDo, 20000, C-
12894349) ( the account C-12894349 is already in the relation)
Referential integrity constraint checking:
For insert operation, it is need to check the foreign keys occur as primary keys in the
referenced relation.
For delete operation, check all relations that have foreign keys refering to this relation
An update need to treat as delete - then – insert for referential constraints checking.
Changes that satisfy referential constraint:
Insert Account(ThangLong, 5000, C-12891230)
Insert Account-Holder(111111, C-12891230)
Update Customer(515016, Son, Hoan Kiem, HaThanh) to Customer(515016, Son, Hoan
Kiem, ThangLong)
Delete Account-Holder(111111, C-12894350)
Changes that does not satisty referential constraint:
Insert Account-Holder(12334, C-12894350) ( no such customer)
Insert Customer(222222, Nha, DongDa, An Binh) ( no such branch)
Delete Customer with customerNumber = ‘111111’ ( this is not acceptable since there are
tuples in Account-Holder relation refer to this customer).
Deletion can violate referential constraint when the tuple being deleted is referenced by
the foreign keys from others tuples in a different relation. Several approaches are consider to
handle this kind of violation. The first approach is simply disallow the deletion. The second
approach user must find the refering tuple then either delete them manually or change their
foreign key to an acceptable value or NULL value ( not possible if the foreign key also forms
part of the primary key such as in the Account-Holder relation). The third approach: attempt to
remove all refering tuple automatically (cascade)
When the referential constraint is specified in the database during the creation phase, the
DBMSs will allow user to specify which of the above approach applies when a violation occur
Relational operations:
Users (or programs) request data from a relational database by sending it a query that is
written in a special language, usually a dialect of SQL. Although SQL was originally intended
for end-users, it is much more common for SQL queries to be embedded into software that
provides an easier user interface. Many web sites, such as Wikipedia, perform SQL queries when
generating pages. In response to a query, the database returns a result set, which is just a list of
rows containing the answers. The simplest query is just to return all the rows from a table, but
more often, the rows are filtered in some way to return just the answer wanted.
Often, data from multiple tables are combined into one, by doing a join. Conceptually,
this is done by taking all possible combinations of rows (the Cartesian product), and then
filtering out everything except the answer. In practice, relational database management systems
rewrite ("optimize") queries to perform faster, using a variety of techniques.
There are a number of relational operations in addition to join. These include project (the
process of eliminating some of the columns), restrict (the process of eliminating some of the
rows), union (a way of combining two tables with similar structures), difference (which lists the
rows in one table that are not found in the other), intersect (which lists the rows found in both
tables), and product (mentioned above, which combines each row of one table with each row of
the other). Depending on which other sources you consult, there are a number of other operators
- many of which can be defined in terms of those listed above. These include semi-join, outer
operators such as outer join and outer union, and various forms of division. Then there are
operators to rename columns, and summarizing or aggregating operators, and if you permit
relation values as attributes (RVA - relation-valued attribute), then operators such as group and
ungroup. The SELECT statement in SQL serves to handle all of these except for the group and
ungroup operators.
The flexibility of relational databases allows programmers to write queries that were not
anticipated by the database designers. As a result, relational databases can be used by multiple
applications in ways the original designers did not foresee, which is especially important for
databases that might be used for a long time (perhaps several decades). This has made the idea
and implementation of relational databases very popular with businesses.
SQL and the relational model:
SQL, initially pushed as the standard language for relational databases, deviates from the
relational model in several places. The current ISO SQL standard doesn't mention the relational
model or use relational terms or concepts. However, it is possible to create a database
conforming to the relational model using SQL if one does not use certain SQL features.
The following deviations from the relational model have been noted in SQL. Note that
few database servers implement the entire SQL standard and in particular do not allow some of
these deviations. Whereas NULL is ubiquitous, for example, allowing duplicate column names
within a table or anonymous columns is uncommon.
Duplicate rows:
The same row can appear more than once in an SQL table. The same tuple cannot appear
more than once in a relation.
Anonymous columns:
A column in an SQL table can be unnamed and thus unable to be referenced in
expressions. The relational model requires every attribute to be named and referenceable.
Duplicate column names:
Two or more columns of the same SQL table can have the same name and therefore
cannot be referenced, on account of the obvious ambiguity. The relational model requires every
attribute to be referenceable.
Column order significance:
The order of columns in an SQL table is defined and significant, one consequence being
that SQL's implementations of Cartesian product and union are both noncommutative. The
relational model requires there to be no significance to any ordering of the attributes of a
relation.
Views without CHECK OPTION:
Updates to a view defined without CHECK OPTION can be accepted but the resulting
update to the database does not necessarily have the expressed effect on its target. For example,
an invocation of INSERT can be accepted but the inserted rows might not all appear in the view,
or an invocation of UPDATE can result in rows disappearing from the view. The relational
model requires updates to a view to have the same effect as if the view were a base relvar.
Columnless tables unrecognized:
SQL requires every table to have at least one column, but there are two relations of
degree zero (of cardinality one and zero) and they are needed to represent extensions of
predicates that contain no free variables.
NULL:
This special mark can appear instead of a value wherever a value can appear in SQL, in
particular in place of a column value in some row. The deviation from the relational model arises
from the fact that the implementation of this ad hoc concept in SQL involves the use of three-
valued logic, under which the comparison of NULL with itself does not yield true but instead
yields the third truth value, unknown; similarly the comparison NULL with something other than
itself does not yield false but instead yields unknown. It is because of this behaviour in
comparisons that NULL is described as a mark rather than a value. The relational model depends
on the law of excluded middle under which anything that is not true is false and anything that is
not false is true; it also requires every tuple in a relation body to have a value for every attribute
of that relation. This particular deviation is disputed by some if only because E.F. Codd himself
eventually advocated the use of special marks and a 4-valued logic, but this was based on his
observation that there are two distinct reasons why one might want to use a special mark in place
of a value, which led opponents of the use of such logics to discover more distinct reasons and at
least as many as 19 have been noted, which would require a 21-valued logic.[citation needed]
SQL itself uses NULL for several purposes other than to represent "value unknown". For
example, the sum of the empty set is NULL, meaning zero, the average of the empty set is
NULL, meaning undefined, and NULL appearing in the result of a LEFT JOIN can mean "no
value because there is no matching row in the right-hand operand".