Vous êtes sur la page 1sur 98

Chapter 5

SQL The Relational


Language
5 SQL The Relational Language
5.1 Introduction . . . . . . . . . . . . . . . . . . . .
5.2 Tabular Variables in SQL . . . . . . . . . . . .
5.2.1 Creation of Tables . . . . . . . . . . . .
5.3 Referential Integrity in SQL . . . . . . . . . . .
5.4 Basic Data Types . . . . . . . . . . . . . . . . .
5.4.1 String Domains . . . . . . . . . . . . . .
5.4.2 Numeric Domains . . . . . . . . . . . .
5.4.3 Special Domains . . . . . . . . . . . . .
5.4.4 Basic Domains Supported by ORACLE
5.5 SELECT Phrases . . . . . . . . . . . . . . . . .
5.6 The WHERE Option . . . . . . . . . . . . . . .
5.7 Union, Intersection, and Difference in SQL . . .
5.8 Table Product in SQL . . . . . . . . . . . . . .
5.9 Join in SQL . . . . . . . . . . . . . . . . . . . .
5.10 Sets and subqueries . . . . . . . . . . . . . . . .
5.11 Parametrized subqueries . . . . . . . . . . . . .
5.12 Subqueries and division . . . . . . . . . . . . .
5.13 Relational Completeness of SQL . . . . . . . .
5.14 Scalar Functions of SQL . . . . . . . . . . . . .
5.14.1 Numerical Functions . . . . . . . . . . .
5.14.2 String Functions . . . . . . . . . . . . .
5.14.3 Date functions . . . . . . . . . . . . . .
5.15 Aggregate Functions in SQL . . . . . . . . . . .
5.16 Sorting Results . . . . . . . . . . . . . . . . . .
5.17 The Group-by Option . . . . . . . . . . . . . .
5.17.1 The decode and case Functions . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

63
64
65
66
70
72
72
72
73
73
75
77
82
84
86
88
91
93
95
96
96
97
100
102
105
107
111

66

SQL The Relational Language


5.17.2 The rollup and cube Extensions of group
5.18 Analytical Capabilities of SQL Plus . . . . . . .
5.18.1 Ranking Functions . . . . . . . . . . . .
5.18.2 Top-n Queries . . . . . . . . . . . . . . .
5.18.3 Windowing functions in SQL Plus . . . .
5.19 Statistics in SQL . . . . . . . . . . . . . . . . .
5.19.1 Variance and Correlation . . . . . . . .
5.19.2 Linear Regression . . . . . . . . . . . . .
5.20 Graphs and SQL in SQL Plus . . . . . . . . . .
5.21 Updates . . . . . . . . . . . . . . . . . . . . . .
5.22 Access Rights . . . . . . . . . . . . . . . . . . .
5.23 Views in SQL . . . . . . . . . . . . . . . . . . .
5.24 Accessing metadata in SQLPlus . . . . . . . . .
5.25 Exercises . . . . . . . . . . . . . . . . . . . . .
5.26 Bibliographical Comments . . . . . . . . . . . .

5.1

by
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

114
124
125
129
131
132
132
136
138
144
146
147
151
152
155

Introduction

SQL is an acronym for Structured Query Language and is the name of the most
important tool for defining and manipulating relational databases. The development of SQL began in the mid-1970s at the IBM San Jose Research Laboratory.
The success of an experimental IBM database system (known as System R) that
incorporated SQL compelled a number of software manufacturers to join IBM
in developing relational database systems that incorporated SQL. In 1982, the
American National Standards Institute (ANSI) initiated the development of a
standard for a query language for relational database systems, it opted for SQL
as its prototype. The resulting ANSI standard, issued in 1986, was adopted as
an International Standard by the International Organization for Standardization
(ISO) in 1987.
In the late 1980s, embedded SQL was standardized by ANSI, and work on
expanding SQL continues. A much extended version of the original standard,
known as SQL92, was adopted by ISO/IEC at the end of 1992. To reflect current trends in the database field towards object-relational technology, a new
standard ISO/IEC 9075-1, known as SQL99, was published in July 1999. As
we shall see, SQL99 is a superset of SQL92. New features incorporated by this
standard include object-relational extensions (user-defined data types, reference
types, collections, large object support, table hierarchies), active database features (triggers), stored procedures and functions, on-line analytic processing
extensions, etc. More recently, in 2003, a new standard was issued. This new
edition of the standard includes a new chapter that deals with the interaction
between SQL and XML (which we discuss in Chapter 10), correction to SQL99,
and several new features.
Our presentation concentrates initially on common SQL features, applicable
to a wide range of SQL implementations.

5.2 Tabular Variables in SQL

67

SQL is a nonprocedural language. This means that a query formulated in


SQL need not specify how a problem is to be solved nor how data should be
accessed by the computing system; instead, an SQL query states what the query
is, i.e., what data are sought.
This leaves the user free to focus on the logic of the query. Because the
DBMS makes use of its internal knowledge, in most cases, the DBMS generates
retrieval procedures that are faster than equivalent retrieval procedures built
directly by the user.
The SQL language consists of three components: the data definition language (DDL), the data manipulation language (DML), and the data control
language (DCL). The first component allows the user to define the structure of
the tables of the database. The second contains retrieval and update directives.
The last component allows the database administrator to define the access rights
to the database for various categories of users.
SQL syntax is format-free: tabs, carriage returns, and spaces can be included
anywhere a space occurs in the definition of an SQL construct. Also, case is
insignificant in table names, reserved words and keywords. However, case is
significant in character string literals.

5.2

Tabular Variables in SQL

When we introduced tables in Chapter 3, we assumed that the contents of a table


is a relation, that is, it is a set of tuples. To conform to the reality of databases
we need to define the content of a table as a sequence of tuples. Thus, a table
may contain several copies of the same tuple. If a table is allowed to contain
duplicates, then even if we know all components of a tuple, we may be unable
to identify the corresponding row in the table uniquely. As a consequence, not
every table has a key.
In this section we present a topic that we refer to informally as table creation. In reality, we create an object similar to a variable in a programming
language that we call a tabular variable. The values of a tabular variables are
tables and these values change in time. Tabular variables are created using the
construction create table.
Example 5.2.1 To create a tabular variable called PATRONS having the heading
name addr city zip telno date of birth
we write:
create table PATRONS (name varchar(35) not null,
addr varchar(50),
city varchar(25),
zip char(9),
telno char(12),
date_of_birth date);

As we shall see, each attribute is followed by a description of its domain. The

68

SQL The Relational Language

effect of this command is to create a tabular variable whose initial value is a


table whose contents is the empty set of tuples:
name

addr

city

PATRONS
zip

telno

date of birth

After inserting a first row, the next value of the tabular variable PATRONS
is the table:
name
Ann Richards

addr
56 Green Ln

PATRONS
city
zip
Natick
02170

telno
508-561-0987

date of birth
02/15/78

A second insertion yields a new table as the value for the tabular variable:
name
Ann Richards
Ron Scott

addr
56 Green Ln
50 Cider Hill

PATRONS
city
zip
Natick
01170
Framingham
01160

telno
508-561-0987
608-663-0211

date of birth
02/15/78
11/4/80

If the first patron moves to a new address, the first row is modified and the
tabular variable assumes a third value:
name
Ann Richards
Ron Scott

addr
77 Lake St.
50 Cider Hill

PATRONS
city
zip
Milton
02186
Framingham
02160

telno
617-364-0606
608-663-0211

date of birth
02/15/78
11/4/80

The values that the tabular variable PATRONS may assume are the actual
tables that have the name and the heading specified at the creation of the
tabular variable. In addition, we can specify several types of constraints that
any value of the tabular variable must satisfy.
Before it is possible to create tabular variables and form queries, it is necessary to create an empty database in which to work. In practice, this is generally
done at the level of the operating system, usually with a command that is provided by the vendor of the DBMS.
To start, we assume that we have created an empty database. In this section
we begin to discuss a part of the data definition component of SQL, namely, the
creation of tabular variables, or informally, the creation of database tables.

5.2.1

Table Creation

We refer to the components of the Data Definition Language (DDL) as directives.


The SQL directive for adding tables to a database is create table.
At a minimum, as we saw in Example 5.2.1, creating a tabular variable
in SQL requires that we specify its name and its attributes along with their
domains. The syntax for this is:
create table table name
[(attr def {,attr def })],
where the attribute definition attr def has the syntax:
attribute name domain

5.2 Tabular Variables in SQL

69

A slightly more general form (that ignores certain details related to the
physical design of databases), the directive that creates a tabular variable is
create table and has the form: following syntax:
create table [schema.]table name
[(hattr def | table constraint | table ref clause i
{,hattr def | table constraint | table ref clause i})],
where the attribute definition attr def has the syntax:
attribute name domain [default expr] [column ref clause]{column constraint }
As a result of the execution of this directive, an initial amount of space
is reserved in secondary memory to accommodate future values of the tabular
variable, and the metadata are modified to reflect the addition of the new tabular
variable. Specialized SQL constructions, discussed later (insert, delete, and
update) can be used to modify the value of this variable.
Creation of tabular variables permits placing restrictions, called constraints
on the contents of any value that the tabular variable may assume. The constraints that follow have a global character (which means that they apply to
the contents of a table in its entirety) and apply to any value that the tabular
variable may assume.
Definition 5.2.2 A primary key constraint has the form
[constraint constraint name] primary key(list of attributes)
when the primary key consists of the attributes of the list.
Alternate keys of tables can be specified using unique constraints. The syntax
of this type of constraints is:
[constraint constraint name] unique(list of attributes)
This indicates that no two rows of a table that is a value of the tabular variable
may have the same values for the attributes specified in the list.
A constraint of the form cC that involves conditions C that are a Boolean
combination of conditions involving only components of tuples and constants is
denoted by:
[constraint constraint name] check(C)
When a constraint involves more than one attribute it is considered a table
constraint ; otherwise, it is a column constraint. Referential integrity can be
imposed by using the column constraint references in the definition of an
attribute. To prevent certain components of tuples from assuming a null value
we can impose the column constraint not null.
Example 5.2.3 To create the tabular variable INSTRUCTORS of the college
database we use the following create table directive:
create table INSTRUCTORS(empno varchar(11) not null,
name varchar(35),
rank varchar(25),
roomno integer,
telno varchar(4), primary key(empno));

70

SQL The Relational Language

The domain of empno is defined to be the set of strings of length at most


11. In addition, we have the column constraint not null, which means that
null cannot be used as a value of the attribute empno. The domains of the
other attributes have similar, obvious definitions that are discussed below. Note
that in the definition of INSTRUCTORS we impose a table constraint, namely
primary key(empno).
Similarly, the tabular variables STUDENTS and COURSES are created by:
create table STUDENTS(stno varchar2(10) not null,
name varchar2(35) not null,
addr varchar2(35),
city varchar2(20),
state varchar2(2),
zip varchar2(10), primary key(stno));
create table COURSES(cno varchar2(5) not null,
cname varchar2(30),
cr smallint, primary key(cno));

A script that creates all tabular variables of the college database is contained
in Appendix A.
Example 5.2.4 To express that the primary key of the table GRADES consists
of the attributes stno cno sem year we can say that this table satisfies the primary
key constraint:
constraint pkg primary key (stno, cno, sem, year)

Example 5.2.5 For the table EMPHIST, introduced in Example 3.3.5 we could
introduce the tuple conditions:
constraint pos_sal check(salary > 0)

and
constraint suf_sal check(position != Programmer or salary > 65000),

respectively. They express that the salary must be a positive number and that
somebody who is a programmer must be paid more than 65000 dollars, respectively.
Thus, the creation of the table EMPHIST can be achieved by:
create table EMPHIST(empno integer not null references PERSINFO(empno),
position varchar2(30),
dept varchar2(20),
appt_date date,
term_date date,
salary float,
check(position != Programmer or salary > 65000),
constraint pos_sal check(salary > 0));

A script that creates the tables PERSINFO, EMPHIST, and REPORTING is contained in Appendix C.

5.2 Tabular Variables in SQL

71

Example 5.2.6 In the directives enclosed below we state that stno is both a
foreign key for ADVISING and, also, its primary key. In addition, empno is a
foreign key for this table (being the primary key for the table INSTRUCTORS).
create table ADVISING(stno varchar2(10) not null
references STUDENTS(stno),
empno varchar2(11)
references INSTRUCTORS(empno),
primary key(stno));
create table GRADES(stno varchar2(10)
not null references STUDENTS(stno),
empno varchar2(11)
not null references INSTRUCTORS(empno),
cno varchar2(5)
not null references COURSES(cno),
sem varchar2(6) not null,
year smallint not null,
grade integer,
primary key(stno,cno,sem,year),
check (grade <= 100));

The definition of the tabular variable GRADES specifies referential integrity constraints for each of the attributes stno, empno,cno. In addition, this designates
the set of attributes stno,cno,sem,year as the primary key of GRADES and, also,
imposes the constraint grade < 100.
To remove the tabular variable T we use the construct
drop table T
Rows can be inserted in a table individually, as we show below, or as they
are produced by a select phrase (as we shall see later). To insert a row in a
table T whose heading is A1 An we write in SQL a directive of the form:
insert into T (A1 , . . . , An )
values (a1 , . . . , an );
For example, to insert the row
(1011,Edwards P. David,10 Red Rd.,Newton,MA,02159)
into the table STUDENTS we write:
insert into STUDENTS(stno,name,addr,city,state,zip)
values (1011,Edwards P. David,10 Red Rd.,Newton,MA,02159);

It is possible to insert tuples in the database starting from text files by using
a special utility or ORACLE known as the SQL*Loader. Details are provided
in Appendix D.
To delete a row specified by a certain condition we can use the construct
delete. For example, to remove the row of the table STUDENTS that corresponds to the student having student number 1011 we write:
delete from STUDENTS
where stno = 1011;

72

SQL The Relational Language

If you wish to examine the headings of the tables you created you can issue,
for example, the SQL Plus directive
describe INSTRUCTOR;

Then, SQL will print:


Name
Null?
-------------------------- -------EMPNO
NOT NULL
NAME
RANK
ROOMNO
TELNO

Type
-----------VARCHAR2(11)
VARCHAR2(35)
VARCHAR2(25)
NUMBER(38)
VARCHAR2(4)

The directive alter table is used for modifying the structure of an existing
table. Columns may be added or dropped, the names of the columns or their
data types can be modified, etc. A simplified syntax of this directive is:
alter table table name modification specification
In turn, the modification specification depends on the particular change we need
to impose on the table. Examples of such modification specifications include
add column name column type,
drop column name,
modify column name column type,
rename column name to new column name,
as well as many other choices.
Example 5.2.7 To add a new year column to the table ADVISING we use the
directive:
alter table advising add year varchar2(4);

The entries of the new column year will have initially null values.
Column types can be modified using the modify option. For instance, to
increase the maximum length of the values of stno to 12 characters we write:
alter table advising modify stno varchar(12);

Column renaming is executed using the option rename column. Below we


rename the column stno to studentno:
alter table advising rename column stno to studentno;

Finally, to drop the column year that we just added we write:


alter table advising drop column year;

5.3

Referential Integrity in SQL

We saw that referential integrity can be imposed in SQL using the column
constraint references. An alternative method is to impose the table constraint
foreign key. Its syntax is:

5.3 Referential Integrity in SQL

73

foreign key(attr def {,attr def })


references table name ((attr def {,attr def })
[on cascade delete]
The foreign key construction contains the option on cascade delete. The
role of this option is to define the behavior of the tables when deletions occur
in the table where the primary key occurs. Namely, when a row is removed
from the table containing the primary key and the clause on cascade delete is
specified, then all rows from the table that contains the corresponding foreign
key that match the removed row are also removed.
Example 5.3.1 Suppose that the tabular variable CITIES is created by:
create table CITIES (city varchar(40),
state char(2),
primary key (city,state));

A second tabular variable, STORES, records the stores that a retailer has in
the covered territory, and is created by
create table STORES (storeno integer not null,
address varchar(40) not null,
city varchar(40),
state char(2),
tel char(12),
primary key storeno,
foreign key(city,state) references CITIES(city,state)
on delete cascade);

To populate the tables we execute the following directives:


insert
insert
insert
insert
insert

into
into
into
into
into

CITIES(city,
CITIES(city,
CITIES(city,
CITIES(city,
CITIES(city,

state)
state)
state)
state)
state)

values(Boston,MA);
values(Spingfield,MA);
values(Providence,RI);
values(Hartford,CT);
values(Bayonne,NJ);

insert into STORES(storeno, addr, city, state, tel)


values(1,125 Harvard St.,Boston,MA,617-287-0991);
insert into STORES(storeno, addr, city, state, tel)
values(2,50 Storrow Drive,Boston,MA,617-566-7629);
insert into STORES(storeno, addr, city, state, tel)
values(3,85 Manton Av.,Providence,RI,401-453-1234);
insert into STORES(storeno, addr, city, state, tel)
values(4,40 West Street,Hartford,CT,860-232-4484);
insert into STORES(storeno, addr, city, state, tel)
values(5,5 Finley Av.,Bayonne,NJ,908-221-0094);
insert into STORES(storeno, addr, city, state, tel)

74

SQL The Relational Language

values(6,10 Linton Plaza,Hartford,CT,860-660-2220);


insert into STORES(storeno, addr, city, state, tel)
values(7,30 Stilson Rd.,Providence,RI,401-861-5249);

The values of the tabular variables CITIES and STORES are


CITY
ST
--------------Boston
MA
Spingfield
MA
Providence
RI
Hartford
CT
Bayonne
NJ

and
STORENO ADDR
CITY
ST TEL
-----------------------------------------------------1
125 Harvard St.
Boston
MA 617-287-0991
2
50 Storrow Drive Boston
MA 617-566-7629
3
85 Manton Av.
Providence
RI 401-453-1234
4
40 West Street
Hartford
CT 860-232-4484
5
5 Finley Av.
Bayonne
NJ 908-221-0094
6
10 Linton Plaza
Hartford
CT 860-660-2220
7
30 Stilson Rd.
Providence
RI 401-861-5249

Since the referential integrity was imposed between the tabular variables
CITIES and STORES we need to insert the tuples of CITIES before we can insert
the tuples of STORES. Otherwise, the cities mentioned in the values of STORES
can not reference a city in a value of CITIES and the insertion in STORES will
be rejected.
The presence of on delete cascade means that if a row is removed from
a table CITIES that the rows corresponding to that city are also removed. For
example, if the company closes its business in Hartford and we execute
delete from CITIES where
city = Hartford and state = CT;

then the rows of STORES corresponding to the stores in Hartford, CT will be


deleted automatically.
Removal of the tabular variables is also constrained by the referential integrity. It would be impossible to remove the tabular city CITIES before we
remove the table STORES because STORES references CITIES. Thus, the correct order of removal is
drop table STORES;
drop table CITIES;

If the clause on cascade delete is absent, then the deletion of a row from
CITIES is impossible unless we delete first the rows of STORES that correspond
to the city that is removed from CITIES.

5.4 Basic Data Types

5.4

75

Basic Data Types

SQL makes use of a collection of domains that, in general, varies from one
implementation to another. Not all domains of the standard exist in every
implementation, and not all domains of implementations exist in the standard.
Basic domains supported by virtually all implementations of SQL can be
classified as string domains, numerical domains, and special domains.

5.4.1

String Domains

String domains represent fixed-length or variable-length sets of sequences of


characters. In this category, we have char(n), which represents the set of strings
of characters (from a given basic set of characters) that have fixed length n. Similarly, varchar(n) represents the set of variable-length strings whose maximal
length is n for n > 0.

5.4.2

Numeric Domains

The SQL standard prescribes two kinds of numeric domains: exact numeric data
types: numeric, decimal, integer and smallint, and approximate numeric
data types: float, double precision, and real. Their respective syntax is:
numeric [(p[, s])]
decimal [(p[, s])]
integer
smallint
float [(p)]
double precision
real
Here, p stands for precision and s stands for scale (both of which are nonnegative integers). The precision parameter refers to the total number of digits,
while the scale indicates the number of digits to the right of the decimal point.
The difference between numeric and decimal is that in the latter case, p is
understood to be the maximum number of digits, while in the former case, p is
the exact total number of digits.
The domains smallint and integer have a number of digits dependent on
the implementation; however, the precision of integer is required to be equal
to or larger than the precision of smallint.
The float domain includes approximate representations of real numbers having precision at least p. Also, real and double precision have implementationdependent precision, where the precision of double precision is never smaller
than the one of real.

5.4.3

Special Domains

Specific DBMSs have their own domains. For instance, ORACLE has the long
domain that contains strings of characters of variable length that may be as

76

SQL The Relational Language

large as 65,535 characters.


To allow us to begin working with actual examples as quickly as possible, we
introduce some basic domains for ORACLE. Other databases are quite similar,
and the reader can obtain the relevant details by consulting product-specific
manuals.

5.4.4

Basic Domains Supported by ORACLE

We review briefly a few of the more important domains supported by ORACLE:


In ORACLE, char[(n)] represents variable strings of characters of length
n, where 1 n 32767; the default value of n is 1. The domain character is the same as char. The characters and their order are determined
by the system during the installation of the DBMS.
The domain varchar(n) requires n to be specified and also represents
variable-length strings of characters. It is the intention of ORACLE to
separate char(n) from varchar(n) in future releases: char(n) will represent fixed-length strings while varchar(n) will represent variable-length
strings.
The varchar2 data type stores variable-length character strings and is
currently synonymous with the varchar data type. However, in a future
version of Oracle, varchar might store variable-length character strings
compared with different comparison semantics. Currently there are two
types of comparison semantics for strings in Oracle: blank-padded comparison semantics and non-padded comparison semantics.
When blank-padded comparison semantics is used, if the two values have
different lengths, Oracle first adds blanks to the end of the shorter one
so their lengths are equal. Oracle then compares the values character
by character up to the first character that differs. The value with the
greater character in the first differing position is considered greater. If
two values have no differing characters, then they are considered equal.
This rule means that two values are equal if they differ only in the number
of trailing blanks. Oracle uses blank-padded comparison semantics only
when both values in the comparison are either expressions of data type
char, text literals, or values returned by the user-defined function.
In the case of non-padded comparison semantics two values are compared
character by character up to the first character that differs. The value with
the greater character in that position is considered greater. If two values
of different length are identical up to the end of the shorter one, the longer
value is considered greater. If two values of equal length have no differing
characters, then the values are considered equal. Oracle uses non-padded
comparison semantics when one or both values in the comparison have the
data type varchar or varchar2.
In either of the two comparison semantics we have ab > aa and
ab > a . However, in the blank-padded comparison semantics we
have a = a, while in the non-padded semantics we have a > a.
The domain date represents dates in the format dd-mmm-yy.

5.5 SELECT Phrases

77

The domain long (also denoted by long varchar) represents variablelength strings of characters with no more than 65,535 characters. At most
one attribute may have this domain in any table.
The number domain in ORACLE can be used in several forms as specified
by the following syntax:
number [(p[, s])],
where p is the precision and s is the scale.
The maximum precision of number is 38. The scale can vary between
84 and 127. If the scale is negative, the number is rounded to the
specified number of places to the left of the decimal point.
The following cases may occur when we insert a value in a column whose
domain is number:
Data
Domain
Stored as
1,234,567.89 number
1234567.89
1,234,567.89 number(9)
1234567
1,234,567.89 number(9,2)
1234567.89
1,234,567.89 number(9,1)
1234567.9
1,234,567.8
number(6)
error: exceeds precision
1,234,567.89 number(10,1) 1234567.9
1,234,567.89 number(7,-2) 1234500
1,234,567.89 number(7,2)
error: exceeds precision
If s > p, then s specifies the maximum number of valid digits after the
decimal point. For instance, number(4,5) requires at least one digit after
the decimal point and rounds the digits after the fifth decimal digit. The
number 0.012358 is stored as 0.01236.
Numbers may also be entered in exponential form, that is, including
an exponent preceded by E. For example, 1234567 can be represented as
1.234567E+6, that is, as 1.234567 106 .
Floating point domains are supported as float, float(*), and float(b),
where b is the binary precision, that is, the number of significant binary
digits. The domains float and float(*) are equivalent, and they consists
of floating point numbers that can be represented by 126 binary digits (or,
equivalently, by about 36 decimal digits).
To provide compatibility with other systems, ORACLE supports such
domains as decimal, integer, smallint, real, and double precision.
However, their internal representation is defined by the format of the
number domain.

5.5

SELECT Phrases

Queries must be written based on the names and headings of the tabular variables and not on the tables that represent their values at any given moment.
This is similar to writing programs. A program should work for all legal inputs
and not just the ones on which it was tested. In both cases, it is important to

78

SQL The Relational Language

focus on the abstract structure and not on specific examples. The way we write
SQL constructs must be directed only by the logic of the query and not by the
content of a particular database instance. Just because the query generated the
right answer for a particular instance of the database does not mean that it is
correct.
The main retrieval construction is the select phrase. Consider a query that
we solved previously using relational algebra. Recall that in Example 4.1.25 we
found the names of all instructors who have taught any student who lives in
Brookline. The solution involved using product, selection, and projection:
T1 := (STUDENTS GRADES INSTRUCTORS)
T2 := T1 where STUDENTS.stno = GRADES.stno and
GRADES.empno = INSTRUCTORS.empno and
STUDENTS.city = Brookline
ANS := T2 [INSTRUCTORS.name].
In SQL the same problem can be resolved using a single select phrase as in:
select INSTRUCTORS.name from STUDENTS, GRADES, INSTRUCTORS
where STUDENTS.stno = GRADES.stno and
GRADES.empno = INSTRUCTORS.empno and
STUDENTS.city = Brookline;

We can conceptualize the execution of this typical select using the operations of relational algebra as follows:
1. The execution begins by performing the product of the tables listed after
the reserved word from. In our case, this involves computing the product
STUDENTS GRADES INSTRUCTORS
2. The selection specified after the reserved word where is executed next,
if the where part is present (we shall see that this may or may not be
present in a select.) In our case, this amounts to retaining that part of
the table product that satisfies the condition:
STUDENTS.stno = GRADES.stno and GRADES.empno = INSTRUCTORS.empno
and STUDENTS.city = Brookline

3. Finally, the result of the second phase is projected on the attributes


listed between select and from, that is, in our case, on the attribute
INSTRUCTORS.name.
We use a string constant (also known as a literal ) in the above select, namely
Brookline. String constant must begin and end with a single quote.
SQL is not case-sensitive. This means that you may or may not use capital
letters in any place in an SQL construction (except for string comparisons)
without any effect on the value returned by the query.
As we mentioned above, the where part of a select (also known as the where
clause) is optional. This allows us to compute table projections in SQL as we
show next.

5.5 SELECT Phrases

79

Example 5.5.1 In Example 4.1.16, we obtain a list of instructors names and


the room numbers of their offices by projecting the table INSTRUCTORS on
name roomno.
In SQL this can be done by writing
select name, roomno from INSTRUCTORS;

The select construct used above requires the table name for the table involved in the retrieval and the list of attributes that we need to extract.
In general, if we need to compute the projection of a table T on a set of
attributes A1 . . . An of the heading of T , we use the construct:
select A1 , . . . , An from T ;
Example 5.5.2 To find out the states where the students originate we project
the table STUDENTS on the attribute state. This is done by
select state from STUDENTS;

The system returns the result:


ST
-MA
MA
MA
MA
NH
MA
MA
MA
RI

The value MA is repeated 7 times because there are seven students who live
in Massachusetts.
Duplicate values can be eliminated from a query by using the option distinct
as in
select distinct state from STUDENTS;

This will yield the answer:


ST
-MA
NH
RI

where duplicate values have been dropped.

80

5.6

SQL The Relational Language

The WHERE Option

The where clause allows us to extract tuples that satisfy certain conditions; in
other words, using the where clause we can perform selections.
Example 5.6.1 To find students who live in Boston we write:
select stno, name, addr, city, state, zip
from STUDENTS
where city = Boston;

This select will return the result:


STNO
NAME
ADDR
CITY
ST ZIP
--------------------------------------------------------------2890
McLane Sandy
30 Cass Rd.
Boston
MA 02122
4022
Prior Lorraine 8 Beacon St.
Boston
MA 02125
5544
Rawlings Jerry 15 Pleasant Dr.
Boston
MA 02115

If we want to extract all columns of a table instance, we can use the wildcard character, *, instead of listing all columns. Thus, we can write the equivalent select:
select * from STUDENTS
where city = Boston;

Here the symbol * replaces the full attribute list.


Starting from simple conditions (which we called atomic conditions in Chapter 4) we can write queries involving more complicated conditions built by using
and, or, and not.
Example 5.6.2 In Example 4.1.14 we retrieved the students who live in Boston
or Brookline. In SQL this can be done by:
select * from STUDENTS
where city = Boston or city = Brookline;

This yields the result:


STNO
NAME
ADDR
CITY
ST ZIP
--------------------------------------------------------------2661
Mixon Leatha
100 School St.
Brookline MA 02146
2890
McLane Sandy
30 Cass Rd.
Boston
MA 02122
3566
Pierce Richard 70 Park St.
Brookline MA 02146
4022
Prior Lorraine 8 Beacon St.
Boston
MA 02125
5544
Rawlings Jerry 15 Pleasant Dr.
Boston
MA 02115

Example 5.6.3 To retrieve the grade records obtained in cs110 during the
Spring of 2000 we can write in SQL:
select * from GRADES
where cno = cs110 and sem = SPRING
and year = 2003;

This returns the result:

5.6 The WHERE Option


STNO
---------1011
4022

EMPNO
----------023
023

CNO
----cs110
cs110

81
SEM
YEAR
GRADE
------ ---------- ---------SPRING
2000
75
SPRING
2000
60

Selections can be combined with projections in a single SQL phrase.


Example 5.6.4 In the select phrase:
select stno, empno from GRADES
where cno = cs110;

the projection specified by where cno = cs110 is followed by the projection


on the attributes stno, empno that are listed after the word select. The result
is:
STNO
---------1011
2661
3566
5544
1011
4022

EMPNO
----019
019
019
019
023
023

In SQL we can use conditions that implement limited pattern matching.


Certain patterns can be specified using the symbol % to replace 0 or more characters, and the underscore to replace exactly one character. As mentioned earlier, SQL is generally not case-sensitive; however, comparisons involving strings
are case-sensitive. Thus, Jerry and JERRY are distinct strings, and Jerry
JERRY. The comparison is realized using the operator like.
Example 5.6.5 If we need to find the names and the addresses of students
whose name includes Jerry, we can use the following select construct:
select name, addr from STUDENTS
where name like %Jerry%;

This returns the table:


NAME
--------------Rawlings Jerry
Lewis Jerry

ADDR
--------------15 Pleasant Dr.
1 Main Rd.

Example 5.6.6 Suppose the computer science course numbers were carefully
assigned so that all fundamental programming courses have a 1 as their second
digit. Then the following select construct lists all fundamental programming
courses.

82

SQL The Relational Language

select * from COURSES


where cno like cs_1%;

The corresponding result is:


CNO
----cs110
cs210
cs310
cs410

CNAME
------------------------Introduction to Computing
Computer Programming
Data Structures
Software Engineering

CR
-4
4
3
3

CAP
--120
100
60
40

Using the reserved word between, we can ensure that certain values are
limited to prescribed intervals (including the endpoints of these intervals).
Example 5.6.7 To find the students who obtained some grade between 65 and
85 in 2002, we apply the following query:
select distinct stno from GRADES
where year = 2003 and
grade between 65 and 85;

This select construct returns the table:


STNO
---1011
2661
5571

The previous select is simply a shorthand for


select distinct stno from GRADES
where year = 2003 and
grade >= 65 and
grade <= 85

Example 5.6.8 A select construct, similar to the one used in Example 5.6.7,
can be used to retrieve the students who have some grade that does not satisfy
the previous condition, that is, the students who have some grade not between
65 and 85:
select distinct stno from GRADES where year = 2003
and grade not between 65 and 85;

This construct generates the answer:


STNO
---1011
2415
3442

5.6 The WHERE Option

83

3566
4022
5571

We can test if certain components of tuples belong to a certain list of values


by using a condition of the form:
A in (v1 , . . . , vn )
This condition is satisfied by those tuples t such that t[A] has one of the values
v1 , . . . , vn .
Example 5.6.9 Let us find the names of students who live in Boston or Brookline, a query that we already discussed in Example 5.6.2. Using the previous
condition we write:
select name from STUDENTS
where city in (Boston,Brookline);

Then, the desired list is:


NAME
-------------Mixon Leatha
McLane Sandy
Pierce Richard
Prior Lorraine
Rawlings Jerry

On the other hand, we can test of the negation of a condition using not. To
list the names of students who live outside those two cities, we write:
select name from STUDENTS
where not(city in (Boston,Brookline));

which has the same effect as:


select name from STUDENTS
where city not in (Boston,Brookline);

We can insert strings of characters in the list of fields of a select phrase to


improve the presentation of the results.
Example 5.6.10 To insert the string Student name: in front of a student
name we write:
select Student name: , name from STUDENTS;

This yields the result:


STUDENTNAME: NAME
--------------- ----------------Student name: Edwards P. David
Student name: Grogan A. Mary
Student name: Mixon Leatha

84
Student
Student
Student
Student
Student
Student

SQL The Relational Language


name:
name:
name:
name:
name:
name:

McLane Sandy
Novak Roland
Pierce Richard
Prior Lorraine
Rawlings Jerry
Lewis Jerry

In SQL Plus concatenation of strings can be achieved with the concatenation


operator ||.
Example 5.6.11 In the next select phrase we concatenate the string Student
with a students name, then with the string lives in and the students state:
select Student || name || lives in || state
from STUDENTS;

returns the result:


STUDENT||NAME||LIVESIN||STATE
------------------------------------Student Edwards P. David lives in MA
Student Grogan A. Mary lives in MA
Student Mixon Leatha lives in MA
Student McLane Sandy lives in MA
Student Novak Roland lives in NH
Student Pierce Richard lives in MA
Student Prior Lorraine lives in MA
Student Rawlings Jerry lives in MA
Student Lewis Jerry lives in RI

In Microsoft SQL server concatenation is obtained using the + operator.


Example 5.6.12 The query shown in Example 5.6.11 can be executed in Microsoft SQL server by
select Student + name + lives in + state
from STUDENTS;

5.7

Union, Intersection, and Difference in SQL

Recall that union, intersection, and difference as defined in relational algebra


may occur only between tables that have identical headings. To execute these
operations in SQL, we need to use compound select phrases. Compound selects
are constructed from simple select phrases using the reserved words union,
intersect, and minus. As we shall see, SQL treats union, intersection and difference as operations between sets of tuples, and therefore, it removes duplicate
values from the results of the queries.

5.7 Union, Intersection, and Difference in SQL

85

Example 5.7.1 To determine the student numbers of students who took cs210
we write:
select stno from GRADES
where cno = cs210;

This returns the result:


STNO
---1011
2661
3566
5571
4022

Similarly, we find the student numbers of students who took cs240:


select stno from GRADES
where cno = cs240;

In turn, this yields:


STNO
---3566
5571
2415
5544
1011
4022

To find the students who took both cs210 and cs240 we use the intersect
to link the two previous select phrases into a compound select:
select stno from grades where cno = cs210
intersect
select stno from grades where cno = cs240;

This gives:
STNO
---1011
3566
4022
5571

Neither SQL Server nor MySQL aupport the intersect operation.


The union of the two sets is computed by the following compound select:
select stno from grades where cno = cs210
union
select stno from grades where cno = cs240;

Note that the tuples of the result are sorted.

86

SQL The Relational Language

STNO
---1011
2415
2661
3566
4022
5544
5571

If we wish to retain all values in the result, then we need to use union all
to link the select phrases as in:
select stno from grades where cno = cs210
union all
select stno from grades where cno = cs240;

The result contain now all values retrieved by the individual selects:
STNO
---1011
2661
3566
5571
4022
3566
5571
2415
5544
1011
4022

The set difference is computed in ORACLEs SQLPlus using minus. To find


the students who took cs210 but did not take cs240 we write:
select stno from grades where cno = cs210
minus
select stno from grades where cno = cs240;

which returns the result:


STNO
---2661

The reverse difference allows us to find students who took cs240 but did not
take cs210:
select stno from grades where cno = cs240
minus
select stno from grades where cno = cs210;

Now we obtain:

5.8 Table Product in SQL

87

STNO
---2415
5544

Neither SQL Server nor MySQL support the minus operation.

5.8

Table Product in SQL

A select phrase that lists several distinct table names after the reserved word
from computes the product of these tables.
Example 5.8.1 To examine all possible pairs of students/instructors we could
write the following select:
select STUDENTS.name, INSTRUCTORS.name
from STUDENTS, INSTRUCTORS;

Since our database is in a state that contains 9 students and five instructors,
this will result in 45 rows retrieved:
NAME
NAME
--------------------------------Edwards P. David
Evans Robert
Grogan A. Mary
Evans Robert
Mixon Leatha
Evans Robert
.
.
.
Pierce Richard
Will Samuel
Prior Lorraine
Will Samuel
Rawlings Jerry
Will Samuel
Lewis Jerry
Will Samuel

Observe that the tables are not linked by any where condition; as expected
in the definition of the product, all combinations of rows are considered. After computing the product, a projection eliminates all attributes except STUDENTS.name and INSTRUCTORS.name.
Also, note that we use qualified attributes as required by the definition of
table product (see Definition 4.1.7).
The result produced by the query shown in Example 5.8.1 does not differentiate between the attributes STUDENTS.name and INSTRUCTORS.name and
this may confuse the user. Therefore, it is preferable to rename the columns of
the result using the option as:
select STUDENTS.name as stname, INSTRUCTORS.name as instname
from STUDENTS, INSTRUCTORS;

This will generate:

88

SQL The Relational Language

STNAME
INSTNAME
--------------------------------Edwards P. David
Evans Robert
Grogan A. Mary
Evans Robert
Mixon Leatha
Evans Robert
.
.
.
Pierce Richard
Will Samuel
Prior Lorraine
Will Samuel
Rawlings Jerry
Will Samuel
Lewis Jerry
Will Samuel

SQL allows for computations of products of several copies of the same table
through the creation of aliases; the solution proceeds using the logic discussed
in Example 4.1.18. To create an alias S of a table named T we write the name
of the alias after the name of the table in the list of table, making sure that at
least one space (and no comma) exists between the name of the table and its
alias. For example, in the select phrase of Example 5.8.2 we create the alias I
by writing
INSTRUCTORS I

Table aliases are also known as correlation names of tables.


Example 5.8.2 Let us solve the query shown in Example 4.1.18: finding all
pairs of instructors names for instructors who share the same office. This can
be done by writing:
select I.name as firstname, INSTRUCTORS.name as secname
from INSTRUCTORS I, INSTRUCTORS
where I.roomno = INSTRUCTORS.roomno and
I.empno < INSTRUCTORS.empno;

The result of this query is:


FIRSTNAME
SECNAME
-----------------------------Exxon George
Will Samuel

Conceptually, we create an alias I of the table INSTRUCTORS, compute the


product between this alias and INSTRUCTORS and retain those pairs that share
the the same room and consist of distinct individuals.
Example 5.8.3 Suppose that we need to find all triples of student names for
students who live in the same city and state. Now we need to operate with three
distinct copies of the table STUDENTS. This is accomplished by:
select S1.name as name1, S2.name as name2,
S3.name as name3
from STUDENTS S1, STUDENTS S2,
STUDENTS S3
where S1.state = S2.state and
S2.state = S3.state and

5.9 Join in SQL


S1.city
S2.city
S1.stno
S2.stno

89
=
=
<
<

S2.city and
S3.city and
S2.stno and
S3.stno

which gives the result:


NAME1
NAME2
NAME3
---------------------------------------------------McLane Sandy
Prior Lorraine
Rawlings Jerry

5.9

Join in SQL

Earlier version of SQL (at the level of SQL 1) dealt with the join operation
indirectly, using operations like product, selection and projection, which are
already available in SQL. The blueprint of this treatment of the join operation
was outlined in Section 4.2.
Example 5.9.1 The SQL solution to the query considered in Example 4.2.2 in
which we seek to find the names of instructors who have taught any four-credit
course is solved in SQL by writing:
select distinct INSTRUCTORS.name
from COURSES, GRADES, INSTRUCTORS
where COURSES.cr = 4
and COURSES.cno = GRADES.cno
and GRADES.empno = INSTRUCTORS.empno;

The steps that we applied in relational algebra can be easily reconstituted in


SQL. The first step that consists of computing the product
T1 = COURSES GRADES INSTRUCTORS
corresponds to the list of tables that follows the word from. Then, the selection
specified by
T2 = (T1 where COURSES.cr = 4 and
COURSES.cno = GRADES.cno and
GRADES.empno = INSTRUCTORS.empno)
is executed using the condition of the where clause.
Finally, the projection
T3 (name) = T2 [INSTRUCTORS.name]
corresponds to the list that follows select. In this case, this list consists of one
attribute, INSTRUCTORS.name.
We give one more example that shows a typical query that uses a join.

90

SQL The Relational Language

Example 5.9.2 To list all pairs of student names and course names such that
the student takes the course, the relational algebra solution would require that
we join the tables STUDENTS, GRADES, and COURSES. In SQL we write:
select distinct STUDENTS.name, COURSES.cname
from STUDENTS, GRADES, COURSES
where STUDENTS.stno = GRADES.stno and
GRADES.cno = COURSES.cno

This query will return:


NAME
CNAME
-------------------------------------------------Edwards P. David
Computer Architecture
Edwards P. David
Computer Programming
Edwards P. David
Introduction to Computing
Grogan A. Mary
Computer Architecture
.
.
.
Prior Lorraine
Data Structures
Prior Lorraine
Introduction to Computing
Rawlings Jerry
Computer Architecture
Rawlings Jerry
Introduction to Computing

SQL dialects that conform to the SQL-2 standard (e.g., SQLPlus of Oracle
9i and 10g, and Microsoft SQL Server) allow the use of the constructions inner join and on. For example, the query discussed in Example 5.9.1 has the
alternate solution:
select distinct INSTRUCTORS.name
from INSTRUCTORS, COURSES INNER JOIN GRADES
on COURSES.cno = GRADES.cno
where INSTRUCTORS.empno = GRADES.empno
and COURSES.cr = 4;

This query should be viewed as computing the natural join of COURSES and
GRADES based on the equality of the attributes they share (as specified by the
on clause. Then, the join INSTRUCTORS with the result of the previous join is
computed using the simulation by product and selection method.
In SQL Plus queries involving natural joins among tables who attributes
identically named can be further simplified by applying the using clause, which
lists the attributes involved in the joining.
Example 5.9.3 To retrieve the names of instructors who taught cs110 we can
execute in SQL Plus the query:
select distinct INSTRUCTORS.name
from INSTRUCTORS inner join GRADES
using(empno);

The inner join can be used for joins that involve more than two tables.

5.9 Join in SQL

91

Example 5.9.4 An alternative solution to the query of Example 5.9.1 that


makes use of the inner join operation is:
select distinct INSTRUCTORS.name
from
INSTRUCTORS inner join GRADES
using(empno)
inner join COURSES
using(cno)
where COURSES.cr = 4

It is possible to involve several attributes in an inner join either explicitely,


using the claues on or implicitely, employing the clause using.
Example 5.9.5 To find the pairs of names of students and instructors such that
the student takes a course with the instructor who is also his or her advisor, we
can write either:
select distinct STUDENTS.name as sname, INSTRUCTORS.name as iname
from GRADES inner join ADVISING
on GRADES.stno = ADVISING.stno and
GRADES.empno = ADVISING.empno
inner join STUDENTS
on ADVISING.stno = STUDENTS.stno
inner join INSTRUCTORS
on ADVISING.empno = INSTRUCTORS.empno

or, equivalently,
select distinct STUDENTS.name as sname, INSTRUCTORS.name as iname
from GRADES inner join ADVISING
using(stno,empno)
inner join STUDENTS
using(stno)
inner join INSTRUCTORS
using(empno)

Cartesian product of two tables can be computed, alternatively using the


cross join operation.
Example 5.9.6 The query that we wrote in Example 5.8.1 that generates all
possible pairs of students/instructors can be also written as:
select STUDENTS.name, INSTRUCTORS.name
from STUDENTS cross join INSTRUCTORS;

which is equivalent to
select STUDENTS.name, INSTRUCTORS.name
from STUDENTS, INSTRUCTORS;

92

SQL The Relational Language

We saw that when joining two tables not all tuples are joinable; tuples that
belong to one table and are not joinable with any tuple of the other table leave no
trace in the join, a situation that is often inconvenient. As we saw in Section 4.3,
the outer join operation and its variants, the left outer join and the right outer
join can rectify this situation.
Let us assume that the tabular variables STUDENTS and INSTRUCTORS
contain the tuples shown in Figure 5.1.
The tabular variable ADVISING has the same content as the one shown in
Figure 3.1.
Example 5.9.7 Oracles own syntax for left outer join is to designate the component that may be null by (+), as in
select students.name, ADVISING.empno from STUDENTS, ADVISING
where STUDENTS.stno = ADVISING.stno(+)

This is equivalent to using the operator left outer join as specified by SQL2:
select STUDENTS.name, ADVISING.empno
from STUDENTS left outer join ADVISING
on STUDENTS.stno = ADVISING.stno
\end{PGMdiplsy}
Either phrase will return:
\begin{PGMdisplay}
name
empno
----------------------------------------Edwards P. David
019
Grogan A. Mary
019
Mixon Leatha
023
McLane Sandy
023
Novak Roland
056
Pierce Richard
126
Prior Lorraine
234
Rawlings Jerry
023
Lewis Jerry
234
Davis Richard
Chu Martin

The computation of the right outer join is similar. We can use either Oracles
syntax as in
select ADVISING.stno, INSTRUCTORS.name from ADVISING, INSTRUCTORS
where ADVISING.empno(+) = INSTRUCTORS.empno;

or the standard syntax:


select ADVISING.stno, INSTRUCTORS.name
from ADVISING right outer join INSTRUCTORS
on ADVISING.empno = INSTRUCTORS.empno;

In either case we shall obtain:

5.9 Join in SQL

93

STUDENTS
addr
10 Red Rd.
8 Walnut St.
100 School St.
30 Cass Rd.
42 Beacon St.
70 Park St.
8 Beacon St.
15 Pleasant Dr.
1 Main Rd
45 Algonquin Rd.
90 Rye Dr.

stno
1011
2415
2661
2890
3442
3566
4022
5544
5571
6410
7209

name
Edwards P. David
Grogan A. Mary
Mixon Leatha
McLane Sandy
Novak Roland
Pierce Richard
Prior Lorraine
Rawlings Jerry
Lewis Jerry
Davis Richard
Chu Martin

empno
019
023
056
126
234
323

INSTRUCTORS
name
rank
Evans Robert
Professor
Exxon George
Professor
Sawyer Kathy
Assoc. Prof.
Davis William
Assoc. Prof.
Will Samuel
Assist.Prof.
Campbell Kenneth
Professor

city
Newton
Malden
Brookline
Boston
Nashua
Brookline
Boston
Boston
Providence
Natick
Ayer

roomno
82
90
91
72
90
102

state
MA
MA
MA
MA
NH
MA
MA
MA
RI
MA
MA

zip
02159
02148
02146
02122
03060
02146
02125
02115
02904
01760
01290

telno
7122
9101
5110
5411
7024
7077

Figure 5.1: Tables with tuples with null components

94

SQL The Relational Language

stno
name
--------------------------1011
Evans Robert
2415
Evans Robert
2661
Exxon George
2890
Exxon George
5544
Exxon George
3442
Sawyer Kathy
3566
Davis William
4022
Will Samuel
5571
Will Samuel
Campbell Kenneth

Finally, the outer join itself can be computed using the operator outer join:
select STUDENTS.name, INSTRUCTORS.name
from students full outer join advising
using(stno)
full outer join instructors
using(empno);

This will result in


sname
iname
----------------------------------------------------Grogan A. Mary
Evans Robert
Edwards P. David
Evans Robert
Rawlings Jerry
Exxon George
McLane Sandy
Exxon George
Mixon Leatha
Exxon George
Novak Roland
Sawyer Kathy
Pierce Richard
Davis William
Lewis Jerry
Will Samuel
Prior Lorraine
Will Samuel
Chu Martin
Davis Richard
Campbell Kenneth

5.10

Sets and subqueries

Subqueries are select phrases that return sets rather than tables. Their main
use is in conditions that involve sets. As we shall see, they are useful in implementing difference and division
in SQL. Syntactically, a subquery is written by placing a select phrase
between a pair of parentheses. For example,
(select empno from INSTRUCTORS where rank = Professor);

5.10 Sets and subqueries

95

is a subquery that computes the employee numbers of full professors. To find


the student numbers of students who take a course with a full professor, we
need to select those GRADES tuples whose empno belongs to this set. This can
be accomplished by writing:
select distinct stno from GRADES where
empno in (select empno from INSTRUCTORS
where rank = Professor);

This will return the result:


STNO
---1011
2415
2661
3566
4022
5544
5571

We refer to the first select as the calling select, or the main select or the outer
select; the select of the subquery is the inner select.
As we saw in the introductory example, membership can be tested using in.
Here is another example.
Example 5.10.1 Let us find the names of students who took cs310. We determine the student numbers of those students using a subquery. Then, in the
main select, we retrieve those students whose student number is in this set.
This can be accomplished using the query:
select name from STUDENTS where
stno in (select stno from GRADES
where cno = cs310);

which returns the table:


NAME
-------------Mixon Leatha
Prior Lorraine

It is possible to test membership of a tuple in a set of tuples computed by a


subquery using a condition of the form
(x1 , . . . , xn ) in (select A1 , . . . , An from )
This type of test is included by SQL99, but it is not implemented in many SQL
dialects. However, it is in ORACLE and DB2.
Example 5.10.2 To find the pairs of names of students and instructors such
that the student took some course with the instructor but no four-credit course.
This is computed by the following query:

96

SQL The Relational Language

select STUDENTS.name as sname,


INSTRUCTORS.name as iname
from STUDENTS, INSTRUCTORS where
(STUDENTS.stno, INSTRUCTORS.empno) in
(select stno, empno from grades
minus
select stno, empno from grades
where cno in (select cno
from courses
where cr=4));

This will return the following table:


SNAME
INAME
------------------ ------------Edwards P. David
Sawyer Kathy
Grogan A. Mary
Evans Robert
Mixon Leatha
Will Samuel
Novak Roland
Will Samuel
Prior Lorraine
Sawyer Kathy
Prior Lorraine
Will Samuel
Rawlings Jerry
Sawyer Kathy
Lewis Jerry
Will Samuel

If oper is one of the operators =, !=, <, >, <= or >=, then we can use
conditions of the form
v oper any (select ...)

or
v oper all (select ...)

in comparisons that involve some elements of the set computed by the subquery
(select ) or all elements of the same set, respectively. Here != stands for
inequality.
Example 5.10.3 To find the names of the courses taken by the student whose
student number is 1011, we can use the following query:
select cname from COURSES where
cno = any (select cno from

GRADES where stno= 1011);

The construct = any is synonymous with in, and the same query could be
written as:
select cname from COURSES
where cno in (select cno from GRADES where stno= 1011);

Also, instead of = any we could use = some, and so, we have a third way or
writing the same query:
select cname from COURSES where
cno = some (select cno from GRADES where stno= 1011);

5.11 Parametrized subqueries

97

All three queries result in the table:


CNAME
------------------------Introduction to Computing
Computer Programming
Computer Architecture

Example 5.10.4 Let us find the students who obtained the highest grade in
cs110. Although there are methods that we explain later that yield much simpler
solutions for this type of query, for the moment we want to illustrate the oper all
condition. We operate on two copies of GRADES. The copy used in the inner
select is intended for computing the grades obtained in cs110:
select stno from GRADES where cno = cs110
and grade >= all(select grade from GRADES
where cno = cs110);

We obtain the table:


STNO
---5544

Example 5.10.5 Let us find the students who obtained a grade higher than any
grade given by a certain instructor, say Prof. Will. Using the all... subquery
we can write:
select stno from GRADES
where grade >= all(select grade from GRADES
where empno in (select empno from INSTRUCTORS
where name like Will%));

If we alter this query and replace the instructor with Prof. Davis, who teaches
no courses, then the set computed by the query
select stno from GRADES
where grade >= all(select grade from GRADES
where empno in (select empno from INSTRUCTORS
where name like Davis%));

is empty. Therefore, every grade satisfies the inequality, and we obtain all
student numers for students who took any course!

5.11

Parametrized subqueries

Often the retrieval performed in a subquery depends on a value provided by the


calling select. A typical situation is described in the following example.

98

SQL The Relational Language

Example 5.11.1 Suppose that we need to retrieve the course numbers of courses
taken by the student whose student number is STUDENTS.stno. Ignore (for the
moment) the origin of this piece of data. Then, the retrieval is done by the
select construct:
select cno from GRADES
where stno = STUDENTS.stno;

Next, we transform this select into a subquery. The student number STUDENTS.stno is provided by the outer select of the following construct:
select name from STUDENTS where cs310 in
(select cno from GRADES
where stno = STUDENTS.stno);

Observe that this provides an alternate solution to the query discussed in Example 5.10.1. Namely, we use a subquery to compute the courses taken by each
student. Then, we test if cs310 is one of these courses. We use the qualified attribute STUDENTS.stno inside the subquery to differentiate between this input
parameter and the attribute stno of the table GRADES.
Sets of tuples produced by subqueries can be tested for emptiness using the
exists condition. Namely, the condition
exists (select from )
is true if the set returned by the subquery is not empty; similarly,
not exists (select from )
is true if the set returned by the subquery is empty.
Example 5.11.2 Let us give yet another solution to the query we solved in
Example 5.10.1. This time, to find the names of students who took cs310 we
determine the student numbers of those students for whom their set of grades
in cs310 is not empty. This can be done as follows:
select name from STUDENTS where
exists (select * from GRADES where
stno = STUDENTS.stno and
cno = cs310);

As a result, we have the table:


NAME
-------------Mixon Leatha
Prior Lorraine

Example 5.11.3 To find instructors who never taught cs110, we search for
instructors for whom there is no GRADES record involving cs310 and these
instructors. This can be done by

5.11 Parametrized subqueries

99

select name from INSTRUCTORS where


not exists(select * from GRADES where
empno = INSTRUCTORS.empno and
cno = cs110);

which results in the table:


NAME
------------Sawyer Kathy
Davis William
Will Samuel

If both the main query and the subquery deal with the same table and the
subquery requires input parameters from the outer query, then we use an alias
of the table in the outer query.
Example 5.11.4 Let us find the student numbers of students whose advisor
is advising at least one other student. The information is contained in the
ADVISING table, and the following select construct uses both ADVISING (in
the subquery) and its alias A in the main query:
select distinct stno from ADVISING A
where exists (select * from ADVISING where
empno = A.empno and stno != A.stno);

This query returns the table:


STNO
---1011
2415
2661
2890
4022
5544
5571

Subqueries can be used in the list that follows from in exactly the same
manner that tables are used. This is shown in the next example:
Example 5.11.5 To find the pairs of names of students and instructors such
that the student took some course with the instructor we could write:
select STUDENTS.name as sname, INSTRUCTORS.name as iname
from STUDENTS, INSTRUCTORS,
(select stno, empno from GRADES) PN
where STUDENTS.stno = PN.stno and
INSTRUCTORS.empno = PN.empno;

100

SQL The Relational Language

The difference of the tables T and S can be computed by looking for each
tuple of T for which there is no matching tuple in S. This can be done by:
select * from T where
not exists (select * from S where
A1 = T.A1 and and An = T.An )
Example 5.11.6 Courses offered by the continuing education program but not
by the regular program can be found by writing:
select * from CED_COURSES where
not exists (select * from COURSES where
cno = CED_COURSES.cno)

which takes advantage of the fact that cno is a key for both COURSES and
CED COURSES.

5.12

Subqueries and division

SQL does not have a division operation. However, as we saw in Examples 4.1.27
and 4.2.3, we can perform division using product, projection, and difference. Of
course, we could apply the prescription offered by relational algebra. This type
of solution is discussed in the next example.
Example 5.12.1 The solution envisioned here is
select cno from grades
minus
select GI.cno from (select grades.cno,
instructors.empno
from grades, instructors
where rank=Professor) GI
where (GI.cno,GI.empno) not in (select cno,empno from grades)

Note that the query


select grades.cno, instructors.empno
from grades, instructors
where rank=Professor

computes all pairs of courses and instructor numbers using the product of the
tables GRADES and INSTRUCTORS. Then, the query
select GI.cno from (select grades.cno,
instructors.empno
from grades, instructors
where rank=Professor) GI
where (GI.cno,GI.empno) not in (select cno, empno from grades)

extracts the courses that are part of the pairs of the previous table that do not
appear in the GRADES table, that is, the courses for which there exists a full
professor who did not teach these courses. These are the courses that we need
to exclude from the answer. Thus, the query presented at the beginning of this
example yields the solution of the problem:

5.12 Subqueries and division

101

CNO
----cs110

The solution presented in Example 5.12.1 is not applicable in SQL dialects


that do not have all the facilities of SQL Plus. Therefore, we need to examine
an alternate way of solving this problem that is almost universally usable. To
understand the technique used we examine the solution of the query formulated
in the next example.
Example 5.12.2 Again, suppose that we need to determine the courses taught
by every full professor. Let us formulate the same query in a way that is
easier to translate in SQL. Namely, we find the courses for which there are no
full professors who have not taught these courses. The reader should realize
immediately that this is simply a new formulation of the same problem. We
show the solution in steps, moving gradually from plain English to SQL:
Phase I:
select cno from GRADES G where
not exists (instructors who are full professors and
have not taught the course G.cno)
Phase II:
select cno from GRADES G where
not exists (select * from INSTRUCTORS
where rank = Professor and
these instructors have not taught
the course G.cno)
Phase III:
select cno from GRADES G where
not exists (select * from INSTRUCTORS
where rank = Professor and
not exists (select * from GRADES
where empno = INSTRUCTORS.empno
and cno = G.cno));
In Phase I we determine in SQL the course numbers for which no full professor exists who has not taught these courses.
In Phase II we concentrate on preventing the existence of full professors who
are not teaching these courses. Note that Phase II still contains an untranslated
part.
Finally, in Phase III, we translate the part who have not taught these
courses using not exists for the second time.
Example 5.12.3 Another query that requires division in relational algebra is:
Find names of instructors who have taught every 100-level course, that is,

102

SQL The Relational Language

every course whose first digit of the course number is 1. The formulation that
is better suited to SQL implementation is: Find names of instructors for whom
there is no 100 level course that they have not taught. This is solved by the
following select construct:
select name from INSTRUCTORS where
not exists (select * from COURSES
where cno like cs1__ and
not exists (select * from GRADES where
empno = INSTRUCTORS.empno
and cno = COURSES.cno));

The answer that results from our usual database instance is:
NAME
-----------Evans Robert
Exxon George

5.13

Relational Completeness of SQL

Between Chapter 4 and the current chapter, we have shown that SQL is capable
of performing all operations of relational algebra. This fact is known as the
relational completeness of SQL. As we shall see in subsequent chapters, the
capabilities of SQL go well beyond the standard definition of relational algebra.

5.14

Scalar Functions of SQL

We present now capabilities of SQL that go beyond relational algebra. We begin


by discussing built-in functions in SQL that may act on individual values (scalar
function), functions that act on sets of values (aggregate functions), and, also,
analytic functions that can be used for various statistical computations. Then,
we continue with the group by option of select, and we discuss several on-line
analytic processing functions of SQL.
Scalar functions are built-in functions of SQL that work on individual values.
They are highly dependent on the particular implementation of SQL, and we
limit our discussions to functions implemented by ORACLEs SQL Plus. There
are several types of scalar functions, depending on the types of their arguments.

5.14.1

Numerical Functions

Among the numerical functions, abs, sin, cos, power, sqrt, etc. have quite obvious
definitions. For example, sqrt computes the square root of its argument, while
power(x, y) computes xy .

5.14 Scalar Functions of SQL

103

Example 5.14.1 To illustrate some of the numerical functions we create a


table POINTS whose rows represent labelled points in the plane:
create table POINTS(ptid varchar2(10), x integer, y integer,
primary key(ptid));

and populate this table using the commands:


insert
insert
insert
insert
insert
insert
insert
insert
insert
insert
insert
insert

into
into
into
into
into
into
into
into
into
into
into
into

points(ptid, x, y) values (a,0,0);


points(ptid, x, y) values (b,0,1);
points(ptid, x, y) values (c,0,2);
points(ptid, x, y) values (d,1,0);
points(ptid, x, y) values (e,1,1);
points(ptid, x, y) values (f,1,2);
points(ptid, x, y) values (g,2,0);
points(ptid, x, y) values (h,2,1);
points(ptid, x, y) values (i,2,2);
points(ptid, x, y) values (j,3,0);
points(ptid, x, y) values (k,3,1);
points(ptid, x, y) values (l,3,2);

To determine the distances from a to every other point we write


select p.ptid,
sqrt(power(a.x - p.x,2)+power(a.y - p.y,2))
as dist
from points a, points p
where a.ptid = a

This returns:
PTID
---------a
b
c
d
e
f
g
h
i
j
k
l

DIST
---------0
1
2
1
1.41421356
2.23606798
2
2.23606798
2.82842712
3
3.16227766
3.60555128

To compute the distance between a having the coordinates


p (xa , ya ) and a point
p with coordinates (xp , yp ), we use the formula d(a, p) = (xa xp )2 + (ya yp )2 .
The formula appears in the target list of the select and is written with the numerical functions sqrt and power.
In Oracle we can perform computations unrelated to any table by using a
fictious tabular variable that is named DUAL.

104

SQL The Relational Language

Example 5.14.2 To compute sin(30 ), sin(45 ) and sin(60 ) in Oracle, we


write:
select sin(30*3.14159265359/180) as sin30,
sin(45*3.14159265359/180) as sin45,
sin(60*3.14159265359/180) as sin60
from dual;

We need to convert the angles to radians before sin is applied. This will return:

SIN30
SIN45
SIN60
---------- ---------- ---------.5 .707106781 .866025404

Microsoft SQL server has a simpler way of performing this type of computations in that it does not require the fictitious table.
Example 5.14.3 In SQL server we can simply write:
select sin(30*3.14159265359/180) as sin30,
sin(45*3.14159265359/180) as sin45,
sin(60*3.14159265359/180) as sin60;

to obtain the same result as the one obtained in ORACLE.

5.14.2

String Functions

String functions can be used to transform strings, extract parts of strings, transform strings, etc.
The functions upper and lower, convert strings to upper and lower characters, respectively.
Example 5.14.4 To print names of students in capital characters and course
titles in small letters we can write:
select distinct upper(STUDENTS.name) as STNAME,
lower(COURSES.cname) as course
from STUDENTS, GRADES, COURSES
where STUDENTS.stno = GRADES.stno and
GRADES.cno = COURSES.cno;

This generates the following return:


STNAME
COURSE
----------------------------------------------EDWARDS P. DAVID
computer architecture
EDWARDS P. DAVID
computer programming
EDWARDS P. DAVID
introduction to computing
GROGAN A. MARY
computer architecture
.
.
.

5.14 Scalar Functions of SQL


PRIOR LORRAINE
PRIOR LORRAINE
RAWLINGS JERRY
RAWLINGS JERRY

105

data structures
introduction to computing
computer architecture
introduction to computing

These functions are particularly useful for performing string comparisons when
ignoring case. Thus,
STE\% like upper(stephany)

is true.
Example 5.14.5 The string function replace substitutes every occurrence of
its second argument in the value(s) specified by its first argument, by its third
argument. In the select written below the string Computer is replaced by the
string Comp.:
select replace(cname,Computer,Comp.) from COURSES;

This yields the following result:


REPLACE(CNAME,COMPUTER,COMP.)
---------------------------------Introduction to Computing
Comp. Programming
Comp. Architecture
Data Structures
Higher Level Languages
Software Engineering
Graphics

Example 5.14.6 The function concat computes the concatenation of two strings
that form its arguments. Its effect is identical to the concatenation operator ||
that we discussed in Example 5.6.11. The phrase below prints the state and zip
code of each students as a single string:
select name, addr, concat(state,zip) as state_zip from STUDENTS;

This returns:
NAME
ADDR
STATE_ZIP
---------------------------------------------Edwards P. David
10 Red Rd.
MA02159
Grogan A. Mary
Walnut St.
MA02148
Mixon Leatha
100 School St.
MA02146
McLane Sandy
30 Cass Rd.
MA02122
Novak Roland
42 Beacon St.
NH03060
Pierce Richard
70 Park St.
MA02146
Prior Lorraine
8 Beacon St.
MA02125
Rawlings Jerry
15 Pleasant Dr. MA02115
Lewis Jerry
1 Main Rd
RI02904

106

SQL The Relational Language

Example 5.14.7 To extract substrings of strings is we can use the function


substr. To call this function we need to use the following syntax:
substr(string, integer [,integer ])
A typical call such as substr(s, n, m) will return a the substring of length m
of the string s that starts with the nth characater of s. If m is omitted, as
in substr(s, n), then the function returns all charaters of s starting from the
nth character to the end of s. If n is negative, then the characters are counted
backwards from the end of s.
The select phrase
select substr(Oracle,2,3) from dual;

will return:
SUB
--rac

The next select which omits the third argument of substr:


select substr(Oracle,2) from dual

yields:
SUBST
----racle

which is the string that begins with the second character of Oracle and ends
with the last character of this string.
Since the second argument of the function call in
select substr(Oracle,-4,3) from dual

is negative, the starting position of the substring is the 4th character counted
from the end (that is, the character a) and thus, the query returns:
SUB
--acl

The functions lpad and rpad can be used to enhance presentation of results
of queries. The syntax of lpad is:
lpad(s, integer [string])
The effect is to padd s to the left with spaces to bring the total length of the
string to the length specified by the second argument of the function. If the
third argument is present, then this string is repeated to the left to fill up the
padded string.
The function rpad has a similar syntax; however, the padding is done at the
right of s.
Example 5.14.8 To print a list of all employees and their salaries (using the
tabular variables EMPHIST and PERSINFO we can use the query:

5.14 Scalar Functions of SQL

107

select name, lpad(salary,7,$) as ann_salary from


persinfo, emphist
where persinfo.empno = emphist.empno

This will return the result:


NAME
----------------------------------Natalia Martins
Laura Schwartz
John Soriano
Kendall MacRae
Rachel Anderson
Richard Laughlin
Danielle Craig
Abby Walsh
Bailey Burns

5.14.3

ANN_SAL
------$150000
$120000
$120000
$100000
$$70000
$$70000
$$90000
$$75000
$$70000

Date functions

SQL Plus contains a class of functions that apply to the DATE type: extract,
months between, etc.
Example 5.14.9 The function extract computes a part of a date value. Its
first argument gives the desired date part; the second argument is the date
value. For instance, to obtain the year part of the appt date attribute of the
table EMPHIST we write:
select empno, extract(year from appt_date) as start_y
from emphist;

This returns:
EMPNO
START_Y
---------- ---------1000
1999
1005
1999
1010
2000
1015
1999
1020
1999
1025
2000
1030
2000
1035
2000
1040
2000

Similarly, we can obtain the month part of a date by writing


select empno, extract(month from appt_date)
as start_m
from emphist

This will return the result:

108

SQL The Relational Language

EMPNO
START_M
---------- ---------1000
10
1005
10
1010
1
1015
10
1020
11
1025
3
1030
1
1035
2
1040
3

Example 5.14.10 To compute the number of months an employee has worked


we can use the function month between. This will compute the number of
months between the current date (designated by the system-provided constant
SYSDATE) and the date of hire:
select empno, months_between(SYSDATE,appt_date)
as month_served
from emphist

The table returned by this query is:


EMPNO MONTH_SERVED
---------- -----------1000
35.8877397
1005
35.532901
1010
32.8877397
1015
35.1135461
1020
34.8877397
1025
30.5974171
1030
32.5974171
1035
31.2748365
1040
30.8877397

Arithmetic computations can be performed in the target list of any select.


Example 5.14.11 Suppose that a bonus is to be paid to the employees. The
bonus is computed by paying 10% of the current weekly salary (salary/52)
(determined by a null value of the termination date), multiplied by the number
of months employed. This is computed by
select empno, 0.1 * months_between(SYSDATE,appt_date) * salary/52 as bonus
from emphist
where term_date is null;

This query returns:

5.15 Aggregate Functions in SQL

109

EMPNO
BONUS
-----------------1000
10430.7253
1005
8262.69438
1010
7652.27254
1015
6804.93348
1020
4733.05642
1025
4155.51299
1030
5688.95627
1035
4550.04006
1040
4194.59488

5.15

Aggregate Functions in SQL

Aggregate functions are those functions that operate on sets of values. Typical
examples include: sum, avg, max, min, and count.
The first four functions operate on columns of tables and ignore null values.
The count returns the number of elements of the set that is its argument.
Example 5.15.1 The following select construct determines the largest grade
obtained by the student whose student number is 1011. The function max is
applied to the set of grades of the student whose number is 1011 and returns
the largest value in this set:
select max(grade) as highgr from GRADES
where stno = 1011;

This returns the table:


HIGHGR
-----90

For instance, sum(A) returns the sum of all values of the selected nonnull
A-components of the tuples. Similarly, avg(A) returns the average value of the
same sequence. The expressions max(A) and min(A) yield the largest and the
smallest values in the set of A-components of the tuples selected by a query,
respectively.
The functions sum and avg apply to attributes whose domains are numerical
(such as integer or float); max and min apply to every kind of attribute.
If we wish to discard duplicate values from the sequences of values before
applying these functions, we need to use the word distinct. For instance,
sum(distinct A) considers only the distinct nonnull values that occur in the
sequence of components.
Example 5.15.2 We mentioned that the built-in functions max and min apply
to string domains as well as to numerical domains. We use this feature of these
functions to determine the first and the last student in alphabetical order:

110

SQL The Relational Language

select min(name) as first, max(name) as last


from STUDENTS;

This query yields the table:


FIRST
LAST
---------------- -------------Edwards P. David Rawlings Jerry

Next, we show a select construct where the same functions are applied to
a numerical domain:
select min(grade) as lowgr,
max(grade) as highgr from GRADES
where stno = 1011;

This generates the answer:


LOWGR
HIGHGR
----------------40
90

The query
select avg(distinct grade) as avggr from GRADES
where stno = 1011

returns the table


AVGGR
----73.75

If we discard duplicate values as in


select avg(distinct grade) as avggr from GRADES
where stno = 1011

then the average grade is lower, indicating a preponderance of the higher grades
for this student:
AVGGR
----68.33

Built-in functions can be used in subqueries. This is illustrated by the next


example.
Example 5.15.3 To retrieve the students who obtained a grade higher than
the average grade in cs110 we write:

5.15 Aggregate Functions in SQL

111

select stno from grades where cno = cs110


and grade > all(select avg(grade) from grades
where cno=cs110);

This returns the table:


STNO
---2661
3566
5544

The count function can be used in several ways:


count(A) can be used to determine the number of non-null entries under
the attribute A;
count(distinct A) computes the number of distinct non-null values that
occur under A;
count(*) determines how many rows exist in a table.
Note that count(distinct *) cannot be used in SQL.
Example 5.15.4 Here are several examples of the use of the count function.
To find how many students took cs110 in the fall semester of 2002, we write:
select count(cno) from GRADES
where cno = cs110 and
sem = Fall and
year = 2003;

Since no records exist for any grades given during that semester in cs110, we
obtain the answer:
COUNT(CNO)
---------0

Observe that this table has a system-supplied column name COUNT(cno). This
happens because we did not provide a name using as.
Let us determine how many students have ever registered for any course. We
have to retrieve this result from GRADES, and we must use distinct to avoid
counting the same student several times (if the student took several courses):
select count(distinct stno) as nost
from GRADES;

This query returns the one-entry table:


NOST
---8

112

SQL The Relational Language

Finally, let us determine the names of instructors who are teaching more
than one subject. For every instructor, we determine in a subquery the number
of courses taught. Then, we retain those instructors who taught more than one
course:
select name from INSTRUCTORS where
1 < any (select count(distinct cno) from GRADES
where empno = INSTRUCTORS.empno);

We obtain the table:


NAME
-----------Evans Robert
Will Samuel

5.16

Sorting Results

Data obtained from a select construct may be sorted on one or several columns
using the order by clause. This clause also gives the user the possibility of
opting for an ascending or descending sorting order on each of the columns. By
default, the ascending order is chosen.
Example 5.16.1 Suppose that we need to sort the GRADES tuples on the
student number. For each student, we sort the grades in descending order. This
can be done with the query:
select * from GRADES
order by stno, grade desc;

This results in the output shown next:


STNO
---------1011
1011
1011
1011
2415
2661
2661
2661
3442
3566
3566
3566
4022
4022

EMPNO
----------019
056
023
019
019
234
019
019
234
019
019
019
056
234

CNO
----cs210
cs240
cs110
cs110
cs240
cs310
cs110
cs210
cs410
cs240
cs110
cs210
cs240
cs310

SEM
YEAR
GRADE
------ ---------- ---------FALL
2003
90
SPRING
2004
90
SPRING
2003
75
FALL
2002
40
SPRING
2003
100
SPRING
2004
100
FALL
2002
80
FALL
2003
70
SPRING
2003
60
SPRING
2003
100
FALL
2002
95
FALL
2003
90
SPRING
2004
80
SPRING
2004
75

5.16 Sorting Results


4022
4022
5544
5544
5571
5571
5571

019
023
019
056
019
234
019

113
cs210
cs110
cs110
cs240
cs210
cs410
cs240

SPRING
SPRING
FALL
SPRING
SPRING
SPRING
SPRING

2004
2003
2002
2004
2004
2003
2003

70
60
100
70
85
80
50

Instead of using the name of the columns one could use their ordinal position
in the select phrase.
Example 5.16.2 An equivalent form of the query from Example 5.16.1 is
select stno, empno, cno, sem, year, grade
from GRADES
order by 1, 6 desc;

Ordering of the results can also be achieved by using expressions.


Example 5.16.3 To sort the grades based on the second digit of the course
number, and, then on the first digit of the course number (which are the fourth
and the third characters of course numbers) we write:
select * from grades
order by substr(cno,4,1), substr(cno,3,1)

This will return the following result:


STNO
---------1011
2661
3566
5544
1011
4022
1011
3566
4022
5571
2661
2661
4022
3442
5571
3566
4022
5571
5544
1011
2415

EMPNO
----------019
019
019
019
023
023
019
019
019
019
019
234
234
234
234
019
056
019
056
056
019

CNO
----cs110
cs110
cs110
cs110
cs110
cs110
cs210
cs210
cs210
cs210
cs210
cs310
cs310
cs410
cs410
cs240
cs240
cs240
cs240
cs240
cs240

SEM
YEAR
GRADE
------ ---------- ---------FALL
2002
40
FALL
2002
80
FALL
2002
95
FALL
2002
100
SPRING
2003
75
SPRING
2003
60
FALL
2003
90
FALL
2003
90
SPRING
2004
70
SPRING
2004
85
FALL
2003
70
SPRING
2004
100
SPRING
2004
75
SPRING
2003
60
SPRING
2003
80
SPRING
2003
100
SPRING
2004
80
SPRING
2003
50
SPRING
2004
70
SPRING
2004
90
SPRING
2003
100

114

SQL The Relational Language

stno
1011
2661
3566
5544
1011
4022
1011
3566
4022
5571
2661
3566
5571
1011
4022
5544
2415
2661
4022
3442
5571

empno
019
019
019
019
023
023
019
019
019
019
019
019
019
056
056
056
019
234
234
234
234

cno
cs110
cs110
cs110
cs110
cs110
cs110
cs210
cs210
cs210
cs210
cs210
cs240
cs240
cs240
cs240
cs240
cs240
cs310
cs310
cs410
cs410

sem
FALL
FALL
FALL
FALL
SPRING
SPRING
FALL
FALL
SPRING
SPRING
FALL
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING

year
2002
2002
2002
2002
2003
2003
2003
2003
2004
2004
2003
2003
2003
2004
2004
2004
2003
2004
2004
2003
2003

grade
40
80
95
100
75
60
90
90
70
85
70
100
50
90
80
70
100
100
75
60
80

Figure 5.2: Table Partitioned in Groups Based on cno

5.17

The Group-by Option

The group by clause serves to group together tuples of tables based on the
common value of an attribute or of a group of attributes. Suppose, for instance,
that we wish to partition the table GRADES into groups based on the course
number. This can be done by using a construct like
select ... from GRADES group by cno

Conceptually, we operate on the table shown in Figure 5.2. The reader should
imagine that the table has been divided into five groups, each corresponding
to one course. In the previous select, we left open the target list following
select. Once a table has been partitioned into groups (using group by), the
select construct that we use must return one or more atomic pieces of data for
every group. The term atomic, in this context, refers to simple pieces of data
(numbers, strings, etc.). By contrast, a set of values is not an atomic piece of
data. For instance, the number of students enrolled in each course can be listed
by:
select cno, count(stno) as totenr from GRADES
group by cno

This results in the table:


CNO
TOTENR
----- ----------

5.17 The Group-by Option


stno
1011
2661
3566
5544
1011
4022
1011
3566
2661
5571
4022
3566
5571
2415
5544
1011
4022
2661
4022
3442
5571

empno
019
019
019
019
023
023
019
019
019
019
019
019
019
019
056
056
056
234
234
234
234

115
cno
cs110
cs110
cs110
cs110
cs110
cs110
cs210
cs210
cs210
cs210
cs210
cs240
cs240
cs240
cs240
cs240
cs240
cs310
cs310
cs410
cs410

sem
FALL
FALL
FALL
FALL
SPRING
SPRING
FALL
FALL
FALL
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING

year
2002
2002
2002
2002
2003
2003
2003
2003
2003
2004
2004
2003
2003
2003
2004
2004
2004
2004
2004
2003
2003

grade
40
80
95
100
75
60
90
90
70
85
70
100
50
100
70
90
80
100
75
60
80

Figure 5.3: Table Partitioned in Groups Based on cno, sem, year


cs110
cs210
cs240
cs310
cs410

6
5
6
2
2

It would be an error to write a select like:


select cno, stno from GRADES
group by cno

because more than one student is enrolled in a course, and therefore the entries
of the result under the attribute stno would be sets of values rather than simple
values. SQL enforces the atomicity of the data generated by a select with
group by by demanding that any component of the target list of such a select
must be either one of the grouping attributes or a built-in function.
Example 5.17.1 Grouping can be done on more than one attribute. Suppose
that now we are interested not in the total enrollment but, rather, in the enrollment numbers for each offering of the courses, that is, in the numbers during
every semester of every year. This can be done using the select construction:
select cno, sem, year, count(stno) as enrol
from GRADES
group by cno, year, sem
order by cno, sem, year;

Conceptually, the grouping results in the groups shown in Figure 5.3.


Then, the query generates the answer:

116
CNO
----cs110
cs110
cs210
cs210
cs240
cs240
cs310
cs410

SQL The Relational Language


SEM
YEAR
ENROL
------ ---------- ---------FALL
2002
4
SPRING
2003
2
FALL
2003
3
SPRING
2004
2
SPRING
2003
3
SPRING
2004
3
SPRING
2004
2
SPRING
2003
2

Example 5.17.2 The next select construct determines the average grade and
the number of courses taken by every student and sorts the results in ascending
order on the student number:
select stno, avg(grade) as average,
count(cno) as ncourses
from GRADES
group by stno
order by stno;

We obtain the result:


STNO
AVERAGE
NCOURSES
---------- ---------- ---------1011
73.75
4
2415
100
1
2661
83.33
3
3442
60
1
3566
95
3
4022
71.25
4
5544
85
2
5571
71.66
3

Grouping can be applied in combination with selection. In such cases, selection is applied first and the resulting rows are grouped.
Example 5.17.3 The select construct that follows determines the average
grade in cs110 during successive offerings of this course:
select sem, year, avg(grade) from GRADES
where cno = cs110
group by sem, year
order by year, sem

The result of this query is:


SEM
YEAR AVG(GRADE)
------ ---------- ---------FALL
2002
78.75
SPRING
2003
67.5

5.17 The Group-by Option

117

It is possible to operate a selection on groups rather than on rows using


the clause having. The condition that follows having must be formulated to
include only data that have an atomic character for every group.
Example 5.17.4 Let us determine the average grade obtained in courses that
are taken by more than two students. After grouping the tuples of GRADES on
cno, we retain the groups that include more than two students by applying the
clause having count(grade) > 2:
select cno, avg(grade) from GRADES
group by cno
having count(grade) > 2
order by cno;

This query returns the table:


CNO
AVG(GRADE)
----- ---------cs110
75
cs210
81
cs240
81.66

The group by option offers another approach to solving divsion. To divide


the tabular variable T , whose heading is A1 Am B1 Bn by the tabular
variable S whose heading is B1 Bn we compute the number k of distinct
rows in S. Then, we seek to retrieve those m-tuples (a1 , . . . , am ) that occur in
T and are associated in that tabular variable with at least k distinct tuples.
Example 5.17.5 Recall that in Example 5.12.3 we solved the query Find
names of instructors who have taught every 100-level course, that is, every
course whose first digit of the course number is 1 by implementing division in
SQL.
Here we determine the number 100-level courses and, then we seek the employee numbers that are associated with all these courses in the GRADES table:
select name
from INSTRUCTORS,
(select empno from GRADES
where cno like cs1__
group by empno
having count(distinct cno) =
all(select count(distinct cno)
from COURSES
where cno like cs1__)) E
where INSTRUCTORS.empno = E.empno;

As expected, this will return the same result as the query discussed in Example 5.12.3.

118

5.17.1

SQL The Relational Language

The decode and case Functions

The function decode is typically used with four arguments and has the syntax:
decode(value,search value,result,default value)
The value returned by this function is:
(
r if x = s
decode(x, s, r, d) =
d otherwise.
Example 5.17.6 A course is defined as introductory if its first digit is one.
Using the decode function we can print a list of students and the courses they
took followed by an indication of their status using the query:
select stno,cno,
decode(substr(cno,3,1),1,Introductory course,Advanced course)
from grades;

Note that the first digit of the course number is the third character of the cno
value; this digit is extracted by the function substr previously discussed. The
query yields the following result:
STNO
---------1011
1011
1011
1011
2415
2661
2661
2661
3442
3566
3566
3566
4022
4022
4022
4022
5544
5544
5571
5571
5571

CNO
----cs110
cs110
cs210
cs240
cs240
cs110
cs210
cs310
cs410
cs110
cs210
cs240
cs110
cs210
cs240
cs310
cs110
cs240
cs210
cs240
cs410

DECODE(SUBSTR(CNO,3
------------------Introductory course
Introductory course
Advanced course
Advanced course
Advanced course
Introductory course
Advanced course
Advanced course
Advanced course
Introductory course
Advanced course
Advanced course
Introductory course
Advanced course
Advanced course
Advanced course
Introductory course
Advanced course
Advanced course
Advanced course
Advanced course

The decode function may accept multiple-choice arguments as in


decode(value,search value,result, [search value,result,] default value)
This variant of decode is defined by:
(
ri if x = si for 1 i n
decode(x, s1 , r1 , . . . , sn , rn , d) =
d otherwise.

5.17 The Group-by Option

119

Example 5.17.7 The following variant of the previous query will print First
year course, Second year course, etc., depending on the first digit of the course
number:
select stno,cno,
decode(substr(cno,3,1),1,First year course,
2,Second year course,
3,Third year course,
4,Fourth year course,
Special course)
from grades;

The result returned by this query is:


STNO
---------1011
1011
1011
1011
2415
2661
2661
2661
3442
3566
3566
3566
4022
4022
4022
4022
5544
5544
5571
5571
5571

CNO
----cs110
cs110
cs210
cs240
cs240
cs110
cs210
cs310
cs410
cs110
cs210
cs240
cs110
cs210
cs240
cs310
cs110
cs240
cs210
cs240
cs410

DECODE(SUBSTR(CNO,
-----------------First year course
First year course
Second year course
Second year course
Second year course
First year course
Second year course
Third year course
Fourth year course
First year course
Second year course
Second year course
First year course
Second year course
Second year course
Third year course
First year course
Second year course
Second year course
Second year course
Fourth year course

The function case is an ANSI-compliant stronger analogue of decode. It


can be used in two formats; either as:
case value
when search value result
[when search valueresult]
else default value
end
or as:
case when condition result
[when conditionresult]
else default value

120

SQL The Relational Language

end
In the first case the function returns the result that corresponds to the search
value that matches the first argument; in the second case, case returns the
result that corresponds to the first condition that is satisfied.
Example 5.17.8 Using case we can give an alternate solution to the query
solved in Example 5.17.7:
select stno,cno,
case substr(cno,3,1)
when 1 then First year course
when 2 then Second year course
when 3 then Third year course
when 4 then Fourth year course
else Special course
end
from grades;

Example 5.17.9 Suppose that the minimal passing grade is 60 for the first
and second year courses and 70 for the third and fourth year courses. We wish
to print a report that prints Passed or Failed depending on the grade and
level of the course. This can be done with the following query:
select stno,cno, grade,
case when (substr(cno,3,1) in (1,2) and grade >= 60) or
(substr(cno,3,1) in (3,4) and grade >= 70)
then Passed
else Failed
end
from grades

The result returned by this query is;


STNO
---------1011
2661
3566
5544
1011
4022
3566
5571
2415
3442
5571
1011
2661
3566
5571
4022

CNO
GRADE CASEWH
----- ---------- -----cs110
40 Failed
cs110
80 Passed
cs110
95 Passed
cs110
100 Passed
cs110
75 Passed
cs110
60 Passed
cs240
100 Passed
cs240
50 Failed
cs240
100 Passed
cs410
60 Failed
cs410
80 Passed
cs210
90 Passed
cs210
70 Passed
cs210
90 Passed
cs210
85 Passed
cs210
70 Passed

5.17 The Group-by Option


5544
1011
4022
2661
4022

cs240
cs240
cs240
cs310
cs310

5.17.2

70
90
80
100
75

121

Passed
Passed
Passed
Passed
Passed

The rollup and cube Extensions of group by

For analyzing complex data, we often wish to partition data into blocks and then
calculate subtotals for these blocks. For example, we may wish to analyze sales
data by geographical region, so we want to calculate values for New England, the
Midwest, the South, etc. Such analyses are faciliatated by ORACLEs rollup
extension of group by.
Example 5.17.10 Suppose that we need to print a report summarizing the
number of grades given in every course by every instructor. We wish to print
subtotals for every course and then a general total for all courses. This can be
done in SQL using three subqueries (each containing a group by clause) as
follows:
select cno,empno,count(grade)
from grades
group by cno,empno
union
select cno,,count(grade)
from grades
group by cno
union
select ,,count(grade)
from grades;

The result of this query is given below:


CNO
----cs110
cs110
cs110
cs210
cs210
cs240
cs240
cs240
cs310
cs310
cs410
cs410

EMPNO
COUNT(GRADE)
----------- -----------019
4
023
2
6
019
5
5
019
3
056
3
6
234
2
2
234
2
2
21

122

SQL The Relational Language

It is clear that the execution of this query entails three scans of the table
GRADES followed by the computation of the unions. The result is sorted because
of the use of the union operation.
In SQL Plus we can replace the cumbersome query used in Example 5.17.10
by:
select cno,empno,count(grade)
from grades
group by rollup(cno,empno);

which produces exactly the same result. Note that after the number of grades
for the first two groups are reported in the first two detail rows a blank is printed
for the empno of the third row; this is the rollup way of indicating that this
row contains the subtotal number of grades for the course cs110. A new detail
row follows for cs210 and, since this course is taught only by the employee 019,
the next row contains a subtotal for this course, etc. Finally, the last row, with
blank for the first two columns is the total number of grades for all courses.
We conclude that the rollup extension of group by generates subtotals in
increasing order of aggregation until all expressions in the group by clause are
rolled up.
Example 5.17.11 The next example uses three grouping attributes cno, empno,
stno:
select cno,empno,stno,count(grade)
from grades
group by rollup(cno,empno,stno)

This generates the following result:


CNO
----cs110
cs110
cs110
cs110
cs110
cs110
cs110
cs110
cs110
cs210
cs210
cs210
cs210
cs210
cs210
cs210
cs240
cs240
cs240

EMPNO
----------019
019
019
019
019
023
023
023
019
019
019
019
019
019
019
019
019

STNO
COUNT(GRADE)
---------- -----------1011
1
2661
1
3566
1
5544
1
4
1011
1
4022
1
2
6
1011
1
2661
1
3566
1
4022
1
5571
1
5
5
2415
1
3566
1
5571
1

5.17 The Group-by Option


cs240
cs240
cs240
cs240
cs240
cs240
cs310
cs310
cs310
cs310
cs410
cs410
cs410
cs410

019
056
056
056
056

1011
4022
5544

234
234
234

2661
4022

234
234
234

3442
5571

123
3
1
1
1
3
6
1
1
2
2
1
1
2
2
21

The order in which attributes are rolled up influences the result of the query
as the next example shows:
Example 5.17.12 Suppose that we invert the grouping attributes cno and
empno as in
select empno,cno, count(grade)
from grades
group by rollup(empno,cno);

This will result in:


EMPNO
----------019
019
019
019
023
023
056
056
234
234
234

CNO
COUNT(GRADE)
----- -----------cs110
4
cs210
5
cs240
3
12
cs110
2
2
cs240
3
3
cs310
2
cs410
2
4
21

Note that this time the subtotals are computed for every employee, and then,
for all employees.
Partial rollups, that is, rollups that involve only a subset of the grouping
attributes, are always possible as shown in the next example.
Example 5.17.13 Suppose that we need to count the number of times a student takes a course and the number of course offerings a student took. This can
be achieved by:

124

SQL The Relational Language

select stno,cno,count(grade) from grades


group by stno,rollup(cno);

which generates the following result:


STNO
---------1011
1011
1011
1011
2415
2415
2661
2661
2661
2661
3442
3442
3566
3566
3566
3566
4022
4022
4022
4022
4022
5544
5544
5544
5571
5571
5571
5571

CNO
COUNT(GRADE)
----- -----------cs110
2
cs210
1
cs240
1
4
cs240
1
1
cs110
1
cs210
1
cs310
1
3
cs410
1
1
cs110
1
cs210
1
cs240
1
3
cs110
1
cs210
1
cs240
1
cs310
1
4
cs110
1
cs240
1
2
cs210
1
cs240
1
cs410
1
3

This shows that the student whose number is 1011 took four course offerings
and repeated cs110. Note that for a partial rollup no general total is produced.
The rollup extension is especially useful when there exists a natural order
on the attributes of a table, as is in the next example.
Example 5.17.14 Suppose that we have the table SALES that contains records
of sales in a chain of department stores that is present in several regions of the
country: the North East (NE), South East (SE), and Midwest (MW).
REGION
---------NE
NE

ST
-NY
NY

CITY
--------------New York City
New York City

STORENO
SALESVOL
---------- ---------55
1000
67
800

5.17 The Group-by Option


NE
NE
NE
SE
SE
SE
SE
SE
MW
MW
MW
MW
MW

NY
MA
MA
FL
FL
GA
GA
GA
OH
KS
KS
KS
KS

Syracuse
Worcester
Boston
Miami
Miami
Atlanta
Atlanta
Augusta
Athens
Topeka
Lawrence
Lawrence
Wichita

125
90
41
83
62
74
60
52
95
48
33
72
09
38

600
1000
750
450
900
500
1100
300
590
860
300
700
900

Clearly, we have a geographical hierarchy of the attributes of this table:


region, st, city, and storeno. To study the total sales in each state we can use
the following rollup query:
select region, state, sum(salesvol)
from sales
group by rollup(region,state);

This yields the following result:


REGION
---------MW
MW
MW
NE
NE
NE
SE
SE
SE

ST SUM(SALESVOL)
-- ------------KS
2760
OH
590
3350
MA
1750
NY
2400
4150
FL
1350
GA
1900
3250
10750

We may want to drill-down in the hierarchy of attributes, to analyze the


sales in each city. This is accomplished by:
select region, state, city, sum(salesvol)
from sales
group by rollup(region, state, city)

This yields the following result:


REGION
---------MW
MW
MW

ST
-KS
KS
KS

CITY
SUM(SALESVOL)
--------------- ------------Lawrence
1000
Topeka
860
Wichita
900

126
MW
MW
MW
MW
NE
NE
NE
NE
NE
NE
NE
SE
SE
SE
SE
SE
SE

SQL The Relational Language


KS
OH Athens
OH
MA
MA
MA
NY
NY
NY

Boston
Worcester
New York City
Syracuse

FL Miami
FL
GA Atlanta
GA Augusta
GA

2760
590
590
3350
750
1000
1750
1800
600
2400
4150
1350
1350
1600
300
1900
3250

Another useful extension of group by is cube. The rollup extension summarizes at increasing levels of aggregation from left to right; in contrast, cube
summarizes at all possible levels of aggregation.
Example 5.17.15 A full aggregation can be achieved by using cube as in:
select cno,empno,count(grade)
from grades
group by cube(cno,empno);

This will produce the following results:


CNO
----cs110
cs110
cs110
cs210
cs210
cs240
cs240
cs240
cs310
cs310
cs410
cs410

EMPNO
COUNT(GRADE)
----------- -----------019
4
023
2
6
019
5
5
019
3
056
3
6
234
2
2
234
2
2
019
12
023
2
056
3
234
4
21

5.17 The Group-by Option

127

The order of aggregation of the attributes influences the presentation of the


result. For example, the query:
select empno,cno,count(grade)
from grades
group by cube(empno,cno);

will result in
EMPNO
----------019
019
019
019
023
023
056
056
234
234
234

CNO
COUNT(GRADE)
----- -----------cs110
4
cs210
5
cs240
3
12
cs110
2
2
cs240
3
3
cs310
2
cs410
2
4
cs110
6
cs210
5
cs240
6
cs310
2
cs410
2
21

The totals computed by either of these cubes are shown in Figure 5.4.
Partial cube aggregations include group by clauses of the form
group by A1 , . . . , Ak , cube (B1 , . . . , B )
and compute total values of an aggregate function for all groups that can be
obtained for values of A1 , . . . , Ak and all combinations of values of B1 , . . . , Bk .
Example 5.17.16 The partial cube aggregation:
select cno,empno,stno,count(grade) from grades
group by cno,cube(empno,stno)

yields the following result:


CNO
----cs110
cs110
cs110
cs110
cs110
cs110

EMPNO
----------019
019
019
019
019
023

STNO
COUNT(GRADE)
---------- -----------1011
1
2661
1
3566
1
5544
1
4
1011
1

128

SQL The Relational Language

Total for

cno

course

cs410

cs310

cs240

cs210

cs110

21

5
4

019

023

056

12

234 empno
4

Total for
employee

Figure 5.4: Totals computed by the aggregate cube(cno,empno)

5.17 The Group-by Option


cs110
cs110
cs110
cs110
cs110
cs110
cs110
cs110
cs210
cs210
cs210
cs210
cs210
cs210
cs210
cs210
cs210
cs210
cs210
cs210
cs240
cs240
cs240
cs240
cs240
cs240
cs240
cs240
cs240
cs240
cs240
cs240
cs240
cs240
cs240
cs310
cs310
cs310
cs310
cs310
cs310
cs410
cs410
cs410
cs410
cs410
cs410

023
023

4022
1011
2661
3566
4022
5544

019
019
019
019
019
019

1011
2661
3566
4022
5571
1011
2661
3566
4022
5571

019
019
019
019
056
056
056
056

2415
3566
5571
1011
4022
5544
1011
2415
3566
4022
5544
5571

234
234
234

2661
4022
2661
4022

234
234
234

53 rows selected.

3442
5571
3442
5571

129
1
2
2
1
1
1
1
6
1
1
1
1
1
5
1
1
1
1
1
5
1
1
1
3
1
1
1
3
1
1
1
1
1
1
6
1
1
2
1
1
2
1
1
2
1
1
2

130

SQL The Relational Language

The grouping function allows us to identify those rows in a cube or rollup


that serve to summarize other rows and, therefore, contain null components.
Namely, grouping(A) returns 1 for those A-compnents of rows that contain null
values and 0, otherwise.
Example 5.17.17 Consider again the cube query discussed in Example 5.17.15.
The summarization query suppemented by the use of the function grouping:
select cno, empno, count(grade) as nogr,
grouping(cno) as c, grouping(empno) as e
from grades
group by cube(cno,empno)

returns the following results:


CNO
----cs110
cs110
cs110
cs210
cs210
cs240
cs240
cs240
cs310
cs310
cs410
cs410

EMPNO
NOGR
C
E
----------- ---------- ---------- ---------019
4
0
0
023
2
0
0
6
0
1
019
5
0
0
5
0
1
019
3
0
0
056
3
0
0
6
0
1
234
2
0
0
2
0
1
234
2
0
0
2
0
1
019
12
1
0
023
2
1
0
056
3
1
0
234
4
1
0
21
1
1

17 rows selected.

In turn, we can use the grouping values and the having clause to retain only
certain summary rows as in
select cno, empno, count(grade) as nogr,
grouping(cno) as c, grouping(empno) as e
from grades
group by cube(cno,empno)
having grouping(cno) = 1 or grouping(empno) = 1

This query returns:


CNO
EMPNO
NOGR
C
E
----- ----------- ---------- ---------- ---------cs110
6
0
1

5.18 Analytical Capabilities of SQL Plus


cs210
cs240
cs310
cs410
019
023
056
234

5.18

5
6
2
2
12
2
3
4
21

0
0
0
0
1
1
1
1
1

131
1
1
1
1
0
0
0
0
1

Analytical Capabilities of SQL Plus

ORACLE includes enhancements to SQL called analytic functions that allow it


to produce quite refined reports. These features reduce the need to use external
reporting tools and simplify statistical data analysis.
Analytical functions compute a value for each row of a query. These values
are, in turn, based on a set of rows that is computed for each row and may be
considered to appear in a sliding window. This set of rows is known as a window
and it is specified by the analytical clause, which is the parenthesizd expression
that follows the reserved word over.
An example of the use of an analytic function (which we discuss in detail
in Example 5.18.2) is the following query that computes a list of students, the
number of courses they took, and their grade point average.
select STUDENTS.name, GA.noc as no_of_c, GA.gpa as gpa,
rank() over (partition by GA.noc
order by GA.gpa desc) as rank
from (select stno,
count(distinct cno) as noc,
avg(grade) as gpa
from GRADES
group by stno) GA, STUDENTS
where STUDENTS.stno = GA.stno

The function rank that we use in this query computes for each row a numerical rank starting from the content of the window.
The analytical clause used in the previous example indicates that the rows
retrieved by the query are partitioned based on the value of the number of
credits (noc) and, then in each group the rows are ordered according to the
values of the gpa attribute.
In general, the computation of the analytical clause is done after the computation of the from, where, group by, and having clauses.
Analytic functions are classified as shown in the table below:

132

SQL The Relational Language


FUNCTION CLASS
Ranking Functions
Windowing Functions
Reporting Functions
Lag/Lead Functions
Statistical Functions

USAGE
Calculating ranks, percentiles and n-tiles
Cumulative and moving averages
Calculating shares
Finding a value in a row located a
specified number of rows from the current row
Linear regression and other statistics

Processing involving analytic functions involves three phases:


1. computation of products, selections, grouping, and having clauses;
2. application of analytic functions to the resulting sets of rows;
3. processing of the final order by clauses.
The results of the first phase can be partitioned. Partitions are created after
the groups defined by the group by clauses and, thus, may use any aggregate
functions such as sum, count, etc.
For each row in a partition, a sliding window of data may be defined. The
window determines a sequence of rows that is used to perform calculations for
the current row. Window sizes can be specified as numbers of rows or can be
determined by intervals in a domain. Either end of a window or both ends can
move, depending on the definition of the window.
Each computation involving an analytic function is based on a current row .
This row serves as reference for the ends of the window.

5.18.1

Ranking Functions

SQL Plus contains the ranking functions rank() and dense_rank() that can
be use to rank tuples in an order determined by certain attributes or expressions. Both functions generate ranks in either ascending or descending order,
but dense_rank() does not leave gaps in rank numbers when a tie occurs. The
default order is, as usual, ascending order.
Example 5.18.1 To rank the grade records based on the grade obtained in
any course we may write:
select stno, grade,
rank() over (order by grade)
from grades;

This will return the result:


STNO
GRADE RANK
---------- ---------- ---1011
40
1
5571
50
2
4022
60
3
3442
60
3
2661
70
5
5544
70
5
4022
70
5

5.18 Analytical Capabilities of SQL Plus


1011
4022
2661
5571
4022
5571
1011
1011
3566
3566
5544
2415
2661
3566

75
75
80
80
80
85
90
90
90
95
100
100
100
100

133

8
8
10
10
10
13
14
14
14
17
18
18
18
18

where the highest ranking is attributed to the grade record that involves the
lowest grade. To reverse the ranking we write:
select stno, grade,
rank() over (order by grade desc)
from grades;

which yields:
STNO
GRADE RANK
---------- ---------- ---5544
100
1
3566
100
1
2415
100
1
2661
100
1
3566
95
5
1011
90
6
1011
90
6
3566
90
6
5571
85
9
2661
80
10
5571
80
10
4022
80
10
1011
75
13
4022
75
13
2661
70
15
5544
70
15
4022
70
15
4022
60
18
3442
60
18
5571
50
20
1011
40
21

Note that the first four grade records are tied for the first place; therefore,
the record that follows the tied records has rank 5. With the dense_rank() all

134

SQL The Relational Language

four tied records will have rank 1 and the record that follows will have rank 2.
This can be achieved by writing:
select stno, grade,
dense_rank() over (order by grade desc) as den_rank
from grades;

This query returns:


STNO
GRADE DEN_RANK
---------- ---------- -------5544
100
1
3566
100
1
2415
100
1
2661
100
1
3566
95
2
1011
90
3
1011
90
3
3566
90
3
5571
85
4
2661
80
5
5571
80
5
4022
80
5
1011
75
6
4022
75
6
2661
70
7
5544
70
7
4022
70
7
4022
60
8
3442
60
8
5571
50
9
1011
40
10

It is possible to use aggregate functions in computing rankings.


Example 5.18.2 To rank the students in order of the number of courses they
have taken we could write:
select STUDENTS.name, GA.noc as no_of_courses,
dense_rank() over (order by GA.noc desc) as den_rank
from (select stno, count(distinct cno) as noc
from grades
group by stno) GA, STUDENTS
where STUDENTS.stno = GA.stno;

This generates the result:


NAME
NO_OF_COURSES
DEN_RANK
----------------------------------- -------Prior Lorraine
4
1
Edwards P. David
3
2

5.18 Analytical Capabilities of SQL Plus


Mixon Leatha
Pierce Richard
Lewis Jerry
Rawlings Jerry
Grogan A. Mary
Novak Roland

3
3
3
2
1
1

135
2
2
2
3
4
4

If we wish to rank the students based on the number of courses and, then,
at an equal number of courses, to rank them in the order of the grade point
average, we could write the following query:
select STUDENTS.name, GA.noc as no_of_c, GA.gpa as gpa,
rank() over (partition by GA.noc
order by GA.gpa desc) as rank
from (select stno,
count(distinct cno) as noc,
avg(grade) as gpa
from GRADES
group by stno) GA, STUDENTS
where STUDENTS.stno = GA.stno

The partition by option establishes groups of equal GA.noc value, and then it
ranks the record in each such group using the order by clause. The result of
this query is:
NAME
NO_OF_C
GPA
RANK
------------------------ ---------- ---------Grogan A. Mary
1
100
1
Novak Roland
1
60
2
Rawlings Jerry
2
85
1
Pierce Richard
3
95
1
Mixon Leatha
3
83.33
2
Edwards P. David
3
73.75
3
Lewis Jerry
3
71.66
4
Prior Lorraine
4
71.25
1
8 rows selected.

In general, the expression in the partition by clause divides the set of rows
that results from the query in groups and the rank() function operates within
these groups; in other words, rank() is reset when the defining expression of the
group changes. The order by clause attached to the rank specifies the ranking
criterion and the order of the rows in each group.

5.18.2

Top-n Queries

Top-n queries ask for the n largest or smallest values of a column. Such queries
are solved in ORACLE using the pseudo-attribute ROWNUM which assigns a value

136

SQL The Relational Language

starting with 1 to each of the rows returned by a subquery. Thus, a top-n query
in SQL Plus requires the following elements:
1. a subquery containing the order by clause that ensures that the rows
retrieved by the subquery are placed in the proper order;
2. the main query that includes the ROWNUM pseudo-attribute and may include
a where clause to specify the number of returned rows.
Example 5.18.3 To retrieve the top three students in the order of their grade
point averages we write:
select ROWNUM as rank, name, avgg from
(select STUDENTS.stno, STUDENTS.name, avg(grade) as avgg
from STUDENTS, GRADES
where STUDENTS.stno = GRADES.stno
group by STUDENTS.stno, STUDENTS.name
order by avg(grade) desc)
where ROWNUM <= 3

This will return:


RANK
-----1
2
3

NAME
AVGG
--------------- ------Grogan A. Mary
100
Pierce Richard
95
Rawlings Jerry
85

To retrive the bottom-3 students all we need to do is to invert the ordering


in the subquery. This can be achieved by either replacing desc with asc, or by
omitting desc altogether (since the default is asc). Thus, the phrase:
select ROWNUM as rank, name, avgg from
(select STUDENTS.stno, STUDENTS.name, avg(grade) as avgg
from STUDENTS, GRADES
where STUDENTS.stno = GRADES.stno
group by STUDENTS.stno, STUDENTS.name
order by avg(grade))
where ROWNUM <= 3;

will yield:
RANK
---1
2
3

NAME
--------------Novak Roland
Prior Lorraine
Lewis Jerry

AVGG
----60
71.25
71.67

Example 5.18.4 Ties between rows may eliminate rows that we would expect
to see in results of our queries. The next query

5.18 Analytical Capabilities of SQL Plus

137

select STUDENTS.stno, STUDENTS.name,


count(distinct cno) as noc
from STUDENTS, GRADES
where STUDENTS.stno = GRADES.stno
group by STUDENTS.stno, STUDENTS.name
order by count(distinct cno) desc;

lists students in decreasing order of the number of courses they took:


STNO
---------4022
1011
2661
3566
5571
5544
2415
3442

NAME
NOC
--------------------Prior Lorraine
4
Edwards P. David
3
Mixon Leatha
3
Pierce Richard
3
Lewis Jerry
3
Rawlings Jerry
2
Grogan A. Mary
1
Novak Roland
1

To retrieve the first four students among the ones who took the largest
number of courses we write:
select ROWNUM as rank, name, noc
from (select STUDENTS.stno, STUDENTS.name,
count(distinct cno) as noc
from STUDENTS, GRADES
where STUDENTS.stno = GRADES.stno
group by STUDENTS.stno, STUDENTS.name
order by count(distinct cno) desc)
where ROWNUM <= 4

Note that the result returned by this query:


RANK NAME
NOC
---------- --------------1 Prior Lorraine
4
2 Edwards P. David
3
3 Mixon Leatha
3
4 Pierce Richard
3

eliminates the student Lewis Jerry.


A more complicated example involves using two subquery rankings.
Example 5.18.5 Suppose that we need to find, as above, the top three students; in addition, we need to find for each of these students their ranking from
the point of view of the number of courses they took. This can be done using
the query:
select ROWNUM as gr_rank, name, c_rank from
(select name, avgg, ROWNUM as c_rank from
(select name, avg(grade) as avgg, count(distinct cno) as nc
from STUDENTS S, GRADES G
where S.stno = G.stno

138

SQL The Relational Language


group by S.stno,S.name
order by nc desc)
order by avgg desc)
where ROWNUM <= 3

which will return:


GR_RANK
------1
2
3

5.18.3

NAME
C_RANK
-------------- -----Grogan A. Mary
7
Pierce Richard
4
Rawlings Jerry
6

Windowing functions in SQL Plus

Windowing functions are used in SQL Plus to compute cumulative, moving, and
other aggregate functions applied to a set of tuples called a window. The size
and shape of the window is always defined relative to a row in a block; this
reference row is called the current row.
Aggregate functions that can be used include sum, avg, min, max, statistical functions (discussed in Section 5.19), as well as two special functions,
first value and last value that return the first and last values in a window.
Example 5.18.6 To compute the evolution of the grade average for each student as he or she advances towards graduation, we can write a query that returns
the cumulative average for each student for the sequence of semesters when the
student is active:
select stno, year, sem,
avg(grade) over (partition by stno
order by year, sem desc
rows unbounded preceding) as ag
from grades
order by stno, year, sem desc;

This will return:


STNO
YEAR SEM
AG
---------- ---------- ------ ---------1011
2002 FALL
40
1011
2003 SPRING
57.50
1011
2003 FALL
68.33
1011
2004 SPRING
73.75
2415
2003 SPRING
100
2661
2002 FALL
80
2661
2003 FALL
75
2661
2004 SPRING
83.33

5.19 Statistics in SQL


3442
3566
3566
3566
4022
4022
4022
4022
5544
5544
5571
5571
5571

2003
2002
2003
2003
2003
2004
2004
2004
2002
2004
2003
2003
2004

139

SPRING
FALL
SPRING
FALL
SPRING
SPRING
SPRING
SPRING
FALL
SPRING
SPRING
SPRING
SPRING

60
95
97.5
95
60
65
70
71.25
100
85
50
65
71.67

The words unbound preceding mean that the window over which we compute
the grade average extends to all rows that involve the same student and precede
the current row.
The syntax of the windowing functions is:
aggregate function (value expression | *)
over ([partition byvalue expression{,value expression}]
order by value expression [collate clause]
[asc | desc] [nulls first | nulls last]
{,value expression [collate clause]
[asc | desc] [nulls first | nulls last}
[rows | range]
[[unbounded preceding | value expression preceding] |
between [unbounded preceding | value expression preceding]
andhcurrent row | value expression following

5.19

Statistics in SQL

SQL Plus is equipped with a large collection of statistical functions which we


discuss in this section. These function are incorporated in the newest SQL
standard, SQL2003.

5.19.1

Variance and Correlation

Population and sample variance can be computed using the functions var pop
and var samp, respectively. Both functions take an attribute as argument and
apply to the remaining non-null values. If the sequence of values of an attribute
A is (x1 , . . . , xn ), then the population variance is:
var pop(A) =

Pn

i=1 (xi

x
)2

Pn

i=1

x2i (
n2

Pn

i=1

xi )

140

SQL The Relational Language

and the sample variance is:


var samp(A) =
Pn

Pn

x
)2
n
=
n1

i=1 (xi

Pn

i=1

P
2
x2i ( ni=1 xi )
,
n(n 1)

i
. As it is shown in statistics, the sample variance is an
where x
= i=1
n
unbiased estimator of the theoretical variance.

Example 5.19.1 To determine the population variance for the grade population of each student we group the records of GRADES on the student number
and then compute the population variance for each group. This is done by the
following select phrase:
select stno, var_pop(grade)
from GRADES
group by stno;

which returns:
STNO
VAR_POP(GRADE)
---------- -------------1011
417.18
2415
0
2661
155.55
3442
0
3566
16.66
4022
54.68
5544
225
5571
238.88

Similarly, the sample variance of the same populations is computed by:


select stno, var_samp(grade)
from GRADES
group by stno;

which yields:
STNO
VAR_SAMP(GRADE)
---------- --------------1011
556.25
2415
2661
233.33
3442
3566
25
4022
72.91
5544
450
5571
358.33
8 rows selected.

To compute the population variance grade over the entire GRADES table we
write:

5.19 Statistics in SQL

141

select var_pop(grade)
from GRADES;

which gives:
VAR_POP(GRADE)
-------------275.283447

A similar select
select var_samp(grade)
from GRADES;

produces the sample variance for the entire table:


VAR_SAMP(GRADE)
--------------289.047619

If the set of values of the sample contains one value, then the function
var samp returns a null value. This is the case in the query:
select var_samp(grade)
from GRADES
where stno= 1011 and cno = cs110
and year = 1999;

which yields:
VAR_SAMP(GRADE)
---------------

On another hand, a similar function called variance returns 0 whenever the


population contains a single value; otherwise, variance returns the sample
variance. For instance, the query
select variance(grade)
from GRADES
where stno =1011 and cno = cs110
and year =1999;

returns:
VARIANCE(GRADE)
--------------0

The population standard deviation and the sample standard deviation that are
the square roots of the population and the sample variance, respectively, can be
computed using the functions stddev pop and stddev samp, respectively.
Example 5.19.2 To compute the population standard deviation of the set of
values of the grade for each student we write:

142

SQL The Relational Language

select stno, stddev_pop(grade)


from GRADES
group by stno;

This yields the following answer:


STNO
STDDEV_POP(GRADE)
---------- ----------------1011
20.42
2415
0
2661
12.47
3442
0
3566
4.08
4022
7.39
5544
15
5571
15.45
8 rows selected.

Similarly, the sample standard deviation can be obtained by:


select stno, stddev_samp(grade)
from GRADES
group by stno;

which generates:
STNO
STDDEV_SAMP(GRADE)
---------- -----------------1011
23.58
2415
2661
15.27
3442
3566
5
4022
8.53
5544
21.21
5571
18.92
8 rows selected.

The population and the sample covariances between the values that appear
under the attributes T.A and S.B are computed using the functions covar pop
and covar samp, respectively, as in the following select phrases:
select covar_pop(T.A,S.B) from T,S where T.C = S.D;
select covar_samp(T.A,S.B) from T,S where T.C = S.D;

Example 5.19.3 The table sstudy contains whose creation was described in
Appendix B records the number of hours slept during three successive nights
by a group of students. To determine the population covariance between the
average number of hours slept and the grade point average of the students we
write:

5.19 Statistics in SQL

143

select covar_pop(g.avggrade, s.avghours)


from (select stno, avg(grade) as avggrade
from GRADES
group by stno) g,
(select stno, avg(no_hours) as avghours
from SSTUDY
group by stno) s
where g.stno = s.stno;

This will return the answer:


COVAR_POP(G.AVGGRADE,S.AVGHOURS)
-------------------------------11.2673611

Similarly, to compute the sample covariance we use the query


select covar_samp(g.avggrade, s.avghours)
from (select stno, avg(grade) as avggrade
from GRADES
group by stno) g,
(select stno, avg(no_hours) as avghours
from SSTUDY
group by stno) s
where g.stno = s.stno;

which produces the result:


COVAR_SAMP(G.AVGGRADE,S.AVGHOURS)
--------------------------------12.8769841

Correlations are computed using the function corr.


Example 5.19.4 The correlation coefficient between the grade point average
and the average number of hours slept is computed by:
select corr(g.avggrade, s.avghours)
from (select stno, avg(grade) as avggrade
from GRADES
group by stno) g,
(select stno, avg(no_hours) as avghours
from SSTUDY
group by stno) s
where g.stno = s.stno;

The answer is:


CORR(G.AVGGRADE,S.AVGHOURS)
--------------------------.961293724

144

SQL The Relational Language

5.19.2

Linear Regression

Regression is a supervised learning activity by which we seek to identify the


link that exists between input and output data of an experiment starting from
a sequence of inputs and the corresponding observations of the outputs. If we
attempt to find this link as a linear function, then we apply linear regression.
Suppose that the input data is x1 , , xn and the corresponding output
sequence is y1 , . . . , yn and we seek to determine the linear function f (x) = ax+b
such that values yi are as close as possible to axi + b for 1 i n. This is
achieved by minimizing the total square error given by:
E=

n
X

(yi (axi + b))2 .

i=1

It is possible toP
show that
minimum of E is achieved when:
Pthe P
n xi yi xi yi
a =
P
P 2
(n x2 xi )
P P i2 P P
yi xi xi xi yi
b =
.
P
P 2
(n x2i xi )
Thus, we obtain the regression line y = ax + b, where a is the slope and b is
the intercept. These numbers are computed by the functions regr slope and
regr intercept, respectively. Both take as arguments the averages of the xsequence and the y-sequence. The quality of the regression line obtained can be
evaluated using the goodness of fit regr r2 which takes the same arguments as
the functions mentioned above.
Example 5.19.5 To compute the regression parameters for the sequences of
average grades and the sequence of average hours of nightly sleep for all students
we write:
select regr_count(g.avggrade, s.avghours) as rc,
regr_avgx(g.avggrade, s.avghours) as avgx,
regr_avgy(g.avggrade, s.avghours) as avgy,
regr_slope(g.avggrade, s.avghours) as slope,
regr_intercept(g.avggrade, s.avghours) as interc,
regr_r2(g.avggrade, s.avghours) as gof
from (select stno, avg(grade) as avggrade
from GRADES
group by stno) g,
(select stno, avg(no_hours) as avghours
from SSTUDY
group by stno) s
where g.stno = s.stno;

This query returns the following result:


RC
AVGX
AVGY
SLOPE
INTERC
GOF
---------- ---------- ---------- ---------- ---------- ---------8 7.08333333
80 12.7755906 -10.493766 .924085625

5.20 Graphs in SQL Plus

145
3
0 e

3
e

?
4 e

?
?
1 e 5 e
6
se
2

we
3 6
7

Figure 5.5: Drawing of the graph G

5.20

Graphs in SQL Plus

Graphs represent binary relations on sets, in the sense of the following definition.
A graph is defined as a pair of sets G = (V, E), where V is the set of vertices of
G and E V V is the set of edges of G. Clearly, E is a binary relation on V .
If (u, v) E, we say that u is origin of the edge (u, v) and v is destination
of the same edge. A graph can be drawn by representing the vertices by points
and edges by arrows. Namely, if (u, v) is an edge, we draw in arrow that begins
at u and ends at v.
Example 5.20.1 Consider the graph G = (V, E), where V = {0, 1, 2, 3, 4, 5, 6}
and E = {(0, 1), (0, 3), (1, 2), (2, 5), (2, 6), (3, 4), (3, 6), (4, 5), (5, 6)}. This graph
is drawn in Figure 5.5.
Graphs can be represented by tables that have the heading origin destination. Each edge (u, v) corresponds to a pair in the table. Clearly, for any graph
the corresponding table contains the same information as the graph.
Example 5.20.2 The graph of Example 5.20.1 is represented by the table:

origin
0
0
1
2
2
3
3
4
5

GRAPH
destination
1
3
2
5
6
4
6
5
6

To create this table use the script included in Appendix F.


A path in the graph G = (V, E) joins v0 to vn is a sequence of vertices (v0 , v1 , . . . , vn )
such that (vi , vi+1 ) is an edge in G for 0 i n 1. We refer to v0 as the
origin of the path and to vn as the destination of the path. The number n is the

146

SQL The Relational Language

length of the path. A path that begins and ends in the same vertex is a cycle
or a loop. If a graph has no cycles, then we say that the graph is acyclic. Note
that the graph defined in Example 5.20.1 is acyclic.
We write (u, v) E + if there exists a path of length at least 1 that has u as
its origin and v as its destination. The relation E + is transitive closure of the
relation E.
Example 5.20.3 The transitive closure of the relation E defined by the graph
of Example 5.20.1 consists of the following pairs:
(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6),
(1, 2), (1, 5), (1, 6), (2, 5), (2, 6), (3, 4),
(3, 5), (3, 6), (4, 5), (4, 6)
as can be easily seen by inspecting Figure 5.5.
Of course, the transitive closure E + of a relation E V V is itself a relation
on the set V and, therefore, it can also be represented as a table. Namely, the
tabular representation of E + is:
GRAPHPLUS
origin
destination
0
1
0
2
0
3
0
4
0
5
0
6
1
2
1
5
1
6
2
5
2
6
3
4
3
5
3
6
4
5
4
6

It is possible to prove that the table GRAPHPLUS can not be computed


through the operations of the relational algebra. (see [Maier, 1983], for example). However, SQL Plus allows us to compute the table GRAPHPLUS, when the
underlying graph G is acyclic.
This is accomplished using the clause connect by of select. This clause
establishes links between tuples of a table and can be used to retrieve the vertices
of a graph that can be accessed by paths that start from a certain vertex. Its
syntax is defined by:
[start with condition ] connect by condition
For example, the chaining condition of the nodes in a path of a graph is
described by connect by origin = prior destination. Thus, to obtain the
set of vertices that are accessible from the vertex 4 in the graph shown in
Figure 5.5 we write:

5.20 Graphs in SQL Plus

147

select distinct destination from graph


start with origin = 4
connect by origin = prior destination;

This will yield the result


DESTINATION
----------5
6

consistent with the structure of the graph.


The connect by clause cannot be applied to graphs that contain cycles. If
this is the case, ORACLE detects the existence of loop and returns an error
message.
Example 5.20.4 Suppose that we add an edge to the graph shown in Figure 5.5 that creates a loop, for example, the edge (5, 0), which creates the loop
(0, 3, 4, 5, 0). This can be done by
insert into tree(origin, destination)
values(5,0);

Now, if we try the query


select distinct destination from graph
start with origin = 0
connect by origin = prior destination;

we obtain the error message:


ORA-01436: CONNECT BY loop in user data

In Chapter 8 we discuss an algorithm that can be used for compute the transitive
closure for arbitrary graphs (with or without loops).
If a data set has a hierarchical structure, then it can be described by a rooted
tree, that is, by a special acyclic graph G = (V, E) that has a distinguished
vertex v0 called root such that for every other vertex v of the graph there is
a unique path that joins v0 to v. It is not difficult to show that for any two
distinct vertices u, v of a rooted tree there exists at most one path that joins u
to v. If such a path exists then we say that v is a descendant of u.
Example 5.20.5 The option connect by of SQL Plus can be used to find the
descendants of a vertex in a rooted tree. Consider, for example the rooted tree
shown in Figure 5.6. The table that represents this tree is created by the SQL
script included in Appendix F and has the form:

148

SQL The Relational Language


0

f

1
f
4

f
?
f

s
3 f

Uf
f/7 ?
f wf
5 6
8
f Uf
11
12

f
9

Uf
10

Figure 5.6: Rooted Tree

origin
0
0
0
1
1
2
2
2
3
3
7
7

TREE
destination
1
2
3
4
5
6
7
8
9
10
11
12

To retrieve the all descendants of a vertex (in this case, of vertex 2) we write:
select distinct destination as DESCENDANTS from tree
start with origin = 2
connect by origin = prior destination;

This returns:
DESCENDANTS
----------6
7
8
11
12

On the other hand, to retrieve the ancestors of a vertex, that is all vertices
that occur between the root of the tree and a vertex we write:
select distinct origin as ANCESTORS from tree
start with destination = 12
connect by destination = prior origin;

This will retrieve the table:

5.20 Graphs in SQL Plus

149

ANCESTORS
---------0
2
7

The reserved word prior can be used on either side of the equality sign. For
example, the last query of Example 5.20.5 can be written as:
select distinct origin ANCESTORS from tree
start with destination = 12
connect by prior origin = destination;

The pseudo-attribute LEVEL can be used to indicate the length of the path
that begins at the starting vertex of the query and ends with the vertex currently
retrieved.
Example 5.20.6 The following query adds the pseudo-attribute LEVEL to the
query of Example 5.20.5 that retrieves the descendants of the vertex 2:
select distinct level, destination as DESCENDANTS from tree
start with origin = 2
connect by origin = prior destination

This yields the result:


LEVEL DESCENDANTS
---------- ----------1
6
1
7
1
8
2
11
2
12

Observe that the immediate descendants are at level 1 and the next level of
descendants at level 2.
If we retrieve the ancestor of a node as in
select distinct level, origin as ANCESTORS from tree
start with destination = 12
connect by destination = prior origin;

the values of LEVEL reflects the distance (in number of edges) between the vertex
and its various ancestors:
LEVEL ANCESTORS
---------- ---------1
7
2
2
3
0

150

SQL The Relational Language

Example 5.20.7 Combining the string function lpad and the pseudo-attribute
LEVEL allows us to display the entire tree using indentations. The query:
select level,lpad(*,2 * level -1)||destination as vertex from tree
start with origin = 0
connect by prior destination = origin;

returns the following display of the tree structure:


LEVEL
---------1
2
2
1
2
2
3
3
2
1
2
2

VERTEX
--------*1
*4
*5
*2
*6
*7
*11
*12
*8
*3
*9
*10

An alternate way for obtaining a description of a tree that shows the paths
that can be used to reach vertices can be obtained using two pseudo-attributes
CONNECT BY ISLEAF and SYS CONNECT BY PATH. CONNECT BY LEAF returns 1 if
the current vertex, (in our case, the destination) of the edge is a leaf and 0,
otherwise. For every edge of the path that joins the starting vertex to the current
node the pseudo-attribute SYS CONNECT BY PATH computes a string specified by
its first argument; entries between successive edges are separated by the string
specified by its second argument.
Example 5.20.8 The query:
select level,destination,
CONNECT_BY_ISLEAF "IsLeaf?",
SYS_CONNECT_BY_PATH((||origin||,||destination||),+) "Path"
from tree
start with origin = 0
connect by prior destination = origin
order by level

will return:
LEVEL
1
1
1
2
2
2

DE
1
2
3
4
7
6

IsLeaf?
0
0
0
1
0
1

Path
+(0,1)
+(0,2)
+(0,3)
+(0,1)+(1,4)
+(0,2)+(2,7)
+(0,2)+(2,6)

5.21 Updates in SQL


2
2
2
2
3
3

5.21

10
9
8
5
11
12

151
1
1
1
1
1
1

+(0,3)+(3,10)
+(0,3)+(3,9)
+(0,2)+(2,8)
+(0,1)+(1,5)
+(0,2)+(2,7)+(7,11)
+(0,2)+(2,7)+(7,12)

Updates in SQL

There are three constructs in SQL that allow us to update the tables of a
relational database: update, insert, and delete.
The update construct modifies components of tuples. It applies to all tuples
of the specified table unless limited by a where clause.
Example 5.21.1 Recall the table EMPHIST introduced in Example 3.3.5. A
script to create and populate the tables discussed in that example is contained
in the script ced.sql that is available in Appendix C.
To give all current employees a 10% raise, we apply the following update
phrase:
update EMPHIST
set salary = 1.1* salary
where term_date is null;

The general syntax of update is:


update table name [corr name]
set column = hexpression|nulli {,column = hexpression|nulli}
[wherecondition]
The insert construct adds new rows to a table. It inserts a single rows
(whose components must be specified by the user) or a set of rows that originate
from a retrieval involving other tables.
The syntax of a single-tuple insert is:
insert into table name[(column{, column}]
hvalues(expr {, expr})|subselecti
The values of expressions listed in the list of values must belong to the domains
of the attributes specified in the list of columns in order for the insertion to take
effect.
Example 5.21.2 To insert two rows containing registration records for student 2890 for the fall semester of 2004 into GRADES, we execute two insert
statements:
insert into GRADES
values (2890,023,cs110,Fall,2004,null);
insert into GRADES
values (2890,056,cs240,Fall,94,null);

152

SQL The Relational Language

The syntax of the insertion of a set of tuples obtained by a retrieval operation


is:
insert into table name [(column{, column}]
select phrase
Here the select phrase must return tuples of values consistent with the
domains of the attributes specified by the list of columns [(column{, column}].
Example 5.21.3 Suppose that we intend to have a separate table indicating
the assignments of instructors. After creating such a table (called ASSIGN and
equipped with the attributes empno, cno, sem, and year) by writing:
create table ASSIGN(empno varchar2(11) not null,
cno varchar2(5) not null,
sem varchar2(6) not null,
year smallint);

we can load this table using data from the existing table GRADES using the
construct:
insert into ASSIGN(empno, cno, sem, year)
select distinct empno, cno, sem, year
from GRADES;

This results in the following table:


empno
019
019
019
019
023
056
234
234

cno
cs110
cs210
cs210
cs240
cs110
cs240
cs310
cs410

sem
Fall
Fall
Spring
Spring
Spring
Spring
Spring
Spring

year
2001
2002
2003
2002
2002
2003
2003
2002

If the components of the tuple to be inserted into a table violate the declaration of the table (e.g., a null value for a not null attribute, or a character
string for a numerical attribute), the DBMS should reject the insertion.
Likewise, the delete construct deletes rows of tables.
Example 5.21.4 To delete the rows of the table ASSIGN that correspond to
course taught by the instructor whose employee number is 234, we write:
delete from ASSIGN where empno = 234;

The directive:
delete from GRADES where grade is null;

eliminates the rows whose grade component is null.


The where clause of delete is optional; if this clause is not used, then all
rows are deleted. The tabular variable still exists.

5.22 Access Rights

153

Example 5.21.5 The following delete eliminates all rows of the table ASSIGN:
delete from ASSIGN;

The syntax of delete is:


delete from table name [wherecondition]

5.22

Access Rights

The grant operation assigns access rights to users. To delegate access rights to
other users, a user must own these rights. The set of access rights includes
select, insert, update, and delete and refers to the right of executing each
of these operations on a table. Further, update can be restricted to specific
columns.
All these access rights are granted to the creator of a table automatically.
The creator, in turn, may grant access rights to other users or to all users
(designated in SQL as public). The SQL standard envisions a mechanism that
can limit the excessive proliferation of access rights. Namely, a user may receive
the select right with or without the right to grant this right to others by his
own action.
Example 5.22.1 Suppose that the user alex owns the table COURSES and
intends to grant this right to the user whose name is peter. The user alex can
accomplish this by
grant select on COURSES to peter

Now, peter has the right to query the table COURSES but he may not propagate
this right to the user ellie. In order for this to happen, alex would have to
use the directive:
grant select on COURSES to peter
with grant option

Example 5.22.2 If peter owns the table STUDENTS, then he may delegate
the right to query the table and the right to update the columns addr, city and
zip to ellie using the directive:
grant select, update(addr, city, zip) on
STUDENTS to ellie

The standard syntax of grant is:

154

SQL The Relational Language


grant {priv {,priv } | all [privileges]}
on [table] tablename{, tablename}
to husername{, username}|publici
[with grant option]

Here priv has the syntax:


hselect|insert|delete|update[(attribute{, attribute})]i
Privileges can be revoked using the revoke construct, which is a feature
of standard SQL. For instance, if peter wishes to revoke ellies privileges to
update the table STUDENTS, he may write:
revoke update(addr,city,zip) on
STUDENTS from ellie

The standard syntax for this directive is


revoke {priv {,priv }|all [privileges]}
on [table] tablename{, tablename}
from husername{, username}|publici

5.23

Views in SQL

Views are virtual tabular variables. This means that in SQL a view is referenced
for retrieval purposes in exactly the same way a tabular variable is referenced.
The only difference is that a view does not have a physical existence. It exists
only as a definition in the database catalog. We refer to real tabular variables
(that is, the tabular variables that have a physical existence in the database) as
base tabular variables.
Views are supported in both SQLPlus and in Transact SQL but not in the
current version (4.1) of MySQL.
To illustrate the notion of view, let us consider the following example.
Example 5.23.1 Suppose that we write:
create view STC as
select STUDENTS.name, GRADES.cno
from STUDENTS, GRADES
where STUDENTS.stno = GRADES.stno;

The select construct contained by this create view retrieves all pairs of
student names and course numbers such that the student whose name is s has
registered for the course c.
When this directive is executed by SQL, no data retrieval takes place. The
database system simply stores this definition in its catalog. The definition of the
view STC becomes a persistent object, that is, an object that exists after our
interaction with the DBMS has ceased. From a conceptual point of view, the
user treats STC exactly like any other tabular variable. Suppose, for instance
that we wish to retrieve the names of students who took cs110. In this case it
is sufficient to write the query:

5.23 Views in SQL

155

select name from STC where cno =cs110;

In reality, SQL combines this select phrase with the query just shown and
executes the modified query:
select

STUDENTS.name from STUDENTS, GRADES


where STUDENTS.stno = GRADES.stno
and GRADES.cno =cs110;

The previous example shows that views in SQL play a role similar to the role
played by macros in programming languages.
Views are important for data security. A user who needs to have access only
to list of names of students and the courses they are taking needs to be aware
only of the existence of STC. If the user is authorized to use only select constructs, then the user can ignore whether STC is a table or a view. Confidential
data (such as grades obtained in specific courses) can be completely protected
in this manner. Also, the queries that this limited-access user may write are
simpler and easier to understand. No space is wasted with the view STC, and
the view remains current always, reflecting the contents of the tabular variables
STUDENTS and GRADES.
SQL treats views exactly as it treats the tabular variables as far as retrieval
is concerned. We can also delegate the select privilege to a view in exactly
the same way as we did for a tabular variable. For instance, if the user george
created the view STC, then he can give the select right to vanda by writing:
grant select on STC to vanda;

Consider now another example of view:


Example 5.23.2 The view SNA that contains the student number and the
names of students can be created by:
create view SNA as
select stno, name from STUDENTS

The purpose of this view is to insure privacy to students. Any user who has
access only to this view can retrieve the student number and name of a student,
but not the address of the student.
There is a fundamental difference between the views introduced in Examples 5.23.1 and 5.23.2, and this refers to the ways in which these two views
behave with respect to updates.
Suppose that the user wishes to insert the pair (7799, Jane Jones) in the
view SNA. The user may ignore entirely the fact that SNA is not a base tabular
variable. On the other hand, the effect on the base tabular variable of this
insertion is unequivocally determined: the system inserts in the tabular variable
STUDENTS the tuple (7799, Jane Jones, null, null, null). On the other hand,
we cannot insert a tuple in a meaningful way in the view STC introduced in
Example 5.23.1. Indeed if we attempt to insert a pair (s, c) in STC, then we have
to define the effect of this insertion on the base tabular variable. This is clearly

156

SQL The Relational Language

impossible: we do not know what the student number is, what the identification
of the instructor is, etc. SQL forbids users to update views based on more than
one table (as STC is). Even if such updates would have an unambiguous effect
on the base tabular variable, this rule rejects any such update. Only some views
based on exactly one tabular variable can be updated. It is the responsibility
of the database administrator to grant to the user the right to update a view
only if that view can be updated.
If a view can be updated, then its behavior is somewhat different from the
base tabular variable on which the view is built. An update made to a view
may cause one or several tuples to vanish from the view, whenever we retrieve
the tuples of the view.
Example 5.23.3 Consider the view uppergr defined by:
create view UPPERGR as
select * from GRADES where grade > 75;

If we wish to examine the tuples that satisfy the definition of the view we use
the construction:
select * from UPPERGR;

that returns the result:


STNO
---------2661
3566
5544
3566
2415
5571
1011
3566
5571
1011
4022
2661

EMPNO
----------019
019
019
019
019
234
019
019
019
056
056
234

CNO
----cs110
cs110
cs110
cs240
cs240
cs410
cs210
cs210
cs210
cs240
cs240
cs310

SEM
YEAR
GRADE
------ ---------- ---------FALL
1999
80
FALL
1999
95
FALL
1999
100
SPRING
2000
100
SPRING
2000
100
SPRING
2000
80
FALL
2000
90
FALL
2000
90
SPRING
2001
85
SPRING
2001
90
SPRING
2001
80
SPRING
2001
100

The update construction:


update UPPERGR
set grade = 70
where stno = 2661 and empno = 019 and cno = cs110
and sem = FALL and year = 1999;

makes the first row disappear, since it no longer satisfies the definition of the
view. Indeed, if we use again the same query on UPPERGR, we obtain:
STNO
---------3566
5544

EMPNO
----------019
019

CNO
----cs110
cs110

SEM
YEAR
GRADE
------ ---------- ---------FALL
1999
95
FALL
1999
100

5.23 Views in SQL


3566
2415
5571
1011
3566
5571
1011
4022
2661

019
019
234
019
019
019
056
056
234

157
cs240
cs240
cs410
cs210
cs210
cs210
cs240
cs240
cs310

SPRING
SPRING
SPRING
FALL
FALL
SPRING
SPRING
SPRING
SPRING

2000
2000
2000
2000
2000
2001
2001
2001
2001

100
100
80
90
90
85
90
80
100

To reestablish the previous content of GRADES, we can use the update:


update UPPERGR
set grade = 80
where stno = 2661 and empno = 019 and cno = cs110
and sem = FALL and year = 1999;

The standard syntax of create view allows us to use the clause with check
option. When this clause is used, every insertion and update done through the
view is verified to make sure that a tuple inserted through the view actually
appears in the view and an update of a row in the view does not cause the row
to vanish from the view.
The syntax of create view is:
create view view as
subselect
[with check option]
A view V can be dropped from a database by using the construct
drop view V;

If we drop a tabular variable from the database, then all views based on that
table are automatically dropped; if we drop a view, then all other views that
use the view that we drop are also dropped.
Views are useful instruments in implementing generalizations. Suppose, that
we began the construction of the college database from the existing tabular
variables UNDERGRADUATES and GRADUATES that modelled sets of entities
having the same name, where
heading (UNDERGRADUATES ) = stno name addr city state zip major
heading (GRADUATES ) = stno name addr city state zip qualdate
Then, the tabular variable STUDENTS could have been obtained as a view
built from the previous two base tabular variables by
create view STUDENTS as
select stno name addr city state zip
from UNDERGRADUATES
union

158

SQL The Relational Language

TABLE NAME
STUDENTS
INSTRUCTORS
COURSES
GRADES
ADVISING

user catalog
TABLE TYPE
TABLE
TABLE
TABLE
TABLE
TABLE

select stno name addr city state zip


from GRADUATES

5.24

Accessing metadata in SQLPlus

The catalog of ORACLE is a very large tabular variable that can be accessed
through several views defined on this table.
In ORACLE a list of the table owned by the current user is contained by
the view user catalog, also accessible through its synonym cat. A content of this
view is shown in Figure 5.24.
Information that describes space allocation and statistical properties can be
found in the view named USER TABLES, also named TABS. A description of
the attributes of tabular variables and of their domains can be found in the view
USER TAB COLUMNS also accessible as COLS. For example, the query:
select table_name,column_name,data_type from COLS;

results in the following table:


TABLE_NAME
ADVISING
ADVISING
COURSES
COURSES
COURSES
GRADES
GRADES
GRADES
GRADES
GRADES
GRADES
INSTRUCTORS
INSTRUCTORS
INSTRUCTORS
INSTRUCTORS
INSTRUCTORS
STUDENTS
STUDENTS

COLUMN_NAME
STNO
EMPNO
CNO
CNAME
CR
STNO
EMPNO
CNO
SEM
YEAR
GRADE
EMPNO
NAME
RANK
ROOMNO
TELNO
STNO
NAME

DATA_TYPE
CHAR
CHAR
CHAR
CHAR
NUMBER
CHAR
CHAR
CHAR
CHAR
NUMBER
NUMBER
CHAR
CHAR
CHAR
NUMBER
CHAR
CHAR
CHAR

5.25 Exercises
STUDENTS
STUDENTS
STUDENTS
STUDENTS

ADDR
CITY
STATE
ZIP

159
CHAR
CHAR
CHAR
CHAR

A more complete list of objects that belong to the current user can be found
in the view USER OBJECTS which lists all objects created by the user, including
those mentioned in USER CATALOG, as well as other useful information (such
as the date of creation, the last time when the object was affected by a data
definition statement, the status of the object, etc.)
The definition of views can be accessed by the USER VIEWS catalog view.
Example 5.24.1 The meta-view (view about views) USER VIEWS has the
structure described below:
Name
Null?
------------------------------- -------VIEW_NAME
NOT NULL
TEXT_LENGTH
TEXT
TYPE_TEXT_LENGTH
TYPE_TEXT
OID_TEXT_LENGTH
OID_TEXT
VIEW_TYPE_OWNER
VIEW_TYPE
SUPERVIEW_NAME

Type
-------------VARCHAR2(30)
NUMBER
LONG
NUMBER
VARCHAR2(4000)
NUMBER
VARCHAR2(4000)
VARCHAR2(30)
VARCHAR2(30)
VARCHAR2(30)

The last six attributes are important for object views discussed in Chapter 7.
To extract the definition of the view UPPERGR defined above we write:
select text from user_views where view_name=UPPERGR;

This query returns the result:


TEXT
-----------------------------------------------------------select "STNO","EMPNO","CNO","SEM","YEAR","GRADE" from GRADES
where grade > 75

5.25

Exercises

1. Solve the following queries in SQL:


(a) Find all students who live in Malden or Newton.
(b) Find all students whose name starts with F;
(c) Find all students whose name contains the letter f;
2. A select phrase equivalent to the union-computing select
select stno from grades where cno = cs210
union
select stno from grades where cno = cs240;

160

SQL The Relational Language


is
select stno from grades
where cno = cs210 or cno = cs210;

(a) Write an equivalent query using the in operator.


(b) Can you transform the intersection-computing select:
select stno from grades where cno = cs210
intersect
select stno from grades where cno = cs240;

into a single select? Explain your answer.


Solve in SQL the following queries that refer to the college database:
3. Find cities where students live for all students who dot not live in Boston,
Massachusetts.
4. Find all pairs of student names and course names for grades obtained
during Fall of 2001.
5. Find the names of students who took some four-credit courses.
6. Find the names of students who took a course with an instructor who is
also their advisor.
7. Find the names of students who took cs210 or had Prof. Smith as their
advisor.
8. Find all pairs of names of students who live in the same city.
9. Find all triples of instructors names for instructors who taught the same
course.
10. Find instructors who taught students who are advised by another instructor who shares the same room.
11. Find course numbers of courses taken by students who live in Boston and
which are taught by an associate professor.
12. Find the names of instructors who teach courses attended by students who
took a course with an instructor who is an assistant professor.
13. Find the telephone numbers of instructors who teach a course taken by
any student who lives in Boston.
14. Find all pairs of names of students and instructors such that the student
never took a course with the instructor.
15. Find the names of students who took no four-credit courses.
16. Find the names of students who took only four-credit courses.
17. Find the names of students who took every four-credit course.
18. Find the names of all students for whom no other student lives in the same
city.
19. Find names of students who took every course taken by Richard Pierce.
20. Find the names of instructors who teach no courses.
21. Find course numbers of courses that have never been taught.
22. Find courses that are taught by every assistant professor.
23. Find the names of students whose advisor did not teach them any course.

5.25 Exercises

161

24. Find the names of students who have failed all their courses (failing is
defined as a grade less than 60).
25. Find the names of students who do not have an advisor.
26. Find the names of instructors who taught every semester when a student
from Rhode Island was enrolled.
27. Find course names of courses taken by every student advised by Prof.
Evans.
28. Find names of students who took every course taught by an instructor
who is advising at least two students.
29. Find names of instructors who teach every student they advise.
30. Find names of students who are taking every course taught by their advisor.
31. Find course numbers of courses taken by every student who lives in Rhode
Island.
32. Find the student numbers of students who took at least two courses.
33. Find the course names of courses in which at least three students were
enrolled.
34. Find the names of instructors who advise at least two students.
35. List all students by name, along with their grade averages.
36. Find student numbers of students for whom the difference between the
highest and the lowest grade is less than 20.
37. Print a report that contains for each course (cno), the number of students
who took the course, the highest, the lowest, and the average grade in the
course.
38. Find the average grade of students who took cs110 at any time. Then,
find students whose grades in cs110 were above the average.
39. Identify those queries that require division among the queries 3 to 34 and
solve those queries using the group by option of SQL.
40. Create views on the college database as specified:
(a) A view that contains the names of the instructors, the courses (cnos)
that they teach, and the average grade in these courses.
(b) A view that shows the names and offices of the instructors.
(c) A view that contains the courses (cnos) , the number of students who
took the courses, the average grade in these courses, and the highest
grade.
(d) A view that contains the names of instructors and the names of the
students that they advise.
(e) A view that shows the data about the students in Massachusetts.
41. Print the contents of the views created in Exercise 40.
42. Determine which of the views created in Exercise 40 can be updated.
43. Using the views created in Exercise 40(a) and 40(c) create a view that
lists the instructors and the total number of students they teach.
44. Solve the following queries:
(a) list names of instructors and the number of courses they taught;
(b) list instructors in the order of the number of courses they taught;

162

SQL The Relational Language

(c) list the top three instructors in the order of the number of courses
they taught.
45. Let GRAPH be the table introduced in Example 5.20.3. The degree of a
vertex is the number of edges incident to that vertex.
(a) write an SQL query that yields a list of vertices of a graph arranged
in the decreasing order of their degrees;
(b) list the top 5 vertices of a graph in increasing order of their degrees.
46. For each instructors list the sequence of the numbers of courses that the
instructor taught during each of the semesters that he or she was active.
47. List the top three instructors in the order of the number of students that
they advise.

5.26

Bibliographical Comments

The initial standard known as SQL1 is recorded in citeX3,ISO7. SQL2 was


defined in [International Organization for Standardization, 1992]. Extensive
presentations of SQL3 can be found in [Melton and Simon, 1993; Melton and
Simon, 2002] and [Fortier, 1999]. Also, useful reference are [Line and Kline,
2000] and [J. Kauffman, 2001].