Vous êtes sur la page 1sur 47

A C R I T I Q U E 0 F

THE S Q L D A T A B A S E L A N G U A G E

C.J.Date

P O Box 2647~ S a r a t o g a
California 9~.~7(.~ U S A

December 1983

The ANS Database Committee (X3H2) is c u r r e n t l y at work on a


proposed standard relational database language (RDL)~ and has
adopted as a basis for that activity a definition of the
"structured q u e r y l a n g u a g e " S Q L f r o m IBM [10]. M o r e o v e r ~ numerous
hardware and software vendors (in a d d i t i o n to IBM) h a v e already
released or at least announced products that are based to a
greater or l e s s e r e x t e n t on t h e S Q L l a n g u a g e as d e f i n e d b y IBM.
There can thus be little doubt that the importance of that
l a n g u a g e will i n c r e a s e s i g n i f i c a n t l y over the next few years. Yet
the S Q L l a n g u a g e is v e r y f a r f r o m p e r f e c t . T h e p u r p o s e of this
paper is to p r e s e n t a c r i t i c a l a n a l y s i s of t h e l a n g u a g e ' s major
shortcomings~ in t h e h o p e t h a t it m a y b e p o s s i b l e to r e m e d y s o m e
of the deficiencies before their influence becomes too all-
pervasive. The paper's standpoint is p r i m a r i l y t h a t of formal
computer languages in g e n e r a l ~ rather than that of database
languages specifically.

sql critique
8
I. INTRODUCTION

The relational language SQL ( t h e acronym i s u s u a l l y pronounced


"sequel"), pioneered in the IBM p r o t o t y p e System R [i] and
subsequently a d o p t e d by IBM a n d o t h e r s as t h e b a s i s f o r numerous
commercial implementations, represents a major advance over older
database l a n g u a g e s s u c h as t h e D L / I l a n g u a g e of IMS a n d t h e DML
and DDL of the Data Base Task Group (DBTG) of CODASYL.
Specifically, SQL is far easier to use than those older
languages; as a r e s u l t , u s e r s in a S Q L s y s t e m (both end-users
and application programmers) c a n b e far m o r e p r o d u c t i v e t h a n t h e y
u s e d t o b e in t h o s e o l d e r s y s t e m s (improvements of up t o 2 0 t i m e s
have been reported). Among the strongpoints of S Q L t h a t l e a d t o
such improvements we may cite the following:

simple data structure

powerful operators

short initial learning period

improved d a t a independence

integrated data definition and data manipulation

double mode of use

integrated catalog

compilation and optimization

These advantages are elaborated in the appendix to this paper.

T h e l a n g u a g e d o e s h a v e i t s w e a k p o i n t s too, however. In f a c t , it
c a n n o t b e d e n i e d t h a t S Q L in i t s p r e s e n t f o r m l e a v e s r a t h e r a lot
t o b e d e s i r e d -- e v e n t h a t , in s o m e i m p o r t a n t r e s p e c t s , it f a i l s
to realize the full potential of the relational model. The
purpose of t h i s p a p e r is t o d e s c r i b e a n d e x a m i n e s o m e of those
w e a k p o i n t s , in t h e h o p e t h a t s u c h a s p e c t s of t h e l a n g u a g e m a y b e
improved before their influence becomes too all-pervasive.

Before getting into details, I should like to make one point


absolutely clear: The c r i t i c i s m s that follow should not be
construed as criticisms of the original designers and
implementers o f t h e SQL language. The paper i s i n t e n d e d s o l e l y as
a c r i t i q u e of t h e SQL language as such, and n o t h i n g more. Note
also that t h e paper a p p l i e s s p e c i f i c a l l y t o t h e d i a l e c t of SQL
implemented by IBM i n i t s p r o d u c t s SQL/DS, D B 2 , and QMF. It is
e n t i r e l y p o s s i b l e t h a t some s p e c i f i c p o i n t does n o t a p p l y t o some
o t h e r implemented d i a l e c t . However, most p o i n t s o f t h e paper do
a p p l y t o most of t h e d i a l e c t s c u r r e n t l y implemented, so f a r as I
am aware.

The remainder of the paper is divided into the following

sql critique
9
sections:

lack of orthogonality: expressions

lack of orthogonality: builtin functions

lack of orthogonality: miscellaneous items

formal definition

mismatch with host languages

missing function

mi s t a k e s

aspects of the relational model not supported

summary and conclusions

Reference [3] g i v e s s o m e b a c k g r o u n d m a t e r i a l -- s p e c i f i c a l l y ~ a
set of principles that a p p l y to the design of programming
languages in g e n e r a l a n d d a t a b a s e l a n g u a g e s in particular. Many
of the criticisms that follow are expressed in t e r m s of those
principles. Note: Some of t h e p o i n t s a p p l y to i n t e r a c t i v e SQL
only and some to embedded SQL only~ b u t m o s t a p p l y to both. I
have not bothered to spell out the distinctions; the context
m a k e s it c l e a r in e v e r y c a s e . A l s o ~ t h e s t r u c t u r e of t h e p a p e r is
a little arbitrary~ in t h e s e n s e t h a t it is n o t really always
clear which heading a particular point belongs under. There is
also some repetition (I h o p e n o t t o o m u c h ) ~ for e s s e n t i a l l y the
same reason.

sql critique
I0
2. LACK OF ORTHOGONALITY: EXPRESSIONS

It is convenient to begin by introducing some nonSQL terms.

* A t~b_l_e_-eE.p.ces.si_on
- is a SQL expression that yields a table --
for example, the expression

SELECT *
FROM EMP
WHERE DEPT# = ~D3'

* A ~o_ik.!mn_2_eEQce_s_si_oQ i s a SQL expression that yields a single


column -- for example, the expression

SELECT EMP#
FROM EMP
WHERE DEPT# = ~D3 ~

A column-expression is a special case of a table-expression.

* A row-exQressioo is a SQL expression that yields a single row


-- for example, the expression

SELECT *
FROM EMP
WHERE EMP# = ~E2"

A row-expression is a special case of a table-expression.

* A scalar-expression is a SQL expression that yields a single


scalar value -- for example, the expression

SELECT AVG (SALARY)


FROM EMP

or the expression

SELECT SALARY
FROM EMP
WHERE EMP# = ~E2'

A scalar-expression is a special c:ase o f a row-expression and a


special c a s e of a c o l u m n - e x p r e s s i o n .

Note t h a t t h e s e f o u r k i n d s of e x p r e s s i o n c o r r e s p o n d t o t h e four
c l a s s e s of data o b j e c t ( t a b l e , c o l u m n ; r o w , s c a l a r ) s u p p o r t e d by
SQL -- though incidentally SQL i s i n c o n s i s t e n t as t o w h e t h e r i t s
e x p r e s s i o n s y i e l d v a l u e s or r e f e r e n c e s , i n g e n e r a l . Note t o o t h a t
(as pointed out in [3]) the four classes of object can be
partially ordered as follows:

sql critique II
table (highest)

V V
col umn row

V
s c a l ar (i o w e s t )

(columns are neither higher nor lower than rows with respect to
this ordering).

As e x p l a i n e d in [3] ( a g a i n ) , a l a n g u a g e s h o u l d p r o v i d e , for" e a c h
c l a s s of o b j e c t it s u p p o r t s , at l e a s t all of t h e f o l l o w i n g :

a constructor function, i.e., a means for constructing an


object of t h e c l a s s from l i t e r a l (constant) values and/or
v a r i a b l e s of l o w e r c l a s s e s ;

a means for- c o m p a r i n g two objects of the class;

a means for assigning the value of one object in the class


to another;

a selector function, i.e., a means for extracting component


o b j e c t s of l o w e r c l a s s e s f r o m an o b j e c t of t h e g i v e n class;

a general, recursively d e f i n e d s y n t a x for" e x p r e s s i o n s that


exploits to the full any closure properties the object class
may possess.

The table below shows that SQL does not really measure up to
these requirements.

sql critique 12
\ opn ~ constructor compare : assign : selector : gen
ob.j\ ~ ~ ~ expr

only via : : no
table : no no ~ INSERT - : yes : (see
SELECT : :below)
÷ ÷ + ~
: o n l y a s a r g to:
column : IN ( h o s t v b l e s : no : no : yes ~ no
: : & c:onsts only):
+ ,

~ only in INSERT: ~ only to/ ~ ~


row ~ & UPDATE ( h o s t : no ~ from set : (yes) ~ no
~ vbles & consts: ~ of h o s t : ~
~ only) : ~ scalars : ~
÷ + ~ ÷ ~
: : : only to/ : ~
scalar : N/A : yes : from host: (yes) : no
~ ~ ~ scalar ~ ~

Let us consider table-expressions in m o r e detail. The SELECT


statement, which., s i n c e it y i e l d s a t a b l e , m a y b e r e g a r d e d as a
table-expression (possibly of a d e g e n e r a t e form, e.g., as a
column-expression)., currently has the following structure:

SELECT scalar-expression-commalist
FROM t a b I e - n a m e - c o m m a l i st
WHERE predicate

(ignoring numerous irrelevant details). N o t i c e t h a t it is just


~l_able2name_s t h a t appear- in t h e F R O M c l a u s e . Completeness suggests
that it should be ta_ble__-eEQEessiQns (as Gray puts it [8].,
"anything in c o m p u t e r science t h a t is n o t r e c u r s i v e is n o g o o d " ) .
T h i s is n o t j u s t an a c a d e m i c consideration, by the way; on the
contrary, there are several practical reasons as to why such
recursiveness is d e s i r a b l e .

First, consider the relational algebra. Relational algebra


possesses the important property of closure -- that is~
relations form a closed system under the operations of the
algebra., in t h e s e n s e t h a t t h e r e s u l t of a p p l y i n g a n y of t h o s e
operations to any relation(s) is i t s e l f a n o t h e r relation. As a
consequence, the operands of any given operation are not
constrained to be real ("base") relations only, but rather can
be any algebraic expression. Thus, the relational algebra
allows the user to write 0 ~ relational ~2R~i~0~ -- and
this feature is u s e f u l f o r p r e c i s e l y the same reasons that
nested expressions are useful in o r d i n a r y arithmetic.

Now consider SQL. SQL is a l a n g u a g e that supports, directly


or indirectly, all the operations of the relational algebra

sql critique 13
(i.e., SQL is r e l a t i o n a l l y complete). However, the table-
expressions of SQL (which are the SQL equivalent of the
expressions of t h e r e l a t i o n a l algebra) ~aQoQt be arbitrarily
nested. Let u s c o n s i d e r t h e q u e s t i o n of e x a c t l y w h i c h cases
SQL does support. Simplifying matters slightly, the expression
SELECT - FROM - WHERE is the SQL version of the nested
algebraic expression

projection ( restriction ( product ( table1, table~,~ ... ) ) )

(the product corresponds to t h e F R O M c l a u s e , the restriction


to t h e W H E R E c l a u s e , and the projection to the SELECT clause;
tablel, table2, ... are the tables identified in t h e FROM
c l a u s e -- a n d n o t e t h a t , as r e m a r k e d e a r l i e r , t h e s e a r e s i m p l e
table-names, not more complex expressions). Likewise, the
expression

SELECT ... FROM ... WHERE ...


UNION
SELECT ... FROM ... WHERE ...

is t h e SQL version of the nested algebraic expression

union ( tabexpl, tabexp2, ... )

where tabexpl, tabexp2~ ... a r e in t u r n t a b l e - e x p r e s s i o n s of


the form shown earlier (i.e., projections of r e s t r i c t i o n s of
p r o d u c t s of n a m e d t a b l e s ) . B u t it is n o t p o s s i b l e to f o r m u l a t e
direct equivalents of a n y o t h e r n e s t e d a l g e b r a i c e x p r e s s i o n s .
Thus, for example, it is n o t p o s s i b l e to write a direct
equivalent in S Q L of t h e n e s t e d e x p r e s s i o n

restriction ( projection ( table ) )

Instead, the user has to recast the expression into a


semantically equivalent (but s y n t a c t i c a l l y different) form in
which the restriction is a p p l i e d b e f Q ~ e t h e p r o j e c t i o n . What
this means in p r a c t i c a l t e r m s is t h a t t h e u s e r m a y have to
expend time and effort transforming the "natural" formulation
of a given query into some different, and arguably less
"natural", representation (see E x a m p l e b e l o w ) . W h a t is m o r e ,
t h e u s e r is t h e r e f o r e a l s o r e q u i r e d to u n d e r s t a n d exactly when
such transformations are valid. This may not always be
intuitively obvious. For example, is a p r o j e c t i o n of a u n i o n
always equivalent t o t h e u n i o n of t w o p r o j e c t i o n s ?

Example: Given the two tables

NYC ( EMP#, DEPT#~ SALARY )


SFO ( EMP#, DEPT#~ SALARY )

(representing New York and San Francisco emp ioyees,


respectively), list EMP# for all employees.

sql critique 14
"Natural" formulation (projection of a union):

SELECT EMP# FROM ( NYC UNION SFO )

SQL f o r m u l a t i o n (union of two p r o j e c t i o n s ) :

SELECT EMP# FROM NYC


UNION
SELECT EMP# FROM SFO

We r e m a r k in p a s s i n g t h a t a l l o w i n g b o t h f o r m u l a t i o n s of the
query would enable different users to perceive and express the
same problem in d i f f e r e n t ways (ideally~ of course~ both
formulations would translate to the same internal
representation~ for otherwise the choice between the two would
no longer be arbitrary).

The foregoing e x a m p l e t a c i t l y m a k e s u s e of t h e f a c t t h a t a
simple table-reference (i.e.~ a t a b l e - n a m e ) QYgh~ to be just a
s p e c i a l c a s e of a g e n e r a l t a b l e - e x p r e s s i o n . Thus we wrote

NYC UNION SFO

instead of

SELECT ~ FROM NYC UNION SELECT i FROM SFO

which current SQL would require. It w o u l d b e h i g h l y d e s i r a b l e


for SQL to allow the expression "SELECT ~ FROM T" to be
replaced b y s i m p l y "T" w h e r e v e r it a p p e a r s ~ in t h e s t y l e of
more conventional languages. In o t h e r w o r d s ~ S E L E C T s h o u l d b e
regarded as a s t a t e m e n t whose function is t o r e t r i e v e a table
( r e p r e s e n t e d by a t a b l e - e x p r e s s i o n ) . Table-expressions per se
-- in particular~ nested table-expressions -- should not
require the "SELECT ~ FROM". Among other things this change
would improve the usability of t h e E X I S T S builtin function
(see l a t e r ) . It w o u l d a l s o b e c l e a r t h a t I N T O a n d O R D E R BY a r e
clauses of t h e S E L E C T ~ t ~ n ~ a n d n o t p a r t of a table- (or
column-) expression; t h e q u e s t i o n of w h e t h e r t h e y c a n a p p e a r
in a nested expression would then simply not arise, thus
avoiding the need for a rule that looks arbitrary b u t is in
f a c t not.

A nested table-expression is p e r m i t t e d -- in f a c t required


-- in current S Q L as t h e a r g u m e n t t o E X I S T S (but strangely
enough not as t h e a r g u m e n t to t h e o t h e r builtin functions;
this p o i n t is d i s c u s s e d in t h e n e x t s e c t i o n ) . Nested column-
~E~C~iQQ~ ("subqueries") a r e (a) ~ g u ~ r e d with the "ANY" and
"ALL" operators ( i n c l u d e s t h e IN o p e r a t o r ~ w h i c h is just a
different s p e l l i n g for = A N Y ) ; a n d (b) Q ~ m i t t e d with scalar
comparison operators (<~ >~ =~ etc.)~ if a n d o n l y if the
column-expression yields a c o l u m n h a v i n g at m o s t one row.
Moreover, the nested expression is a l l o w e d t o i n c l u d e G R O U P BY
and HAVING in case (a) but not in case (b). More
arbitrariness.

sql critique
IS
Elsewhere I have proposed some extensions to SQL to support
the outer join operation [4]. The details of t h a t p r o p o s a l do
not concern us here; what does concern u s is t h e f o l l o w i n g . If
the user needs to compute an o u t e r j o i n of three or more
relations, then (a) that outer _join is constructed by
performing a sequence of ~!i_[!~E2 o u t e r joins (e.g., join
relations A a n d B, then join the result and relation C); and
(b) it is e s s e n t i a l that the user indicate the sequence in
which tlnose binary joins are performed, because different
sequences wi i i produce different results, in general.
Indicating the required sequence is done, precisely, by
writing a suitable nested expression. Thus, nested expressions
are @=ss]eQt~i_al_ if S Q L is t o provide direct (i.e., single-
statement) support for general o u t e r j o i n s of m o r e t h a n two
tel a t i o n s .

Another example (involving outer join again): P a r t of the


proposal for- s u p p o r t i n g o u t e r j o i n [4] i n v o l v e s t h e u s e of a
new clause, the PRESERVE clause, whose function is t o p r e s e r v e
rows from the indicated table that would not otherwise
participate in t h e r e s u l t of t h e S E L E C T . Consider the tables

COURSE ( COURSE#, SUBJECT )


OFFERING ( COURSE#, OFF#, LOCATION )

a n d consider- t h e q u e r y " L i s t all a l g e b r a courses, with their


offerings if any" The two SELECT statements fol l o w i n g
(neither of which is valid in current SQL, of course>
represent two attempts to formulate this query:

SELECT ALGEBRA. COURSE#, OFF#, LOCATION


FROM ( SELECT COURSE#
FROM COURSE
WHERE SUBJECT = ~Algebra ~ ) ALGEBRA, OFFERING
WHERE ALGEBRA.COURSE# = OFFERING.COURSE#
PRESERVE ALGEBRA

SELECT COURSE.COURSE#, OFF#, LOCATION


FROM COURSE, OFFERING
WHERE COURSE.COURSE# = OFFERING. COURSE#
AND SUBJECT = ~Algebra'
PRESERVE COURSE

Each of these statements does list all algebra courses,


together with their offerings, f o r all s u c h c o u r s e s that do
have any offerings. The first also lists algebra courses that
do not have any offerings, concatenated with null values in
the OFFERING positions; i.e., it p r e s e r v e s information for
those courses (note the introduced name ALGEBRA, w h i c h is u s e d
to r e f e r t o t h e r e s u l t of e v a l u a t i n g the inner expression).
The second, by contrast, preserves information not only for
algebra courses with no offerings, b_L~ a.lso f o r al..l c Qb~rse_s
f..or_ which, t_h_e ~L~i~c_~ i__s no_t al_gebj2 ~ ( r e g a r d l e s s of whether
those courses have any offerings or n o t > . In o t h e r w o r d s , t h e

sql critique
16
first preserves information for algebra courses only (as
required)., the second produces a l o t of u n n e c e s s a r y output.
And note that the first cannot even be formulated (as a s i n g l e
statement) if n e s t e d e x p r e s s i o n s are not supported.

* In f a c t , SQL does alreacly support nested expressions in a


kind of "under the covers" sense. Consider the following
ex a m p i e :

Base table:

S ( S#., S N A M E , STATUS, CITY )

View d e f i n i t i o n :

CREATE VIEW LONDON SUPPLIERS


AS S E L E C T S#, SNAME., S T A T U S
FROM S
WHERE CITY = ~London ~

Query (Q) :

SELECT *
FROM LONDONSUPPLIERS
WHERE STATUS > 50

Resulting SELECT statement (Q'):

SELECT S#., S N A M E ~ S T A T U S
FROM S
WHERE STATUS > 50
AND CITY = ~London ~

The SELECT statement Q' i s o b t a i n e d from the original query Q


by a process usually described as "merging .... statement Q is
"merged" with the SELECT in t h e v i e w d e f i n i t i o n to produce
statement Q'. To the naive user this looks a little bit like
magic. But in fact what is going on is simply that the
reference to LONDON_SUPPLIERS in t h e F R O M c l a u s e in Q i s b e i n g
replaced by the expression that ~ n ~ LONDON_SUPPLIERS, as
follows:

SELECT *
FROM ( SELECT S#., SNAME., S T A T U S
FROM S
WHERE CITY = ~London ~ )
WHERE STATUS > 50

This explanation~ though both accurate and easy to understand.,


cannot conveniently b e u s e d in d e s c r i b i n g or teaching SQL.,
precisely because SQL does not support nesting at the external
or user's level.

* UNION is not permitted in a s u b q u e r y . , a n d h e n c e (among other


things) cannot be used in t h e d e f i n i t i o n of a v i e w (although

sql critique 17
strangely enough it c a n b e u s e d t o d e f i n e t h e scope for a
cursor in e m b e d d e d SQL). So a view cannot be "any derivable
relation", and the relational closure property breaks down.
Likewise, I N S E R T ... S E L E C T c a n n o t b e u s e d t o a s s i g n t h e u n i o n
of two relations to another relation. Yet another consequence
of the special treatment g i v e n t o U N I O N i s t h a t it is not
possible to apply a builtin function such as AVG to a union.
See the following section.

We conclude this discussion of S Q L e x p r e s s i o n s by noting a few


additional (and apparently arbitrary) restrictions.

The predicate C BETWEEN A AND B is equivalent to the


predicate A <= C AND C <= B -- except that B (but not A o r C!)
can be a column-expression (subquery) in the second
formulation b u t n o t in t h e f i r s t .

The predicate "field comparison (subquery)" must be written


in the order shown and not the other way around; i.e., the
expression "(subquery) comparison field" is illegal.

If w e r e g a r d S E L E C T , UPDATE, a n d I N S E R T all a s s p e c i a l kinds


of assignment statement - - in e a c h c a s e , the value of some
expression is being assigned to some variable (a n e w l y c r e a t e d
variable, in the c a s e of I N S E R T ) - - t h e n s o u r c e v a l u e s for
those assignments can be specified as scalar-expressions
(involving database fields, host variables, constants, and
scalar operators) for SELECT and UPDATE, but must be specified
as simple host variables or constants for INSERT. Thus, for
example, the following is valid:

SELECT :X + 1
FROM T

and so is:

UPDATE T
SET F = :X + 1

but the following is not:

INSERT INTO T ( F )
VALUES ( :X + 1 )

Given the tables:

S ( S#, SNAME, STATUS, CITY )


P ( P#, PNAME, COLOR, WEIGHT, CITY )

the SELECT statement

sql critique 18
SELECT COLOR
FROM P
WHERE CITY =
( SELECT CITY
FROM P
WHERE P# = ~PI ~ )

is legal, but the UPDATE statement

UPDATE P
SET COLOR = ~Blue ~
WHERE CITY =
( SELECT CITY
FROM P
WHERE P # = ~Pi ~ )

is not. Worse, neither is the UPDATE statement

UPDATE P
SET CITY =
( SELECT CITY
FROM S
WHERE S# = ~$1 ~
WHERE ...

Even w o r s e , given:

EMP ( EMP#, SALARY )


BONUSES ( EMP#, BONUS )

the following (potentially very useful) UPDATE i s a l so


illegal:

UPDATE EMP
SET SALARY = SALARY + ( SELECT BONUS
FROM BONUS
WHERE EMP# = EMP. EMP# )

(Actually there is a slight problem in t h i s last example.


Suppose a given employee number~ s a y e, appears in t h e EMP
table b u t n o t in t h e B O N U S E S table. Then the parenthesized
expression will evaluate to null for employee e, and the
UPDATE will therefore set e~s salary to null as well --
whereas what is wanted is clearly for e~s salary to remain
unchanged. To fix this problem~ we need to replace the
parenthesized expression by (say)

ROW_MAX ( ( SELECT BONUS ... EMP.EMP# ) , (7 )

where ROW_MAX is a function that operates b y (a) i g n o r i n g any


of its arguments that evaluate to null and then (b) r e t u r n i n g
the maximum of t h o s e that are left, if a n y , o r n u l l otherwise.
Note that ROW MAX is different in kind from the builtin
functions currently provided in S Q L - - it i s in f a c t a scalar--
valued function, whose arguments are scalar-expressions.)

sql critique 19
3. LACk-:] O F ORTIdOGONALITY: BUILTIN FUNCTIONE.;

Frankly, there is so much confusion in t h i s a r e a t h a t it is


difficult to criticize it c o h e r e n t l y . The basic point, however-,
is that the argument to a function such as SUM is a column of
scalar values and the result is a s i n g l e scalar value; hence,
orthogonality dictates that (a) a n y c o l u m n - e x p r e s s i o n should be
permitted as the argument, a n d (b) t h e f u n c t i o n - r e f e r e n c e should
be permitted in any context in w h i c h a scalar can appear.
However, (a) the argument is in f a c t specified in a most
unorthodox manner-, which means in turn that (b) function
references can actually appear o n l y in a v e r y small set of
special-case situations. In particular, functi on-ref erences
cannot appear nested inside other function-references. In
addition to this fact, functions are subject t o a l a r g e n u m b e r of
peculiar and apparently arbitrary restrictions.

Before getting into details, w e s h o u l d p o i n t o u t t h a t S Q L in f a c t


supports two distinct categories of f u n c t i o n , not however in a n y
uniform syntactic style. We refer to the two categories
informal i y as _co l !..m_
! n_ a n d table functions, respectively. We
discuss e a c h in t u r n .

Column functions are the ones that one usually t h i n k s of w h e n e v e r


functions are mentioned in c o n n e x i o n with SQL. A column function
is a -Function that reduces a n e n t i r e c o l u m n of s c a l a r v a l u e s t o a
single value. The functions in t h i s c a t e g o r y a r e COLJNT ( e x c l u d i n g
COUNT(*)), SUM, AVG, MAX, and MIN. A functional notation is used
to represent these functions; however, as suggested above, the
scoping rules for representing the argument are somewhat
unconventional. Consider the following database (suppliers and
parts):

S ( S#, SNAME, STATUS, CITY )


P ( P#, PNAME, COLOR, WEIGHT, CITY )
SF' ( S#, P#, Q T Y )

and consider also the following query:

SELECT SUM (QTY)


FROM SP

The argument t o S U M h e r e i s in f a c t t h e e n t i r e column of QTY


values in t a b l e SP, and a more conventional representation would
accordingly be:

SUM ( SELECT QTY


FROM SP )

(though once again the keyword SELECT seems rather obtrusive; QTY
FROM SF', or -- even better -- simply SP.QTY, would be more
orthodox). As another example, the query:

sql c r i t i q u e 20
SELECT S U M (QTY)
FROM SF'
WHERE P # = "F'~'

would more conventionally be represented as

SUM ( SELECT QTY


FROM SP
WHERE P# = 'F'2' )

or (better) as:

SUM ( SF'.QTY WHERE SF'.F'# = "F'2" )

A s it is, the argument has to be determined by reference to the


context. An immediate consequence of t h i s f a c t is t h a t a query
such as "Find parts supplied in a t o t a l q u a n t i t y of more than
1000" cannot be expressed in a n a t u r a l s t y l e . F i r s t , t h e syntax':

SELECT P#
FROM SP
WHERE SUM (QTY) > 1000

~!~E~2 does not work, either with SQL~s rules-For argument scope
or with any other rules. The most logical formulation (but
retaining a SQL-like style) is:

SELECT DISTINCT SPX.P#


FROM SP SPX
WHERE SUM ( SELECT QTY
FROM SP SPY
WHERE SPY. P# = SPX.P# )
> 100c)

(The DISTINCT is required because of SQL's rules concerning


duplicate elimination.) However, the normal SQL formulation would
be:

SELECT P#
FROM SP
GROUP BY P#
HAVING S U M (QTY) > 1000

N o t e t h a t t h e u s e r is n o t r e a l l y interested in g r o u p i n g p e r s e in
this query; by writing G R O U P BY, h e or s h e is in e f f e c t telling
the system how to execute the query, w h i c h is c o u n t e r to the
general philosophy of t h e r e l a t i o n a l model. To put this another
way, the statement begins to look more like a prescription for
solving the problem, rather than a simple description of w h a t t h e
problem is.

More important, it is n e c e s s a r y to introduce the HAVING clause,


the justification f o r w h i c h is n o t i m m e d i a t e l y apparent to the
user ("Why can't I use a WHERE clause?"). The HAVING clause .E2

sql critique 21
and the GROUP BY clause alsot come to ~b~ 2 ~ !~FZ2 EE ~E~

~[g~Q~ ~ 9 o ~ Q g C ~ ] ~ . ~ As a m a t t e r o f f a c t , it is possible to
p r o d u c e a SQL f o r m u l a t i o n o f t h i s e x a m p l e t h a t does n o t use GROUP
BY o r HAVING a t a l l , and i s f a i r l y close to " t h e most logical
formulation" suggested earlier:

SELECT DISTINCT P#
FROM SP SPX
WHERE 1OOO <
( S E L E C T S U M (QTY)
FROM SP SPY
WHERE SPY.P# = SPX.P# )

As mentioned earlier, current SQL requires the predicate in t h e


outer W H E R E c l a u s e to b e w r i t t e n a s s h o w n (i.e., in the order
"constant - comparison - (subquery)", i n s t e a d of t h e o t h e r way
around).

An important consequence of all of t h e f o r e g o i n g i s that SQL


c a n n o t ~O09E~ ~ E ~ E ~ E ~ C ~ ~ QQ ~E~=~E~E~ 2 ~ - Consider
the following example.

View definition:

CREATE V I E W P Q ( P#, T O T Q T Y )
AS S E L E C T P#, S U M (QTY)
FROM SP
GROUP BY P #

Attempted query:

SELECT *
FROM PQ
WHERE TOTQTY > 10OO

This query fails (it is s y n t a c t i c a l l y invalid), because the


"merging" process described earlier leads to something like the
fol 1 o w i n g :

SELECT P#, S U M (QTY)


FROM SP
WHERE S U M (QTY) > 1 O O O
GROUP B Y P#

and this is n o t a legal SELECT statement. Likewise, the attempted


query:

SELECT AVG (TOTQTY)


FROM PQ

also does not work, for similar reasons.

The following is a n o t h e r s t r i k i n g e x a m p l e of t h e u n o b v i o u s n e s s of
the scoping rules. Consider the following two queries:

sql critique 22
SELECT SUM (QTY) SELECT SUM (QTY)
FROM SP FROM SP
GROUP BY P#

In t h e f i r s t e a s e l t h e q u e r y r e t u r n s a s i n g l e v a l u e ; t h e a r g u m e n t
to the SUM invocation is t h e e n t i r e Q T Y c o l u m n . In t h e second
case,, the query returns multiple values; the SUM function is
invoked multiple times, o n c e f o r e a c h of t h e g r o u p s c r e a t e d by
the GROUP BY c l a u s e . N o t i c e h o w t h e m e a n i n g of the syntactic
construct "SUM(QTY)" is d e p e n d e n t on c o n t e x t . In fact,, SQL is
moving out of the strict tabular framework of the relational
model in t h i s s e c o n d e x a m p l e a n d i n t r o d u c i n g a n e w k i n d of data
object,, viz. a set Q~ tables ( w h i c h is of c o u r s e n o t t h e same
t h i n g a s a t a b l e at a l l ) . G R O U P BY c o n v e r t s a t a b l e i n t o a s e t of
tables. In t h e example,, S U M is t h e n a p p l i e d to (a c o l u m n w i t h i n )
each member of t h a t set. A more logical syntax might look
something like the following:

APPLY ( SUM~ SELECT QTY


FROM ( GROUP SP BY P# ) )

where " G R O U P SP BY P#" p r o d u c e s t h e s e t of t a b l e s ~ "SELECT QTY


F R O M ( ... )" e x t r a c t s a c o r r e s p o n d i n g set of columns,, a n d A P P L Y
applies the function specified as i t s f i r s t argument to each
column in t h e s e t of c o l u m n s s p e c i f i e d as i t s s e c o n d argument,
producing a set of s c a l a r s - - i.e.~ another column. (I a m not
suggesting a c o n c r e t e s y n t a x here,, only indicating a possible
direction for a systematic development of s u c h a s y n t a x . )

As a m a t t e r of fact,, GROUP BY would be logically unnecessary in


the foregoing example anyway if column function invocations were
more systematic:

SELECT DISTINCT SPX.P#,, SUM ( SELECT QTY


FROM SP S P Y
WHERE SPY.P# = SPX.P# )
FROM SP SPX

This formulation also shows~ incidentally~ t h a t it might be


preferable to d e c l a r e a l i a s e s (range variables) s u c h as S P X and
S P Y b y m e a n s of s e p a r a t e s t a t e m e n t s before they are used. As it
is,, t h e u s e of s u c h v a r i a b l e s may often precede their definition~
possibly by a considerable amount. Although there is nothing
l o g i c a l l y w r o n g w i t h this,, it d o e s m a k e t h e s t a t e m e n t s difficult
to r e a d (and w r i t e ) .

Yet another consequence of t h e s c o p i n g r u l e s ( a l r e a d y t o u c h e d on


a couple of t i m e s ) is t h a t it is n o t p o s s i b l e to nest column
function references. Extending t h e e a r l i e r e x a m p l e of g e n e r a t i n g
the t o t a l q u a n t i t y p e r p a r t (i.e., a c o l u m n of values,, e a c h of
which is a t o t a l quantity),, s u p p o s e w e n o w w a n t e d to find the
_a2~c_ag~ total q u a n t i t y p e r p a r t -- i.e., the average of that
c o l u m n of v a l u e s . ] h e l o g i c a l f o r m u l a t i o n is s o m e t h i n g like:

sql critique 23
AVG ( APPLY ( SUM, SELECT QTY
FROM ( GROUP SP BY P# ) ) )

But (as a l r e a d y stated) existing SQL cannot handle this problem


a t all in a s i n g l e e x p r e s s i o n .

Let us now leave the scoping rules and consider some additional
points. E a c h of S U M , AVG, MAX, and MIN can optionally have its
argument qualified by the operator DISTINCT. ( C O U N T @bjst h a v e i t s
argument so qualified., t h o u g h it w o u l d s e e m t h a t t h e r e is no
intrinsic justification for this requirement. For MAX and MIN
such qualification is legal but has no semantic effect.) If (and
o n l y if) D I S T I N C T is not specified., then the column argument can
be a "computed" column, i.e., the result of an arithmetic
expression - - for- e x a m p l e :

SELECT AVG ( X + Y )
FROM T

And (again) if and o n l y if D I S T I N C T is QQt specified, the


function reference can itself be an operand in an arithmetic
expression -- for example:

SELECT AVG ( X ) ~ 3
FROM T

In current SQL, null values are always eliminated from the


argument to a column .Function, regardless of w h e t h e r DISTINCT is
specified. However., t h i s s h o u l d b e r e g a r d e d as a property of t h e
existing functions specifically, rather than as a necessary
property of all c o l u m n f u n c t i o n s . In f a c t , it w o u l d b e b e t t e r t o
not to ignore nulls but to introduce a new function whose effect
is to reduce a given column to another in w h i c h n u l l s h a v e been
eliminated (and, of course., t o a l l o w t h i s n e w f u n c t i o n to be used
completely orthogonally).

Table functions

Table functions are functions that operate on an entire table


(not necessarily just on a single column). There are four
functions in t h i s c a t e g o r y , two that return a scalar value and
two that return another table. The two that return a single value
are COUNT(S) and EXISTS.

COUNT(S) is basically very similar to the column functions


discussed a b o v e . T h u s , m o s t of t h e c o m m e n t s made above apply here
also. For example, the query:

SELECT COUNT(S)
FROM SP

would more logically be expressed as

sql critique 24
COUNT ( SELECT
FROM SP )

or (better) as:

COUNT ( SP )

C O U N T (~ ) does not ignore nulls (i.e.., all-null rows) in its


argument.

EXISTS., interestingly enough., does use a more logical syntax.


For example:

SELECT
FROM S
WHERE EXISTS
( SELECT
FROM SP
WHERE SP.S# = S.S# )

-- though the EXISTS argument would look better iT the "SELECT


FROM" could be el i d e d :

SELECT
FROM S
WHERE EXISTS ( SP WHERE SP.S# = S.S# )

or (better still):

S WHERE EXISTS ( SP WHERE SP.S# = S.S# ) .

EXISTS takes a table as its argument (though that table mbjst be


expressed as a SELECT-expression., not just as a table-name) and
returns the value t r~le if that table is nonempty., false_
otherwise. Because there is currently no BOOLEAN or BIT data type
in S Q L , EXISTS can be used only in a W H E R E clause, not (e.g.) in
a SELECT clause ( l a c k of o r t h o g o n a l i t y once again).

Now we turn to the functions that return another table, viz.


DISTINCT and UNION.

DISTINCT takes a table and returns another which is a copy of


that first table except that redundant duplicate rows have been
removed (rows that are entirely null are considered as duplicates
of e a c h other in t h i s p r o c e s s -- that is., t h e r e s u l t will contain
at most one all-null row). Once again the syntax is
unconventional. For instance:

SELECT DISTINCT S#
FROM SP

instead of:

DISTINCT ( SELECT S#
FROM SP )

sql critique 2S
or (better):

DISTINCT ( SP.S# )

There is an a p p a r e n t l y arbitrary restriction that DISTINCT may


appear" at m o s t o n c e in a n y g i v e n S E L E C T s t a t e m e n t .

UNION takes two tables (each of w h i c h m u s t b e represented by


means of a S E L E C T - e x p r e s s i o n ~ not just as a simple table-name)
a n d p r o d u c e s a n o t h e r t a b l e t h a t is t h e i r u n i o n . It is w r i t t e n as
an infix operator. B e c a u s e of t h e u n o r t h o d o x syntax~ it is n o t
possible (as m e n t i o n e d b e f o r e ) to a p p l y a c o l u m n f u n c t i o n s u c h as
A V G to a u n i o n of t w o c o l u m n s .

N o t e : W e c o n s i d e r U N I O N ~ a l o n e of t h e o p e r a t o r s o~ t h e r e l a t i o n a l
algebra~ as a function in S Q L m e r e l y b e c a u s e of the special
syntactic treatment it is g i v e n . S Q L is r e a l l y a h y b r i d of the
relational algebra and the relational calculus; it is not
precisely t h e s a m e as e i t h e r , t h o u g h it l e a n s s o m e w h a t t o w a r d t h e
calculus -- a d i a l e c t of t h e c a l c u l u s t h a t d o e s n o t l e n d itself
v e r y n e a t l y t o s u p p o r t of U N I O N ~ h o w e v e r ~ w h i c h is p r e c i s e l y why
the special treatment is n e c e s s a r y .

sql critique 26
4. LACK OF ORTHOGONALITY: MISCELLANEOUS ITEMS

Let F be a database field that can accept null values, and let HF
be a corresponding host variable, with associated indicator
variable HN. T h e n :

SELECT F
INT0 :HF:HN

is legal, and so are

INSERT . . .
VALUES ( : H F : H N ... )

and

UPDATE ...
SET F = :HF:HN

But the following is not:

SELECT ... ( or UPDATE or DELETE )


m m l

WHERE F = :HF:HN

Let C be a cursor that currently identifies a record of t a b l e T.


T h e n it i s p o s s i b l e to designate the "CURRENT O F C" - - i . e . , the
record currently identified by C -- as the target of a n U P D A T E or
DELETE statement, e.g.~ as follows:

UF'DATE T
SET ...
WHERE CURRENT OF C

Incidentally, a more logical formulation would be

UPDATE CURRENT OF C
SET ...

Specifying the table-name T is redundant (this point is


recognized in t h e s y n t a x of F E T C H , see later), a n d in a n y c a s e
"CURRENT OF C" i s n o t t h e s a m e k i n d of c o n s t r u c t as the more
usual WHERE-predicate (e.g. : "SALARY > 20000") . Nor is it
permitted to combine "CURRENT O F C" w i t h o t h e r predicates and
write (e.g.) "WHERE CURRENT OF C AND SALARY > 20000". But to
return to the main argument : Ai t h o u g h the (first) UPDATE
statement above is l e g a l , the analogous SELECT statement

sql critique 27
SELECT ...
FROM T
WHERE CURRENT OF C

is not. Nor can fields within the "CURRENT O F C" be directly


referenced -- e.g., the following is also illegal:

SELECT
FROM EMP
WHERE DEPT# =
( SELECT DEPT#
FROM DEPT
WHERE CURRENT OF D

Turning now to the FETCH statement~ we have here an example of


bundling. "FETCH C INTO ..." is effectively a shorthand for a
sequence of t w o d i s t i n c t operations --

STEP C TO NEXT
SELECT ~ I N T O ... WHERE CURRENT OF C

-- the f i r s t of w h i c h (STEP) advances C to the next record in T


in a c c o r d a n c e with the ordering associated w i t h C,, a n d t h e s e c o n d
(if which (SELECT) then retrieves that record. As noted above,
that SELECT does not logic:ally require any FROM clause. Replacing
the FETCH statement by two more primitive statements in t h i s w a y
would have the following advantages:

(a) it is clearer;

(b) it i s a m o r e l o g i c a l structure (incidentally, "FETCH C"


does not really make intuitive s e n s e - - it i s n o t the ~J~Z~QC
that is b e i n g fetched);

(c) it would allow SELECTs of i n d i v i d u a l fields of t h e c u r r e n t


record (i.e., "SELECT field-name" as well as "SELECT ~");

(d) it would allow selective (and repeated) access to that


current record (e.g.., "SELECT F" f o l l o w e d by "SELECT G"., b o t h
selecting fields of t h e s a m e r e c o r d ) ;

(e) it w o u l d be extendable to other kinds of STEP operation --


e.g.,, S T E P C TO PREVIOUS (say).

In f a c t I would go further. First., n o t e t h a t " C U R R E N T O F C" is a n


example of a r o w - e x p r e s s i o n . Let us therefore introduce a (new)
FETCH statement,, whose argument is a row-expression (as opposed
to SELECT, whose argument is a table-expression)., and whose
function is to retrieve the row represented by that expression.
N e x t ., o u t I a w SELECT where FETCH is real I y int ended. N e x t,
introduce "(row-expression).field-name .... e.g., (CURRENT OF C).F
-- as a n e w f o r m of s c a l a r - e x p r e s s i o n . Finally, support a l l of
these constructs orthogonally. Thus, for example, a l l of the
following would be legal:

sql critique 2B
FETCH CURRENT OF C I N T O . . .

FETCH (CURRENT OF C).F" INTO ...

SELECT
FROM EMP
WHERE DEPT# = (CURRENT OF C).DEPT#

UPDATE CURRENT OF C
SET ...

DELETE CURRENT OF C

The examples illustrate the point that "CURRENT O F C" i s r e a l l y a


very clumsy notation, incidentally, but an improved syntax is
beyond the scope of this paper. See [5] ~or a preferable
alternative.

Specifying O R D E R B Y in t h e d e c l a r a t i o n of c u r s o r C m e a n s t h a t t h e
statements UPDATE/DELETE ... CURRENT OF C are illegal (in f a c t ,
the declaration of C c a n n o t include a FOR UPDATE clause if ORDER
BY is specified). The rationale for this restriction is that
ORDER BY may cause the program to operate on a copy instead of o n
the actual data, and hence that updates and deletes would be
meaningless; but the restriction is unfortunate, to say the
least. Consider a program that needs to process employees in
department number o r d e r a n d n e e d s t o u p d a t e s o m e of t h e m a s it
goes. The user is forced to code along the following lines:

EXEC SQL DECLARE C CURSOR FOR


SELECT EMP#, DEPT#,
FROM EMP
ORDER BY DEPT# ;

EXEC SQL OPEN C ;


DO WHILE more-to-come ;
EXEC SQL FETCH C INTO :EMP#, :DEPT#, ... ;
if t h i s r e c o r d n e e d s u p d a t i n g ~ then
EXEC SQL UPDATE EMP
SET ...
WHERE E M P # = : E M P # I~ i n s t e a d of CURRENT OF C ~/ ;
END ;
EXEC SQL CLOSE C ;

The UPDATE statement here is an "out-of-the-blue" UPDATE, not the


CURRENT form. Problems:

(a) The u p d a t e w i l l be v i s i b l e through cursor C if and o n l y if


C i s r u n n i n g t h r o u g h the r e a l d a t a , not a copy.

(b) If c u r s o r C i s running through the real data, a n d if the


UPDATE changes the v a l u e of D E P T # ~ the effect on the position
of c u r s o r C w i t h i n the table is apparently undefined.

sql critique 29
We remark also that the FOR UPDATE clause is a little mysterious
(its real significance is not immediately apparent); it i s also
logically unnecessary. T h e w h o l e of t h i s a r e a s m a c k s of a most
unfortunate l o s s of p h y s i c a l data independence.

The keyword NULL may be regarded as a "builtin constant",


representing the null value. However, it c a n n o t a p p e a r in all
positions in which a scalar constant can appear. PoE example, the
statement

SELECT F, NULL
FROM T

is illegal. This is unfortunate, since the ability to select NULL


is precisely what is required in o r d e r t o c o n s t r u c t an outer join
(in the absence of d i r e c t s u p p o r t for such an operation). See
[4].

EmQt~ sets

Let T be a table-expression. If T h a p p e n s to evaluate to an empty


set, then what happens depends on the context in w h i c h T a p p e a r s .
For example, consider the expressions

SELECT SALARY and SELECT AVG (SALARY)


FROM EMP FROM EMP
WHERE DEPT# = ~D3' WHERE D E P T # = ~D3 ~

and suppose that department D3 currently has no employees. Note


that t h e s e c o n d of t h e s e e x p r e s s i o n s represents the application
of the AVG function t o t h e r e s u l t of t h e f i r s t ; as pointed out
earlier, it w o u l d m o r e l o g i c a l l y be written as

AVG (SELECT SALARY


FROM EMP
WHERE DEPT# = ~D3")

The statement

EXEC SQL SELECT SALARY


INTO :S:SN
FROM EMP
WHERE DEPT# = ~D3 ~ ;

gives "not found" (SQLCODE = +100, host variables S and SN


unchanged) .

sql critique
3O
The statement

EXEC SQL SELECT AVG (SALARY)


INTO :S:SN
FROM EMP
WHERE DEPT# = ~D3'

sets host variable SN to an unspecified negative value to


indicate that the value of the expression is null. The effect
on host variable S is unspecified.

The statement

EXEC SQL SELECT ...


INTO :S:SN
FROM ...
WHERE field IN
( SELECT SALARY
FROM EMP
WHERE DEPT# = ~D3 ~ )

gives "not found" (at the ok.!t__er_ l e v e l ) .

The statement

EXEC SQL SELECT ...


INTO :S:SN
FROM ...
WHERE field =
( SELECT SALARY
FROM EMP
WHERE DEPT# = ~D3' ) ;

also gives "not found" (at tlne o u t e r level), though there is a


good argument for treating this case as an error, as follows:
The parenthesized expression "(SELECT SALARY ...)" should
really be regarded as a shorthand for "UNIQUE (SELECT SALARY
...)", where UNIQUE is a quantifier (analogous to EXISTS)
meaning "there exists ~ E ~ 2 QQ@ .... or, in o t h e r words, a
function whose effect is to return the single element from a
singleton set and to raise an error if t h a t s e t d o e s not in
fact contain exactly one member-. Note that an error be
raised in the example if t h e p a r e n t h e s i z e d expression yielded
a set having more than one member (which in general, of
course, it w o u l d ) .

The statement

EXEC SQL SELECT ...


I N]'O :S :SN
FROM ...
WHERE field =
( SELECT AVG (SALARY)
FROM EMP
WHERE DEPT# = ~D3 ) ;

sql critique 31
also gives "not found" at the outer level.

(]ompare the following:

SELECT * FROM T ...

UF'DATE T ...

DELETE FROM T ...

I N S E R T I N T O T ...

( FETCH C ... )

A more ¢:onsistent approach would be to define "table-expressions"


(as suggested earlier)., and then to recognize that SELECT,
UPDATE, etc., are each operators: o n e of w h o s e arguments :is s u c h
a table-expression. (A p r o b l e m that immediately arises is that a
simple table-name is currently [Lo.~ a v a l i d table-expression! --
i.e..~ instead of b e i n g able to write simply T, the user has to
write SELECT * FROM T. ]'his p o i n t has been mentioned before, and
i s of c o u r s e easi I y remedied. )

Nc:,te t o o that the syntax UPDATE T S E T F = ... does not extend


very nicely t o a f o r m of U P D A T E in which an entire record is
replac:ed en bloc ( S E T * = ... ?). A n d t h i s touches on yet another
point., viz: SQL currently provides whole-record SELECT (and
FETCH) and INSERT operators, but no whole-record UPDATE operator.
(DELETE of course must be "whole-record".)

L..o...r'.,g ..f..l.e_!.~ .!..L_Q.N.Q Y.A~C_...H.~_ .or. ).,~QUSF~.!._n._}. ~k~.b..n 2 ~!.

Long fields are subjec:t to numerous restrictions. Here are some


(if them (this may or may not be an exhaustive 1 i st ) . A Iong
field:

- cannot be referenced in a predicate

- cannot be indexed

- cannot be referenced in SELECT DISTINCT

- cannot be referenced in GROUP BY

- cannot be referenced in ORDER BY

- cannot be referenced in COUNT, MAX, MIN (note: SUM and AVG


would make no sense)

- cannot be involved in a UNION

- cannot be involved in a "subquery" (c ol u m n - e x p r e s s i (in)

sql critique 32
- cannot be INSERTed from a constant or SELECT-expression

- cannot be UPDATEd from a constant (UPDATE from NULL i~


legal., h o w e v e r )

b!~.~!Q~ ~_e ~.~ ~.~.._c.


t.!.~o.~

UNION is not permitted on long fields o r in a subquery (in


particular~ in a view definition). Also., the data types of
corresponding i t e m s in a U N I O N m u s t b e e,'actlv_~
........... ~. t h e s a m e :

- if the data t y p e i s DECIMAL(p.,q)., then p must be the same


for both items and q must be the same for both items

- if the data type is CHAR(n), then n must be the same for


both items

- if the data type is VARCHAR(n): then must be the same for


both i tems

- if NOT NULL applies to either item., then it must apply to


both

Given these restrictions~ it i s p a r t i c u l a r l y unfortunate that a


character string constant such as ~ABC ~ is treated as a var2iog
length string -- a varying string., moreover., for which Q~!!!~ ~

Note also that UNION always eliminates duplicates. There is no


"DISTINCT/ALL" option as there is with a simple SELECT; and if
there were~ the default would have to be DISTINCT (for
compatibility reasons)., whereas the default for a simple SELECT
i s ALL..

GROUP BY:

- only works to one level (it c a n construct a "set of tables"


but not a " s e t of s e t s of tables"., etc.) '

- can only have simple fields as arguments (unlike ORDER BY)

T h e f a c t is., a s i n d i c a t e d in t h e d i s c u s s i o n of f u n c t i o n s earlier.,
an orthogonal treatment of G R O U P BY would require a thorough
treatment of a n e n t i r e l y n e w k i n d o f d a t a object., n a m e l y the "set
of tables .... presumably a major undertaking.

Null values are implemented by hidden fields in t h e database.


However., it i s n e c e s s a r y to expose those fields in t h e i n t e r f a c e
to a host language such as PL/I~ because PL/I has no notion of
null. As an example~ if F a n d G a r e t w o f i e l d s in t a b l e T~ the

sql critique 33
UPDATE statement to set F equal to G is:

EXEC SQL UPDATE T


SET F = G ...

but the UPDATE statement to set F equal to a host variable H is


(for instance):

EXEC SQL UPDATE T


SET F = :H:HN ...

(assuming in both cases that the source of the assignment might


be null).

Indicator variables are not permitted in all contexts where


host variables can appear (as a l r e a d y discussed).

To test (in a W H E R E c l a u s e ) whether a field is null, SQL


provides the special comparison "field IS NULL". It is not
intuitively obvious why the user has to write "field IS N U L L " and
not "field = NULL" -- especially as the format "field = NULL" is
used in t h e S E T c l a u s e of t h e UF'DATE s t a t e m e n t to update a field
to the null value. (In f a c t , the WHERE clause "WHERE field =
NULL" is illegal syntax.)

Null values are considered as duplicates of e a c h o t h e r for the


purposes of UNIQUE and DISTINCT and ORDER BY but not for the
purposes of W H E R E a n d G R O U P BY. Null values are also considered
as greater than all nonnull values for the purposes of O R D E R BY
b u t n o t for- t h e p u r p o s e s of W H E R E .

Null values are always eliminated from the argument to a


builtin function such as SUM or AVG, regardless of whether
DISTINCT is specified in the function reference -- except for the
c a s e of C O U N T ( S ) , which counts all rows, including duplicates and
including all-null rows. Thus, for example, given:

SELECT AVG (STATUS) FROM S -- Result: x

SELECT SUM (STATUS) FROM S -- Result: y

SELECT C O U N T (~) FROM S -- Result: ~

there is no guarantee that x = y/z.

As a consequence of the foregoing, the function reference


SUM(F) (for example) i s QQt_ semantically equivalent to the
ex p r e s s i o n

f l + f2 + .. + fn

w h e r e fl fo fn are the values appearing in f i e l d F at the


time of e v a l u a t i o n . Perhaps even more counterintuitively, the
expression

sql critique 34
SUM (Fi + F2)

is not equivalent to the expression

SUM (Fi) + SUM (F2) .

H o_st_ v a r i a b l e s

Host variables are permitted in t h e I N T O c l a u s e (of S E L E C T and


FETCH), the SET clause (of U P D A T E ) , and the WHERE clause (of
SELECT, UPDATE, and DELETE), but nowhere else. In p a r t i c u l a r ,
table-names and field-names cannot be represented by host
variables.

Introduced names

The user can introduce names (aliases) for tables (e.g., FROM T
TX) b u t n o t f o r s c a l a r s (e.g., SELECT F FX). This latter facility
would be particularly useful when the scalar is in fact
represented as an operational expression -- e.g., S E L E C T A + B C.
The name C c o u l d b e u s e d in O R D E R B Y o r in G R O U P B Y or as an
inherited n a m e in C R E A T E V I E W (etc., etc.).

Certain INSERT, UPDATE, and DELETE statements are not allowed.


For example, consider the requirement "Delete all suppliers with
a status less than the average". The statement:

DELETE
FROM S
WHERE STATUS <
( SELECT AVG (STATUS)
FROM S )

is illegal, because the FROM clause in t h e subquery refers to the


table against which the deletion is to be done. Likewise, the
UPDATE statement

UPDATE S
!SET STATUS = O
WHERE STATUS <
( SELECT AVG (STATUS)
FROM S )

is also illegal, for analogous reasons. Third, the statement

INSERT INTO T
SELECT ~ FROM T

which might be regarded as a perfectly natural way to "double up"


on the contents of a t a b l e T, is also illegal, again for
analogous reasons.

sql critique 35
~J. F O R M A L .DEFINITION

As indicated earlier :i.n t h i s paper, it would be misleading to


suggest that SQL d o e s n o t possess a detailed definition. However,
as was also indicated earlier, that definition [i0] was produced
"after the ~act" In some respects, therefore, it r-epresents a
definition of the way implementations actually work rather than
the way a "pure" language ought to be (although it must be said
that maTy of the criticisms of the present paper" have indeed been
addressed in [I()]). At t h e s a m e t i m e it provides definitive
answers to some questions t h a t a r e n o t in a g r e e m e n t with the way
IBM SQL act.ually works! Furthermore, there s t i l l appear- t o be
some areas where the definition is not yet precise enough. We
give examples of all of "these a s p e c t s below.

Let C be a cursor tlTat is currently associated with a set of


records of type R. Suppose moreover that the ordering associated
with C is defined by values of field R.F. If C is positioned on a
record r and r is deleted, C goes into the "before" state --
i.e., it is now positioned "before" record rl, where rl is the
immediate successor of r with respect to the ordering associated
with C -- or', if there is no such successor record, then it goes
into the "after" state -- i.e., it is " a f t e ~ .... t h e last record in
the set (note: the " a f t e r .... s t a t e is possible even if the set is
empty) .

Questions:

(a) If C i s " b e f o r e rl" and a new record r is inserted with a


v a l u e of R . F s u c h t h a t r l o g i c a l l y belongs between rl a n d r l : s
predec:essor (if any), what happens to C'? [Answer:
Impl e m e n t a t i o n - d e f i n e d . ]

(b) D o e s it make a difference if t h e n e w r e c o r d r logically


precedes or follows the old record r that C was positioned on
before that record was deleted? [Answer: Implementation-
def ined. ]

(c) Does it make a difference if C was actual 1 y running


through a copy of the real set of records? [Answer:
Implementation-defined.]

Note for cases ( a ) - ( c ) that. it ~.~ g u a r a n t e e d that the next


"FETCH C" will retrieve record rl ( p r o v i d e d no other DELETEs
etc. occur in t h e i n t e r i m ) .

(d) W h a t if t h e n e w r i s n o t a n INSERTed record but an UPDATEd


record? [Answer: Not defined.]

(e) If C i s p o s i t i o n e d on a record r' a n d t h e value of f i e l d F


in t h a t r e c o r d is updated (not via cursor C, of c o u r s e ) ; w h a t
happens to C? [Answer: Not defined. ]

sql critique 36
Does I_OCK SHARED acquire an S lock or an SIX lock [9]? If the
answer- is S, are updates permitted'7 When are locks acquired via
LOCK TABLE released?

First, consider- the two statements:

!SELECT S#
FROM S
WHERE CITY = ~London'

ELECT P#
FROM P
WHERE CITY = ~London'

The meaning of t h e u n q u a l i f i e d name CITY depends on the context


- - it is taken as S.CITY in t h e f i r s t of t h e s e e x a m p l e s and as
P.CITY in the second. But now suppose the c:olumns are renamed
SCITY and PCITY respectively, so that now the names are globally
unique, and consider the query "Find suppliers located in c i t i e s
in which no parts are stored". The obvious formulation of this
query i s:

SELECT S#
FROM S
WHERE NOT EXISTS
( SELECT
FROM P
WHERE PCITY = SCITY )

However., this statement is invalid. SQL assumes that "SCITY" is


shorthand for "P.SCITY", and then complains that no such field
exists. The following statement., by contrast, is p e r f e c t l y valid:

SELECT S#
FROM S
WHERE NOT EXISTS
( SELECT
FROM P
WHERE PCITY = S.SCITY )

So also is:

SELECT S#
FROM S SX
WHERE NOT EXISTS
( SELECT
FROM P
WHERE PCITY = SX.SCITY )

Is the following legal?

scll c r i t i q u e 37
SELECT *
FROM S
WHERE EXISTS ( SELECT *
FROM SP SPX
WHERE SPX.S# = S.S#
AND SPX.P# = ~PI ~
AND EXISTS ( SELECT *
FROM SP SPX
WHERE SPX.S# = S.S#
AND SPX.P# = ~P2' )

What if " F R O M S P SPX" is replaced by "FROM SP" (twice) and all


other occurrences of "SPX" are replaced by "SP"? And is the
following legal?

SELECT *
FROM S
WHERE EXISTS ( SELECT *
FROM SP SPX
WHERE SPX.S# = S.S#
AND SPX.P# = ~PI' )
AND EXISTS ( SELECT *
FROM SP SPX
WHERE SPX.S# = S.S#
AND SPX.P# = ~P2 ~ )

(etc., etc.). In other words: What are the name scoping rules for
"aliases" (range variables)?

There is another point to be made while on the subject of name


resolution, incidentally. Consider the statement:

SELECT S.S#, P.P#


FROM S, P
WHERE S.CITY = P.CITY

(we now go back to the unqualified name C I T Y in e a c h of t h e two


tables). This statement is (conceptually) evaluated as follows:

- form the product of S and P; call the result TEMPi

- restrict TEMPi according to the predicate S.CITY = P.CITY;


call the result TEMP2

- project TEMP2 over the columns S.S# and P.P#

Butt how can this be done? The predicate "S.CITY = F'.CITY" does
not refer to any columns of T E M P i (it r e f e r s to columns of S and
P, o b v i o u s l y ) . Similarly, S.S# and P.P# are not columns of T E M P 2 .
In o r d e r for these references to be interpreted appropriately, it
is necessary to :ii n t r o d u c e cer tai n n am_e i_n_h_e~z~_ta n c e ~7.t=~l_~s,
indicating how resuit tables inherit column-names from their
source tables (which may of c o u r s e may themselves also be
[intermediate] result tables, witln i n h e r i t e d column-names of

scll c r i t i c l u e
38
their own). Such rules are currently defined only very
informally, if at all. Such r u l e s become even more i m p o r t a n t i f
SQL i s t o p r o v i d e s u p p o r t f o r n e s t e d e x p r e s s i o n s .

When exactly does a cursor iterate over the real "base data" and
when over a copy?

When exactly does "~" b e c o m e bound to a specific set of field-


names? [Answer: Implementation-defined -- but this seems an
unfortunate aspect to leave to the implementer, especially as the
binding is likely t o be d i f f e r e n t for different b!s_e_s o f the
feature (e.g., it may depend on w h e t h e r t h e " ~ " a p p e a r s in a
program or i n a view d e f i n i t i o n ) . ]

sql critique 39
6. MISMATCH WI]H HOST LANGUAGE

The general p o i n t h e r e i s t h a t tlnere a r e f a r t o o m a n y friw~Ious


distinctions between SQL and the host language in which it
happens to be embedded; a l s o t h a t in s o m e c a s e s S Q L h a s f a i l e d t o
benefit from lessons learned in the design of those host
languages. Generally, orthogonality suggests that what is useful
on o n e s i d e of t h e i n t e r f a c e (in t h e w a y of d a t a s t r u c t u r i n g and
access for "permanent" [i.e., database] data) is likely to be
useful on t h e o t h e r s i d e a l s o (for- "temporary" [i.e., local]
data); thus~ a distinct sublanguage is the wrong approach: and a
two-level s t o r e is w r o n g t o o ( f u n d a m e n t a l l y so!). Some specific
points :

SQL does not exploit the exception-handling capabilities of t h e


host (e.g., PL/I ON-conditions). T h i s p o i n t a n d ( e v e n m o r e so)
the following one mean that SQL does not exactly encourage the
production of w e l l - s t r u c t u r e d , quality programs, a n d t h a t in s o m e
respects SQL programming i s at a l o w e r l e v e l t h a n tlnat of the
host.

SQL does not exploit the control structures of t h e host (i c o p


constructs in p a r t i c u l a r ) . See the pr-evious point.

SQL objects (tables~ cursors~ etc.) are not known and cannot be
referenced in the host environment.

Host objects can be referenced in the SQL enwLronment only if:

- they are specially declared (may not apply to all hosts)

- they are scalars or certain 1 imited structures (in


particular, they are not arrays)

- the references are marked with a colon prefix (admittedly


only :in some contexts - - b u t in my opinion "some" is worse
t h a n "al I " )

- the references are constrained to certain limited contexts


(e.g.~ they can appear in a S E L E C T c l a u s e b u t n o t a FROM
clause)

- the references are constrained to certain limited formats


(e.g.~ no subscripting~ only limited dot qualification~ etc.)

SQL object names and host object names are independent and may
clash. SQL names do not follow the scoping r u l e s of t h e h o s t .

SQL keywords and host keywords are independent and may clash
(e.g., P L / I S E L E C T vs. S Q L S E L E C T ) .

SQL and host may have different name qualification rules (e.g.,
T . F in S Q L vs. F O F T in C O B O L ; and note that the SQL form must
be used even for host object references in t h e S Q L e n v i r o n m e n t ) .

sql critique 40
SQL and host may have different data type conversion rules.

SQL and host may have different expression evaluation rules


(e.g., SQL division and varying string comparison differ from
t h e i r P[_/I a n a l o g s [at l e a s t in S Q L / D S ] ) .

SQL and host may have different Boolean operators (AND, OR, and
N O T in S Q L vs. &, :, a n d ~ in P L / I ) .

SQL and host may have different comparison operators (e.g.,


C O B O L h a s IS N U M E R I C , SQL has BETWEEN [and many others]).

SQL imposes statement ordering restrictions that are alien to


tlne h o s t .

SQL DECLARE cannot be abbreviated to DCL, unlike PL/I DECLARE.

Null is handled differently on the two sides of the interface.

Function references have different formats on the two sides of


the interface.

SQL name resolution rules are different from those of the host.

Cursors are a clumsy w a y of b r i d g i n g the gap between the


database and the program. A m u c h b e t t e r metlnod would be to
associate a query with a conventional ~gQ~o~! ~!~ in t h e h o s t
program, and then let the program use conventional READ, REWRITE,
and DELETE statements to access that file (maybe INSERT
statements too).

The "structure declarations" in C R E A T E TABLE should use the


standard COBOL or PL/I (etc.) syntax. A s it is, it i s d o u b t f u l
whether they can be elegantly extended to deal with minor
structures (composite fields) or arrays, should such extensions
ever prove desirable (they will).

The SQL parameter mechanism is regressive, clumsy, ad hoc,


restrictive, and different from t h a t of t h e h o s t .

sql critique 41
7. MISSING FUNCTION

(Note: It i s o b v i o u s l y p o s s i b l e t o e x t e n d t h e e x i s t i n g languaqe
to incorporate most i f n o t a l l o f t h e following features. We
mention them f o r c o m p l e t e n e s s . )

Ability to override WHENEVER NOT FOUND a t the level of an


individual statement.

"Whole-record" UPDATE.

Procedure call instead of GO TO on WHENEVER.

Cursor stepping other than "next".

Cursor comparison.

Cursor assignment.

Cursor constants.

Cursor arrays.

Dynamically created cursors and/or cursor stacks.

Reusable cursors.

Ability to access a unique record and keep a cursor on it


without having to go through separate DECLAREr OPEN~ and FETCH:
e.g., "FETCH UNIQUE ( E M P W H E R E E M P # = ~E2 ~ ) S E T ( C ) ;".

Fine control over locking.

sql critique 42
8. MISTAKES

I h a v e a r g u e d a g a i n s t n u l l v a l u e s at l e n g t h e l s e w h e r e [6], and I
will not repeat those arguments here. In my o p i n i o n the null
v a l u e c o n c e p t is f a r m o r e t r o u b l e t h a n it is w o r t h . Certainly it
has never been p r o p e r l y t h o u g h t t h r o u g h in the existing SQL
implementations (see t h e d i s c u s s i o n u n d e r " L a c k of O r t h o g o n a l i t y :
Miscellaneous Items", earlier). For example, the fact that
functions s u c h as A V G s i m p l y i g n o r e n u l l v a l u e s in t h e i r a r g u m e n t
violates what should surely be a fundamental principle, viz: ! ~
~ ~ ~Qc3!.~ O ~ 2 ~ E p r o d u c e a ( s p u r i o u s l ~ 2 p r e c i s e a n s w e r to
gY~E2 w h e n t h e d a t a i n v o l v e d in t h a t g u e r 2 is i t s e l f ~mprecise.
At least the system should offer the user the explicit option
either to ignore nulls or to treat their presence as an
exception.

Field uniqueness is a l o g i c a l p r o p e r t y of the data, not a


physical p r o p e r t y of an a c c e s s p a t h . It s h o u l d b e s p e c i f i e d on
CREATE TABLE, n o t on C R E A T E I N D E X . S p e c i f y i n g it on C R E A T E I N D E X
is an unfortunate bundling, and may lead to a loss of data
independence (dropping the index puts the integrity of the
database at r i s k ) .

The only function of the FROM clause that is not actually


redundant is to a l l o w t h e i n t r o d u c t i o n of r a n g e variables, and
that function would be better provided in some more elegant
manner. (The n o r m a l use, as e x e m p l i f i e d by t h e e x p r e s s i o n SELECT
F F R O M T, c o u l d b e t t e r b e h a n d l e d by t h e e x p r e s s i o n SELECT T.F,
especially since this latter expression - - w i t h an accompanying
b u t redunclant F R O M c l a u s e -- is a l r e a d y l e g a l S Q L . )

SQL does not make a clear distinction between tables, record


types, and range variables. I n s t e a d , it a l l o w s a s i n g l e s y m b o l t o
s t a n d for a n y o n e of t h o s e o b j e c t s , a n d l e a v e s t h e i n t e r p r e t a t i o n
to d e p e n d on c o n t e x t . C o n c e p t u a l c l a r i t y w o u l d d i c t a t e t h a t it at
least be Q o s s ! b l e a l w a y s to d i s t i n g u i s h among these different
constructs (i.e., syntactically), e v e n if t h e r e a r e r u l e s t h a t
allow such punning games to be played when intuitively
convenient. Otherwise it is p o s s i b l e t h a t -- f o r example --
extendability m a y s u f f e r , t h o u g h I h a v e to a d m i t t h a t I c a n n o t at
the time of w r i t i n g p o i n t to a n y concrete problems. (But it
shouldn't be O ~ ~ E 2 to h a v e to d e f e n d t h e p r i n c i p l e of a o n e -
to-one correspondence between names and objects!)

While on t h e s u b j e c t of p u n n i n g , I might also mention the point


that SQL is a m b i v a l e n t as to t h e m e a n i n g of t h e term "table".

sql c r i t i q u e
43
Sometimes "table" means, specifically, a b_~se t a b l e (as in C R E A T E
TABLE); at other times it m e a n s "base table or view" (as in
COMMENT ON TABLE). Since the critical point about a view is that
it i s a t a b l e (just as the critical point about a subset is that
it i s a s e t ) , I w o u l d v o t e f o r t h e f o l l o w i n g changes:

(a) Replace the terms "base table" arid "view" by "real table"
and "virtual table" r espectivelv~
.q , .

(b) Use the term "table" generically to mean "real table or


virtual table";

(c) In c o n c r e t e syntax, use the expressions [REAL] TABLE and


VIRTUAL TABLE ( w h e r e it i s necessary to distinguish them),
with REAL as the default.

"SELECT ~"

This is a good example of a situation in w h i c h t h e n e e d s of the


end-user and t h o s e of t h e a p p l i c a t i o n programmer are at odds.
"SELECT ~" is fine for the interactive user (it saves
keystrokes). I believe it i s r a t h e r d a n g e r o u s for the programmer
(because the meaning of "~" m a y c h a n g e a t a n y t i m e in t h e l i f e of
the program). The u s e of " O R D E R B Y n" ( w h e r e n i s an integer
i n s t e a d o f a f i e l d - n a m e ) i n c o n j u n c t i o n w i t h "SELECT ~" c o u l d be
particularly unfortunate. Similar r e m a r k s a p p l y t o t h e use of
INSERT w i t h o u t a l i s t of f i e l d - n a m e s .

Incidentally~ I believe that the foregoing are the on12


situations in the entire SQL language in w h i c h the user- is
dependent on the left-to-right ordering of columns within a
table. It would be nice to eliminate that dependence entirely
(except possibly for "SELECT ~"., f o r i n t e r a c t i v e queries only).

=ANY (etc.)

The comparison operators =ANY., )ALL., etc.., a r e t o t a l l y redundant


and in m a n y c a s e s a c t i v e l y misleading. The following example is
taken from "IBM Database 2 SQL Usage Guide" (IBM Form No. GG24-
1583): "Select employees who are younger than any member of
department E21" (irrelevant details omitted).

SELECT EMPNO, LASTNAME, WORKDEPT


FROM TEMPL
WHERE BRTHDATE >ANY ( SELECT BRTHDATE
FROM TEMPL
WHERE WORKDEPT = ~E21 ~ )

This SELECT does n o t f i n d e m p l o y e e s who a r e y o u n g e r than any


employee in E21 ( a t l e a s t i n t h e s e n s e that this requirement
would n o r m a l l y be u n d e r s t o o d i n c o l l o q u i a l English) -- it finds
e m p l o y e e s who a r e y o u n g e r t h a n s o m e employee i n E21.

To i l l u s t r a t e the redundancy, consider the query: "Find supplier


names f o r s u p p l i e r s who s u p p l y p a r t F'2". This is a very simple

sql critique 44
problem, yet it i s n o t d i f f i c : u l t to find no less than seven at
least superficially distinct formulations for it ( s e e b e l o w ) . Of
course, the differences would not be important if al 1
formulations worked equally well, but that is unlikely.

1. SELECT SNAME
FROM S
WHERE S # IN
( SELECT S#
FROM SP
WHERE P# = ~F'2~)

2. SELECT SNAME
FROM S
WHERE S # ==ANY
( SELECT S#
FROM SP
WHERE P# = ~P2 ~ )

3. SELECT SNAME
FROM S
WHERE EXISTS
( SELECT
FROM SP
WHERE S# = S.S# AND P# = ~P2 ~ )

4. SELECT DISTINCT SNAME


FROM S, S P
WHERE S.S# = SP.S# AND P# = ~P2")

5. SELECT SNAME
FROM S
WHERE 0 <
( SELECT COUNT(U)
FROM SP
WHERE S# = S.S# AND P# = ~P2')

6. SELECT SNAME
FROM S
WHERE ~P2' IN
( SELECT P#
FROM SP
WHERE S# = S.S# )

7. SELECT SNAME
FROM S
WHERE ~P2 ~ =ANY
( SELECT P#
FROM SP
WHERE S# = S.S# )

In general, the WHERE clause

WHERE x $ANY ( SELECT y FROM T WHERE p )

sql critique
45
(where $ is any one of =~ >~ etc.) is equivalent to the WHERE
clause

WHERE EXISTS ( SELECT * FROM T WHERE (p) AND x $ T.y )

Likewise~ the WHERE clause

WHERE x SALL ( SELECT y FROM T WHERE p )

is equivalent to the WHERE clause

WHERE NOT EXISTS ( SELECT * FROM T WHERE (p)


AND NOT ( x $ T.y ) )

A s a m a t t e r of f a t t y it i s n o t j u s t t h e c o m p a r i s o n operators =ANY
(etc.) that are redundant; the entire subquery construct could be
removed from SQL with effectively n o l o s s of function. (Nested
table- and column-expressions etc. would of c o u r s e still be
required~ as argued earlier.) This is ironic~ s i n c e it w a s the
subquery notion that was the justification for the "Structured"
in " S t r u c t u r e d Query Language" in t h e f i r s t p l a c e .

sql critique 46
9. ASPECTS OF THE RELATIONAL MODEL NOT SUPPORTED

There are several aspects of t h e f u l l relational model (as


d e f i n e d in~ e.g. [2]) t h a t S Q L d o e s n o t c u r r e n t l y support. We
list them h e r e in a p p r o x i m a t e o r d e r of importance. Again~ of
courser m o s t of t h e s e f e a t u r e s c a n b e a d d e d t o S Q L at s o m e l a t e r
p o i n t -- t h e s o o n e r t h e b e t t e r ~ in m o s t cases. However~ their
omission n o w l e a d s t o a n u m b e r of s i t u a t i o n s in c u r r e n t S Q L t h a t
are extremely ad h o c a n d m a y b e d i f f i c u l t t o r e m e d y l a t e r on~ f o r
compatibility reasons.

Primary keys provide the sole record-level addressing mechanism


within the relational model. T h a t is~ t h e o n 1 2 s y s t e m - g u a r a n t e e d
m e t h o d of i d e n t i f y i n g an i n d i v i d u a l r e c o r d is v i a t h e c o m b i n a t i o n
(R~k)~ where R is t h e n a m e of t h e c o n t a i n i n g relation and k is
the primary key value for the record concerned. Every relation
(to be a relation) is r e q u i r e d to h a v e a p r i m a r y key. Primary
k e y s a r e (of c o u r s e ) r e q u i r e d t o b e u n i q u e ; in t h e c a s e of real
(base) r e l a t i o n s ~ they are also required to be (wholly) nonnull.

SQL currently provides mechanisms that allow users to apply the


primary key discipline for themselves (if t h e y c h o o s e ) ~ but does
not itself understand the semantics associated with that
discipline. As a r e s u l t ~ S Q L s u p p o r t f o r c e r t a i n o t h e r f u n c t i o n s
is e i t h e r d e f i c i e n t or l a c k i n g e n t i r e l y ~ as w e n o w e x p l a i n .

1. Consider the query

SELECT P.P#~ P.WEIGHT, AVG (SP.QTY)


FROM P~ SP
WHERE P.P# = SP.P#
GROUP BY P . P # ~ P . W E I G H T

The "P.WEIGHT" in t h e G R O U P BY c l a u s e is l o g i c a l l y redundant~


but must be included because SQL does not understand that
P.WEIGHT is single-valued per part number (perhaps only a
minor annoyance~ b u t it c o u l d b e p u z z l i n g t o t h e u s e r ) .

2. P r i m a r y k e y s u p p o r t is p r e r e q u i s i t e to foreign key support


(see t h e f o l l o w i n g subsection).

3. An understanding of p r i m a r y k e y s is r e q u i r e d in o r d e r to
support the updating of v i e w s c o r r e c t l y . SQL~s rules for the
updating of views are in f a c t disgracefully ad hoc. We
consider projection: restriction, and join views in turn
b e l o w . Further- d i s c u s s i o n of t h i s t o p i c c a n be f o u n d in [7].

3(a). A projection is l o g i c a l l y updatable if a n d o n l y if


it preserves t h e p r i m a r y k e y of t h e u n d e r l y i n g relation.
However, S Q L s u p p o r t s u p d a t e s , n o t on p r o j e c t i o n s p e r ~e,
but on w h a t m i g h t b e c a l l e d c_ol~=!_mn_ s u b s e t s -- w h e r e a
"column subset" is any s u b s e t of t h e columns of the

sql critique 47
underlying table for which duplicate elimination is not
requested (via D I S T I N C T ) -- w i t h a " u s e r b e w a r e " if that
subset does n o t in f a c t i n c l u d e t h e underlying primary
key. ( A c t u a l l y t h e s i t u a t i o n is e v e n w o r s e t h a n t h i s . E v e n
a c o l u m n s u b s e t is n o t u p d a t a b l e if t h e F R O M c l a u s e in t h e
d e f i n i t i o n of t h a t s u b s e t l i s t s m u l t i p l e t a b l e s . M o r e o v e r ~
updates are prohibited if duplicate elimination is
recluested ~ e v e n if t h a t r e q u e s t c a n h a v e n o e f f e c t b e c a u s e
the column subset does include the underlying primary
key.)

3(b). Any restriction is l o g i c a l l y u p d a t a b l e . S Q L h o w e v e r


does n o t p e r m i t s u c h u p d a t e s if d u p l i c a t e e l i m i n a t i o n is
requested (even t h o u g h such a r e q u e s t can h a v e no effect
if t h e u n d e r l y i n g t a b l e d o e s h a v e a p r i m a r y key)~ nor if
the FROM clause lists multiple tables. W h a t is m o r e , e v e n
w h e n it d o e s a l l o w u p d a t e s ~ S Q L d o e s n o t a l w a y s c h e c k t h a t
updated records satisfy the restriction predicate; hence,
an u p d a t e d (or i n s e r t e d ) r e c o r d m a y i n s t a n t a n e o u s l y vanish
from the view~ and moreover there are concomitant security
exposures (e.g.., a u s e r w h o is r e s t r i c t e d to accessing
employees with salary less than $40K may nevertheless
g.E_eatt~ a salary greater than that value via INSERT or
UF'DATE). [Note: The CHECK option, w h i c h is i n t e n d e d to
prevent such abuses, cannot always be specified.] Also;
the fact that SQL automatically supplies null values for
missing fields in inserted records means that it is
im_po%s.ib_le f o r such r e c o r d s t o s a t i s f y the restriction
p r e d i c a t e i n some c a s e s ( c o n s i d e r : f o r example: t h e view
" e m p l o y e e s i n d e p a r t m e n t D3", i f t h e v i e w does n o t i n c l u d e
t h e DEPT# f i e l d ) . However, these latter deficiencies are
n o t h i n g t o do w i t h SQL"s l a c k o f k n o w l e d g e o f p r i m a r y keys
p e r se.

3(c). A join of two t a b l e s on t h e i r primary keys is


logically u p d a t a b l e , So a l s o i s a j o i n o f one t a b l e on i t s
primary key t.o a n o t h e r on a m a t c h i n g f o r e i g n key (though
the details are not totally straightforward). However, SQL
does n o t a l l o w _a..Q2' j o i n t o be u p d a t e d .

* E~E~gn ~

Foreign keys p r o v i d e t h e p r i n c i p a l r e f e r e n c i n g mechanism within


the relational model. L o o s e l y s p e a k i n g . , a f o r e i g n k'ey i s a field
in one t a b l e whose v a l u e s a r e r e q u i r e d t o match v a l u e s of the
p r i m a r y key i n a n o t h e r t a b l e . For e x a m p l e , f i e l d DEPT# o f t h e EMP
table is a f o r e i g n key m a t c h i n g t h e p r i m a r y key (DEPT#) of the
DEPT t a b l e .

SQL does not c u r r e n t l y p r o v i d e any k i n d of support for the


f o r e i g n key c o n c e p t a t a l l . I r e g a r d l a c k o f s u c h s u p p o r t as t h e
major deficiency in relational s y s t e m s t o d a y (SQL i s certainly
not alone in this regard). Proposals for such support are
documented i n some d e t a i l in [7].

sql critique 48
SQL currently p r o v i d e s no s u p p o r t f o r d o m a i n s at all, except
i n a s m u c h as t h e f u n d a m e n t a l data types (INTEGER, FLOAT., e t c . ) c a n
b e r e g a r d e d as a v e r y p r i m i t i v e k i n d of d o m a i n .

A l i m i t e d f o r m of r e l a t i o n a s s i g n m e n t is s u p p o r t e d v i a I N S E R T ...
SELECT~ but that operation does not overwrite the previous
content of t h e t a r g e t t a b l e , a n d t h e s o u r c e of the assignment
c a n n o t b e an a r b i t r a r y algebraic expression (or S Q L e q u i v a l e n t ) .

, E~.p_.ii.~it ~.q.!.!~

We m e n t i o n e d earlier that explicit support for the (natural) join


operation was desirable. At t h a t p o i n t we w e r e t a c i t l y d i s c u s s i n g
t h e i n n e r or r e g u l a r n a t u r a l join. The observation is s t i l l m o r e
applicable to Q!,!t~£ join. Reference [4] s h o w s h o w a w k w a r d it is
to extend the circumlocutory SELECT-style j o i n to h a n d l e outer
joins. Thus., s u p p o r t f o r an e x p l i c i t J O I N o p e r a t o r is l i k e l y to
become even more desirable in t h e f u t u r e t h a n it is a l r e a d y .

These omissions are not particularly important (equivalent


SELECT-expressions e x i s t in e a c h c a s e ) ; however, symmetry would
suggest that, s i n c e U N I O N .is e x p l i c i t l y supported, INTERSECT and
DIFFERENCE o u g h t to b e e x p l i c i t l y supported too. Some problems
are most "naturally" formulated in terms of explicit
intersections and differences. On t h e o t h e r h a n d , as indicated
earlier, it is u s u a l l y n o t a g o o d i d e a to p r o v i d e a m u l t i p l i c i t y
of e q u i v a l e n t w a y s of f o r m u l a t i n g t h e s a m e p r o b l e m , u n l e s s it c a n
be guaranteed that the implementation will recognize the
equivalences a n d w i l l t r e a t all f o r m u l a t i o n s equally, which is
probably unlikely.

sql c r i t i q u e 49
10. SUMMARY AND CONCLUSIONS

This paper h a s d i s c u s s e d a l a r g e n u m b e r of d e f i c i e n c i e s in the


SQL language as currently defined~ in t h e h o p e that such a
discussion can serve as a step toward remedying those
deficiencies. In f a c t (as r e m a r k e d e a r l i e r ) ~ the ANS Database
Committee (X3H2) h a s a l r e a d y r e m e d i e d s o m e of t h e m in i t s "RDL"
proposal; a s e c o n d a r y o b j e c t i v e f o r t h e p r e s e n t p a p e r is t h u s t o
serve as a d o c u m e n t of j u s t i f i c a t i o n for the changes X3H2 has
already made.

Of c o u r s e r I r e a l i z e t h a t m a n y of t h e s h o r t c o m i n g s i d e n t i f i e d in
t h i s paper- w i l l v e r y l i k e l y b e d i s m i s s e d as a c a d e m i c ~ t r i v i a l ~ or
unimportant by many people~ especially as SQL is so clearly
superior to o l d e r l a n g u a g e s s u c h a s t h e D M L of DBTG. However~
experience shows that "academic" considerations have a nasty
h a b i t of b e c o m i n g h o r r i b l y p r a c t i c a l a f e w y e a r s f u r t h e r d o w n t h e
road. T h e m i s t a k e s w e m a k e n o w w i l l c o m e b a c k t o h a u n t u s in t h e
future. Indeed~ the language in i t s p r e s e n t f o r m is already
proving d i f f i c u l t t o e x t e n d in s o m e ( d e s i r a b l e ) w a y s b e c a u s e of
limitations in i t s c u r r e n t s t r u c t u r e . A v e r y t r i v i a l e x a m p l e is
provided b y t h e p r o b l e m s of a d d i n g s u p p o r t f o r c o m p o s i t e fields
(i.e.~ m i n o r s t r u c t u r e s ) .

In c o n c l u s i o n s let me repeat the point that many other database


languages suffer from similar shortcomings; S Q L is (as stated
before) certainly not the sole offender. But the fact remains
t h a t ~ if S Q L is a d o p t e d on a w i d e s c a l e in i t s p r e s e n t fortm~ t h e n
we will to some degree have missed the relational boat~ or at
least failed to capitalize t o t h e f u l l e s t p o s s i b l e e x t e n t on the
p o t e n t i a l of t h e r e l a t i o n a l model. That would be a pity~ because
w e h a d an o p p o r t u n i t y t o d o it r i g h t ~ a n d w i t h a l i t t l e e f f o r t w e
c o u l d h a v e d o n e so. T h e q u e s t i o n is w h e t h e r it is n o w t o o late. I
s i n c e r e l y h o p e not.

sql critique SO
ACKNOWLEDGMENTS

I a m g r a t e f u l to my f r i e n d s a n d c o l l e a g u e s Ted Codd, Phil Shaw,


and Sharon Weinberg f o r their- h e l p f u l c o m m e n t s and criticism.

sql critique 5!
REFERENCES

I. M . M . A s t r a h a n et al. " S y s t e m R: R e l a t i o n a l Approach to Database


Management." A C M T O D S i, No. 2 ( J u n e 1976).

2. E.F.Codd. "Extending the Database Relational Model to Capture


M o r e M e a n i n g . " A C M T O D S 4, No. 4 ( D e c e m b e r 1979).

3. C.O.Date. "Some Principles of Good Language Design." Submitted


to ACM SIGMOD Record.

4. C . J . D a t e . " T h e O u t e r J o i n . " P r o c . 2ncl I n t e r n a t i o n a l Conference


on Databases ( I C O D - 2 ), C a m b r i dge, Engl a n d (August-September
1983).

5. C.J.Date. "An Introduction to t h e U n i f i e d Database Language


(UDL). " Proc. 6th International Conference on Very Large Data
Bases, Montreal, Canada ( O c t o b e r 1980).

6. C.J. D a t e . "Nul I Values in Database Management" (invited


paper). Proc. 2nd British National Conference on Databases
(BNCOD-2), Bristol, England (July 1982).

7. C.J.Date. A Guide to DB2~ Addison-Wesley (to a p p e a r 1984).

8. O.N.Gray. Private communication.

9. J . N . G r a y et al. "Granularity of L o c k s in a L a r g e Shared Data


Base." F'roc. ist International Conference on V e r y Large Data
Bases, Framingham, Mass. ( S e p t e m b e r 1975).

I 0. X ~- H ~~ (American National Standards Database Committee) Draft


Proposed Relational Database Language. Document X.~U~._,
,~_~.,
o.~_ i
~.o~
( A u g u s t 1983).

sql critique B2
APPENDIX: SQL STRONGPOINTS

SQL is b a s e d on t h e r e l a t i o n a l model, a n d as s u c h s u p p o r t s the


s i m p l e t a b u l a r d a t a s t r u c t u r e of t h a t m o d e l . It d o e s n o t s u p p o r t
any user-visible links between tables.

_FjQM~E-F.LjI. Q~e_rat~r.~

SQL also supports (indirectly) all the operators of the


relational algebra, including in p a r t i c u l a r the operators SELECT
(i.e., R E S T R I C T ) , PROJECT, and (natural) JOIN (these are the ones
r e q u i r e d m o s t o f t e n in p r a c t i c e ) . E a c h of t h e s e o p e r a t o r s is v e r y
high-level, in t h e s e n s e t h a t it t r e a t s e n t i r e s e t s of r e c o r d s as
single operands.

It is v e r y e a s y to l e a r n e n o u g h of t h e S Q L l a n g u a g e t o "get on
the air" and start doing real, useful work; thus, the initial
l e a r n i n g p e r i o d is t y p i c a l l y v e r y s h o r t i n d e e d -- c e r t a i n l y hours
r a t h e r t h a n d a y s or w e e k s .

Users are insulated, to a greater degree than with earlier


languages, from the physical structure of t h e d a t a b a s e (physical
data independence). This fact means that: (a) Users can
concentrate on t h e l o g i c of t h e i r a p p l i c a t i o n without having to
concern themselves with irrelevant physical details; (b) the
physical structure of the database can be changed without
necessitating any corresponding reprogramming. Users are also
insulated to some extent from the logical structure of the
database (logical data independence); this means that users can
concentrate on j u s t t h a t p o r t i o n of t h e d a t a t h a t is of interest
to t h e m ( t h e y m a y n o t e v e n b e a w a r e of o t h e r p o r t i o n s ) , and it
also m e a n s t h a t s o m e l i m i t e d c h a n g e s c a n be m a d e to t h e logical
structure of the database without very much reprogramming
( p r o b a b l y n o t w i t h o u t any, h o w e v e r ) .

SQL imposes comparatively few artificial boundaries between


definition functions and manipulation functions. For example, the
creation of a v i e w (a d e f i n i t i o n function) involves essentially
the same SELECT operation as d o e s t h e f o r m u l a t i o n of a q u e r y (a
manipulation function). This uniformity, again, makes the
l a n g u a g e e a s i e r to l e a r n a n d use.

sql critique 53
SQL can be used both interactively (i.e., as a q u e r y language)
and embedded in a program (i.e., as a database programming
language). This property is d e s i r a b l e for several reasons. First,
it i m p r o v e s c o m m u n i c a t i o n : End-users and application programmers
are "speaking the same language" Second, it m a k e s p r o g r a m m e r s ,
as well as e n d - u s e r s , more productive -- t h e b e n e f i t s sketched
above (e.g., the provision of h i g h - l e v e l operators) apply to
programmers too. And third, the interactive interface provides a
very convenient programmer debugging facility; that is~
application programmers can take the SQL portions of their
program and debug them interactively at t h e t e r m i n a l .

Since the d a t a b a s e c a t a l o g is r e p r e s e n t e d just like any other


d a t a in t h e s y s t e m (i.e., as a c o l l e c t i o n of t a b l e s ) , it c a n be
interrogated by m e a n s of S Q L S E L E C T s t a t e m e n t s , just like any
other data in the system. U s e r s d o not have to learn two
languages, one for querying the dictionary (for t h e c a t a l o g is in
effect exactly that, a rudimentary, online, active dictionary),
and one for querying the database.

~!~QQ ~ o~timization

S Q L is c a p a b l e of e f f i c i e n t implementation, v i a t h e by n o w w e l l -
known compilation/optimization techniques pioneered in the IBM
prototype S y s t e m R. Moreover, t h e f a c t t h a t S Q L is c o m p i l e d , and
hence t h a t s y s t e m s s u c h as S y s t e m R a r e " e a r l y b i n d i n g " systems,
does not compromise the flexibility of t h o s e s y s t e m s . If a c h a n g e
is made to the database (such as t h e d r o p p i n g of an i n d e x ) that
invalidates an e x i s t i n g c o m p i l e d p r o g r a m , then that program --
or, more accurately, the SQL statements within that program --
will automatically be r e c o m p i led and rebound on the next
invocation. Thus the system can provide the flexibility of late
binding without incurring the interpretation overheads normally
associated with such systems.

sql critique 54

Vous aimerez peut-être aussi