Vous êtes sur la page 1sur 58

Guidelines and good practice

Guide for developing SQL


Page 1

GUIDELINES AND GOOD PRACTICE GUIDE

FOR DEVELOPING SQL

Page 1

1.0 Introduction

2.0 SQL Layout


2.1 General layout
2.2 SQL Keywords
2.3 Constants and variables.
2.4 Outer Joins
2.5 Table aliases.
2.6 Ordering of where clauses.

3.0 Database selects.


3.1 Ordering of the from clause.
3.2 Unintentionally disabling indexes.
3.3 Intentionally disabling indexes.
3.4 Investigate Rewriting the Query
3.5 Use of the Exists operator
3.6 Don't perform unnecessary joins.
3.7 Resource intensive operations
3.8 Use realistic test data.
3.9 Use of != operator
3.10 Use Oracle's trace facility.
3.11 Management of oracle cursors.
3.12 The 10,15,20 percent rule

4.0 Insert statements.

5.0 Database updates.

6.0 Optimising oracle queries.


6.1 Table access.
6.2 Indexes and Null.
6.3 Indexes and 'NOT=' predicates.
6.4 Group by and predicate clauses.
6.5 Multiple index queries.
6.6 When indexes cannot be merged
6.7 Suppression of indexes for performance
6.8 Concatenated indexes
6.9 Or optimisation
6.10 Non correlated sub queries
6.11 Correlated sub queries

Appendix A Query Paths Ranked in Order of Speed

Appendix B Tables used in the examples.

Appendix C Correlated updates and PL/SQL.

Appendix D Guidelines for outer join syntax

Page 2

Appendix E The ten commandments for fast queries.

Appendix F Oracle trace facility.


Page 3

1.0 Introduction

The purpose of this guide is to present standards


and guideline for the wms development team, which will
ensure that quality SQL is produced. The rules on SQL
layout are important, because, they improve the clarity
of the statements.

Developers should realise that, as in third


generation programming, the first method of writing a
program is not necessarily the best, the same is true
of fourth generation SQL queries.
Adherence to the SQL layout rules needs to be
enforced now, as this will allow the project to take
advantage of Oracle V7 SQL caching facilities. Version
7 of Oracle has an SQL cache, which will hold parsed
queries, Oracle determines whether a query is in the
cache by doing a case insensitive comparison

All code reviews, which require database access,


will have available, as part of the document set, sql
trace output formatted with tkprof. This output will
be formally reviewed, by the database team, as part of
the review process.

2.0 SQL Layout

2.1 General layout

SQL statements should be laid out so as to aid


readability and maintenance. Each statement should be
in a separate sql file.

1) Statements should be indented so that


individual lines are neatly aligned.

2) Each expression in the SELECT list should be


on a different line.

3) Every table in the FROM list should be on a


fresh line.

2.2 SQL Keywords

1) Oracle keywords should be entered in upper


case

2) The following keywords should be placed on a


new line :

SELECT
INTO
FROM
WHERE
AND/OR
GROUP BY
HAVING
CONNECT BY
FOR UPDATE OF
ORDER BY

3) All keywords within the same select should line


up with the previous keywords. Nested select
statements should be indented and lined up with
the second word on the previous line, e.g. :

SELECT sal,
Job,
ename,
dept
FROM emp
WHERE sal > any
(SELECT sal
FROM emp
WHERE deptno = 30)
ORDER BY sal;

4) Lines which don't start with a listed keyword


should be lined up with the second word on the
previous line, e.g. :

SELECT ename,
dept
FROM emp

2.3 Constants and variables.

1) Constants, program variables etc. should be


on the right hand side of a WHERE or HAVING
clause.

SELECT ename
FROM emp
WHERE empno = '1232'

Or

SELECT ename
FROM emp
WHERE empno = :1

Page 5

2.4 Outer Joins.

Columns requiring to be outer joined should


appear on the right hand side of a WHERE or HAVING
clause.

SELECT ename
FROM emp e,
dept d
WHERE e.empno = d.empno(+)

2.5 Table aliases.

Table aliases should be used in all queries


that have more than one table in the FROM clause.
The use of table aliases speeds up the parse phase
of an oracle query, by reducing the number of
recursive sql queries.

SELECT count(*)
FROM oe o,
oe_link l,
oe_link_name n
WHERE o.oe = l.oe
AND l.name = n.name

Notice the table aliases o,l,n and their


subsequent use in the where clause.

2.6 Ordering of where clauses.

As an aid to understanding, WHERE clauses


should be laid out with the join clauses first,
and the restriction clauses second.

3.0 Database selects.

3.1 Ordering of the FROM clause.

This rule is for when the oracle optimiser is


stuck for a good idea. The Oracle optimiser works
in the following manner. It looks at each of the
WHERE clauses and assigns the tables concerned a
number based on the type of predicate e.g. field =
'const' or field(+) = field. It then chooses the
table with the lowest score as the driving table.
But, and its a big one, if more than one table has
the same lowest score then it chooses the last
table in the FROM list to be the driving table.

Page 6

Page 7
See appendix A for a listing of the predicate
scoring system.

In this example there are indexes on the


tables as follows

Indexes :

unique on oe(id)
unique on oe_link(oe)
unique on oe_link_name(name)
non unique on oe(oe)

SELECT count(*)
FROM oe_link l,
oe_link_name n,
oe o
WHERE o.oe = l.oe
AND l.name = n.name

Time 621.21 secs

SELECT count(*)
FROM oe o,
oe_link l,
oe_link_name n
WHERE o.oe = l.oe
AND l.name = n.name

Time 197.05 secs

Notice the difference in query timings. The


only difference in the two queries is the order of
the tables in the FROM clause. Therefore the table
returning the least number of rows should be last
in the FROM list.

3.2 Unintentional disabling of indexes.

This is a very easy way to inadvertently


disable the use of the indexes.

On table oe, which is described in the


appendix. Column id has a datatype of number.

SELECT id,
oe
FROM oe
WHERE to_char(id) = 1232
Page 7

Time 97 secs.

whereas

SELECT id,
oe
FROM oe
WHERE id = 1232

Time .4 secs.

The reason for this is that oracle cannot use


an index if the column is modified in the where
clause. It is up to the application programmer to
ensure that, WHERE clause columns, aren't modified
in any way.

Note that the following query will also stop


the use of indexes.

SELECT id,
oe
FROM oe
WHERE id+1 = 1233

Dates in the where clause can also cause


problems. To select all the records entered on a
certain day the following three ideas come to
mind.

SELECT *
FROM oe
WHERE trunc(timestamp) = '26-MAR-91'

SELECT *
FROM oe
WHERE timestamp between '26-mar-91'
AND '27-mar-91'

SELECT *
FROM oe
WHERE timestamp >= to_date('26-mar-91:00:00',
'dd-mon-yy:hh24:mi')
AND timestamp < to_date('27-mar-91:00:00',
'dd-mon-yy:hh24:mi')
The first one when run takes 240 seconds to
produce a result.

The second one only takes 1.05 seconds to

Page 8

Page 9

run, however, it has a featurette. It will


include any records for midnight on the 27th of
march.

The last one only takes .5 of a second and


it doesn't retrieve the extra records for
midnight.

Examples along the lines of that shown below


are also considered to be column modifications.
Therefore concatenating the columns, stops the
indexes from being used.

example a) should be rewritten as in example


b)

a)

SELECT *
FROM job
WHERE db_id||job_no = 'AZ0100201'

b)

SELECT *
FROM job
WHERE db_id = 'AZ'
AND job_no = '0100201'

Developers need to be aware of the type


conversions that Oracle performs implicitly.
Oracle may choose to convert either the column or
the constant. If the column is chosen then the
indexes cannot be used. The conversion chosen is
defined in the following table.

Mixed Common Function


datatype Unit chosen
-------------- ----------- --------
Char with number number to_number
Char with rowid rowid to_rowid
Char with date date to_date

The following example shows how the


conversion table given above works in practice.

SELECT deptno
FROM dept

WHERE deptno = 1324

indexes

non unique index on deptno

The select would not use the index because


the column deptno would be to_numbered by the
kernel.

The following queries will use the indexes.

SELECT deptno
FROM dept
WHERE deptno = to_char(1324)

SELECT deptno
FROM dept
WHERE deptno = '1324'

3.3 Intentionally disabling indexes.

When it has been decided, that certain


indexes need to be disabled, for query
optimisation, the following column modifiers will
be used.

Datatype Inhibit expression


-------- ------------------

Char char||''
number number+0
date add_months(date,0)

The use of nvl(column,0) works with all


datatypes, however, it could be confusing in
queries which perform a lot of arithmetic
calculation. See example below.

SELECT deptno
FROM dept
WHERE nvl(deptno,0) = '1234'

3.4 Investigate Rewriting the Query

SQL is a very expressive language and there


are normally several ways of performing the same
query. Developers should investigate different
wordings of the same query, so as to identify the
optimal query.

This can be demonstrated by the following


example, which is about finding all the people who
don't have jobs to do. There are 99 people in the
system and 9900 jobs

There are indexes on name on both tables.

The first attempt

SELECT p.name
FROM people p,
job j
WHERE p.name = j.name(+)
AND j.name is null

fred the 27

1 record selected.

Time 51.40 secs.

Not a very good result this time.

The second attempt

SELECT name
FROM people
WHERE name not in
(SELECT name
FROM job)

fred the 27

1 record selected.

Time 6.11 secs


A much better attempt, there is nearly an
order of magnitude improvement. We might now be
inclined to rest on our laurels, make a cup of tea
and put our feet up, however, if we correlate the
sub query to get.

SELECT name
FROM people p
where not name in
(SELECT name
FROM job j
WHERE p.name = j.name)

fred the 27

1 record selected.

Time 1.08 secs.

3.5 Use of the Exists operator

One operator which seems to be ignored is the


EXISTS operator. This can be particularly useful
in forms work for validating foreign keys. In the
following example we wish to know whether 'fred
the 45' has any jobs.

The first example is

SELECT distinct 'x'


FROM job
WHERE name = 'fred the 45'

1 record selected.

Time 0.45 secs.

The second example uses the exists operator


and is almost twice as fast.

SELECT 'x'
FROM dual
WHERE exists
( SELECT 'x'
FROM job
where name = 'fred the 45')

1 record selected.

Time 0.26 secs.

The reason this is faster is that with the


exists operator the oracle kernel knows that once
it has found one match it can stop. It therefore
doesn't have to continue the FTS (a TLA (Three
Letter Acronym) for Full Table Scan).

Things start becoming interesting, now the


actual value of the data item being searched on
determines which query to use. Consider the people
table with 10,000 entries.

Enquiring about 'fred the 34' and 'fred the


9999' gives the following.

SELECT distinct 'x'


FROM job
WHERE name ='fred the 34'

Time 6.65 secs.

SELECT 'x'
FROM dual
WHERE exists
(SELECT 'x'
FROM job
WHERE name = 'fred the 34')

Time 0.28 secs.

SELECT 'x'
FROM dual
WHERE exists
(SELECT 'x'
FROM job
WHERE name = 'fred the 9999')

Time 8.28 secs.

Ok I cheated somewhat the data goes into the


table in name order. Thus 'fred the 1' goes in
first data block and 'fred the 9999' goes in last
one.
Developers should be aware that, the
efficiency of EXISTS and IN is dependant on the
amount of data in each table. A query with IN in
it drives from the subquery accessing the main
query for each row returned, when, a query uses
EXISTS it drives from the main query accessing the
subquery for each row returned. So if the subquery
returns few rows, but, the main query returns a
lot of rows for each row from the subquery use the
IN operator.

3.6 Don't perform unnecessary joins.

In some parts of our system, developers are


joining onto the sys_param table. This is bad
practice and must be stopped. The sys_params table
was designed to hold static system wide values.

The correct use of this table, is for the


application code to get the values it requires
once at the start of the code, or in the case of
regular update cycles, at the start of each such
cycle.

At the logical level, any runtime


modification of sys_params, probably indicates

Page 13

missing entities.

3.7 Resource intensive operations

Queries which use DISTINCT, UNION, MINUS,


INTERSECT, ORDER BY and GROUP BY call upon the
kernel to perform resource intensive sorts. A
DISTINCT requires one sort, the other set
operators require at least two sorts. Other ways
of writing these queries should be found. Most
queries that use the set operators, UNION,MINUS
and INTERSECT, can be rewritten in other ways.

3.8 Use realistic test data.

Realistic test data, which matches both in


volume and values, that shown in the ER diagrams
will be used.
The following is based on the example used in
3.4 about rewording queries. There are now 999
people in the table and 9990 jobs. Thus we have
increased our people by a factor of ten.

Notice how the query performed with an outer


join is much quicker than that performed by the
subquery. This is a complete reversal of the
result in 3.4.

This demonstrates quite clearly that the sql


developer must have a knowledge of both the
structure of the data and the typical number of
rows in each table.

SELECT p.name
FROM people p,
job j
WHERE p.name = j.name(+)
AND j.name is null

fred the 218

Time 23.20 secs.

SELECT name
FROM people
WHERE not name in
(SELECT name
FROM job)

fred the 218

Time 193.46 secs.

SELECT name
FROM people p
WHERE not name in
(SELECT name
FROM job j
WHERE p.name = j.name)

fred the 218

Time 8.66 secs.

3.9 Use of != operator

Unless it is absolutely necessary avoid using


the != operator. The use of this operator disables
the use of indexes, because the Oracle kernel
assumes that the query will be retrieving most of
the rows in the table.

3.10 Use Oracle's trace facility.

For every query, even the simplest, check the


output from oracle's trace facility. The first
step in optimising queries is to eliminate FTS's,
this can be done using trace. See APPENDIX F for
details on the trace facility.

3.11 Management of oracle cursors.

Where ever possible, oracle cursors should


be declared at the start of the program. It is
also good practice to tell the system the maximum
number of cursors you want, at the start of the
program. This can be done in DB by opening the
cursors you require, in Pro*c set MAXCURSORS.

There are two 'expensive' operations during


the execution of a query, apart from returning the
rows. These are associated with the 'parse' and
'bind' phase of query execution.

What is meant by the parse phase. This is the


time when the sql text that the programmer has
written is translated into an internal
representation, that the Oracle kernel can
understand. The following activities happen during
the parse phase.

a) data dictionary look up and security

checking. This activity causes the oracle kernel


to produce internal sql queries, which are called
recursive queries, to check such things as whether
this user is allowed access to this column etc.

b) query optimisation. Part of the query


optimisation which deals with views and sub
queries is performed.

Performing all of the above can take a


considerable time. Therefore they should be done
as infrequently as possible.

The binding phase takes place after the parse


phase. It is at this stage that the user variables
are decided. For instance in the query select 'x'
from person where name = :1 the host variable 1
would have its value 'bound' into the query during
binding by having the address of :1 associated
with the host variable.

Under normal circumstances, wms programs


should perform the following steps, for a query
that is the be executed more than once.

Step 1

During program start-up set up and parse the


queries with calls to sql_set-up.

Step 2

Execute the query with a call to sql_execute


supplying the necessary bind variables.

Step 3

Fetch and process the returned rows by using


sql_fetch.

To execute the query again, return to step 2


and supply new bind variables. For further
information see the document entitled
DB_user.doc.

3.12 The 10,15,20 percent rule

Only use indexes for retrieving data from


tables, where your want less than 15 percent of
the rows. FTS's are quicker if you want more than
15 percent of a tables data.

The rule is called the 10,15,20 percent rule,


because it depends on which oracle document you
read, as to whether its 10,15 or 20 percent.

4.0 Insert statements.

4.1 Use of * in insert statements

Although the use of the shorthand character


'*' is allowed by the oracle kernel, it is
considered to be bad programming practice. It is
not allowed in the WMS project.

The reason for it being bad practice, is that


if a table is modified during an upgrade to the
database, subsequent inserts could fail out in the
field.

5.0 Update Statements.

5.1 Correlated updates.

One of the slowest commands in SQL is the


UPDATE. This is largely due to the fact that most
correlated updates require a full table scan.
This results in very slow performance when the
table is extremely large.

The following update statement is typical of


correlated updates:

Update Target_Table
Set Target_Field = (Select Source_Information
From Source_Table
Where Source_Table.Key =
Target_Table.Key)
Where exists (Select 'x'
From Source_Table
Where Source_Table.Key =
Target_Table.Key)

Performance problems arise because there is


no method of eliminating rows in the Target_Table
based on information in the Source_Table. If
other conditions in the Update's Where clause do
not disqualify most of the rows in the
Target_Table, this update will require substantial
processing time.

The following PL/SQL code effectively uses an


index on the Key field to access only the
appropriate records in the Target_Table:

Declare
Cursor Source is
Select *
From Source_Table;
Begin
For Row in Source Loop
Update Target_Table
Set Target_Field = Row.Source_Information
Where Key = Row.Key;
End Loop;
Exception
When OTHERS Then
Null;
End;

This PL/SQL script loops through each of the


records in the Source_Table and updates the
appropriate row in the Target_Table, if any.
Essentially, this transfers the full table scan to
the Source_Table and allows the index on the
Target_Table to be used effectively.

Running a typical correlated update on an


8,000 row table to update 9 records required 19.4
CPU seconds. The same update using the PL/SQL
script executed in 1.12 CPU seconds -- a 94%
performance improvement.

The PL/SQL script will outperform the


correlated update whenever the Source_Table is
smaller than the Target_Table. The larger the
Target_Table compared to the Source_Table, the
more substantial the performance gain. With an
effective index, the size of the Target_Table is
no longer a factor in the time required to update
the table; the number of records being updated
determines the performance.

Replacing the EXISTS subquery with IN


subquery will give same improvement in most cases.

6.0 Optimising oracle queries.

6.1 Table access.

There are three ways that Oracle can find a


row in a table, these are :

1) Scan each row in the table checking


whether to select the row for output. This is
commonly known as a full table scan (FTS), it is
also the least efficient method of locating a row.
2) By using an index to locate the row.

3) By using the rowid to directly access the


row on disc. This is the most efficient method of
accessing a row. Do not use this method across
transactions as it is possible for a rowid to
change, use the primary key of the table in these
circumstances.

Unfortunately we do not normally know the


rowid, so the fastest method of access is by the
use of indexes.

The general rules are as follows:

Indexes can be used in the following


circumstances.

1) If the indexed column is mentioned in


the where clause.

2) If the indexed column is not modified


by a function or, an arithmetic operation, or both
of them. The following exception is applicable, a
MIN(column), MIN(column+constant) or MAX(column),
MAX(column+constant).

Indexes cannot be used in the following


circumstances.

1) there is no 'where' clause.

2) the indexed column is modified in any way.

3) the search is for 'null' or 'not null'

6.2 Indexes and Null.

If a record has no value (i.e. NULL) in the


column specified as the index key, then there will
not be an index entry for the record. For this
reason the following query will not use an index

on column COMM

SELECT *
FROM emp
WHERE comm is NULL

Oracle assumes that the majority of records


in a table will contain values for indexed
columns, because of this a FTS will be executed
for the following query.

SELECT *
FROM emp
WHERE comm is not NULL

A FTS may not be appropriate if the column


COMM is sparsely populated. There is an example of
how to alter the query to make use of the index.

SELECT *
FROM emp
WHERE comm > -0.01

6.3 Indexes and 'NOT=' predicates.

When a predicate contains a "not equal"


condition, Oracle will not consider an index,
however, Oracle will interpret other 'not'
predicates so as to use any indexes available e.g.

'not sal > 50' => 'sal <= 50'


'not sal <= 50' => 'sal > 50'

6.4 Group by and predicate clauses.

The performance of group by queries can be


improved by eliminating unwanted rows early in the
selection process. The following two queries
return the same data, however, the second is
potentially quicker, since rows will be eliminated
before the set operators are applied.

SELECT job,
avg(sal)
FROM emp
GROUP BY job
HAVING job = 'president'
OR job = 'manager'

SELECT job,
avg(sal)
FROM emp
WHERE job = 'president'
OR job = 'manager'
GROUP BY job

6.5 Multiple index queries.

When a query has two or more equality


predicate clauses, multiple indexes may be used.
Oracle will merge the indexes at run time,
returning rows that are in both indexes, for
example.

SELECT ename
FROM emp
WHERE deptno=20
AND job='manager'

Will merge the following indexes.

non unique index on job


non unique index on deptno

6.6 When indexes cannot be merged

Where a query can use both range and equality


predicates, as in the following example, oracle
cannot merge the indexes. The index on job will be
used to locate the 'manager' rows, these will then
be checked to ensure deptno is greater than 10.

SELECT *
FROM emp
WHERE job='manager'
AND deptno > 10

indexes:

non unique index on job


non unique index on deptno

When there is no clear preference as to which


index to use, see the following query, Oracle will
use only one index, because it is inefficient to
merge them. Note that a scan of the second table,
will have to be done for each row returned in the
first table.

Since both indexes are non unique, Oracle


will choose the index which, it encounters first
in the DC_INDEXES cache. In practice, as
developers cannot see the cache, the choice is
arbitrary.

SELECT ename
FROM emp
WHERE sal > 1
AND empno > 1

indexes :

non unique index on empno


non unique index on sal

6.7 Suppression of indexes for performance

When there is a choice between merging a


unique index with a non unique one, the kernel
will always use the unique index and avoid
performing the merge. The reasoning behind this is
that the unique index will return only one row.
See the following example.

SELECT ename
FROM emp
WHERE sal = 3000
AND empno = 7902

indexes :

unique index on empno


non unique index on sal

Only the empno index will be used, if a


record is found then the row will be checked to
see if the sal = 3000.

Oracle will only use 5 indexes in a single


sql statement. After 5 indexes have been merged,
the kernel will check those rows returned for the
values in the remaining predicates.

SELECT *
FROM emp
WHERE empno =7844
AND job = 'salesman'
AND deptno = 30
AND sal = 1500
AND comm = 0
AND ename = 'turner'
indexes :

non unique index on empno


non unique index on job
non unique index on deptno
non unique index on sal
non unique index on comm
non unique index on ename

Because all the predicates score equally,


according to APPENDIX A, only five of the above
indexes will be used. Rows that are returned will
be checked by the kernel to see if the last value
is correct.

6.8 Concatenated indexes

Concatenated indexes are indexes that


reference more than one column. The index will be
available for scoring, provided that the first
part of the index is present and usable.

In the following examples assume a


concatenated index on job and deptno.

SELECT *
FROM emp
WHERE job = 'president'
AND deptno = 10

The full index will be used.

SELECT *
FROM emp
WHERE deptno = 10
AND job = 'president'

The full index will be used.

SELECT *
FROM emp
WHERE deptno = 10

The index will not be used, the leading part


of the index is missing from the statement.

SELECT *
FROM emp
WHERE job = 'analyst'

The leading part of the index will be used,


developers should check the selectivity of this

part of the index, it may not be selective, and


needs to be suppressed.

SELECT *
FROM emp
WHERE job != 'clerk'
AND deptno = 10

The index cannot be used because the !=


prevents the use of the leading part of the index.

6.9 Or optimisation

Unlike the 'AND' operator which requires that


a row passes both predicates, the 'OR' operator
requires a row to pass either of the predicates.
See the following example

SELECT ename,
sal,
job
FROM emp
WHERE sal = 3000
OR job = 'clerk'

index :

non unique on job

If the job index is used to identify the rows,


then it would only return those employees who are
'clerks', it would not return those employees
where the sal is 3000 and job is not 'clerk'.

A single concatenated index on job, sal or


sal,job cannot be used because records with values
for the second key, must be found regardless of the
value of the first key.

If we have an index on sal,job, then with the


index on job the kernel can optimise the query. A
single column index on sal could also be used.

If there are indexes available for both the


predicates that are or'd, then the query will be
processed conceptually, as the union of two select
statements. This is shown in the example below.
SELECT ename,
sal,
job
FROM emp

WHERE sal = 3000


OR job = 'clerk'

indexes :

non unique on job


non unique on sal

Becomes :

SELECT ename,
sal,
job
FROM emp
WHERE job = 'clerk'

UNION

SELECT ename,
sal,
job
FROM emp
WHERE sal = 3000
AND job != 'clerk'

Notice that the kernel has used the last


predicate in the or clause for the single
condition query, and it has used the same
predicate to form the "!=" condition in the
double condition query. Notice that if we rewrite
the query as

SELECT ename,
sal,
job
FROM emp
WHERE job = 'clerk'
OR sal = 3000

indexes :

non unique on job


non unique on sal

Becomes :
SELECT ename,
sal,
job

FROM emp
WHERE sal = 3000

UNION

SELECT ename,
sal,
job
FROM emp
WHERE job = 'clerk'
AND sal != 3000

From this, we can see that it is best to


place the predicate associated with the most
selective index first in the where clause, and the
least selective last. This minimises the number of
checks for '!='.

These 'or' optimisations cannot take place


when the sql query contains a connect by, or an
outer join.

6.10 Non correlated sub queries

There are two cases to consider here, firstly


queries which use the IN operator, and secondly,
queries which use the NOT IN operator. We will
deal with the IN operator first.

The following rules apply to optimising the


query.

1) The main and subqueries are optimised


separately.

2) The same rules for optimisation apply to


the main and subqueries, for example, in the query
below there is no optimisation of the subquery
because there is no where clause.

3) The driving table is the table from the


subquery. In the example below this will be job.

4) The subquery is transformed into a join


by the following method. The rows frturned from
the subquery are sorted and duplicates removed. A
full table scan of these sorted rows is used for
access, finally the table in the main query is
joined to the sorted rows from the subquery using
column in the main query's where clause.

SELECT distinct name


FROM men
WHERE pin in
(SELECT pin
FROM job)

indexes :

gives the following explain plan.

SORT(UNIQUE)
MERGE JOIN
TABLE ACCESS (FULL) OF 'MEN'
SORT(JOIN)
TABLE ACCESS (FULL) OF 'JOB'

Execute time 4759

Note that the SORT(UNIQUE) is caused by the


distinct.

If we now put a dummy where clause on the sub


query we get

SELECT distinct name


FROM men
WHERE pin in
(SELECT pin
FROM job )

indexes :

unique on job(jobno)
non unique on job(pin)

gives the following explain plan

SORT(UNIQUE)
NESTED LOOPS
TABLE ACCESS (FULL) OF 'MEN'
INDEX(RANGE SCAN) OF 'JOB_1' (NON_UNIQUE)

Note that an index has been used on job. This


is the one case where an index can be used even
though there is no where clause on the subquery.
The sql is executed conceptually as, SELECT
distinct pin FROM job ORDER BY pin, these sorted
rows are then joined with the rows from men using
the index job(pin).

The use of NOT IN is not recommended,


developers should consider rewording queries using
this construct to use outer joins, however, it
should be noted that outer joins are an oracle
extension, and are not available on other RDBMS's.
See below for example.

SELECT *
FROM dept
WHERE deptno not in
(SELECT deptno
FROM emp)

This can be rewritten as the following outer


join.

SELECT d.*
FROM dept d,emp e
WHERE d.deptno = e.deptno(+)
AND e.rowid is NULL

This relies on the fact that each row has a


unique rowid, and that, rowid is never NULL. Note
also that any NOT IN is equivelent to a NOT
EXISTS. Therefoe any NOT EXISTS can be transformed
into an outer join.

6.11 Correlated sub queries

All Correlated subqueries follow the same


execution path. The main and the subqueries are
separately optimised. The driving table in
selected from the main query. For each row
returned in the main query, the subquery is
executed. Indexes can be used for the where
clauses on both the main and subquery.
APPENDIX A

Query Paths Ranked in Order of Speed

The lower the rank, the faster the path.


Rank Path
====== ==================================

1 ROWID = constant
2 Unique indexed column = constant
3 entire unique concatenated index =
constant
4 entire cluster key = corresponding
cluster key in another table in
the same cluster
5 entire cluster key = constant
6 entire non-unique concatenated
index = constant
7 non-unique single column index
merge
8 most leading concatenated index =
constant
9 indexed column BETWEEN low value
AND high value, or indexed
column LIKE 'C%' (bounded range)
10 sort/merge (joins only)
11 MAX or MIN of single indexed
column
12 ORDER BY entire index
13 full table scans
14 unindexed column = constant, or
column IS NULL, or column LIKE
'%C%' (full table scan)
APPENDIX B

Table Structure for demos

table oe (id number(6,0),


oe number(1,0),
timestamp(date))

The field id holds the unique reference number starting


at 1 the field oe holds a 0 if the id is even and a 1
if it is odd

unique index on id
nonunique index on oe

there are 100,000 rows in oe

table oe_link (oe number(1),name char(10)

with 2 records

table oe_link_name (name char(10),age number(3))

With 1 record

people (name, sex)

job(job_id,name)
APPENDIX C

Update Performance RDBMS


John R. Pack VAX/VMS
July 17, 1990 V6.0.30.4
(Revised 19-Sep-90)

Using PL/SQL to Enhance


Update Performance

One of the slowest commands in SQL is the UPDATE.


Most often, when a client complains about performance,
a large, correlated update (or,worse, an uncorrelated
update) is at the root of the problem.Often, even after
creating the optimal indexes and fine tuning the SQL
statement itself, the update is still hogging enormous
CPU resources and is still the bottleneck in the user's
production scheme.

The Correlated Update

This is largely due to the fact that most


correlated updates require a full table scan. This
results in very slow performance when the table is
extremely large.

The following update statement is typical of


correlated updates:

Update Target_Table
Set Target_Field = (Select Source_Information
From Source_Table
Where Source_Table.Key =
Target_Table.Key)
Where exists (Select 'x'
From Source_Table
Where Source_Table.Key =
Target_Table.Key)

Performance problems arise because there is no


method of eliminating rows in the Target_Table based on
information in the Source_Table. If other conditions
in the Update's Where clause do not disqualify most of
the rows in the Target_Table, this update will require
substantial processing time.
In addition, this type of query suffers because an
index would not help the Target_Table access time. In
fact, the only index which will improve the
performance of the preceding update is an index on the
Source_Table Key field. If the Source_Table is large,
this is very useful.

Nevertheless, the best method of updating the


table would be able to access just the correct rows in
the Target_Table using an effective index. This method
is now available using PL/SQL and an index on the
Target_Table Key field.

The PL/SQL Update

The following PL/SQL code effectively uses an


index on the Key field to access only the appropriate
records in the Target_Table:

Declare
Cursor Source is
Select *
From Source_Table;
Begin
For Row in Source Loop
Update Target_Table
Set Target_Field = Row.Source_Information
Where Key = Row.Key;
End Loop;
Exception
When OTHERS Then
Null;
End;

This PL/SQL script loops through each of the


records in the Source_Table and updates the appropriate
row in the Target_Table, if any. Essentially, this
transfers the full table scan to the Source_Table and
allows the index on the Target_Table to be used
effectively.

Performance Gain

Running a typical correlated update on an 8,000 row


table to update 9 records required 19.4 CPU seconds.
The same update using the PL/SQL script executed in
1.12 CPU seconds -- a 94% performance improvement.

The PL/SQL script will outperform the correlated


update whenever the Source_Table is smaller than the
Target_Table. The larger the Target_Table compared to
the Source_Table, the more substantial the performance
gain. With an effective index, the size of the
Target_Table is no longer a factor in the time required
to update the table; the number of records being
updated determines the performance.

Replacing the EXISTS subquery with IN subquery


will give same improvement in most cases.
APPENDIX D

GUIDELINES FOR USING THE OUTER JOIN SYNTAX

Guidelines for Using the Outer Join Syntax RDBMS


RDBMS Support V6.0
October 1990

1 INTRODUCTION

The purpose of this technical note is to


provide some guidelines on how to use the
outer join facility provided by ORACLE.

1.1 Outer Join Semantics - Definitions

The following terms, used to describe the


operation on outer joins, are defined :-

'outer-join column' - a column reference followed by


the symbol (+), e.g. EMPNO(+)
and DEPT.DEPTNO(+) are outer
join columns

'simple predicate' - a logical expression containing


no AND's, OR's, or NOT's (
usually a simple relation such
as A = B )

'outer join predicate' - a simple predicate containing


one or more outer join columns.

2 OUTER JOIN SYNTAX - RULES

An outer join predicate may only


contain outer join columns from one table ( in other
words, all outer join columns in a single outer join
predicate must belong to the same table). This
means, for example, that the following statement is
illegal :-

EMP.EMPNO(+) = DEPT.DEPTNO(+) - outer join columns from


two tables
Also, if a column in a predicate is
an outer join column, then all columns from
the same table must be outer join columns in that
predicate. This means, for example, that the
following statement is illegal :-

EMP.SAL + EMP.COMM(+) = SALGRADE.HIGH - mixed columns from


one table

In a predicate, the table referenced with a


(+) is directly 'outer joined' to all other tables
in the predicate. It is indirectly 'outer joined' to
any tables to which these other tables are
themselves 'outer joined'. A predicate may not be
directly or indirectly 'outer joined' to itself. This
means, for example, that the following combination
of predicates is illegal :-

EMP.EMPNO(+) = PERS.EMPNO
AND PERS.DEPTNO(+) = DEPT.DEPTNO
AND DEPT.JOB(+) = EMP.JOB - circular outer
join relationship

3 OUTER JOIN EXECUTION

For a given table, T, there may be both outer join and


non-outer join predicates. Execution occurs ( conceptually ) as
follows :-

1. The result of joining all tables mentioned in table T's


outer join predicates is formed ( by recursive application
of this algorithm ).

2. For each row of the result, a set of composite rows is


formed, each consisting of the original row in the
result joined to a row in table T for which the composite
row satisfies all of table T's outer join predicates.

3. If a set of composite rows is the null set, a composite


row is created consisting of the original row in the
result joined to a row similar to those in table T, but
with all values set to null.

4. Rows that do not pass the non-outer join predicates


are removed.

This may be summarised as follows. Outer join


predicates ( those with (+) after a column of table T ), are
evaluated BEFORE table T is augmented with a null row. The null
row is added only if there are NO rows in table T that satisfy
the outer join predicates. Non-outer join predicates are
evaluated AFTER table T is augmented with a null row (if needed)

4 OUTER JOIN - RECOMMENDATIONS

Certain types of outer joins in complicated logical


expressions may not be well formed. In general, outer join
columns in predicates that are branches of an OR should be
avoided. Inconsistencies between the branches of the
OR can
result in an ambiguous query, and this may not be detected. It
is best to confine outer join columns to the top level of the
'where' clause, or to nested AND's only.

5 OUTER JOIN - ILLUSTRATIVE EXAMPLES

5.1 Simple Outer Join

SELECT ENAME, LOC


FROM DEPT, EMP
WHERE DEPT.DEPTNO = EMP.DEPTNO(+)

The predicate is evaluated BEFORE null augmentation. If


there is a DEPT row for which there are no EMP rows, then a null
EMP row is concatenated to the DEPT row.

5.2 Outer Join With Simple Post-Join Predicates

SELECT ENAME, LOC


FROM DEPT, EMP
WHERE DEPT.DEPTNO = EMP.DEPTNO(+)
AND EMP.DEPTNO IS NULL

The second simple predicate is evaluated AFTER


null
augmentation, since there is no (+), removing rows which were
not the result of null augmentation and hence leaving only DEPT
rows for which there was no corresponding EMP row.

5.3 Outer Join With Additional Pre-Join Predicates

SELECT ENAME, LOC


FROM DEPT, EMP
WHERE DEPT.DEPTNO = EMP.DEPTNO(+)
AND 'CLERK' = EMP.JOB(+)
AND EMP.DEPTNO IS NULL
The predicate on EMP.JOB is evaluated at the same time
as the one on EMP.DEPTNO - before null augmentation. As a
result, a null row is augmented to any DEPT row for which there
are no corresponding clerks's in the EMP table. Therefore, this
query displays departments containing no clerks.

Note that it the (+) were omitted from the EMP.JOB


predicate, no rows would be returned. In this case, both the
EMP.JOB and EMP.DEPTNO IS NULL predicates are evaluated
AFTER
the outer join, and there can be no rows for which both are
true.

APPENDIX E
The Ten commandments for fast Queries

THOU SHALL ACCIDENTLY NOT DISABLE THE USE OF INDEXES BY


MODIFYING A COLUMN WITHIN THE WHERE CLAUSE

THOU SHALL PUT THE TABLE THAT RETURNS THE FEWEST


ROWS LAST IN THE FROM LIST OF THE QUERY.

THOU WILL INVESTIGATE REWORDING THY QUERY IN


SOME OTHER WAY.

THOU SHALL USE THE EXISTS OPERATOR WHENEVER POSSIBLE.

THOU SHALL NOT ASK THE KERNEL FOR MORE THAN THOU
WANTEST.

THOU SHALL NOT FORCE REPARSING OF A QUERY WHEN


ONLY CHANGING BIND VARIABLES

THOU SHALL KNOW THE DATA VOLUMETRICS AND USE


REALISTIC TEST DATA.

THOU SHALL USE TABLE ALIASES IN THY QUERIES

THOU SHALL NOT USE THE != UNLESS ABSOLUTELY NECESSARY

THOU SHALL USE THE TRACE FACILITY TO MONITOR


YOUR QUERIES

APPENDIX F

EXPLAIN Facility

MOTIVATION
There is a need for users to be able to determine the steps the system
takes in performing various operations on a database. The EXPLAIN facility
provides users with a convenient mechanism for getting this information.
The facility stores this information in a standard database table that can
be manipulated with standard SQL statements.

SYNTAX

The following syntax is based on the syntax used by DB2 for their
EXPLAIN facility:

EXPLAIN PLAN [SET STATEMENT_ID [=] <string literal>]


[INTO <table_name>]
FOR <sql_statement>

where

STATEMENT_ID - a unique optional identifier for the statement;

INTO - allows user to save the results of the analysis in


the specified table. The table must conform to the format
for the table used to store the analysis ( see TABLE FORMATS
section for a description of the table format ). If this
clause is not specified, the system will then attempt to store
the information in a table named <user_id>.PLAN_TABLE . If
the explicit or implicit table does not exist the EXPLAIN
command will fail.

<sql statement> - an insert, delete, update, or query statement;

TABLE FORMATS

Core Table Format

The core table used to represent the plan information consists of the
following fields:

STATEMENT_ID - An identifier associated with the statement. If not set by


the user, the identifier will be NULL. Note that a user may
identify a statement by the timestamp field.

TIMESTAMP - The date and time when the statement was analysed.

REMARKS - Any comment the user wishes to associate with this step of
the analysis.
OPERATION - the name of the operation being performed. The following table
provides a listing of the operations described by the facility.

Operation Description
---------------------------------------------------------------
And-Equal A retrieval utilising intersection of
rowids from index searches
Connect by A retrieval that is based on a tree walk
Concatenation A retrieval from a group of tables. It is
essentially a UNION ALL operation of the
sources. Used for OR operations.
Counting A node that is used to count the number of
rows returned from a table. Used for queries
that use the ROWNUM meta-column.
Filter A restriction of the rows returned from a table
First Row A retrieval of only the first row
For Update A retrieval that is used for updating
Index A retrieval from an index
Intersection A retrieval of rows common to two tables
Merge Join A join utilising merge scans
Minus A retrieval of rows in Source 1 table but not in
Source 2 table
Nested Loops A join utilising nested loops. Each
value in the
first subnode is looked up in the second subnode.
This is often used when one table in a join is
indexed and the other is not.
Project A retrieval of a subset of columns from a table
Remote A retrieval from a database other than the current
database
Sequence An operation involving a sequence table
Sort A retrieval of rows ordered on some column or group
of columns
Table A retrieval from a base table
Union A retrieval of unique rows from two tables
View A retrieval from a virtual table
-------------------------------------------------------------------

Note that the operation shown when counting the number of rows
returned by a query (i.e. select count(*)) is SORT. This is due to
the way that COUNT is implemented internally. The table will not
really be sorted.

OPTIONS - an option that modifies the operation, e.g.,


OUTER option on join operations, rationale for sorting, type
of index scan, type of filter, etc. The following table

provides a list of the options for the operations that have


options.

OPERATION OPTIONS DESCRIPTION


---------------------------------------------------------------------
Index UNIQUE KEY Unique key lookup on index
RANGE Index range scan
Merge Join OUTER Join is an outer join
Nested Loops OUTER Join is an outer join
Sort DISTINCT Sort is to produce distinct values
GROUP BY Sort is for grouping operation
JOIN Sort is for merge join
ORDER BY Sort is for order by
Table BY ROWID Table scan is by rowid
FULL Sequential table scan
CLUSTER Table scan by cluster key
---------------------------------------------------------------------

OBJECT_NODE - the name of the node that owns the database object.

OBJECT_OWNER - the name of the schema the owns the database object.

OBJECT_NAME - the name of the database object.

OBJECT_TYPE - a modifier that provides descriptive information about


the database object, e.g., NON-UNIQUE for indexes, etc.

OBJECT_INSTANCE - a number corresponding to the ordinal position of the


object as it appears in the original query. The numbering proceeds
from left to right, outer to inner with respect to the original
query text. Note that at this level, view expansion will result in
rather interesting object instance numbers. We will be addressing this
issue fully in future releases.

SEARCH_COLUMNS - the number of leading columns used when searching an


index.

ID - a number assigned to this operation in the tree.


Corresponds to a preorder traversal of the row source tree.

PARENT_ID - the number assigned to the previous operation that


receives information from this operation. This field
combined with the
ID field allows users to do a treewalk of the specified
plan with the CONNECT BY statement.

POSITION - the position this database object occupies for the previous
operation.

OTHER - other information that is specific to the row source that a user
may find useful. For example, the select statement to a remote node, etc.

Sample Table Definition


create table PLAN_TABLE (
statement_id char(30),
timestamp date,
remarks char(80),
operation char(30),
options char(30),
object_node char(30),
object_owner char(30),
object_name char(30),
object_instance numeric,
object_type char(30),
search_columns numeric,
id numeric,
parent_id numeric,
position numeric,
other long);

An SQL script to create this table resides in file xplainpl.sql in the same
directory containing the file catalog.sql. This table must reside in the
current schema unless you use the optional INTO clause of the EXPLAIN
command.

EXAMPLES

Suppose we issue the following statements:

EXPLAIN PLAN
SET STATEMENT_ID = 'query1'
INTO QUERY_PLANS
FOR SELECT * FROM T1,T2,T3 WHERE T1.F1 = T2.F1 AND T2.F2 = T3.F2;

SELECT OPERATION, OPTIONS, OBJECT_NAME, ID,


PARENT_ID, POSITION
FROM QUERY_PLANS
WHERE STATEMENT_ID = 'query1'
ORDER BY ID;

The following output would be created:

OPERATION OPTIONS OBJECT_NAME ID PARENT_ID POSITION


--------------------------------------------------------------------------
MERGE JOIN 1
MERGE JOIN 2 1 1
SORT JOIN 3 2 1
TABLE ACCESS FULL T1 4 3 1
SORT JOIN 5 2 2
TABLE ACCESS FULL T2 6 5 1
SORT JOIN 7 1 1
TABLE ACCESS FULL T3 8 7 1

8 RECORDS selected

Suppose that an index is created on field F1 on table T1 and the


following statements are issued:

EXPLAIN PLAN
SET STATEMENT_ID = 'query2'
INTO QUERY_PLANS
FOR SELECT * FROM T1 WHERE F1 > 1;

SELECT OPERATION, OPTIONS, OBJECT_NAME, OBJECT_TYPE, SEARCH_COLUMNS,


ID, PREVIOUS_ID
FROM QUERY_PLANS WHERE
STATEMENT_ID='query2'
ORDER BY ID;

The following output is produced:

OPERATION OPTIONS OBJECT_NAME OBJECT_TYPE SEARCH_COLUMNS ID


POSITION
-----------------------------------------------------------------------------
TABLE SCAN BY ROWID T1 1
INDEX SCAN RANGE IT1 NON-UNIQUE 1 2 1

2 RECORDS selected

With the same conditions as before, suppose we issue the following


which demonstrates an index only retrieval:

EXPLAIN PLAN
SET STATEMENT_ID = 'query3'
INTO QUERY_PLANS
FOR SELECT F1 FROM T1 WHERE F1 > 1;

SELECT OPERATION, OPTIONS, OBJECT_NAME, OBJECT_TYPE, SEARCH_COLUMNS,


ID
FROM QUERY_PLANS WHERE
STATEMENT_ID='query3';

The following output is produced:

OPERATION OPTIONS OBJECT_NAME OBJECT_TYPE SEARCH_COLUMNS ID


--------------------------------------------------------------------------
INDEX SCAN RANGE IT1 NON-UNIQUE 1

1 RECORDS selected

The next example illustrates the output if a grouping operation is


specified in the statement:

EXPLAIN PLAN
SET STATEMENT_ID = 'query4'
INTO QUERY_PLANS
FOR SELECT AVG(F1),F2 FROM T1 GROUP BY F2;

SELECT OPERATION, OPTIONS, OBJECT_NAME, ID,


PARENT_ID
FROM QUERY_PLANS WHERE STATEMENT_ID = 'query4'
ORDER BY ID;

OPERATION OPTIONS OBJECT_NAME ID PARENT_ID


---------------------------------------------------------------------
SORT GROUP BY 1
TABLE SCAN FULL T1 2 1

2 RECORDS selected

The next example illustrates the ouptut if DISTINCT is specified in the


statement:

EXPLAIN PLAN
SET STATEMENT_ID = 'query5'
INTO QUERY_PLANS
FOR SELECT DISTINCT F1 FROM T1;

SELECT OPERATION, OPTIONS, OBJECT_NAME, ID,


PREVIOUS_ID
FROM QUERY_PLANS WHERE STATEMENT_ID = 'query5'
ORDER BY ID;

OPERATION OPTIONS OBJECT_NAME ID PREVIOUS_ID


--------------------------------------------------------------
SORT DISTINCT 1
TABLE SCAN FULL T1 2 1

2 RECORDS selected

The next example illustrates the output if a subquery is specified in


the statement:

EXPLAIN PLAN
SET STATEMENT_ID = 'query6'
INTO QUERY_PLANS
FOR SELECT * FROM T1 WHERE F1 < ( SELECT F2 FROM T2 WHERE F1=3);

SELECT OPERATION, OPTIONS, OBJECT_NAME, ID,


PARENT_ID, POSITION
FROM QUERY_PLANS WHERE STATEMENT_ID = 'query6'
ORDER BY ID;
OPERATION OPTIONS OBJECT_NAME ID PARENT_ID POSITION
-------------------------------------------------------------------------
FILTER OUT 1
TABLE SCAN FULL T1 2 1 1
TABLE SCAN FULL T2 3 1 2

3 RECORDS selected

The final example displays a complex query whose output is sent to the
default plan table. ( It is assumed that this table has been created before
issuing the statement.)

EXPLAIN PLAN
SET STATEMENT_ID = 'query7'
FOR SELECT * FROM T1,T2 WHERE T1.F1 = T2.F1 UNION
SELECT * FROM T2,T3 WHERE T2.F1 = T3.F1;

SELECT OPERATION, OPTIONS, OBJECT_NAME, ID, PARENT_ID, POSITION


FROM PLAN_TABLE WHERE STATEMENT_ID = 'query7'
ORDER BY ID;

The following output is produced:

OPERATION OPTIONS OBJECT_NAME ID PARENT_ID POSITION


--------------------------------------------------------------------------
PROJECTION 1
UNION 2 1 1
SORT DISTINCT 3 2 1
NEST LOOP 4 3 1
TABLE SCAN BY ROWID T1 5 4 1
INDEX SCAN RANGE IT1 6 5 1
TABLE SCAN FULL T2 7 4 2
SORT DISTINCT 8 2 2
MERGE JOIN 9 8 1
SORT JOIN 10 9 1
TABLE SCAN FULL T2 11 10 1
SORT JOIN 12 9 2
TABLE SCAN FULL T3 13 12 1

13 RECORDS selected

The following example is based on the previous query. It illustrates the


use of the treewalking capability in Oracle's version of SQL.

SELECT LPAD(' ',2*LEVEL)||OPERATION,OPTIONS, OBJECT_NAME


FROM PLAN_TABLE WHERE STATEMENT_ID='query7'
CONNECT BY PRIOR ID = PARENT_ID
and STATEMENT_ID = 'query7'
START WITH ID = 1
and STATEMENT_ID = 'query7'
ORDER BY ID;
LPAD(' ',2*LEVEL)||OPERATION
------------------------------------------------------------------------------
OPTIONS OBJECT_NAME
-------------------------------------
PROJECTION

UNION

SORT
DISTINCT
NEST LOOP

TABLE SCAN
BY ROWID T1
INDEX SCAN
RANGE IT1
TABLE SCAN
FULL T2
SORT
DISTINCT
MERGE JOIN

SORT
JOIN
TABLE SCAN
FULL T2
SORT
JOIN
TABLE SCAN
FULL T3

13 RECORDS selected
Oracle SQL FAQ
$Date: 15-Jul-2003 $
$Revision: 1.90 $
$Author: Frank Naudé $

Topics
What is SQL and where does it come from?
What are the difference between DDL, DML and DCL commands?
How does one escape characters when building SQL queries?
How does one eliminate duplicate rows from a table?
How does one generate primary key values for a table?
How does one get the time difference between two date columns?
How does one add a day/hour/minute/second to a date value?
How does one count different data values in a column?
How does one count/sum RANGES of data values in a column?
Can one retrieve only the Nth row from a table?
Can one retrieve only rows X to Y from a table?
How does one select EVERY Nth row from a table?
How does one select the TOP N rows from a table?
How does one code a tree-structured query?
How does one code a matrix report in SQL?
How does one implement IF-THEN-ELSE in a select statement?
How can one dump/ examine the exact content of a database column?
Can one drop a column from a table?
Can one rename a column in a table?
How can I change my Oracle password?
How does one find the next value of a sequence?
Workaround for snapshots on tables with LONG columns
Where can one get more info about SQL?
Back to Oracle FAQ Index

What is SQL and where does it come from?
Structured Query Language (SQL) is a language that provides an interface to relational database systems.
SQL was developed by IBM in the 1970s for use in System R, and is a de facto standard, as well as an ISO
and ANSI standard. SQL is often pronounced SEQUEL.
In common usage SQL also encompasses DML (Data Manipulation Language), for
INSERTs, UPDATEs, DELETEs and DDL (Data Definition Language), used for creating
and modifying tables and other database structures.
The development of SQL is governed by standards. A major revision to the SQL
standard was completed in 1992, called SQL2. SQL3 support object extensions and
are (partially?) implemented in Oracle8 and 9.
Back to top of file

What are the difference between DDL, DML and DCL 
commands?
DDL is Data Definition Language statements. Some examples:

• CREATE - to create objects in the database


• ALTER - alters the structure of the database
• DROP - delete objects from the database
• TRUNCATE - remove all records from a table, including all spaces allocated for
the records are removed
• COMMENT - add comments to the data dictionary
• GRANT - gives user's access privileges to database
• REVOKE - withdraw access privileges given with the GRANT command

DML is Data Manipulation Language statements. Some examples:

• SELECT - retrieve data from the a database


• INSERT - insert data into a table
• UPDATE - updates existing data within a table
• DELETE - deletes all records from a table, the space for the records remain
• CALL - call a PL/SQL or Java subprogram
• EXPLAIN PLAN - explain access path to data
• LOCK TABLE - control concurrency

DCL is Data Control Language statements. Some examples:

• COMMIT - save work done


• SAVEPOINT - identify a point in a transaction to which you can later roll back
• ROLLBACK - restore database to original since the last COMMIT
• SET TRANSACTION - Change transaction options like what rollback segment to use

Back to top of file

How does one escape special characters when building 
SQL queries?
The LIKE keyword allows for string searches. The '_' wild card character is used to match exactly one
character, '%' is used to match zero or more occurrences of any characters. These characters can be escaped
in SQL. Example:
SELECT name FROM emp WHERE id LIKE '%\_%' ESCAPE '\';
Use two quotes for every one displayed. Example:
SELECT 'Franks''s Oracle site' FROM DUAL;
SELECT 'A ''quoted'' word.' FROM DUAL;
SELECT 'A ''''double quoted'''' word.' FROM DUAL;
Back to top of file

How does one eliminate duplicates rows from a table?
Choose one of the following queries to identify or remove duplicate rows from a table leaving only unique
records in the table:
Method 1:
SQL> DELETE FROM table_name A WHERE ROWID > (
2 SELECT min(rowid) FROM table_name B
3 WHERE A.key_values = B.key_values);
Method 2:
SQL> create table table_name2 as select distinct * from table_name1;
SQL> drop table_name1;
SQL> rename table_name2 to table_name1;
SQL> -- Remember to recreate all indexes, constraints, triggers, etc on table...
Method 3: (thanks to Dennis Gurnick)
SQL> delete from my_table t1
SQL> where exists (select 'x' from my_table t2
SQL> where t2.key_value1 = t1.key_value1
SQL> and t2.key_value2 = t1.key_value2
SQL> and t2.rowid > t1.rowid);
Note: One can eliminate N^2 unnecessary operations by creating an index on the joined fields in the inner
loop (no need to loop through the entire table on each pass by a record). This will speed-up the deletion
process.
Note 2: If you are comparing NOT-NULL columns, use the NVL function. Remember
that NULL is not equal to NULL. This should not be a problem as all key columns
should be NOT NULL by definition.
Back to top of file

How does one generate primary key values for a table?
Create your table with a NOT NULL column (say SEQNO). This column can now be populated with
unique values:
SQL> UPDATE table_name SET seqno = ROWNUM;
or use a sequences generator:
SQL> CREATE SEQUENCE sequence_name START WITH 1 INCREMENT BY 1;
SQL> UPDATE table_name SET seqno = sequence_name.NEXTVAL;
Finally, create a unique index on this column.
Back to top of file

How does one get the time difference between two date 
columns?
Look at this example query:
select floor(((date1-date2)*24*60*60)/3600)
|| ' HOURS ' ||
floor((((date1-date2)*24*60*60) -
floor(((date1-date2)*24*60*60)/3600)*3600)/60)
|| ' MINUTES ' ||
round((((date1-date2)*24*60*60) -
floor(((date1-date2)*24*60*60)/3600)*3600 -
(floor((((date1-date2)*24*60*60) -
floor(((date1-date2)*24*60*60)/3600)*3600)/60)*60)))
|| ' SECS ' time_difference
from ...
If you don't want to go through the floor and ceiling math, try this method
(contributed by Erik Wile):
select to_char(to_date('00:00:00','HH24:MI:SS') +
(date1 - date2), 'HH24:MI:SS') time_difference
from ...
Note that this query only uses the time portion of the date and ignores the date itself.
It will thus never return a value bigger than 23:59:59.
Back to top of file

How does one add a day/hour/minute/second to a date 
value?
The SYSDATE pseudo-column shows the current system date and time. Adding 1 to SYSDATE will
advance the date by 1 day. Use fractions to add hours, minutes or seconds to the date. Look at these
examples:
SQL> select sysdate, sysdate+1/24, sysdate +1/1440, sysdate + 1/86400 from
dual;

SYSDATE SYSDATE+1/24 SYSDATE+1/1440 SYSDATE+1/86400


-------------------- -------------------- -------------------- --------------------
03-Jul-2002 08:32:12 03-Jul-2002 09:32:12 03-Jul-2002 08:33:12 03-Jul-2002
08:32:13
The following format is frequently used with Oracle Replication:
select sysdate NOW, sysdate+30/(24*60*60) NOW_PLUS_30_SECS from dual;

NOW NOW_PLUS_30_SECS
-------------------- --------------------
03-JUL-2002 16:47:23 03-JUL-2002 16:47:53
Back to top of file

How does one count different data values in a column?
Use this simple query to count the number of data values in a column:
select my_table_column, count(*)
from my_table
group by my_table_column;
A more sophisticated example...
select dept, sum( decode(sex,'M',1,0)) MALE,
sum( decode(sex,'F',1,0)) FEMALE,
count(decode(sex,'M',1,'F',1)) TOTAL
from my_emp_table
group by dept;
Back to top of file

How does one count/sum RANGES of data values in a 
column?
A value x will be between values y and z if GREATEST(x, y) = LEAST(x, z). Look at this example:
select f2,
sum(decode(greatest(f1,59), least(f1,100), 1, 0)) "Range 60-100",
sum(decode(greatest(f1,30), least(f1, 59), 1, 0)) "Range 30-59",
sum(decode(greatest(f1, 0), least(f1, 29), 1, 0)) "Range 00-29"
from my_table
group by f2;
For equal size ranges it might be easier to calculate it with DECODE(TRUNC(value/range), 0, rate_0, 1,
rate_1, ...). Eg.
select ename "Name", sal "Salary",
decode( trunc(f2/1000, 0), 0, 0.0,
1, 0.1,
2, 0.2,
3, 0.31) "Tax rate"
from my_table;
Back to top of file

Can one retrieve only the Nth row from a table?
Rupak Mohan provided this solution to select the Nth row from a table:
SELECT * FROM t1 a
WHERE n = (SELECT COUNT(rowid)
FROM t1 b
WHERE a.rowid >= b.rowid);
Shaik Khaleel provided this solution:
SELECT * FROM (
SELECT ENAME,ROWNUM RN FROM EMP WHERE ROWNUM < 101 )
WHERE RN = 100;
Note: In this first query we select one more than the required row number, then we select the required one.
Its far better than using a MINUS operation.
Ravi Pachalla provided these solutions:
SELECT f1 FROM t1
WHERE rowid = (
SELECT rowid FROM t1
WHERE rownum <= 10
MINUS
SELECT rowid FROM t1
WHERE rownum < 10);
SELECT rownum,empno FROM scott.emp a
GROUP BY rownum,empno HAVING rownum = 4;

Alternatively...
SELECT * FROM emp WHERE rownum=1 AND rowid NOT IN
(SELECT rowid FROM emp WHERE rownum < 10);
Please note, there is no explicit row order in a relational database. However, this query is quite fun and may
even help in the odd situation.
Back to top of file

Can one retrieve only rows X to Y from a table?
Shaik Khaleel provided this solution to the problem:
SELECT * FROM (
SELECT ENAME,ROWNUM RN FROM EMP WHERE ROWNUM < 101
) WHERE RN between 91 and 100 ;
Note: the 101 is just one greater than the maximum row of the required rows (means x= 90, y=100, so the
inner values is y+1).
Ravi Pachalla provided this solution:
SELECT rownum, f1 FROM t1
GROUP BY rownum, f1 HAVING rownum BETWEEN 2 AND 4;

Another solution is to use the MINUS operation. For example, to display rows 5 to 7, construct a query like
this:
SELECT *
FROM tableX
WHERE rowid in (
SELECT rowid FROM tableX
WHERE rownum <= 7
MINUS
SELECT rowid FROM tableX
WHERE rownum < 5);
Please note, there is no explicit row order in a relational database. However, this query is quite fun and may
even help in the odd situation.
Back to top of file

How does one select EVERY Nth row from a table?
One can easily select all even, odd, or Nth rows from a table using SQL queries like this:
Method 1: Using a subquery
SELECT *
FROM emp
WHERE (ROWID,0) IN (SELECT ROWID, MOD(ROWNUM,4)
FROM emp);
Method 2: Use dynamic views (available from Oracle7.2):
SELECT *
FROM ( SELECT rownum rn, empno, ename
FROM emp
) temp
WHERE MOD(temp.ROWNUM,4) = 0;
Method 3: Using GROUP BY and HAVING - provided by Ravi Pachalla
SELECT rownum, f1
FROM t1
GROUP BY rownum, f1 HAVING MOD(rownum,n) = 0 OR rownum = 2-n
Please note, there is no explicit row order in a relational database. However, these queries are quite fun and
may even help in the odd situation.
Back to top of file
How does one select the TOP N rows from a table?
Form Oracle8i one can have an inner-query with an ORDER BY clause. Look at this example:
SELECT *
FROM (SELECT * FROM my_table ORDER BY col_name_1 DESC)
WHERE ROWNUM < 10;
Use this workaround with prior releases:
SELECT *
FROM my_table a
WHERE 10 >= (SELECT COUNT(DISTINCT maxcol)
FROM my_table b
WHERE b.maxcol >= a.maxcol)
ORDER BY maxcol DESC;
Back to top of file

How does one code a tree­structured query?
Tree-structured queries are definitely non-relational (enough to kill Codd and make him roll in his grave).
Also, this feature is not often found in other database offerings.
The SCOTT/TIGER database schema contains a table EMP with a self-referencing
relation (EMPNO and MGR columns). This table is perfect for tesing and
demonstrating tree-structured queries as the MGR column contains the employee
number of the "current" employee's boss.
The LEVEL pseudo-column is an indication of how deep in the tree one is. Oracle can
handle queries with a depth of up to 255 levels. Look at this example:
select LEVEL, EMPNO, ENAME, MGR
from EMP
connect by prior EMPNO = MGR
start with MGR is NULL;
One can produce an indented report by using the level number to substring or lpad() a series of spaces, and
concatenate that to the string. Look at this example:
select lpad(' ', LEVEL * 2) || ENAME ........
One uses the "start with" clause to specify the start of the tree. More than one record can match the starting
condition. One disadvantage of having a "connect by prior" clause is that you cannot perform a join to other
tables. The "connect by prior" clause is rarely implemented in the other database offerings. Trying to do this
programmatically is difficult as one has to do the top level query first, then, for each of the records open a
cursor to look for child nodes.
One way of working around this is to use PL/SQL, open the driving cursor with the
"connect by prior" statement, and the select matching records from other tables on a
row-by-row basis, inserting the results into a temporary table for later retrieval.
Back to top of file

How does one code a matrix report in SQL?
Look at this example query with sample output:
SELECT *
FROM (SELECT job,
sum(decode(deptno,10,sal)) DEPT10,
sum(decode(deptno,20,sal)) DEPT20,
sum(decode(deptno,30,sal)) DEPT30,
sum(decode(deptno,40,sal)) DEPT40
FROM scott.emp
GROUP BY job)
ORDER BY 1;

JOB DEPT10 DEPT20 DEPT30 DEPT40


--------- ---------- ---------- ---------- ----------
ANALYST 6000
CLERK 1300 1900 950
MANAGER 2450 2975 2850
PRESIDENT 5000
SALESMAN 5600
Back to top of file

How does one implement IF­THEN­ELSE in a select 
statement?
The Oracle decode function acts like a procedural statement inside an SQL statement to return different
values or columns based on the values of other columns in the select statement.
Some examples:
select decode(sex, 'M', 'Male', 'F', 'Female', 'Unknown')
from employees;

select a, b, decode( abs(a-b), a-b, 'a > b',


0, 'a = b',
'a < b') from tableX;

select decode( GREATEST(A,B), A, 'A is greater OR EQUAL than B', 'B is greater
than A')...

select decode( GREATEST(A,B),


A, decode(A, B, 'A NOT GREATER THAN B', 'A GREATER THAN B'),
'A NOT GREATER THAN B')...
Note: The decode function is not ANSI SQL and is rarely implemented in other RDBMS offerings. It is one
of the good things about Oracle, but use it sparingly if portability is required.
From Oracle 8i one can also use CASE statements in SQL. Look at this example:
SELECT ename, CASE WHEN sal>1000 THEN 'Over paid' ELSE 'Under paid' END
FROM emp;
Back to top of file

How can one dump/ examine the exact content of a 
database column?
SELECT DUMP(col1)
FROM tab1
WHERE cond1 = val1;

DUMP(COL1)
-------------------------------------
Typ=96 Len=4: 65,66,67,32
For this example the type is 96, indicating CHAR, and the last byte in the column is 32, which is the ASCII
code for a space. This tells us that this column is blank-padded.
Back to top of file

Can one drop a column from a table?
From Oracle8i one can DROP a column from a table. Look at this sample script, demonstrating the ALTER
TABLE table_name DROP COLUMN column_name; command.
Other workarounds:
1. SQL> update t1 set column_to_drop = NULL;
SQL> rename t1 to t1_base;
SQL> create view t1 as select <specific columns> from t1_base;

2. SQL> create table t2 as select <specific columns> from t1;


SQL> drop table t1;
SQL> rename t2 to t1;
Back to top of file

Can one rename a column in a table?
No, this is listed as Enhancement Request 163519. Some workarounds:
1. -- Use a view with correct column names...
rename t1 to t1_base;
create view t1 <column list with new name> as select * from t1_base;

2. -- Recreate the table with correct column names...


create table t2 <column list with new name> as select * from t1;
drop table t1;
rename t2 to t1;

3. -- Add a column with a new name and drop an old column...


alter table t1 add ( newcolame datatype );
update t1 set newcolname=oldcolname;
alter table t1 drop column oldcolname;
Back to top of file

How can I change my Oracle password?
Issue the following SQL command: ALTER USER <username> IDENTIFIED BY
<new_password>
/
From Oracle8 you can just type "password" from SQL*Plus, or if you need to change
another user's password, type "password user_name".
Back to top of file

How does one find the next value of a sequence?
Perform an "ALTER SEQUENCE ... NOCACHE" to unload the unused cached sequence numbers from the
Oracle library cache. This way, no cached numbers will be lost. If you then select from the
USER_SEQUENCES dictionary view, you will see the correct high water mark value that would be
returned for the next NEXTVALL call. Afterwards, perform an "ALTER SEQUENCE ... CACHE" to
restore caching.
You can use the above technique to prevent sequence number loss before a
SHUTDOWN ABORT, or any other operation that would cause gaps in sequence
values.
Back to top of file

Workaround for snapshots on tables with LONG 
columns
You can use the SQL*Plus COPY command instead of snapshots if you need to copy LONG and LONG
RAW variables from one location to another. Eg:
COPY TO SCOTT/TIGER@REMOTE -
CREATE IMAGE_TABLE USING -
SELECT IMAGE_NO, IMAGE -
FROM IMAGES;
Note: If you run Oracle8, convert your LONGs to LOBs, as it can be replicated.
Back to top of file

Where can one get more info about SQL?
• The SQL FAQ
• Introduction to Structured Query Language
• SQL Tutorial
• Tina London's SQL Guide

Vous aimerez peut-être aussi