.Ua Articles SQL SQL

New Page Page 1 sur 11
The Quality of Your SQL

Gavin Nour
nourg@acslink.net.au
The following article was first published in the November 1995 issue of the Informix User
Group NSW Newsletter.
Some organisations are suprised when I suggest that all SQL should be passed through Quality
Assurance (QA) by analysing the output of the command SET EXPLAIN ON.
This article will discuss why we should QA SQL, explain the role of the Informix optimiser
(including how to read the output from SET EXPLAIN) and offer some tips to improve SQL
performance.
The subject should be pertinent to all development where an Informix database is used,
regardless of the tool. The SQL development could be via an Informix tool such as 4GL,
I-SQL, ESQL or NEWERA or it could be via a third party product accessing an Informix
database.
Firstly, why QA your SQL? Generally I recommend placing strict controls on ALL SQL
development. This becomes especially important today as we are finding the size of databases
are growing at an alarming rate. The larger our databases become the more visible our
mistakes become and the more problems we have. Large databases are less forgiving,
especially in the area of performance. Paying careful attention in the development phase will
of course reward us in the future.
Quality assurance will not only help us avoid performance problems, it will also help us avoid
other serious problems such as filling the root database space unnecessarily with large
temporary tables (possibly causing other INFORMIX applications to fail) or filling UNIX
filesystem space with large sort files (possibly causing any UNIX process to fail) or even
filling our logical logs with long transactions and having to restore the database from an
archive.
One of the best ways to avoid such problems is to enforce SQL development standards.
Usually the best approach is to separate the SQL Quality Assurance process from the
Development process. Many large organisations do this and have a team of staff devoted to
general QA others have just one person. My opinion is that usually the DBA is in the best
position to QA SQL statements as he/she is most familiar with the database and knows how
the engine goes about its task. Also the DBA should be aware of the SQLs being executed as
this may highlight the need to alter the database schema (eg.; add an index). Regardless of
who performs the QA, the most important thing is that it gets done and that both the developer
and the person responsible for QAing the SQL understands what to look out for.
Undoubtably the best tool to use for optimising a query or for the QA process is the command
SET EXPLAIN ON.
http://www.informix.com.ua/articles/sql/sql.htm 12/04/2017
We should remember first that although we can optimise the queries proper database design
plays a major part in the query performance and often the redesign of tables and an effective
indexing strategy will solve many problems. Maybe a future article will discuss database
schema optimisation. This article will focus on the SET EXPLAIN ON option and endeavour
to explain (pun intended) how this command can help you optimise queries. First we have to;
Understanding the Informix Optimiser
What is the optimiser?
The optimiser is the part of the INFORMIX engine that has the job of trying to anticipate the
best route to take before running a given query. Because there are often many paths that can be
taken to get to the data, some paths will take much longer than others. This is especially true
when there are many tables involved in a join. Developers usually develop the query and then
leave the task of finding the best route to the optimiser and assume that the engine will use the
best method available. Unfortunately because the optimiser does a good job most of the time
we tend to forget that the engine does not always make the right decision. Sometimes we
know more than the engine does and can see a better way of doing things. The output from
SET EXPLAIN ON explains how the optimiser has chosen to access the data. With this
information available we may discover the optimiser has chosen a path that will be time
consuming and we can takes steps to restructure the query. Sometimes a small alteration will
influence the optimiser to take an alternative route. Other times we realise that an index can be
added to improve performance or we may just find out that it is fine the way it is but we need
to add more temporary space to cater for a sort.
How does the optimiser make its decision?
The answer to this is very complex. The main aim of the optimiser is to reduce I/O by limiting
the amount of data to search through in order to obtain the requested data in the most efficient
way. The optimiser makes its decisions based on information in the system catalogues. In
version 5 this information consists of :
• the number of rows in each table used in the query (Systables.nrows)

• how many pages are used for data and how many pages are used for indexes
(systables.npused)
• whether columns values are unique or not.(sysconstraints)
• what indexes exist (sysindexes)
• whether the indexes are ascending or descending (sysindexes). Not req in V7.
• whether the data is in the same order as the index; i.e., clustered (sysindexes.clust)
• how many levels there are in an index
• what the second largest and second lowest values are for each column. This gives the
optimiser a rough idea of the range of values (syscolumns.colmin and colmax). Version
7 can obtain more detail about the distribution of data (sysdistrib table).
Using all of this information the optimiser will determine all possible routes and then weigh
each method with an estimated cost. The cost will be based on several considerations
including disk access, CPU resources required and network access. In the process the
optimiser will determine in what order to join tables, whether to perform sequential scans;
whether it should create temporary tables; whether it can use an index for the select list, for the
filter, for the sort or for a group by etc. Once the optimiser selects a plan which it believes is
the most cost effective it passes the query on for processing and if SET EXPLAIN ON is in
effect the chosen method will be recorded in a file.
To illustrate how important it is for the optimiser to have the right information available we
can look at a very simple query and a decision that would have to be made by the optimiser to
perform a join :
Lets say we have two tables, one with 500,000 rows (tab2) and the other with 200 rows, both
with a unique index on the joining column. A simple select to find related rows from the two
tables would be:
SELECT * FROM tab1, tab2
WHERE tab1.col1=tab2.col2
Hopefully the optimiser would choose to select from the smaller table first and then join to the
second using the index. This would result in 1000 disk reads (assuming one row per page and
it takes three index reads and one data read per row in table two). ie. 200 for table one plus
200 x 4 for table two.
Now, lets say the optimiser did not have accurate information available about the number of
rows and the number of unique values and chose to select fromtable two first. This would
result in two million reads! (assuming it takes just two index reads and one data read per row
in table 1). ie. 500,000 plus 500,000 x 3.
Of course if the optimiser had even less accurate information and didn't know about an index
and used a sequential scan for each row returned from table one we would have to perform
around one billion reads! (200 * 500,000).
We can see from this example using a very simple query that the wrong information and the
wrong decision could have dramatic effects on performance. Eg; 1000 reads verses
1,000,000,000. If there are many tables involved along with more filter conditions in the
WHERE clause the decision process becomes much more complex and the importance of
accurate statistics is magnified.
How accurate are your statistics?
It is very important to remember that the information in the system catalogues used by the
optimiser is only updated when the UPDATE STATISTICS command is executed. The
optimiser is only as good as the information it has been provided with, so, rule number one is:
execute UPDATE STATISTICS as regularly as possible. More often for very dynamic tables
than for tables which rarely change. If the system catalogue information is out of date the
optimiser may make the wrong decision and severe performance problems could be
experienced. It is surprising to see how many sites fail to do this.
We can execute UPDATE STATISTICS for the whole database, for individual tables, for
columns in a table and for stored procedures.
What I usually recommend is that, instead of relying on memory, automate the execution of
UPDATE STATISTICS every night via a cron command. Be careful when running the
UPDATE STATISTICS command whilst other applications are running because the system
tables are momentarily locked when the catalogues are updated. This can result in other
processes receiving errors. Do not update too often because the command takes time.
When does the engine perform the optimisation?
The optimisation occurs whenever an SQL statement is prepared, providing there are no host
variables. If there are host variables the optimiser does not have all the information required
until the variables are passed (when the cursor is opened) and in this case the optimisation
occurs on the OPEN statement. So, with standard SQL (i.e. not prepared) the query is
optimised whenever it is executed. If an SQL statement is used repeatedly within a program it
is best to PREPARE the SQL statement to improve performance. This way the SQL is only
optimised once within a program instead of every time it is used (unless specifically requested
to reoptimise by using the re-optimisation keyword). With Stored Procedures the SQL is
optimised when the procedure is created or when UPDATE STATISTICS is executed for that
stored procedure.
Suddenly slow?
Even after running UPDATE STATISTICS regularly in production we may notice one day
that a query which previously took 10 minutes suddenly takes one hour. This can be the result
of the optimiser suddenly choosing a new path based on new information in the system
catalogues. This may mean the query needs restructuring to influence the optimiser to make a
better decision. Because of this we should have the ability to turn SET EXPLAIN ON for any
program in production instead of being forced to recompile the program to find out the query
plan. I usually recommend building a simple function into all programs which enables either a
parameter to be passed or an environment variable to be set to turn SET EXPLAIN ON.
Development Verses Production
Remember, not only can the statistics used by the optimiser change but also if you are testing a
query in development on a small database the statistics may be quite different to what is in
production. A query may appear to perform well in development but once it is in production
the optimiser may choose (rightly or wrongly) to take a totally different route. The only way to
get around this problem in the QA process is to run the query against a test database which is
of a similar size to production with the same data, or alternatively (providing the database
schemas are the same) one might consider manually updating some of the columns in the
system catalogue tables after running UPDATE STATISTICS on the test database just so that
the optimiser makes the same decisions, however this is definitely not recommended and
INFORMIX may not support you if corruption is caused.
Controlling the optimiser
Optimisation level
In version 5.x and above we have the ability to influence how much information the engine
attempts to obtain. We can do this by using the command SET OPTIMISATION HIGH or
LOW. HIGH (which is the default) causes the engine to examine all access paths and LOW
causes the engine to eliminate less likely options at an earlier stage in order to improve
optimisation time. LOW can be useful if many tables are used in a join (e.g. five or more) as
the optimisation time can hinder performance. The downside to using OPTIMISATION LOW
is that an access path may be eliminated too early when in fact it would have been the most
suitable option.
OPTCOMPIND
Beginning with version 7 we have a little more control over the optimiser. Previous versions
assumed that if an index existed this would be the most cost efficient access path. Now we can
specify that we would like the optimiser to compare the cost of an index read verses the cost of
a table scan so that the most cost efficient path can be chosen. This is done by setting
OPTCOMPIND=2 (default). We can still emulate the functionality of previous versions by
setting OPTCOMPIND=0. Setting OPTCOMPIND=1 instructs the optimiser to work like it
would with it set to 2 unless REPEATABLE READ is set in which case it works like it would
if it was to 0. The reason for this option is that a sequential scan with repeatable read will
effectively place a shared lock on the whole table (as it scans all rows) during the read.
Usually undesirable!
Obtaining data distribution for the optimiser
Beginning with version 6, INFORMIX introduced data distribution analysis and storage in
order to help the optimiser make more informed decisions. Data distribution analysis is used to
give the optimiser a better understanding of the values contained in a column, e.g.; how many
unique values there are in each area of the table. This is done by sampling the data in a column
and storing information about sections of the table in various bins. This information can be
extremely valuable to the optimiser when dealing with large tables. To generate distribution
for a column the UPDATE STATISTICS command is used. The amount of data sampled is
controlled by the keywords MEDIUM and HIGH e.g. UPDATE STATISTICS HIGH FOR
table(column). MEDIUM merely samples the data and is very quick whereas HIGH evaluates
all the rows in the table and is therefore slower but more accurate. LOW is like using previous
versions of UPDATE STATISTICS and does not obtain any distribution data at all. The
distribution analysis can be influenced further by the use of the keyword RESOLUTION and
by specifying a resolution value (number of bins) and a confidence value (level of sampling).
Refer to the manual for more information on using these parameters.
An UPDATE STATISTICS strategy for ONLINE DSA Version 7
The recommended UPDATE STATISTICS strategy for INFORMIX DSA version 7 is to do

the following (in the same order):
1) run UPDATE STATISTICS MEDIUM on all tables using DISTRIBUTIONS ONLY

without listing columns and using the default RESOLUTION parameters. Prior to 7.10.UD1 it
is better to actually list the columns that do NOT head an index for better performance. 2) run
UPDATE STATISTICS HIGH for all columns that head an index or the columns that are
definitely part of a query. Execute a separate command for each column. 3) run UPDATE
STATISTICS LOW for all other columns in composite indexes.
Step one is very fast as it just samples data, but still gathers useful distribution information.
This is used first to obtain information about all the columns that are not in an index. From
7.10.UD1 it is not worth specifying each column due to the speed of MEDIUM. Step two is to
get as much distribution information as possible for all columns which are important for query
performance (eg joining columns). Because the index can be used (in version 7) the statistics
can be gathered relatively quickly, however do note that only one column should be specified
at a time with
HIGH so that it can make use of an index. In the final step the remaining columns in all the
indexes which are not at the beginning of the index can be referenced in the one statement (per
table) using the LOW parameter.
Do not re-run UPDATE STATISTICS for large static tables.
Once the UPDATE STATISTICS steps have been completed you can view any data
distribution profiles by using the -hd option with dbschema, e.g.: dbschema -d databasename
-hd tablename
TIP: This command is a very useful tool to help determine a good fragmentation strategy.
Examining the Optimiser's Choice
Once we are comfortable knowing that we have supplied the optimiser with enough
information we can see what query plan the INFORMIX optimiser has chosen. To do this use
SET EXPLAIN ON within the query or within the NewEra, 4GL or esql program. When this
is set the optimiser will write output for all queries (for the same process) to a file called
"sqexplain.out" in the current directory (usually, however the filename and location depends
on the operating system and whether the query is executed on a remote host). Once the query
is submitted it is at the point of no return. The only way to examine the optimiser output
without completing the query is to hit the interrupt key just after the query starts (after the
status line in dbaccess says explain set).
Typical SET EXPLAIN output:
QUERY:
------
select cust_id, order.* from orders, customers where order_date
> "01/12/1995" AND order_date < "01/01/1996" AND
customers.cust_id = orders.cust_id order by order_date DESC
Estimated Cost: 10
Estimated # of Rows Returned: 200
Temporary Files Required For: Order By
1) informix.orders: INDEX PATH
(1) Index keys: order_date

Lower Index Filter: informix.orders.order_date > "01/12/1995"
Upper Index Filter: informix.orders.order_date < "01/01/1996"
2) informix.customers: INDEX PATH
(1) Index keys: cust_id (Key-Only)

Lower Index Filter: informix.customers.cust_id = informix.orders.cust_id
Understanding the SET EXPLAIN output:
Query: {LOW}
This section of the output shows the actual query which was optimised. LOW is displayed if
SET OPTIMISATION is currently set to LOW. Note that sqexplain.out is appended to if the
file already exists.
Estimated Cost:
This value is simply a number the optimiser has assigned to the chosen access method. The
value is not meaningful except to the optimiser as it bears no relationship to real time. It
cannot be compared to the estimated cost of other queries and is best ignored. It can be used
however to compare changes made for the same query (e.g.; an index change).
Estimated # of Rows Returned:
This is the optimisers estimate based on information in the system catalogue tables. Remember
that the catalogue information is fairly limited (especially prior to version 7) so this value will
often be inaccurate (more so if the query involves a join). In ONLINE DSA Version 7
distribution information can be obtained for the data which helps the optimiser estimate the
number of rows more accurately.
Temporary Files Required For: Order By Group By
When this is displayed there is a GROUP BY or an ORDER BY statement in the query and
the optimiser has determined that there is no corresponding index available to obtain the data
in the required order. A temporary file will be created to order the result set. This could be
very large (depending on the size of tables) so check available disk space and be aware of the
affect this sort could have on performance. Indexes cannot be used when the columns to be
sorted are from more than one table. Note that in version 7 the optimiser can choose to
traverse an index in the direction of the ORDER BY regardless of whether the INDEX is in
the same order as the ORDER BY. Prior to version 7 the ability of the optimiser to use the
index for an ORDER BY was dependant on whether the ASCENDING/DESCENDING values
on the index and the ORDER BY matched.
1) owner.table: INDEX PATH (Key-Only)
This is the table that the optimiser has chosen to read first (indicated by the 1 ). Subsequent
table accesses (for a nested loop join for example) will be displayed further down in the
explain output and will be indicated by a higher number. For each row returned at this level
the engine will query the tables at a lower level. INDEX PATH indicates an index will be used
to access this table.
The (Key-Only) notation (with ONLINE only) indicates that only the index will be read and
the actual data value (row) will not be read from this table. Key only access would generally
be very efficient (prior to version DSA 7.2 ) due to the smaller size of the index compared to
the row. Not only is the read for the data row eliminated but more index key values are likely
to fit on the one page which in turn reduces I/O. This type of access is only achieved if no
columns are selected from the same table. Tip: Avoid the use of SELECT * if possible and
only select the required columns. Note that with ONLINE DSA 7.2 key only reads are in fact
slower in most cases due to the read ahead capabilities.
(1) Index keys: column_name Lower
Index Filter: owner.table.column > x Upper
Index Filter: owner.table.column < y
Column_name is the name of the column to be used in the INDEX PATH read Lower Index
Filter shows the first key value (x) where the index read will begin. Upper Index Filter shows
the key value (y) where the index read will stop.
1) owner.table: SEQUENTIAL SCAN (Serial, fragments: ALL)
In the above case all rows will be read from this table using a sequential scan.
The section in brackets relates to version 7. If Parallel is displayed instead of Serial then the
engine will perform a parallel scan (this behaviour is influenced by the PDQPRIORITY
setting). The ALL notation indicates all fragments have to be scanned because the optimiser
could not eliminate fragments after examining the WHERE predicates. NONE indicates the
opposite, that is the optimiser has eliminated all fragments and therefore none need to be
examined. A number (or list of numbers) indicates the engine will only examine the
fragment/s listed (numbers are relative to the order in the sysfragments table) Pay special
attention if the sequential scan is being performed at a lower level in the query plan (indicated
by a higher number) as this could mean the whole table is being scanned for each row returned
in a previous step. Often this is one of the warning bells when optimising/QAing a query.
Sequential scans are not so bad when they are for small tables or when they are in the first step
of a query plan providing the engine does not have to scan a large table in order to retrieve just
a fraction of the table.
AUTOINDEX PATH: owner.table.column
Used more in Version 4. In order to avoid sequential access a temporary index is built on this
column to perform a join. Used if an index does not exist on this join column and is generally
an indication that you need a permanent index.
SORT SCAN: owner.table.column
Used in combination with a sequential scan when no index is available on the join column.
This column will be sorted for later use with a join (See MERGE JOIN).
MERGE JOIN Merge Filters: owner.table.column = owner.table.column
A Merge join will be used to join the results of the two previous selections sets which were
prepared for a join. Once the join columns are obtained in the appropriate order (possibly via a
SORT SCAN if an index does not exist) the server sequentially reads both result sets and
merges them before accessing the rows. Considered faster than a nested loop join in many
cases.
DYNAMIC HASH JOIN (Build Outer) Dynamic Hash Filters: owner.tab1.col =
owner.tab2.column ...
Version 7 only. A Hash join will be used to join the two preceding tables in the explain output.
The Build Outer notation tells us which table is used first. The Filter shows how the tables will
be joined. Where some complex queries cannot use an index the hash join takes over. A Hash
join is also used instead of a sort-merge join and is considered more efficient. Whereas a sort-
merge join sorts both tables a hash join typically only sorts one. Hash joins are favoured with
large amounts of data especially for PDQ with fragmentation. Rows are placed in a hash table
after using an internal hash algorithm. The cost of a hash join can be lower than using an
index, especially when more than around 15% of data from a large table needs to be scanned.
When the data is not clustered (in a different order to the index) the cost of traversing the
index in addition to retrieving the actual rows (in a different order) is quite high compared to a
table scan with a hash join. OPTCOMPIND=2 (which is default) will cause the optimiser to
consider hash joins instead of an index. Note OPTCOMPIND ought to be set to 1 when
REPEATABLE READ is being used (and arguably should be the default).
SQL Query Quality Assurance and Optimisation

(Some Tips, Some Warnings, Some Things To Check and Some Things to Avoid):
1. Avoid sequential scans on a large table if it is not in the first position of the query plan
as the whole table scan is repeated for every row in the preceding table. This can
severely affect performance. This not only affects the query in action but can also
impact other tasks by changing the recently used pages in shared memory (as it keeps
reading the whole table into shared memory). This increases disk I/O all round as other
processes are forced to read from disk. Consider adding an index if the query cannot be
restructured and performance is slow.
2. Avoid potentially large temporary sort files. This can consume all available CPU
resources, increases disk I/O and consume all available disk space. Consider adding an
index on the columns being sorted. A hint: if the optimiser is not using an existing index
for the ORDER BY this may be because the column being ordered is not in the WHERE
clause. In this case the optimiser can be influenced to use the index instead of creating a
temporary file by adding a dummy WHERE condition on the ORDER BY column (e.g.;
AND order_num>=0). Prior to version 7 the index has to be in the same order as the
ORDER BY. When sort files are used check $PSORT_DBTEMP and
$DBSPACETEMP settings, these can help significantly to improve performance by
enabling the engine to use more than one disk for the sort file.
3. Correlated subqueries. Subqueries that make reference (in the WHERE clause) to a
selected column in the main query can severely affect performance. This causes the
subquery to be executed repeatedly for each main table row. Be very cautious when
using statements like EXISTS with large tables as logical logs can fill very quickly in
the process. Temporary table space is allocated and logged for each row in the main
table (even if no rows are returned in the final select). The worst affect of this is filling
the logs and having to restore from an archive. Correlated subqueries should be
rewritten to use a join wherever possible. Some more complex rewrites may involve
joining to a temporary table.
4. OR statements on different columns can prevent the optimiser from being able to use an
existing index. If an index does exist and the optimiser chooses a sequential scan in the
query plan, consider using a UNION statement (one for each OR condition) as this will
provide the opportunity for the optimiser to use the index.
5. If the query is slow even when an INDEX PATH has been chosen on all levels of the
query plan, do not assume the optimiser has made the right decision. Check the query
plan to see if the tables are being filtered in the right order. The aim is usually to
eliminate as many rows as possible in the early stages of the query, but unfortunately the
optimiser does not always have enough information to do this correctly (especially in
versions prior to 6). Using UPDATE STATISTICS HIGH in version 6 and above will
give the optimiser more information about the data distribution so that the right tables
are eliminated first. Also note the INDEX reads are not necessarily the best. For
example a sequential scan of a table can be faster than an index scan if the data pages
have to be retrieved and the pages are not in the same physical order.
6. Converting data types and comparing character columns is very expensive. E.g.;
tab.character_col=tab2.integer_col. If possible consider changing the column types to be
numeric. Remember if the join columns are character types the columns have to be
compared byte by byte for each row. Although version 7 handles conversions better, the
cost of the conversion overhead is still not considered by the optimiser.
7. Look out for the WHERE predicates which may not be able to use indexes. These
include: OR, LIKE or MATCHES with a wildcard at the beginning (e.g.; MATCHES
*NOUR), functions (e.g.; MONTH, DAY, LENGTH), negative expressions ( e.g.; !=
NOUR), non initial substring searches (e.g.; postcode[4,5]>10).
8. Except for very old versions of the engine the order of tables in a SELECT list or the
order of elements in the WHERE clause will not have an effect, however in some cases
when the optimiser believes the cost is the same for two different paths, the order of the
statements in the WHERE clause may have an effect. Some tricks have been suggested
in the past (such as repeating predicates in the WHERE clause) to try and force the
optimiser to use a particular index, however this is no longer recommended and will not
work with future versions. E.g. with version 7 the optimiser query rewrite feature
eliminates duplicates in the WHERE clause.
9. Avoid long transactions in logged databases. We probably all know by now that long
transactions threaten to destroy our databases by filling the logical logs. Warning bells
should ring with the following statements: LOAD statements, INSERT INTO xx
SELECT yy FROM zz, UPDATE or DELETE statements spanning many rows. As an
aside consider locking the table in exclusive mode to avoid excessive lock usage
(hindering performance and even worse running out of locks).
10. Check the placement of WHERE conditions with joining columns to see if all
combinations of the WHERE predicate have been included. For example the WHERE
predicate; tab1.x=tab2.x and tab1.x >1000 would probably cause the optimiser to use
the index on tab1.x, however the index on tab2.x might be much more appropriate.
Adding the condition; and tab2.x > 1000 would give the optimiser more choices to
evaluate. Another example is tab1.x=tab2.x AND tab2.x=tab3.x. The optimiser in ver 5
would not consider joining tab1 directly to tab3, so adding tab1.x=tab3.x would again
provide more choices. Note however that the optimiser in ONLINE DSA V7 rewrites
queries so that all possible combinations are examined and these suggestions are not
applicable.
11. Select only the columns required, this reduces the communication between the front end
and backend and reduces I/O. Avoid the temptation to SELECT * (all columns when
they not required).
12. Use temporary tables when a subset of data is being reselected with different WHERE
predicates. For example a SELECT with the WHERE clause: orderdate> 01/12/1995
and x=1 might be followed by a SELECT with a WHERE clause: orderdate>
01/12/1995 and x=2 in this case if the table is very large performance could be
improved by first selecting all rows WHERE orderdate>01/12/95 into a temporary table
and then performing subsequent selects on the temporary table.
13. Use temporary tables to influence the optimiser to take the route that you know is best.
This can be done by first selecting the rows you want from a large table into a
temporary table and then joining the temp table to the rest of the tables. The optimiser
may use a different query plan knowing that the temporary table is much smaller than
the original large table.
14. Consider creating indexes on temporary tables. Often overlooked just because the table
is temporary.
15. When using temporary tables for subsequent selects with ORDER BYs, create the
temporary table using an ORDER BY. E.g.; SELECT x FROM y ORDER BY x INTO
TEMP temp_tab. This syntax is not available in earlier versions of the engine (pre 4.1).
16. Consider running UPDATE STATISTICS (within the application) for large temporary
tables. Again this is often overlooked just because the table is temporary.
17. Use the WITH NO LOG statement when explicitly creating temporary tables, this will
help performance by eliminating the overhead of writing to the logical logs. This also
avoids the possibility of creating a long transaction with a large temporary table. In
version 7 temporary tables can be created in a special dbspace which is not logged. Use
this feature whenever possible.
18. Time commands with the UNIX time or timex command or use; SELECT CURRENT
FROM systables WHERE tabid=1 before and after the query being tested/optimised.
19. When testing/timing queries remember that the second time the query is executed it is
much more likely to be faster because the pages read the first time round will probably
still be in memory. Be aware that this may distort the test results. Use tbstat -p to
monitor disk/buffer usage. When timing queries consider restarting the instance between
tests in order to re- initialise shared memory.
20. Fragmentation and Parallel queries. It is important to fragment the data in a way that
makes it easy for the optimiser to determine which fragments are active. Making the
WHERE predicates clear enough to enable optimiser to eliminate fragments will reduce
I/O and enable parallelism. PDQ and fragmentation is a whole new subject in its own
right (perhaps another article). Just remember if you are using ONLINE DSA version 7
the trick is to divide and conquer!
To Summarise
The Informix Cost based optimiser does an excellent job in doing what it is supposed to do,
that is shielding us from the task of having to think about the best way to retrieve our data. For
this reason we often take it for granted. Just imagine if we had to make the same complex
decisions as the optimiser does every time we wanted to access our data. Nevertheless, it is
important that we understand the optimiser and the consequences of the decisions it makes.
More importantly we need to provide the optimiser with the information it needs to perform its
job effectively.
Informix are constantly refining the optimiser and finding new ways to improve performance.
Some of the comments made in this article may not be valid in future releases. However this
article was checked for accuracy (in Nov 95) by the brains behind the optimiser (at
INFORMIX, Menlo Park, California) and I wish to publicly thank that person for his valuable
time.
I hope this article has been of value to you. Any criticisms, comments or suggestions would be
welcome.
Gavin Nour is the Director of Select Software Solutions Pty Ltd. He has ten years Informix
experience covering a variety of Industries and applications. He was a DP Manager for a
leading financial organisation in the United Kingdom, a technical Support Manager for a
leading Informix Distributor in Australia and spent two years at a leading Informix Software
house in Australia before forming his own company specialising in Informix services. Gavin is
committed to the Informix solution and to helping the Informix community. He is the Secretary
of the NSW Informix User Group and is on the board of Directors for the International
Informix User Group.
Украинская баннерная сеть
Hosted by NO-
[Home] Сайт поддерживается группой пользователей Информикс на Украине.
more.

.Ua Articles SQL SQL

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

.Ua Articles SQL SQL

Transféré par

Droits d'auteur :

Formats disponibles

New Page Page 1 sur 11

The Quality of Your SQL

Understanding the Informix Optimiser

What is the optimiser?

How does the optimiser make its decision?

• the number of rows in each table used in the query (Systables.nrows)

How accurate are your statistics?

When does the engine perform the optimisation?

Development Verses Production

Controlling the optimiser

Obtaining data distribution for the optimiser

An UPDATE STATISTICS strategy for ONLINE DSA Version 7

The recommended UPDATE STATISTICS strategy for INFORMIX DSA version 7 is to do

1) run UPDATE STATISTICS MEDIUM on all tables using DISTRIBUTIONS ONLY

Do not re-run UPDATE STATISTICS for large static tables.

Examining the Optimiser's Choice

Typical SET EXPLAIN output:

Estimated # of Rows Returned: 200

Temporary Files Required For: Order By

1) informix.orders: INDEX PATH

(1) Index keys: order_date

2) informix.customers: INDEX PATH

(1) Index keys: cust_id (Key-Only)

Understanding the SET EXPLAIN output:

SQL Query Quality Assurance and Optimisation

Украинская баннерная сеть

Vous aimerez peut-être aussi