Vous êtes sur la page 1sur 7

EXPLAIN Explained: Recent Changes That Let Us Use EXPLAIN

Better
By
Jim Dee

Introduction and Basics


This is the first half of a two part article aimed at DBAs and application developers who use DB2 for
z/OS. Both articles will explore a use of EXPLAIN which may be a little different than what youre used
to. The idea is to store access path data over time, and use this data to detect trends and to proactively
identify issues for more detailed investigation. This article will review the basics of EXPLAIN and
describe how to store trending data in your explain tables, with some examples of SQL to extract value
from this. Then we will explore recent enhancements to DB2 for z/OS which allow us to more accurately
simulate our production environment in test. The second half of the article will discuss the use of
profiles and get into more enhancements which have given us the ability to improve optimization by
passing more accurate data to the optimizer: virtual indexes, hints and APREUSE, and selectivity profiles.
A basic premise of this article is that the optimizer is accurate most of the time, and that the explain
tables provide information which can be used to quickly narrow down the list of potential SQL problems
associated with application changes, DB2 version upgrades, application of maintenance, and other
environmental changes. Im assuming you want to strive for continual improvements in SQL
performance but minimize the risks of doing so. I assume that youre familiar with the explain tables
(PLAN_TABLE and its associated tables) and that you understand the basics of the EXPLAIN command or
BIND with EXPLAIN(YES).
Access path stability
Access path stability (sometimes called plan management) is an important feature of DB2 that was
added in DB2 9 and has been enhanced in DB2 10 and 11. For static SQL, it provides the ability to undo
a BIND which has experienced access path regression (or even avoid the BIND altogether!), but as we
will see, it provides a lot more than that. Note that many of the new BIND options can be applied only
to a package which was bound in DB2 9 or later.
Space does not permit me to go into detail about access path stability, but the rest of this article
assumes you are using at least PLANMGMT BASIC, and much of this material is about exploiting the
access path data stored in the explain tables, as well as the package information now stored in the DB2
catalog.
Get Current with EXPLAIN
You have to cover the basics first. Make sure that you use the most current set of EXPLAIN tables (21 in
DB2 11, at last count) for the version of DB2 you are running on. You can find the DDL to create this set
of tables in SDSNSAMP(DSNTESC). If youre planning to migrate to a new DB2 version, IBM allows you to
work with these tables in the previous release, so you may want to get ahead of the game if youre

running on DB2 10, and install the DB2 11 format of the EXPLAIN tables now. You can run the DSNTIJPM
job to identify PLAN_TABLES found in a subsystem which are not at the required level. IBM provides a
PTF to create DSNTIJPM (with a different name) in the previous version of DB2; the PTF to provide
DSNTIJPB in DB2 10 is UK98219.
Note that the tablespaces created by this member are all defined as UNICODE. This is demanded in DB2
10 NFM and in DB2 11. If you have existing EBCDIC tables, you will have to DROP the tablespaces and
CREATE new ones.
Another change in DB2 10 is the replacement of the TIMESTAMP column with EXPLAIN_TIME. If you
have SQL that reports on when EXPLAIN occurred, the new column is much more meaningful. It reports
the time of cache entry for cached statements, or the time of BIND or EXPLAIN for those not in the
cache.

Detect Trends and Changes


In addition to regular performance monitoring of your SQL applications, you can monitor your access
paths by saving historical EXPLAIN data. Lets discuss doing this with static SQL first; well move on to
dynamic SQL later. Assuming you BIND with EXPLAIN(YES), you might execute something like the
following after every BIND or REBIND to see cost trends at a glance.
SELECT SUBSTR(PL.COLLID,1,10) AS COLLID,
SUBSTR(PL.PROGNAME,1,10) AS PROGNAME,
DATE(PL.EXPLAIN_TIME) AS DATE,
TIME(PL.EXPLAIN_TIME) AS TIME,
COUNT(PL.QUERYNO) AS "STMT COUNT",
DEC(SUM(ST.TOTAL_COST),8,2) AS "TOTAL COST"
FROM SJD.PLAN_TABLE PL,
SJD.DSN_STATEMNT_TABLE ST
WHERE PL.PROGNAME = ST.PROGNAME
AND PL.COLLID = ST.COLLID
AND PL.EXPLAIN_TIME = ST.EXPLAIN_TIME
AND PL.QUERYNO = ST.QUERYNO
GROUP BY PL.COLLID, PL.PROGNAME, PL.EXPLAIN_TIME
ORDER BY PL.PROGNAME;
The output might look as follows for a very simple case.
---------+---------+---------+---------+---------+---------+---------+
COLLID
PROGNAME
DATE
TIME
STMT COUNT TOTAL COST
---------+---------+---------+---------+---------+---------+---------+
MYCOLL
MYPACK
05/08/2014 11.19.38
5
10.55
MYCOLL
MYPACK
05/08/2014 17.36.17
8
19.03
DSNE610I NUMBER OF ROWS DISPLAYED IS 2

A few words are in order here about the cost column. First, please remember that the absolute value
is meaningless; only the relative values matter. Theoretically, a lower cost is good and a higher cost is
bad, but its worth analyzing any change in detail. Remember to consider and allow for changes in the

sizes of the DB2 objects being accessed; if your business is growing, it is reasonable to expect that SQL
costs will increase with time. Also, maybe the cost of a SQL statement increased because you changed
your RUNSTATS options and the optimizer now has more detailed and accurate statistics to work with;
in this case, the cost might increase and be more accurate whether the access path has changed or not.
Im assuming here that the packages are not versioned; if they are, you need to add predicates for
VERSION to the above SQL. The other information that is missing is the number of executions for each
SQL statement; that will weight the cost of each SQL statement. To accurately assess the relative
impact, you could create a copy of PLAN_TABLE with an additional column like EXECUTION_COUNT
(maybe a FLOAT column) and populate it with your estimates or data from traces or a monitor.
The next logical step is to identify the statements for which cost changed and any new or deleted
statements. In my simple example, we could just list the rows in the PLAN_TABLE but packages in the
real world typically have many more than eight statements! If you have coded a unique value for
QUERYNO in each SQL statement, or if your program code has not changed, then the SQL is relatively
simple.
SELECT ST1.QUERYNO AS QUERYNO,
COALESCE(DEC(ST1.TOTAL_COST,8,2),0) AS "OLD COST",
COALESCE(DEC(ST2.TOTAL_COST,8,2),0) AS "NEW COST"
FROM SJD.DSN_STATEMNT_TABLE ST1 FULL JOIN
SJD.DSN_STATEMNT_TABLE ST2
ON ST1.QUERYNO = ST2.QUERYNO
WHERE ST1.COLLID = 'MYCOLL' AND ST2.COLLID = 'MYCOLL'
AND ST1.PROGNAME = 'MYPACK' AND ST2.PROGNAME = 'MYPACK'
AND DATE(ST1.EXPLAIN_TIME) = '2014-05-08'
AND TIME(ST1.EXPLAIN_TIME) = '17.36.17'
AND DATE(ST2.EXPLAIN_TIME) = '2014-05-15'
AND TIME(ST2.EXPLAIN_TIME) = '13.05.14'
AND ST1.TOTAL_COST <> ST2.TOTAL_COST
ORDER BY QUERYNO ;
The results might look as follows:
---------+---------+---------+---------+----QUERYNO
OLD COST
NEW COST
---------+---------+---------+---------+----353
.15
.11
360
8.07
16.24
DSNE610I NUMBER OF ROWS DISPLAYED IS 2

If like most of us, you have not coded QUERYNO in each of your SQL statements and the QUERYNO
values have changed, this SQL gets a little more difficult. After identifying the appropriate QUERYNO
values, you can then retrieve the PLAN_TABLE row from each EXPLAIN and compare the access paths to
find why calculated costs changed. This SQL is as follows.
WITH FIR(MIN_QNO, MAX_QNO) AS
(SELECT MIN(QUERYNO), MAX(QUERYNO)
FROM SJD.PLAN_TABLE
WHERE COLLID = 'MYCOLL'
AND PROGNAME = 'MYPACK'
AND DATE(EXPLAIN_TIME) = '2014-05-08'

AND TIME(EXPLAIN_TIME) = '11.19.38'),


SEC(MIN_QNO, MAX_QNO) AS
(SELECT MIN(QUERYNO), MAX(QUERYNO)
FROM SJD.PLAN_TABLE
WHERE COLLID = 'MYCOLL'
AND PROGNAME = 'MYPACK'
AND DATE(EXPLAIN_TIME) = '2014-05-08'
AND TIME(EXPLAIN_TIME) = '17.36.17')
SELECT ST1.QUERYNO AS "OLD QUERYNO",
ST2.QUERYNO AS "NEW QUERYNO",
COALESCE(DEC(ST1.TOTAL_COST,8,2),0) AS "OLD COST",
COALESCE(DEC(ST2.TOTAL_COST,8,2),0) AS "NEW COST",
SUBSTR(PACK.STATEMENT,1,60) AS STATEMENT
FROM SJD.DSN_STATEMNT_TABLE ST1,
SJD.DSN_STATEMNT_TABLE ST2,
SYSIBM.SYSPACKSTMT PACK,
FIR, SEC
WHERE ST1.COLLID = 'MYCOLL' AND ST2.COLLID = 'MYCOLL'
AND PACK.COLLID = 'MYCOLL'
AND ST1.PROGNAME = 'MYPACK' AND ST2.PROGNAME = 'MYPACK'
AND PACK.NAME = 'MYPACK'
AND DATE(ST1.EXPLAIN_TIME) = '2014-05-08'
AND TIME(ST1.EXPLAIN_TIME) = '11.19.38'
AND DATE(ST2.EXPLAIN_TIME) = '2014-05-08'
AND TIME(ST2.EXPLAIN_TIME) = '17.36.17'
AND ((ST1.QUERYNO - ST2.QUERYNO) =
(FIR.MIN_QNO - SEC.MIN_QNO) OR
(ST1.QUERYNO - ST2.QUERYNO) =
(FIR.MAX_QNO - SEC.MAX_QNO))
AND ST2.QUERYNO = PACK.QUERYNO
AND ST1.TOTAL_COST <> ST2.TOTAL_COST
ORDER BY "OLD QUERYNO";

This SQL relies on the fact that code is usually added in chunks, and therefore the QUERYNO will change
by a constant amount. The SQL above will work if one or two groups of lines of code are added to your
source. Please note that the extraction from SYSPACKSTMT is added for reference and to aid analysis;
you may want to remove it if you have a large number of packages on your DB2 subsystem. Also, please
note that this extraction is valid only if one of the packages youre analyzing is the current one. The SQL
below will identify new statements, and a very similar statement can be used to extract the statements
that were removed.
WITH FIR(MIN_QNO, MAX_QNO) AS
(SELECT MIN(QUERYNO), MAX(QUERYNO)
FROM SJD.PLAN_TABLE
WHERE COLLID = 'MYCOLL'
AND PROGNAME = 'MYPACK'
AND DATE(EXPLAIN_TIME) = '2014-05-08'
AND TIME(EXPLAIN_TIME) = '11.19.38'),
SEC(MIN_QNO, MAX_QNO) AS
(SELECT MIN(QUERYNO), MAX(QUERYNO)
FROM SJD.PLAN_TABLE
WHERE COLLID = 'MYCOLL'
AND PROGNAME = 'MYPACK'
AND DATE(EXPLAIN_TIME) = '2014-05-08'

AND TIME(EXPLAIN_TIME) = '17.36.17')


SELECT ST.QUERYNO AS "NEW QUERYNO",
COALESCE(DEC(ST.TOTAL_COST,8,2),0) AS COST,
SUBSTR(PACK.STATEMENT,1,60) AS STATEMENT
FROM SJD.DSN_STATEMNT_TABLE ST,
SYSIBM.SYSPACKSTMT PACK
WHERE ST.COLLID = 'MYCOLL'
AND PACK.COLLID = 'MYCOLL'
AND ST.PROGNAME = 'MYPACK'
AND PACK.NAME = 'MYPACK'
AND DATE(ST.EXPLAIN_TIME) = '2014-05-08'
AND TIME(ST.EXPLAIN_TIME) = '17.36.17'
AND ST.QUERYNO = PACK.QUERYNO
AND NOT EXISTS
(SELECT QUERYNO
FROM SJD.DSN_STATEMNT_TABLE, FIR, SEC
WHERE COLLID = 'MYCOLL'
AND PROGNAME = 'MYPACK'
AND DATE(EXPLAIN_TIME) = '2014-05-08'
AND TIME(EXPLAIN_TIME) = '11.19.38'
AND ((QUERYNO - ST.QUERYNO) =
(FIR.MIN_QNO - SEC.MIN_QNO) OR
(QUERYNO - ST.QUERYNO) =
(FIR.MAX_QNO - SEC.MAX_QNO)) )
ORDER BY "NEW QUERYNO";
Since IBM added access path stability with DB2 9, we have had the ability to undo package changes after
a REBIND which lead to unexpected and unwelcome access path regressions. The SYSPACKCOPY table
was introduced in DB2 10, so now it is much easier to find information about previous copies of a
package. Getting information to find details about the change is also possible. If you bound and
rebound with EXPLAIN, the following SQL will identify statements in your package for which the access
path changed between the previous copy and the current one. We know that QUERYNO values did not
change because the SQL has not changed.
SELECT ST1.QUERYNO AS QUERYNO,
COALESCE(DEC(ST1.TOTAL_COST,8,2),0) AS "PREVIOUS COST",
COALESCE(DEC(ST2.TOTAL_COST,8,2),0) AS "CURRENT COST"
FROM SJD.DSN_STATEMNT_TABLE ST1 FULL JOIN
SJD.DSN_STATEMNT_TABLE ST2
ON ST1.QUERYNO = ST2.QUERYNO
WHERE ST1.COLLID = 'MYCOLL' AND ST2.COLLID = 'MYCOLL'
AND ST1.PROGNAME = 'MYPACK' AND ST2.PROGNAME = 'MYPACK'
AND ST1.EXPLAIN_TIME =
(SELECT BINDTIME FROM SYSIBM.SYSPACKCOPY
WHERE COLLID = 'MYCOLL'
AND NAME = 'MYPACK'
AND COPYID = 1)
AND ST2.EXPLAIN_TIME =
(SELECT BINDTIME FROM SYSIBM.SYSPACKAGE
WHERE COLLID = 'MYCOLL'
AND NAME = 'MYPACK')
AND ST1.TOTAL_COST <> ST2.TOTAL_COST
ORDER BY QUERYNO ;

In DB2 10, even if you did not remember to EXPLAIN when you bound any of the copies, you can now
execute SQL something like the following to populate the PLAN_TABLE.
EXPLAIN PACKAGE COLLECTION MYCOLL PACKAGE MYPACK COPY PREVIOUS;

To exploit this, you need only have bound the package in DB2 9 or later. Please note that only the
PLAN_TABLE and none of the other explain tables is populated, so it is still a good idea to remember to
BIND and REBIND with EXPLAIN. Otherwise, you will not be able to run the cost comparisons above.
Also, note that you can populate the PLAN_TABLE for all copies at once if you leave off the COPY option
in the EXPLAIN PACKAGE statement.
You can get the most out of your explain history if you use it proactively, to avoid binding (and
promoting your code to go with it) if doing so will cause access path regression. To accomplish this, you
can specify a BIND with EXPLAIN ONLY, which will populate all the explain tables without creating a
new copy of the package. Then you can run the SQL weve discussed to verify that the optimizer
anticipates the same or lower costs before actually doing the BIND or REBIND. Another option we now
have in DB2 10 and later is to specify a REBIND with APCOMPARE(ERROR) to stop the REBIND if any of
the access paths have changed. The relative advantage of EXPLAIN ONLY is that you may find access
path changes that you want to happen, because of lower costs. Last, remember that you can also
specify REBIND with APCOMPARE(WARN) if you like living dangerously; in this case, the new package
will be created but any changes in access path will cause messages to be issued.
Copy prod stats, copy processor info
There is yet another improvement we can make to this set of techniques. Everything discussed so far
has assumed that the explain information is valid; in other words, weve talked about doing our analysis
on the production system. Why not do this analysis on our test DB2, where the changed code is
developed? Weve been able to copy RUNSTATS statistics used by the optimizer from production to test
for a long time, but the ability to simulate cpu speeds, buffer pool settings, etc. is relatively new. It was
introduced with a PTF in DB2 9 and is of course available in DB2 10 and DB2 11. You still must allow for
different DSNZPARM settings, different versions of DB2, or different levels of z/OS.
The procedure and SQL to copy production statistics to test is documented in Chapter 50 of the DB2 for
z/OS 11 Managing Performance manual referenced in the bibliography. The procedure to support
modeling the environment is in Chapter 49 of the same book.

Conclusion
I hope this article has given you some ideas to consider. Please send me an email with your experiences,
questions, or comments. I can be reached at jim_dee@bmc.com. In the second half of the article, we
will look into some of the features IBM has added recently, to give you more control over the
optimization process.

Bibliography
Willie Faveros blog at http://it.toolbox.com/blogs/db2zos/this-apar-is-just-way-too-cool-to-just-be-anapar-61153?rss=1 is about the SYSPROC.ADMIN_EXPLAIN_MAINT stored procedure. This is a tool to
automate many aspects of EXPLAIN table maintenance.
DB2 11 for z/OS Performance Topics (Draft), SG24-8222-00, Copyright International Business Machines
Corporation 2014.

DB2 for z/OS 11Command Reference, SC19-4054-01, Copyright IBM Corporation 1983, 2013.
DB2 for z/OS 11 Managing Performance, SC19-4060-02, Copyright IBM Corporation 1982, 2014.
DB2 for z/OS 11 SQL Reference, SC19-4066-02, Copyright IBM Corporation 1982, 2014.

Biography
Jim Dee is the Chief Architect for DB2 for z/OS at BMC Software. He has worked at BMC since 1990,
mostly in DB2 Backup and Recovery, as a developer, product author, and architect, and has held his
current position as Chief Architect since 2007. Jim can be reached at jim_dee@bmc.com.

Vous aimerez peut-être aussi