Académique Documents
Professionnel Documents
Culture Documents
It is important to understand how to identify candidate dimension columns to ensure that you do
not waste space and obtain the performance improvements you desire.
Critical to the performance of MDC tables is the selection of columns which will be used as
dimensions for MDC tables. Since each unique occurrence of a column or multiple columns in a
multi-column dimension key index will be stored in an extent, the amount of space used by an
MDC table could be many times more than that of a non-MDC table. In addition to the increase in
space usage, performance of queries could degrade due to the additional I/O required to retrieve
the extents (blocks). Candidate dimension columns can be identified in a couple of ways. Use the
following three steps to evaluate the space usage and space occupied per cell:
If the MDC table under analysis already exists as a regular table, the following query can be used
to identify the Cells per Block for the candidate dimensions under analysis:
With cell_table as (
SELECT DISTINCT dimcol1, dimcol2,…..dimcolN
FROM table )
SELECT COUNT(*) as cell_count
FROM cell_table
Note that this estimate will be inaccurate if there are correlations between the dimension
columns. If a non-MDC table does not exist, then you can estimate the number of cells per block
by estimating the number of unique combinations of the dimension candidate columns.
The next step is to determine the space occupied per cell as follows:
First determine the number of rows in the table by issuing the following SQL statement:
Or make an estimate of the number of rows if the table doesn't exist. Then determine the
extentsize in bytes using the following SQL statement:
Next, determine the minimum, average and maximum rows per cell using the following query:
Using the RpC results from the previous formula, compute the space used by rows in a cell
(Space per cell) using the following formula:
Next, using the results of the above queries and formulas, compute the space utilization per cell
as follows:
A cell utilization value of 1.0 is the optimum value. It indicates that one cell will fully occupy one
extent. Cell utilization should be less than 10. If higher than 10, multiple extents will be required to
store the rows in a cell. A cell utilization value of 0.1 or less indicates that a large amount of space
allocated to the table will be unused and this should be avoided. A table with cell utilization of 0.1
or less could use many times the amount of space as a non-MDC table. To increase cell
utilization, experiment through a trial and error process with other dimension columns and extent
sizes.
When considering the use of MDC, you need to collect and analyze the queries that will be going
against the table being reviewed, and determine if MDC tables could benefit those queries. The
following types of statements can benefit from an MDC table:
After you have created MDC tables, run the query or query workload against non-MDC and MDC
versions of the tables using db2batch and DB2 Explain to validate your design and ensure that
you have met your performance objectives with the MDC table. This is an iterative process and
may result in you selecting new dimension candidates, extent sizes and/or using generated
columns to reduce the cardinality of the candidate dimensions.
Generated columns can be used to reduce the granularity of a candidate dimension so that the
number of rows per extent fill up most of the extent, which prevents extents from only having a
few entries which would waste space and not necessarily improve performance.
For example, we could use a student identification number and divide it by 10 to create a
generated column to use as a dimension.
2003 Tips
2002 Tips
2001 Tips
2000 Tips
1. Create a new tablespace (and table and associated indexes) with the desired
specifications (for example, DSSIZE 64G).
2. Unload data from the "original" table, and load into the new tablespace.
3. Assuming that the data in the "original" tablespace was updated in the course of the
unload/load procedure, use a DB2 log analysis tool (available from IBM and third-party
software vendors) to extract the changes from the log and apply them to the "new"
tablespace.
4. Do step 3 iteratively to bring the "new" tablespace closer, in terms of content currency, to
the "original" tablespace.
It's strongly recommended that you keep the "old" tablespace and table (what I referred to as the
"original" tablespace and table), around for a while (2-4 weeks is our norm), just in case
something goes wrong. Make sure that when you finally drop the "old" tablespace, you don't
accidentally drop the new one. The RESTRICT ON DROP option of CREATE TABLE can help
here.
We've successfully executed this procedure a number of times. It's similar to what happens under
the covers when you run an online REORG of an actively updated tablespace.
Drop a Specific Stored Procedure When There are Other Procedures with the Same Name
From IBM's Frequently Asked Questions
How can you drop one specific stored procedure when you have several overloaded procedures
with the same name?
Each procedure must have a unique specificname in the DB2 system catalog. This specificname
can be used to delete the one procedure without affecting others with the same
procname/routinename.
Use the following query to find the specificname for a procedure (be sure to modify the where
clause with the correct name). There will be a break in specificname that will help identify the
procedure with the parameters matching the procedure to be dropped.
"Truncating" a Table
From www.tek-tips.com
1. Using DELETE
This causes all the rows in the table to be deleted. Log records are written. If the table is
large (in terms of millions of records), a very large active log space is required. DELETE
Triggers are fired.
2. Using IMPORT/LOAD
3. IMPORT from /dev/null of del
4. replace into <tablename>
5. LOAD from /dev/null of del
replace into <tablename> NONRECOVERABLE
More priveleges are required for these tasks. IMPORT .. REPLACE requires CONTROL
on the table and LOAD requires LOAD Authority on the table. The good thing about this is
that there is minimal logging.
The tables may be left in check pending state. DELETE Triggers are not fired.
6. Using NLI
The table should have been created with the NLI Option. No logging is done. DELETE
triggers are not fired.
The desrulix command can be used to display or set the indexing rules for a text index.
The desrulix command (set/get default indexing rules) is executed from the command line with
this syntax:
DESRULIX [-h|-H|-?|-copyright]
[-quiet]
-s <search service name>
-x <index name>
[-dfmt <document format> numerical or: TDS, ASCIISECTION, RTF, HTML]
[-ccsid <default code page>]
[-lang <default language> numerical or: ARB, CAT, CHS, CHT, DAN, DEU, DES,
ENG,
ENU, ESP, FIN, FRA, FRC, HBR, ISL,
ITA,
JAP, KOR, NLD, NOB, NON, NOR, PTG,
PTB,
RUS, SVE]
The search service is TXINS000, the index name is the DB2 index name, which can be found
under indexes in the control center. For example, to display the indexing rules:
C:\PROGRA~1\SQLLIB\BIN>desrulix -s txins000 -x ix025913_004
The output looks like this:
DESRULIX - indexing rules
--------------------------------------------
Document format . . . . . . : TDS/ASCII
Default CCSID . . . . . . .: 819
Default language . . . . . .: EN_US
--------------------------------------------
The purpose of this tip is to alert you to some mistakes that have been made by many people,
maybe even you. These mistakes can occur when someone is rushed or tired which can result in
poor performance, and can give joins an undeservedly bad reputation. An awareness of these
common mistakes can help prevent them in the future.
Perhaps the most distressing mistake is a join statement for which the user has forgotten to
include a predicate. The result is called a Cartesian product. In relational terms, a Cartesian
product includes a row in the result table for every combination of rows in the participating tables.
In other words, the number of rows in a Cartesian product equals the number of rows in one table
multiplied by the number of rows in the second table. If S has 1,000 rows and SPJ has 10,000,
the statement
A partial Cartesian product can be less obvious and can occur when a join condition is forgotten.
This is especially easy to do when the SQL statements are complicated. Consider the following
select on the catalog tables that estimates the average number of values that appear on a page.
Which join condition is missing, thus requiring a partial Cartesian product to be performed?
An even more subtle oversight involves a composite index. If the first column in a composite
index on K.IXCREATOR and K.IXNAME is forgotten, a matching index scan cannot be used
(does not result in a Cartesian product), significantly degrading the join’s performance.
A seemingly simple SELECT statement required 1.5 hours to run and gave many duplicate rows.
The problem is shown using the SPJ and S tables in the following SELECT statement to
determine information on suppliers who supply parts for J4 or are located in London.
Even the simplest omission can be costly. It most frequently occurs with developers new to the
set processing capability of SQL. Accustomed to one-record-at-a-time processing, they feel
comfortable joining two entire tables and using the host program to test for the desired values.
For example, when looking for suppliers of part P5 and the jobs on which it is used, they might
join the entire S and SPJ tables with this statement:
When you are pruning your replication tables using the asncmd prune command, the process
uses the database logs. If the data in your change data tables and/or UOW table is large, logs will
get filled up. This will cause incomplete pruning of the tables in addition to taking up the log space
from other applications, causing those applications to fail.
There are a few different ways to get out of the situation (where pruning requires more log space
than is available):
C) Quiesce the source tables. Allow Capture to catch up with the log so that all data is captured.
Stop Capture. Allow Apply to catch up and apply all data (subs_set synchpoint = register table
synchpoint where global_record ='Y'). Now all data is subject to pruning. Drop and recreate the cd
and uow tables. This causes zero logging and zero guesswork.
Start Capture with the NOPRUNE parm so that pruning is NOT done automatically.
Start Apply so that the captured changes are applied to the targets. Changes cannot be
pruned until they have been applied.
Stop Capture so that you can manually delete rows from the CD tables
For each CD table, issue the following:
DELETE FROM <cd_table> CD
WHERE CD.IBMSNAP_UOWID IN
(SELECT UOW.IBMSNAP_UOWID FROM ASN.IBMSNAP_UOW UOW
WHERE UOW.IBMSNAP_COMMITSEQ <=
(SELECT MIN(PC.SYNCHPOINT) FROM ASN.IBMSNAP_PRUNCNTL PC))
Then, for the UOW table, issue:
DELETE FROM ASN.IBMSNAP_UOW UOW
WHERE UOW.IBMSNAP_COMMITSEQ <=
(SELECT MIN(PC.SYNCHPOINT) FROM ASN.IBMSNAP_PRUNCNTL PC)
Restart Capture
E) Option D may still need a lot of log space, so you can break the manual deletes down into
smaller bits:
Stop Capture and Apply. Keep Capture down until you are finished with this manual effort.
Capture will just suffer continual contention while you are pruning these large numbers of
rows, and keeping capture and apply down will keep all of the pruncntl values static while
you perform this task.
Determine the min synchpoint for a CD table based on the pruncntl table.
SELECT HEX(MIN(A.SYNCHPOINT)) FROM ASN.IBMSNAP_PRUNCNTL A,
ASN.IBMSNAP_REGISTER B
WHERE (A.SOURCE_TABLE = B.SOURCE _TABLE
AND A.SOURCE_OWNER = B.SOURCE_OWNER
AND A.SOURCE_VIEW_QUAL = B.SOURCE_VIEW_QUAL
AND B.PHYS_CHANGE_OWNER = <YOUR CD OWNER>
AND B.PHYS_CHANGE_TABLE = <YOUR CD TABLE>)
This is the highest value that can be safely pruned from this cd table. Eventually all the
rows in the CD table with a commitseq less than or equal to that value can be pruned, but
the commitseq value actually comes from a join with the UOW table. Now find the
minimum commitseq value in the uow table. Choose interim values of commitseq in
between the two values just selected and use as prunpoint in the following SQL:
DELETE FROM <CD TABLE> A WHERE A.IBMSNAP_UOWID IN (SELECT DISTINCT
B.IBMSNAP_UOWID FROM ASN.IBMSNAP_UOW B WHERE B.IBMSNAP_COMMITSEQ <=
prunpoint)
Keep on issuing these deletes, with commits, until you have pruned all of the values
through to the highest value determined at the beginning.
The uow table can be pruned safely only after all of the cd tables have been pruned. The
safe value for the uow table pruning is the min(synchpoint) from the pruncntl table (select
* where not null or zeroes).