Vous êtes sur la page 1sur 672

Module 1

Introduction to Database Administration

Introduction to Database Administration 09-2001 2001 International Business Machines Corporation

1-1

Objectives
At the end of this module, you will be able to: n Define an IBM Informix system n Explain the role of a database administrator n Explain the functions of database administration

1-2 Introduction to Database Administration

An IBM Informix System

IBM Informix System Shared Memory

Disk

process process process

An IBM Informix database management system is made up of database server processes, the shared memory that is used by the system, and the disk space that is used by the system to store the data that is being managed.
n

Database server - The database server is the program that manages the contents of the database as it is stored on disk. The database server knows how tables, rows, and columns are actually organized in physical computer storage. The database server also interprets and executes all SQL commands. Instance - An IBM Informix instance is a database server process, together with the shared memory and disk space that the server process manages. There may be multiple IBM Informix instances on the same machine. For example, there may be a development instance and a production instance, or a test instance.

Introduction to Database Administration 1-3

Accessing the IBM Informix Instance


n n n

INFORMIXDIR PATH INFORMIXSERVER

In order to access your IBM Informix instance, you must have the following environment variables set:
n n n

INFORMIXDIR - must be set to the full path name of the directory where the IBM Informix files reside. PATH - must include the full path name of the directory containing the IBM Informix program executables (usually $INFORMIXDIR/bin on a UNIX system). INFORMIXSERVER - must be set to the name of the IBM Informix database server (instance) you wish to access.

Environment variables can be set automatically for your session when you login or can be set explicitly at the OS prompt.

1-4 Introduction to Database Administration

Database Administration
IBM Informix Dynamic Server database administration involves: n Creating databases and tables n Enforcing security n Assuring data integrity n Managing concurrency n Creating indexes n Optimizing data access

The database administrator plays an integral role in the overall implementation and successful management of IBM Informix Dynamic Server (IDS) databases. Database administration involves:
n n n n n n

Creating databases and tables. Enforcing security. Assuring data integrity. Managing concurrency. Creating indexes. Optimizing data access.

Introduction to Database Administration 1-5

Creating Databases and Tables


n

Creating databases w Location of the database w Type of database Creating tables w Selecting data types w Fragmented or non-fragmented tables w Location of the table w Allocating disk space w Setting the lock mode

There are many decisions that must be made by the database administrator before databases or tables are created. This course will describe the different decisions that must be made that affect databases and tables. Some of the decisions include:
n

Creating databases
w w

Selecting the location of the database within the IBM Informix Dynamic Server. Selecting whether the database will log transaction activity. Selecting the data types to be used in the tables. Creating fragmented or non-fragmented tables. Selecting the location of the tables within the IBM Informix Dynamic Server. Allocating disk space in the form of extents . Selecting the table lock mode to manage concurrency.

Creating tables
w w w w w

1-6 Introduction to Database Administration

Assuring Data Integrity


There are several methods of assuring data integrity: n Granting/restricting privileges w Database w Table w Column n Creating views n Enforcing referential, entity, and semantic integrity

There are several methods of assuring data integrity:


n n n

IBM Informix Dynamic Server allows a database administrator to grant or restrict database, table, or column level privileges. A database administrator can create views or a window into the data from the users perspective. Enforcing Integrity
w w w

Referential integrity is used to enforce relationships between tables. Entity integrity is enforced by the use of primary keys that uniquely identify each row in a table. Semantic integrity enforces data types and default column value constraints.

Introduction to Database Administration 1-7

Managing Concurrency
Managing concurrency involves influencing how data is viewed and updated in a multi-user environment. There are two types of concurrency control that will be discussed in this course: n Read concurrency (SELECT statements) w Managed through the use of isolation levels n Update concurrency (INSERT, DELETE, UPDATE statements)

A database administrator must control concurrency within a multi-user environment. This involves influencing how data is read and accessed by users. There are two classes of concurrency control that will be described in detail in this course.
n n

Read concurrency involves how data is selected in a database. Update concurrency involves how data is inserted, updated and deleted in a database.

Concurrency control is enforced in the IBM Informix Dynamic Server by the use of isolation levels and locks.

1-8 Introduction to Database Administration

Optimizing Data Access


n n n

Indexes Update statistics Data distributions

Database administration also includes assuring that the users can access and update the data as quickly as possible. The IBM Informix query optimizer is responsible for choosing the fastest and most efficient way to access the data. It examines indexes and the distribution of data in the tables, and selects the best path based on a cost-based algorithm. It is the role of the database administrator to:
n n

Ensure that the database table indexes are created for optimal performance. Make current statistics available to the query optimizer.

Create data distributions as needed to assist the optimizer in making decisions.

Introduction to Database Administration 1-9

1-10 Introduction to Database Administration

Exercises

Introduction to Database Administration 1-11

Exercise 1
1.1 Log on to the UNIX system as stu# using the login and password provided by the instructor. 1.2 Check the settings for your environment variables to make sure you can access the IBM Informix system. The following command can be used on a UNIX system: env | more What settings did you check? What is the name of the IBM Informix database server instance that you will access? ____________________________________________________________ ____________________________________________________________ ____________________________________________________________ ____________________________________________________________ 1.3 Change the value of INFORMIXSERVER. At the UNIX operating system command prompt enter the following: INFORMIXSERVER=atest;export INFORMIXSERVER Try to access the instance by using the dbaccess utility and record the error message. ____________________________________________________________ ____________________________________________________________ Set INFORMIXSERVER back to its original setting. 1.4 Create the stores demonstration database using the following command: dbaccessdemo7 storesstu# -log where stu# represents your student login id. When you are prompted whether you want copies of the <k keyword>SQL examples, answer n. A demonstration database will be created for you.

1-12 Introduction to Database Administration

Solutions

Introduction to Database Administration 1-13

Solution 1
1.2 What settings did you check? INFORMIXDIR, PATH, INFORMIXSERVER What is the name of the IBM Informix database server instance that you will access? The database server instance is class1 or class2. (This may vary according to classroom.) 1.3 Error message: shared memory not initialized for INFORMIXSERVER "atest"

1-14 Introduction to Database Administration

Module 2
IBM Informix Dynamic Server Data Types

IBM Informix Dynamic Server Data Types 09-2001 2001 International Business Machines Corporation

2-1

Objectives
At the end of this module, you will be able to: n Identify the IBM Informix data types n Choose the appropriate type for each data element in a database

2-2 IBM Informix Dynamic Server Data Types

CHAR vs. VARCHAR


n

CHAR(32) w Use if the content of the column is unpredictable. w Storage space required = length VARCHAR(150,20) w Use if the majority of the rows use a small amount of space and the maximum size of the column is the exception. w Storage space required = length + 1

The character data types store any combination of letters, numbers, and symbols. Tabs and spaces can be included. No other non-printable characters are allowed.
n

CHAR -CHAR columns are of fixed length. The maximum length of a CHAR column is 32,767 bytes. If a character column is defined with a width of 400 bytes, data for that column will take up that amount of space on disk even if the data is less than 400 bytes. Numbers can be stored in a CHAR column. When stored as characters, numbers cannot be used in some arithmetic operations. Sorting of numbers in CHAR columns will be performed in ASCII code sequence, whereas values in numeric columns are sorted in binary sequence. Numbers which are intended for calculation should be stored in numeric columns.

VARCHAR - VARCHAR columns store variable-length character data. VARCHAR columns may store between 0 and 255 bytes of character data. Besides the actual contents of the VARCHAR column, a one-byte length indicator is stored at the beginning of the column. The primary benefit of using the VARCHAR data type is that, when used correctly, it can increase the number of rows per page of storage on disk. VARCHAR is most effectively used when the majority of the rows need only a small amount of space, and

IBM Informix Dynamic Server Data Types 2-3

some rows require significantly more. For example, a comments column may not be used in 80% of the rows in a table. However, when it is populated, the maximum size of the column is often used. Because more rows can be stored on a page, VARCHARS can increase performance on sequential reads of tables and reduce disk storage waste when compared to the same data stored in CHAR data type fields. When specifying a VARCHAR data type, a maximum length is included in the syntax of the column definition. The max-size parameter sets the upper limit on the length of the characters allowed within the data item. The min-size sets a minimum amount of disk space that will always be reserved for data within the data item. When a row is written, IBM Informix sets aside either the number of bytes needed to store the data or the number of bytes specified in min-size for the column (whichever is greater). If the column later grows to a size greater than the space available in the row, the row may have to be moved to another place on a page, or part of the row may be moved to another page. You can see why it is important to specify an accurate average min-size when the table is created.

Note
See the appendix chapter on Global Language Support (GLS) for information on data types NCHAR and NVARCHAR.

2-4 IBM Informix Dynamic Server Data Types

Numeric Data Types


n INTEGER w Whole numbers -2,147,483,647 to +2,147,483,647, 4 bytes storage n SMALLINT w Whole numbers, range is -32,767 to +32,767, 2 bytes storage n FLOAT w Binary floating point numbers, double precision, 8 bytes storage n SMALLFLOAT w Binary floating point numbers, single precision, 4 bytes storage n DECIMAL/MONEY w Precision and scale designation up to 32 significant digits w If the scale is odd: storage is (precision +4)/2) w

If the scale is even: storage is (precision +3)/2)


5

The five numeric data types are discussed briefly below:


n n

INTEGER - INTEGER values hold numbers from -2,147,483,647 to +2,147,483,647. An INTEGER uses 4 bytes of disk space. SMALLINT - SMALLINT values hold numbers from -32,767 to +32,767. A SMALLINT uses 2 bytes of disk space. The 2 byte savings is probably not significant in small tables but it can make a substantial difference in large tables. You can always convert a SMALLINT to an INTEGER without loss of data. FLOAT - FLOAT values store binary floating point numbers with up to 16 significant digits (double precision). FLOAT corresponds to the double data type in C. A column with the FLOAT data type typically stores scientific numbers that can be calculated only approximately. A FLOAT uses 8 bytes of disk space. SMALLFLOAT - SMALLFLOAT values store binary floating point numbers with up to 8 significant digits (single precision.). SMALLFLOAT corresponds to the float data type in C. SMALLFLOAT is also typically used to store numbers that can be calculated only approximately. A SMALLFLOAT uses 4 bytes of disk space. Because floating point numbers retain only their most significant digits, the number that you enter in this type of column and the number the database stores might differ slightly, depending on how your computer stores floating point numbers internally. This

IBM Informix Dynamic Server Data Types 2-5

difference occurs when a value has more digits than the floating point number can store. The value is stored in its approximate form with the least significant digits treated as zeroes.
n

DECIMAL/MONEY - DECIMAL and MONEY values store numbers with the number of digits specified by the user. You can specify up to 32 significant digits. The range of numbers that you can store is 10-130 to 10124,. Note, however, that only 32 digits are significant. DECIMAL numbers can be formatted with a given precision and scale.
w w

Precision is the total number of digits. Scale is the number of digits to the right of the decimal point.

A DECIMAL column with a definition (5,2) stores a five digit number with three digits before the decimal point and two digits after the decimal point. The number of bytes it takes to store a DECIMAL value can be calculated with the following formulas:
w w

if the scale is odd: N = (precision + 4) / 2 if the scale is even: N = (precision + 3) / 2

The default precision and scale for DECIMAL is (16,0). The MONEY data type is always treated as a fixed-point decimal number. The default precision and scale for MONEY is (16,2). The display format for the MONEY data type can be altered using the DBMONEY environment variable. FLOAT or DECIMAL? The advantages of using the DECIMAL data type over the FLOAT data type are:
n n n

The DECIMAL data type allows greater precision (32) over FLOAT (8 or 16). Decimal values are rounded if necessary instead of truncated. The available precision of FLOAT may differ from machine to machine, which may have some ramifications when transferring data across a network.

2-6 IBM Informix Dynamic Server Data Types

SERIAL

Starting number = 100

Unique numbers

customer_num 100 101 102 103 ...

SERIAL columns contain numbers that are assigned to each row of the table in sequential order by the system. They are stored as INTEGERS. When a new row is entered, the serial column is assigned the next number in the sequence. The default starting number is one, and the highest serial number that can be assigned is over 2.1 billion. The SERIAL data type assigns values in sequential order but it does not automatically ensure uniqueness. While the database server itself will not assign a duplicate value to a serial column, a client application could potentially insert a row with a duplicate SERIAL value. To ensure that unique values exist in a serial column, create the column with a UNIQUE constraint, a unique index, or a primary key constraint.

When to Use SERIAL


Serial columns make excellent primary keys because of their small size and potential uniqueness. Only one SERIAL column can be specified in a table.

Starting Number
If the starting number is 100, the first row to be entered will be assigned the serial value 100.

IBM Informix Dynamic Server Data Types 2-7

Deleting Rows
When a row or rows are deleted, the data is removed but the SERIAL values will continue to increase. That is, when a new row is added, it will be assigned the next number in the sequence. SERIAL numbers cannot be re-used without special programming. SERIAL data types can be used to store identification numbers such as customer numbers and order numbers. A SERIAL data type uses 4 bytes.

2-8 IBM Informix Dynamic Server Data Types

DATE, DATETIME, INTERVAL


n

DATE - a day (12/24/98) w Integer used to store calendar dates (Storage space = 4 bytes) DATETIME - a particular point in time (1994-4-24 12:00) w DATETIME YEAR TO MINUTE DATETIME DAY TO MINUTE DATETIME SECOND TO FRACTION INTERVAL - a span of time (5 years, 3 months) w INTERVAL YEAR TO MONTH w INTERVAL HOUR TO MINUTE Storage space for DATETIME AND INTERVAL w Total number of digits for all fields/2 +1
9

Data types related to calendar dates and time are:

DATE
The DATE data type is an integer representing the number of days since December 31, 1899. Jan. 1, 1900 is day one. The data type DATE uses 4 bytes of disk storage. A DATE column is specified in programs, forms and <k keyword>SQL in the following default format: mm/dd/yyyy
n n n

mm is the month (1-12) dd is the day of the month (1-31) yyyy is the year (0001-9999)

You can change the default format with the environment variable DBDATE.

DATETIME
DATETIME is an advance over the DATE data type in that the granularity to which a point in time is measured is selectable, that is, data items can be defined that store points of time with granularities from a year to a fraction of a second.
IBM Informix Dynamic Server Data Types 2-9

The value ranges for each DATETIME field are: YEAR MONTH DAY HOUR MINUTE SECOND (A.D.) 1 to 9999 1 to 12 1 to 28, 29, 30, or 31 0 (midnight) to 23 0 to 59 0 to 59

FRACTION(n) where n is 1-5 significant digits (default 3)

INTERVAL
INTERVAL is used to store values that represent a span of time. An INTERVAL data item can express spans of time as great as 9,999 years and 11 months, or as small as a fraction of a second. INTERVAL data types cannot contain both months and days. This is because the number of days in a month varies with the month: May has 31 days, while September has 30. The number of days in a month may also vary with the year: the number of days in February changes from 28 to 29 every four years with special exceptions for particular centuries. Because of these needed peculiarities in calendars, ANSI divides the INTERVAL type into two classes: year-month intervals and day-time intervals. Year-month interval classes are:
n n

YEARs MONTHs

Day-time interval classes are:


n n n n n

DAYs HOURs MINUTEs SECONDs FRACTIONs of a second

Calculate the amount of space for data types DATETIME and INTERVAL with the following:
total number of digits for all fields/2 + 1

2-10 IBM Informix Dynamic Server Data Types

DBCENTURY
Todays Date: 10/31/1998 Dates Stored with different DBCENTURY Settings
DBCENTURY Setting P F C R Date Entered 03/25/01 Date Entered 12/01/98 Date Entered 10/13/98

03/25/1901 12/01/1898 10/13/1998 03/25/2001 12/01/1998 10/13/2098 03/25/2001 12/01/1998 10/13/1998 03/25/1901 12/01/1998 10/13/1998

11

As of version 7.2 of IBM Informix Dynamic Server (IDS) the environment variable DBCENTURY allows selection of the appropriate century for two-digit year DATE and DATETIME values. Acceptable values for DBCENTURY are: P, F, C, or R. P F C R Past. The year is expanded with both the current and past centuries. The closest date prior to todays date is chosen. Future. The year is expanded with both the current and future centuries. The closest date after todays date is chosen. Closest. The past, present, and next centuries are used to expand the year value. The date closest to today is used. Present. The present century is used to expand the year value.

The system default for DBCENTURY is R.

IBM Informix Dynamic Server Data Types 2-11

Warning!
When a DBCENTURY value of P or F is set and todays date is entered, the century will be the past century or the future century respectively. Todays century will be used when the keyword TODAY is substituted for todays date, since TODAY uses the 4-digit year.

2-12 IBM Informix Dynamic Server Data Types

Binary Large Object

13

Binary Large OBjects, BLOBs, are streams of bytes of arbitrary value and length. A BLOB might be a digitized image or sound, a relocatable object module, or a legal contract. A BLOB can be any arbitrary collection of bytes for any purpose. Anything that you can store in a file system of a computer can be stored in a BLOB. IBM Informix allows BLOBs to be stored as columns within a database. The theoretical limit to their size is over 2.1 billion bytes. This size is based on the highest value that can be held in a 4 byte signed integer. 56 bytes of space are reserved in the row for general BLOB information. The BLOB itself is stored in pages separate from the rest of the row.

IBM Informix Dynamic Server Data Types 2-13

TEXT vs. BYTE


n

TEXT w Large amount of data containing ASCII values, Control-i , Control-j, and Control-l BYTE w Large amounts of unstructured data with unpredictable contents

14

There are two types of BLOB data types:


n

TEXT - A data object of type TEXT is restricted to a combination of printable ASCII text and the control characters Control-i, Control-j and Control-l. Examples of the data that might be stored in TEXT data types are:
w w w

Text notes Engineering specifications Program source code files Spreadsheets Program modules Digitized images, for example, photographs and drawings Voice patterns

BYTE - The BYTE data type can store any type of binary data such as:
w w w w

The BYTE data type is an undifferentiated byte stream. IBM Informix knows only the length of the BLOB and storage location on the disk. Other programs can be called to display the BLOB information.

2-14 IBM Informix Dynamic Server Data Types

Exercises

IBM Informix Dynamic Server Data Types 2-15

Exercise 1

True or False

CHAR data types are of variable length. Numbers can be stored in character columns. There can be more than one SERIAL column in a table. FLOAT columns can store numbers with larger precision than DECIMAL columns. TEXT and BYTE data types are variable length.

______ ______ ______ ______ ______

The maximum length a VARCHAR column ______ can be is 255 characters.


16

2-16 IBM Informix Dynamic Server Data Types

Exercise 2
Your company wishes to track employee and department information in a database. Each employee is assigned a unique employee number. Some of the required reports will be sorted by last name. As well, the information will sometimes need to be grouped by department number. Select the appropriate data types to store the information shown below: Data Emp. Number First Name Last Name Salary Date Hired Dept. Number 1 2 John Alexander Smith Johnson $15,000.00 $24,599.80 12/15/98 1/15/97 001 050 Examples 001 050 Accounting Sales B. Allen G.T. Weems Column Name Data Type Examples Column Name Data Type

Data Dept. Number Dept. Name Dept. Manager

IBM Informix Dynamic Server Data Types 2-17

Exercise 3
Optional DBCENTURY Exercise
In this exercise, you will experiment with DBCENTURY. 3.1 First, set the environment variable DBCENTURY to P. 3.2 Using the stores demonstration database, create a new table test_date with one DATE column:
CREATE TABLE test_date (thedate DATE, mycentury CHAR(1));

3.3 Insert three rows into test_date: In the column thedate enter todays date, todays date + 1, and todays date - 1. Insert the value of the DBCENTURY environment variable into the column mycentury for all three rows.The purpose of the exercise is to find out how IBM Informix expands a two digit year to a four digit century and year, so do not input any century values. When using the insert statement with dates, remember to use quotes. Do not use the keyword TODAY because that will input a century value. Example:
INSERT INTO test_date VALUES ("10/31/98","P");

3.4 View the rows that you have inserted by selecting all rows from test_date . What century was each date expanded to and why?
SELECT * FROM test_date;

3.5 Delete all rows from test_date.


DELETE FROM test_date;

3.6 Repeat steps 1 through 5 using at least two other settings for DBCENTURY. Simulate what might happen close to the turn of the century by entering a date in the 21st century (i.e. 01/01/01). 3.7 Remember that the default setting for DBCENTURY is R. Are there any other settings that you might consider?

2-18 IBM Informix Dynamic Server Data Types

Solutions

IBM Informix Dynamic Server Data Types 2-19

Solution 1

True or False

CHAR data types are of variable length. Numbers can be stored in character columns. There can be more than one SERIAL column in a table. FLOAT columns can store numbers with larger precision than DECIMAL columns. TEXT and BYTE data types are variable length.

__F__ __T__ __F__ __F__ __T__

The maximum length a VARCHAR column __T__ can be is 255 characters.

20

2-20 IBM Informix Dynamic Server Data Types

Solution 2
The following is one solution to the group exercise:

Data Emp. Number First Name Last Name Salary Date Hired Dept. Number 1 2

Examples

Column Names empnum firstname lastname salary hiredate deptnum

Data Type SERIAL CHAR(20) CHAR(20) DECIMAL(9,2) DATE CHAR(3)

John Alexander Smith Johnson $15,000.00 $24,599.80 12/15/98 1/15/97 001 050

Data Dept. Number Dept. Name Dept. Manager 001 050

Examples

Column Names deptnum deptname deptmgr

Data Types CHAR(3) CHAR(15) CHAR(25)

Accounting Sales B. Allen G.T. Weems

IBM Informix Dynamic Server Data Types 2-21

2-22 IBM Informix Dynamic Server Data Types

Module 3
Creating Databases and Tables

Creating Databases and Tables 09-2001 2001 International Business Machines Corporation

3-1

Objectives
At the end of this module, you will be able to: n Create databases and tables n Calculate row and extent sizes n Use the sysmaster database to examine your instance

3-2 Creating Databases and Tables

Creating a Database
The database administrator must decide: n The location (dbspace) of the database n The logging mode n ANSI standard compliance The system catalog is automatically generated.

As database administrator you must make some decisions before creating a database.
n

Where in the IBM Informix system will the database will be located? Usually this decision is made in conjunction with the IBM Informix system administrator, who creates and manages the dbspaces within an IBM Informix system. Will you use a transaction log to record all changes to the data in the database? When will that log be flushed to disk? Will this be an ANSI compliant database? ANSI databases must conform to certain rules laid out by the ANSI standards committee.

n n

When you create a database, system catalog tables that describe the structure of the database are automatically generated. These tables are accessed each time an SQL statement is executed. The system catalog is used to determine system privileges or verify table names, for example.

Creating Databases and Tables 3-3

Location: Dbspaces

customer database

rootdbs

dbspace1 stores database dbspace2

The location of an IBM Informix databases, tables, and indexes is a particular pool of disk space, called a dbspace. An IBM Informix Dynamic Server instance has physical disk space assigned to it in units called chunks . Each chunk is a contiguous unit of space. A logical collection of chunks is called a dbspace. Every IBM Informix instance has at least one dbspace, the root dbspace .

Location of a database
You can specify which dbspace to use when creating a database. This means that the system catalog tables will be located in the chunk or chunks assigned to that dbspace. Unless you specify otherwise, all data tables associated with that database, and the corresponding indexes, will also be located in that dbspace. If you do not specify the dbspace in which to create the database, the database will, by default, be created in the root dbspace. This is not recommended. Other dbspaces should be created for holding databases. Separating databases into different dbspaces can have several advantages:
n

A database cannot grow beyond the space available in the dbspace. By limiting the size and number of chunks assigned to a dbspace, you also limit the size of the database.

3-4 Creating Databases and Tables

n n

You can assign databases to different devices by assigning them to separate dbspaces that have chunks on different devices. This will reduce the I/O contention. Should one of your devices suffer a failure, you will only lose access to the database stored in the dbspace on that device. Other databases will be unaffected.

Dbspaces and growth


Dbspaces may have as many chunks assigned to them as necessary. If you run out of space in a particular dbspace (because all the chunks assigned to it are full), you can add additional chunks to it. Creating and managing dbspaces is covered in the IBM Informix Dynamic Server System Administration courses.

Creating Databases and Tables 3-5

Logging Modes: No Logging

insert update insert delete delete update delete insert

Database
No record of changes to the database recorded in the log.

Logical Log

You can create a database in the IBM Informix system that does not log transactions. If a database does not have logging, then any changes made to that database are not recorded in the logical log. From the application's perspective, this means that transactions are not supported.

No Full Recovery
More importantly, a database without logging cannot be fully recovered in the event that you have to restore the system from an archive. Since an archive records the state of the system at the time of the archive, any changes made to the database since the time of the archive must be recovered from the records in the logical log. Since an unlogged database writes no such records to the log, there will be no way to recover those changes.

3-6 Creating Databases and Tables

Logging Modes: Buffered Logging


begin update insert commit begin update delete commit

Transaction records were Database written to the log buffer

begin update insert commit begin update delete

Shared memory log buffer

Contents flushed to disk when the buffer is full

Contents of the log buffer are not flushed to disk until the buffer is filled.

begin update insert commit begin update delete

Logical log on disk

If you choose to log the database transactions, the log entries are not recorded to disk immediately. They are first put in a buffer in memory, called the logical log buffer. Eventually the buffer will be flushed and the entries will be placed in the logical log file on disk. If a database is created with buffered logging, then the contents of the logical log buffers will not be written to disk until the buffer becomes full.

Advantage
The amount of physical I/O to disk is greatly reduced. Since physical I/O is often a costly operation, this can improve the performance of your system.

Disadvantage
Should a system crash occur, whatever is contained in the log buffer and not yet written to disk will be lost. Without the data being contained in the log on disk, IDS cannot recover those transactions when the system is brought back up. So while buffered logging can improve performance, it can result in the loss of some transactions in the event of a system failure.

Creating Databases and Tables 3-7

Logging Modes: Unbuffered Logging


begin update insert commit

begin update insert commit

Transaction Database records were written to the log buffer

Shared memory log buffer

Contents flushed to disk as soon as a transaction is complete

Contents of the log buffer are flushed to disk as soon as a transaction is complete.

begin update insert commit

Logical log on disk

If a database is created with unbuffered logging, the contents of the log buffer are flushed as soon as a transaction is complete. This means that once you have completed a transaction against the database, the log records for that transaction will be written to disk.

Advantage
If there is a system failure, the changes made to the database will be recovered when the system is brought back up.

Disadvantage
More physical I/O will be performed to the disk, which may degrade performance somewhat.

3-8 Creating Databases and Tables

Mode ANSI Databases


n n n n n

All databases use unbuffered logging. All statements are automatically contained in transactions. Owner-naming is enforced. The default isolation level is repeatable read. Users do not receive PUBLIC privileges to tables and synonyms by default.

When you create an ANSI-compliant database (LOG MODE ANSI) the following are the main features that set it apart from databases that are not ANSI-compliant:
n n

All databases use unbuffered logging. All statements are automatically contained in transactions. Programs that access ANSI databases cannot use the BEGIN WORK statement; the program is always in a transaction and the COMMIT WORK statement will implicitly start a new transaction once the current transaction has been committed. Owner naming is enforced. You must use the owner name when you refer to each table, view, synonym, index, or constraint unless you are the owner. The default isolation level is repeatable read. Isolation levels are discussed in a later chapter. The default privileges of ANSI databases are different from non-ANSI databases. Users do not receive PUBLIC privileges to tables and synonyms by default. Privileges are discussed further in a later chapter.

n n n

A detailed discussion of the differences between ANSI-compliant databases and non-ANSI databases is in the IBM Informix Guide to SQL:Reference product documentation manual.

Creating Databases and Tables 3-9

CREATE DATABASE Statement


Examples: CREATE DATABASE CREATE DATABASE CREATE DATABASE CREATE DATABASE
stores stores IN dbspace1 WITH LOG stores WITH BUFFERED LOG stores IN dbspace1 WITH LOG MODE ANSI

10

You create a database with the CREATE DATABASE statement. This statement will also automatically create the system catalog. The examples show different ways to create a database with IBM Informix Dynamic Server.
n

CREATE DATABASE stores This statement will create a database in the default location, which is the root dbspace. The database is created without logging. As mentioned earlier, creating databases in the root dbspace is not recommended.

CREATE DATABASE stores IN dbspace1 WITH LOG This example will create a database in the dbspace called dbspace1, with unbuffered logging.

n n

CREATE DATABASE stores WITH BUFFERED LOG This example will create a database in the default location with buffered logging. CREATE DATABASE stores IN dbspace1 WITH LOG MODE ANSI This example will create an ANSI database in the dbspace called dbspace1. An ANSI database uses unbuffered logging.

3-10 Creating Databases and Tables

Creating a Table
The database administrator must decide: n The columns and their data types n The location of the table (dbspace) n The contiguous space initially set aside (extent size) n The lock level of the table The related system catalog tables are automatically populated.

11

When you create a table in a database you must decide:


n n n n

The names of the columns in the table, and the data type of each column. The dbspace where the table will be located. Tables can be created in the same dbspace as the database, or in a different dbspace. The amount of contiguous space to be set aside initially in the dbspace for that table. This is determined by the extent size set in the table. The lock level of the table. Locks are used to prevent one user from accessing data that is being used by someone else.

When a table is created the table and column information is automatically inserted into the systables and syscolumns system catalog tables.

Creating Databases and Tables 3-11

Tables and Dbspaces

rootdbs
stores database table orders

dbspace1

dbspace2

12

With IBM Informix Dynamic Server, you have the ability to create a table in a different dbspace than the database. The system catalog information for the table is stored with the rest of the database, but the data for the table itself will be located in the specified dbspace. The advantage of locating some tables in different dbspaces from the database is that those tables can be placed on different physical devices than the rest of the database.
n n

Large tables - will have no competition for space from other tables. Frequently accessed tables - will have reduced contention for the disk head from other processes querying different tables.

Grouping Tables for Archiving


The IBM Informix Dynamic Server archive utilities support parallel archives and restores with a granularity of dbspace. To fully utilize their capabilities, group tables into dbspaces based on which tables need to be archived with the same frequency.

3-12 Creating Databases and Tables

Extents
An extent is a collection of contiguous pages on a disk. Space for tables is allocated in units of extents. Extent size here is 16K (8 pages.) A page is the basic unit of I/O in an IBM Informix system.

Extent
Page 1 Bitmap Page Page 3 Data Page Page 2 Index Page Page 4 Data Page Page 6 Remainder Page Page 8 Free Page

Page 5 Data Page

Page 7 Blob Page

13

Disk space for a table is allocated in units called extents. An extent is an amount of physically contiguous space on disk; the amount is specified for each table when the table is created. The size of an extent is specified in kilobytes. This number must be an even multiple of the page size of the system. The page is the basic unit of I/O in an IBM Informix system. Page size is determined when a port is made to a particular machine/operating system, and cannot be changed. The most common page size used is 2 Kbytes, although some systems use a 4 Kbyte page size. When an extent is added, it is empty of data at first. When this extent has no more space another extent will be allocated for the table, when that extent is filled, another extent will be allocated, and so on. Each table has two extent sizes associated with it. EXTENT SIZE Is the size of the first extent allocated for the table. This first extent is allocated when the table is created. The default is 8 pages. NEXT SIZE Is the size of each subsequent extent added to the table. The default is 8 pages.

The minimum size that an extent can be is 4 pages. There is no maximum size in practical terms (it would be in the range of 2 gigabytes). IBM Informix recommends calculating the extent size for each of your tables instead of using the default extent size.
Creating Databases and Tables 3-13

tblspace
All the extents allocated for a given table are logically grouped together and are referred to as a tblspace. The space represented by the tblspace may not be contiguous as extents may be spread around a device as space permits. Once an extent has been allocated to a table, that extent will never be freed up for reuse by other tables. If an extent should become empty (due to massive deletes from the table), the extent will remain part of the tblspace. The space will be reused, however, when additional rows are inserted into the same table in the future.

Total number of extents


The total number of extents allowed for a table varies depending on the page size, number of indexes, the number of columns per index, and the type of columns in the table (i.e. VARCHAR , TEXT , or BYTE). For systems with a 2K page size, the maximum number of extents is approximately 200. Systems with a 4K page size can have approximately 450 extents. If a table reaches the maximum number of extents, you will need to unload it, find enough contiguous space to recreate the table with fewer extents, and then reload the data. Having a lot of extents can have a performance impact, particularly in a decision support ( DSS) environment where large groups of rows are selected. There will be additional I/O when bringing the data pages into shared memory because of all the extents that need to be accessed.

Reclaiming space in empty extents


If you want to reclaim the space in empty extents and make it available to the dbspace to use for other tables, you can change one of the indexes on the table to a clustered index, forcing a rewrite of the table. This is discussed in more detail in the chapter on Indexes and Indexing Strategy.

3-14 Creating Databases and Tables

Extent Growth
existing extent existing extent (made larger)

Concatenation
new extent new extent is physically adjacent to existing extent 17th extent SIZE 40K

33rd extent SIZE 80K

Doubling

16th extent SIZE 20K extent size automatically doubles every 16 extents

Manual Modification

15

There are special situations that can alter the next size of subsequent extents from the size specified at the time the table was created.

Concatenation
Extent concatenation occurs when an extent for a table is allocated and that extent is physically contiguous with an existing extent for the same table. The existing extent is simply made larger. This effect is most often seen when performing batch loading of tables one at a time. Generally, each new extent allocated will be contiguous with the previous extent. A large table can be loaded and end up occupying a single large extent.

Doubling
Extent size doubling occurs when the number of extents allocated for a particular table grows to a multiple of 16. The current working size of the extent will be doubled.

Creating Databases and Tables 3-15

Manual Modification
The size of the first extent and subsequent extents are specified when the table is created. It is possible to alter the extent size using the ALTER TABLE command. You can increase or decrease the extent size for any subsequent extents. It will not alter the size of extents currently allocated for the table.

Space Limitation
If the amount of contiguous free space in a dbspace is less than the size of the current extent allocation, the amount of space available will be allocated to the extent even though it is less than the desired amount. The minimum available space will be used for an extent.

3-16 Creating Databases and Tables

Table Lock Modes

Row lock

Page lock

PAGE locking (default) will lock an entire page of data, effectively locking all data on that page. Index locks will also be at PAGE level. ROW locking will only lock the row needed. Index locks will be placed only on the key values.
17

To prevent errors when more than one user is reading or modifying data IBM Informix uses a system of locks to control access. When you create a table you may choose the level of locking that IBM Informix uses for the table.

Page Level Locking


Page level locking locks an entire page. This is the default locking level. Page level locking provides the most efficient method for locking several rows of data. But, because you are locking every row on the page regardless of whether you are using it, other users cannot access that data. Page level locking decreases concurrency, or the availability of the data to other users. Since a page lock will actually lock more than one row with only one resource, it is useful if you are processing rows in the same order as they are placed physically on disk. For example, if you are processing a table in its physical order, page level locking allows you to update many rows with fewer locks.

Creating Databases and Tables 3-17

Row Level Locking


Row level locking locks a single row at a time. Row level locking increases concurrency because only one row is being locked. IBM Informix Dynamic Server uses row level locking on the system catalog tables to provide the highest level of concurrency. When the number of locked rows is high, however, not only do you risk exhausting the number of available locks, but the overhead of lock management can become significant.

3-18 Creating Databases and Tables

CREATE TABLE Statement


CREATE TABLE orders ( order_num SERIAL NOT NULL, customer_num INTEGER, order_date DATE) IN dbspace1 EXTENT SIZE 64 NEXT SIZE 32 LOCK MODE row;

19

The example CREATE TABLE statement above creates a table named orders with three columns. The table will be located in the dbspace dbspace1. 64K will be allocated for the first extent. Every extent after that will be allocated 32K. When locks are applied to data in the table, they will be applied at the row level. Tables may also be fragmented across multiple dbspaces to improve performance. This is covered in a later chapter.

Creating Databases and Tables 3-19

Storing BLOB Data

CREATE TABLE pictures (... pic1 BYTE IN TABLE, pic2 BYTE IN ; blobdbs, ...) IN datadbs
pic 2 BLOB pages pic1 BLOB pages

datadbs

blobdbs

chunk2

chunk3

chunk4

chunk5

chunk6

20

When you create a table containing a column of type TEXT or BYTE (BLOB columns), you have the option of locating the BLOB data in the table or in a separate blobspace. A blobspace is a special dbspace dedicated to storing BLOBs.

Stored in table
If you create the BLOB in TABLE, the BLOB values will be stored in the same dbspace as the rest of the table data. The BLOB will not, however, be stored in the row with the other data. It will use its own pages. A 56-byte pointer to the BLOB will be stored in the row. Generally, small BLOBs (less than 2 pages in size) can be stored in the table with no performance degradation.

Stored in blobspace
If you choose to create the BLOB in blobspace-name , then all the BLOB data for that column will be stored in that blobspace. The blobspace page (blobpage) size is larger than a normal page because BLOB data tends to be very large. This makes storing BLOBs more efficient. BLOBs much larger than 2 pages should be stored in a blobspace.

3-20 Creating Databases and Tables

Space Available on Page


When storing BLOB values in the table, the maximum amount of space available on that page for actual BLOB data is the page size - 40 bytes. So, for example, on a system with a page size of 2K (2048 bytes), you would be able to store up to 2008 bytes of BLOB data on a single page. If the BLOB value was larger than 2008 bytes, several pages would be used to store the value.

Creating Databases and Tables 3-21

Page Structure
The basic structure of a page: A 24 byte header A second 4byte timestamp A slot table
Page Header (24 bytes)

Data in the page is stored here


page_id (4 bytes) timestamp (4 bytes) num_slots (2 bytes) pg_type (2 bytes) free_ptr (2 bytes) free_cnt (2 bytes) next (4 bytes) prev (4bytes)

Page

Slot Table

TS

(Pagesize - 28 bytes) available for data on each page.

RowOffSet (2 bytes)

RowSize (2 bytes)

Timestamp (4 bytes)

22

When storing information for a table, the type of data stored on a particular page is homogeneous; that is, if data rows are stored on the page, then the page will contain only data rows. If index data is stored on a page, then that page will contain only index data. Some pages are set aside for administrative purposes. All pages used in the server have the following data structure: Page header Timestamp Slot table 24 bytes of information used to keep track of the data on the page. the last 4 bytes on the page that is used to ensure the validity of writes. contains one 4 byte entry for every row inserted into the page. The 4 bytes store the offset of the row in the page and the length of the row. The slot table is used to locate rows on the page.

The remainder of space on the page is available to store data or index information.

3-22 Creating Databases and Tables

Estimating Row and Extent Size for Tables


Some guidelines for estimating the storage space required for a table are: 1. 2. Determine the number of rows in the table that you wish to store initially. Determine the row length by adding the widths of all the columns in the table. Columns for BLOB data that is not stored in the table will use 56 bytes for the pointer. Tables containing VARCHAR or BLOB columns located in the table are impossible to size accurately. You can use the maximum possible size or an average size, but the resulting row size will be an estimate. Add 4 bytes for the slot table entry. The result is rowsize. Determine page length in bytes by subtracting 28 (for the header) to obtain the usable page space (pageuse) . Determine how many whole rows can fit on a page (round down): # of rows on a page = pageuse/rowsize 6. 7. The maximum number of rows per page is 255, regardless of the size of the row. Determine how many pages are needed (round up). # of pages = total rows/ # of rows on a page Determine the total space needed in kilobytes, using a page size of 2K or 4K, depending on your system: disk space = # of pages * size of page The result in #7 is the FIRST EXTENT size. Repeat the steps, based on anticipated growth, to determine the NEXT EXTENT size. Alternative "rough estimate" method Rough Estimate: (# of rows * row size * 125%)

3. 4. 5.

1.

Note
If the row size is larger than the usable space on a page Dynamic Server will divide the row between homepages and remainder pages. The number of data pages is calculated as # of homepages + # of remainder pages.

Creating Databases and Tables 3-23

Calculating Extent Size: example using customer table


Assume that the current customer base is 10,000 and expected growth is 5,000 annually. 1. Determine the number of rows in the table that you wish to store initially (current + growth = total). FIRST EXTENT based on current customer base + rest of year = 10,000 + 5,000 = 15,000 rows 2. NEXT EXTENT based on annual growth = 5,000 rows Determine the row length by adding the widths of all the columns. Column name Data type Bytes customer_num fname lname address1 address2 city state zipcode phone TOTAL 3. 4. 5. 6. Add 4 bytes for the slot table entry. rowsize = 114 + 4 = 118 Determine page length by subtracting 28 to obtain the usable page space. For 2 Kb page: 2048 - 28 = 2020 Determine how many whole rows can fit on a page (round down): 2020/118 = 17.1 = 17 rows per page Determine how many pages are needed (round up). FIRST EXTENT = 15,000 rows/17 = 823.5 = 824 data pages 7. NEXT EXTENT = 5,000 rows /17 = 294.1 = 295 data pages Determine the total space needed in kilobytes. FIRST EXTENT = 824 * 2 Kb = 1648 Kb NEXT EXTENT = 295 * 2 Kb = 590 Kb serial char(15) char(15) char(20) char(20) char(15) char(2) char(5) char(18) 4 15 15 20 20 15 2 5 18 114

3-24 Creating Databases and Tables

The Sysmaster Database


n n n

Contains diagnostics about the entire instance Is created automatically when IBM Informix Dynamic Server is initialized Accesses real-time data in shared memory

25

The System Monitoring Interface ( SMI) offers a snapshot of performance and system status information for an entire IBM Informix instance. The sysmaster database holds the tables used by SMI to retrieve this information. There is one sysmaster database for each IDS system. The database is created automatically the first time the IDS system is initialized. Most of the sysmaster tables do not hold any data. Instead, the database structures for the tables point to structures in memory. When you query the sysmaster tables using <k keyword>SQL, the SELECT statement accesses real-time data in memory. For this reason, the data retrieved from one table may not be synchronized with the data retrieved from a different table. The structure of the most commonly used tables in the sysmaster database is included in the System Monitoring Interface Appendix.

Note
An additional database, the sysutils database, is also automatically created when the IDS system is initialized. This database contains information used by the archiving utilities.

Creating Databases and Tables 3-25

Some sample <k keyword>SQL statements to obtain information about your IBM Informix instance from the sysmaster database are shown below:
n

sysdbspaces: all the dbspaces in the instance, the number of chunks, whether the dbspace is a blobspace or temporary dbspace, status of the dbspace
SELECT dbsnum, name, owner, nchunks, is_temp, is_blobspace, flags FROM sysdbspaces;

syschunks : # of free pages, and # of pages used in chunks for a particular dbspace
SELECT syschunks.dbsnum, chknum, nxchknum, sysdbspaces.name, chksize, nfree FROM syschunks, sysdbspaces WHERE syschunks.dbsnum = sysdbspaces.dbsnum AND sysdbspaces.name = name of dbspace;

sysdatabases : all databases in instance and their logging status


SELECT name, owner, created, is_logging, is_buff_log, is_ansi FROM sysdatabases;

systabnames: each table in database and its owner


SELECT partnum, dbsname, owner, tabname FROM systabnames WHERE dbsname = name of database;

syssessions: user for session, name of host, time connected


SELECT sid, username, uid, hostname, connected FROM syssessions;

sysextents: each extent in the database, and the table associated with it
SELECT dbsname, tabname, start, size FROM sysextents;

3-26 Creating Databases and Tables

Exercises

Creating Databases and Tables 3-27

Exercise 1
1.1 Answer the following questions about your instance by accessing the sysmaster database, using the tool indicated by your instructor: How many dbspaces are there in the instance? Are any of them temporary dbspaces or blobspaces? ________________________________________________________ How many databases are there? ________________________________________________________ What is the logging status of the database that you created? ________________________________________________________ The sample SQL statements on the previous page may be used to obtain the answers. 1.2 Write the CREATE TABLE statements for separate employee and department tables using the data types that you chose in the previous chapter. a. Create each table in the same dbspace as your database. b. Use the default extent size for the department table. For the employee table, calculate the extent size, assuming that there are 5,000 employees currently, but the company expects to increase its employee count by 2,000 each year. c. Use row level locking for both tables. Create the tables in the stores demonstration database that you created earlier. 1.3 Use the SQL INSERT statement to insert data into both of the tables that you created. The sample data from the previous chapter may be used.

3-28 Creating Databases and Tables

Solutions

Creating Databases and Tables 3-29

Solution 1
1.1 a. Six dbspaces (note that this answer may vary according to classroom setup) b. One database was created for each student, plus a sysmaster database and a sysutils database. c. Unbuffered logging 1.2 The following is one solution to the exercise: To calculate extent size for the employee table: FIRST EXTENT based on 5,000 + 2000 employees => 7,000 rows NEXT EXTENT based on 2,000 employees ==> 2,000 rows Column in employee Table empnum SERIAL firstname CHAR(20) lastname CHAR(20) salary DECIMAL(9,2) hiredate DATE deptnum SMALLINT TOTAL length Column Length in Bytes 4 20 20 6 4 2 56

Rowsize = 56+ 4 (for slot table entry) = 60 Pageuse = 2048 28 = 2020 # of rows per page = 2020/ 60= 33.6==> 33 rows # of data pages FIRST EXTENT: 7,000/33 = 212.1==> 213 pages NEXT EXTENT: 2000/33 = 60.6 ==> 61 pages Disk Space in Kb for extents FIRST EXTENT: 213 * 2 Kb = 426 Kb NEXT EXTENT: 61 * 2 Kb = 122 Kb

3-30 Creating Databases and Tables

CREATE TABLE employee ( empnum SERIAL, firstname CHAR(20), lastname CHAR(20), salary MONEY (9,2), hiredate DATE, deptnum SMALLINT) EXTENT SIZE 426 NEXT SIZE 122 LOCK MODE ROW; CREATE TABLE department ( deptnum SMALLINT, deptname CHAR(15), deptmgr CHAR(25)) LOCK MODE ROW;

1.3 INSERT INTO employee VALUES (0, "John", "Smith", 15000, "12/15/98",1); INSERT INTO department VALUES (1, "Accounting", "B.Allen");

Creating Databases and Tables 3-31

3-32 Creating Databases and Tables

Module 4
Table Maintenance

Table Maintenance 09-2001 2001 International Business Machines Corporation

4-1

Objectives
At the end of this module, you will be able to: n Examine the structure of tables n Manage temporary tables n Rename databases and tables n Alter and delete tables n Make tables memory-resident n Use unlogged tables for faster loading

4-2 Table Maintenance

System Catalog Tables


n

systables - describes each table in the database SELECT * FROM systables WHERE tabname = "customer"; syscolumns - describes each column in the tables SELECT * FROM syscolumns WHERE tabid = (SELECT tabid FROM systables WHERE tabname = "customer");

Whenever a database is created, system catalog tables are automatically created that store information about the database. Since the system catalog is stored in normal database tables, it can be queried like any other database table. You can use these tables to become familiar with the structure of the database.
n

The system catalog table systables describes each table in the database. It contains one row for each table, view, and synonym defined in the database, including the system catalog tables themselves. The information stored in the table includes the tablename, owner, row size, number of rows, number of columns, lock mode, size of extents, and number of indexes. The link between the information in systables and many of the other system catalog tables is the column tabid, the table identifier. The tabid for user tables is 100 or higher. The system catalog tables themselves have tabids less than 100.

The system catalog table syscolumns describes each column in the tables in the database. Each row contains a column name, the tabid of the table, the sequence number of the column within the table, the type of column, and the physical length.

A complete outline of the structure and contents of the columns in the system catalog tables is included in an appendix to this manual.

Table Maintenance 4-3

Temporary Tables

CREATE TEMP TABLE temp_order( order_num INTEGER) WITH NO LOG; SELECT fname, lname, city FROM customer INTO TEMP tempcust WITH NO LOG;
Temporary tables can be created in dbspaces specifically created for temporary objects designated by the DBSPACETEMP environment variable or IBM Informix configuration parameter.
4

You can create a temporary table that is similar to a permanent table except that it is only valid for the duration of the session. There are no entries for a temporary table in the systables or syscolumns system catalog tables. You cannot alter temporary tables. You may create indexes on them. If you close the current database, the temporary table will be deleted. Alternatively, you can drop a temporary table by using the DROP TABLE statement.

Temporary dbspaces
Temporary tables should be created in a dbspace that is specifically designated for temporary tables within the IBM Informix system. A temporary dbspace does not accommodate logging. Therefore, temporary tables created in a temporary dbspace must be from an unlogged database or be created using the syntax WITH NO LOG. A dbspace is designated as temporary at the time the dbspace is created.

DBSPACETEMP
The DBSPACETEMP environment variable can be set to one or more of the specifically designated dbspaces. If the DBSPACETEMP environment variable is not set, IBM Informix uses the value of the DBSPACETEMP configuration parameter.
4-4 Table Maintenance

Synonyms
CREATE SYNONYM ord FOR stores101:orders; CREATE SYNONYM cust FOR stores7@infmxchi:customer;

You can create a synonym for any table or view in any database on your database server or any networked database server. The following example shows a synonym for a table outside the current database: CREATE SYNONYM ord FOR stores101:orders; If the table is on a different database server on your network, that database server must be online when you create the synonym: CREATE SYNONYM cust FOR stores7@infmxchi:customer; You can create a synonym for a table that is being dropped or moved to a different location. If the synonym has the same name as the dropped or moved table, the change will be transparent to users. Synonyms are described in the system catalog tables systables, with a table type (tabtype) of "S", and syssyntable.

Table Maintenance 4-5

Privileges
Users have the same privileges for a synonym as for the table to which the synonym applies. If your database is MODE ANSI, or if you use the PRIVATE keyword when creating the synonym, the user must know the name of the owner of the synonym.
CREATE PRIVATE SYNONYM mystock FOR stores101:stock; SELECT * FROM joe.mystock; -- joe is the owner

4-6 Table Maintenance

Altering a Table

test_tab col1 1 2 col2 2 3 col1 1 2

test_tab col3 col4

ALTER TABLE test_tab DROP col2, MODIFY coll INTEGER NOT NULL, ADD col4 INTEGER, ADD col3 CHAR(20) BEFORE col4;
7

Using the ALTER TABLE statement, you can:


n

Add a column. Using the ADD clause, a column may be added to the table. The contents of the column will be NULL for rows already existing before the ALTER TABLE statement is run. You may specify where to put the column by adding the BEFORE clause. The following example puts the order_op column before the order_date column: ALTER TABLE test_tab ADD order_op INTEGER BEFORE order_date;

Modify a column. You can use the MODIFY clause to change the data type, the length, or allow/disallow nulls in a column. You must specify all existing attributes of a column (e.g., the UNIQUE and NOT NULL constraints) or they will be dropped. You may alter a table to change the data type if the data type is compatible with the data already in the column.

Drop a column. You may drop a column using the DROP clause. Dropping a column means that all data in that column will be lost.

Table Maintenance 4-7

In-Place ALTER TABLE


n n n

Adding/dropping a column anywhere in the table Changing the length of a column Changing the type of a column

For many modifications that can be made to a table, In-Place Alter Table logic is used. As a result, the table is unavailable to users only while the table definition is updated. A copy of the table is not created. Instead, existing data rows are updated to the new definition only when they are modified. New rows are inserted with the new definition. This feature is used automatically.

Table Definition Versions


The in-place ALTER TABLE algorithm creates a new version of the table definition. Up to 255 versions of a table definition are allowed by the database server. Information about versioning is available using the oncheck utility: oncheck -pT database:table Each data page is associated with a version. After an in-place ALTER TABLE, new rows are inserted only into data pages with the new version. Therefore, the same extent may have version0, version1, version2 pages, for example. When rows on old pages are updated, all the rows on the data page are updated to the new version, if there is enough room. If there is not enough room, the row is deleted from the old page and inserted into a page with the new version.

4-8 Table Maintenance

When a table with multiple versions is queried, IBM Informix returns appropriate values for any columns that do not physically exist. Default values are returned for columns created with a default value, otherwise null values are returned. Each subsequent in-place ALTER TABLE statement on the same table takes more time to execute. IBM Informix recommends no more than 50 to 60 outstanding Alters on a table. If you wish to eliminate multiple versions of a table, force an immediate change to all rows. For example, use a dummy UPDATE statement that sets the value of a column to itself.

Version 7.2 and earlier


In IDS version 7.2, all ALTER TABLE statements executing operations other than adding a column to the end of a table create a complete copy of the table. The table is exclusively locked for the duration of the operation. The old copy of the table is deleted when the statement has completed. For successful completion, there must be enough space in the dbspace to hold two copies of the table being altered. In all IBM Informix Dynamic Server versions earlier than 7.2, an ALTER TABLE statement that modifies the structure creates a complete copy of the table and locks the table exclusively for the duration of the operation. The old copy of the table is deleted when the statement has completed.

Logging and ALTER TABLE


The ALTER TABLE statement creates log entries even when the database is not logged. With large tables and an inadequate amount of log space, it may be difficult to avoid running out of log space or a long transaction. Using an in-place ALTER TABLE statement, each data page is logged at the time that the change physically takes place (that is, when a row is inserted or updated). The initial log entries from the in-place ALTER TABLE statement are very brief.

Table Maintenance 4-9

In-Place Alter Will Not Be Used If:


n

Adding or Dropping a Column w The column is part of a fragmentation expression of a table Modifying a Column w Any data type conversions cannot be done without errors w The distribution of rows across fragments would change w A VARCHAR column is being modified

10

The In-Place Alter will not be used:


n

Adding or Dropping a Column If the column being added or dropped is part of a fragmentation expression used to fragment a table, the In-Place Alter logic will not be used. If the dropped column is referenced by an index fragmentation expression, an error will be returned.

Modifying a Column The modify operation is subject to more restrictions. In general, the operation will not be performed in place if all conversions between old and new data types cannot be done without errors, or if distribution of rows across fragments would change as a result of that conversion. The modified column may not be a VARCHAR.

4-10 Table Maintenance

Next Extent Size and Lock Mode


ALTER TABLE orders MODIFY NEXT SIZE 300; ALTER TABLE orders LOCK MODE (ROW);

11

ALTER TABLE may be used to change the next extent size or lock mode for a table:
n

You may change the next extent size. This will not alter any current extents that have been allocated, only the future ones. The above example changes the next extent size to 300 kilobytes. You may change the locking mode of the row to either PAGE or ROW.

Table Maintenance 4-11

7.31 Feature: Unlogged Tables


n n n

Created in logged database Fast loading of very large tables No constraints or indexes

CREATE RAW TABLE sales (...); ALTER TABLE customer TYPE (RAW);

12

Beginning in version 7.31 of IBM Informix Dynamic Server an additional type of table, a raw permanent table, is available. Raw tables are not logged even though the database has logging. In prior versions only temporary tables could be non-logging. Tables with the normal behavior in a logged database are referred to as standard permanent tables. This feature supports fast loading of very large tables, such as those in a data warehouse. You can use any loading utility, High Performance Loader for example, to load raw tables. After the data is loaded, a level-0 backup should be performed. This provides a starting point from which to restore data if it should become necessary. To avoid concurrency problems and inconsistent data, you must make a raw table into a standard table before using the table in a transaction. Existing standard tables can be temporarily converted to raw tables for fast loading of new data. You will be required to drop any indexes or constraints, since those are not permitted in a raw table.

4-12 Table Maintenance

To quickly load a new table


1. 2. 3. 4. 5. Create a non-logging table. CREATE RAW TABLE tabname (columnname etc......); Load the table using a load utility. Perform a level-0 backup of the table. Alter the table to standard. ALTER TABLE tabname TYPE (STANDARD); Create any indexes or constraints desired.

To quickly load an existing standard table


1. 2. 3. 4. 5. 6. Drop indexes and constraints. Alter the table to non-logging; ALTER TABLE tabname TYPE (RAW); Load the table using a load utility. Perform a level-0 backup of the table. Alter the table to standard. ALTER TABLE tabname TYPE (STANDARD); Re-create indexes and constraints.

Table Maintenance 4-13

Renaming Columns, Tables and Databases


RENAME COLUMN invoice.paid_date TO date_paid; RENAME TABLE stock TO inventory; RENAME DATABASE stores6 TO stores7;

14

If you rename a column that is referenced by a view in the database, the text of the view in the sysviews system catalog table is updated with the new name. If the column is referenced in a check constraint, the text of the check constraint is updated in the syschecks system catalog table. When a table is renamed, references to the table within any views are changed. The RENAME TABLE command operates on synonyms as well as tables. The table name is replaced if it appears in a trigger definition. It is not replaced if it is inside any triggered actions. The RENAME DATABASE command is available as of the 7.10.UD1 release of IBM Informix Dynamic Server. You must have DBA privilege or be the creator to rename a database. Renaming does not change the owner name.

In Stored Procedures
Column and table names within the text of a stored procedure are not changed by RENAME COLUMN or RENAME TABLE. The procedure will return an error when it references a nonexistent column or table.

4-14 Table Maintenance

Dropping Tables and Databases


DROP TABLE tabname;
n n

All references to the table in system catalog tables are deleted. The space occupied by the table is freed.

DROP DATABASE databasename;


n n

The system catalog tables are dropped. The space occupied by all tables is freed.

15

When you drop a table or database, the space occupied by the tables is freed and the data is no longer accessible. The DROP TABLE command will free up space that was allocated for the table to be used for other purposes. All references to the tables in the system catalog tables are deleted. Example: DROP TABLE orders; The DROP DATABASE command frees up space that was allocated for all the tables in the database. The system catalog tables are also dropped. Example: DROP DATABASE stores;

Table Maintenance 4-15

Memory Residency
n n n

Table Fragment Index


SET TABLE state MEMORY_RESIDENT; SET TABLE state NON_RESIDENT;

16

The performance of a query may be enhanced if the object of the query does not have to be retrieved from disk. This new feature of version 7.3 allows one or more fragments of a table, the entire table, or a specific index to be given preferential treatment in the shared memory buffer pool. Its pages will be considered last for page replacement when a free buffer is requested by the database server. A user must have DBA permission on the table in order to set it memory-resident. To turn this feature off, use the syntax: SET TABLE tablename NON_RESIDENT; Recycling the database server will also turn this feature off. What type of tables might this be most useful on? What problems could overuse cause?

4-16 Table Maintenance

The DBSCHEMA Utility


dbschema -d stores7 dbschema -d stores7 -ss dbschema -d stores7 -t orders schema.out

17

The DBSCHEMA utility is used to produce an <k keyword>SQL command file that contains the CREATE TABLE, CREATE INDEX, GRANT, CREATE SYNONYM, and CREATE VIEW statements required to replicate an entire database or a selected table.You must specify the database with the -d option. Additional options are shown below: -t tabname -s synname -p pname -f stproc -hd tabname -ss Only the table or view will be included. Specify all in place of tabname for all tables. CREATE SYNONYM statements for the user ( synname) specified are included. Specify all in place of synname for all synonyms. Print only GRANT statements for the user listed. Specify all in the place of pname for all users. Print the stored procedure listed. Specify all in place of stproc for all stored procedures. Displays distribution information. Specify all in the place of tabname for all tables. Generates server-specific information for the specified table including the lock mode, extent sizes, and dbspace name.

Table Maintenance 4-17

-r outputfilename

Generate CREATE and GRANT of the role specified or enter all for all roles. Sends the output to the named file.

4-18 Table Maintenance

Exercises

Table Maintenance 4-19

Exercise 1
1.1 One way to determine the structure of a database or tables is to examine the SQL commands that would be needed to replicate it. Execute the dbschema utility in order to obtain a file of all the <k keyword>SQL commands for the database you created earlier. Place the output of the dbschema utility in schema.sql , in your home directory. 1.2 The structure of the database and tables can also be examined by querying the system catalog tables. Execute an SQL command to list the user tables in the database that you created, as well as some additional information about the tables. An explanation of the columns in systables is in the system catalog appendix to this manual.

4-20 Table Maintenance

Exercise 2
To understand how IN-PLACE ALTER and table versioning work: 2.1 Execute oncheck against the employee table and notice the version information at the end of the output (the count column refers to the number of data pages): oncheck -pT <database name>:<table name> 2.2 Alter the employee table to add an ssn column before the department number. 2.3 Add another row to the table using the following SQL command: INSERT INTO employee (empnum) VALUES (0); 2.4 Execute oncheck again and notice any differences in the version information.

Table Maintenance 4-21

4-22 Table Maintenance

Solutions

Table Maintenance 4-23

Solution 1
1.1 At the operating system prompt: dbschema -d login_name -ss schema.sql 1.2 SELECT * FROM systables WHERE tabid > 99;

4-24 Table Maintenance

Solution 2
2.1 Version 0 (current) 2.2 ALTER TABLE employee ADD ssn CHAR(11) BEFORE deptnum; 2.3 INSERT INTO employee (empnum) VALUES (0); 2.4 Version 0 (oldest) 1 (current) Count 1 1 1 Count

This demonstrates an in-place ALTER TABLE. The table definition has been changed but physically the rows have not been altered. One data page exists with the old definition and one with the new definition.

Table Maintenance 4-25

4-26 Table Maintenance

Module 5
Indexes and Indexing Strategy

Indexes and Indexing Strategy 09-2001 2001 International Business Machines Corporation

5-1

Objectives
At the end of this module, you will be able to: n Explain how an index is built and maintained n List four types of indexes n Recognize the costs and benefits of indexing n Index appropriate columns n Estimate the storage space required for an index

5-2 Indexes and Indexing Strategy

Index Structure
Branch Nodes
> 387

Leaf Node
401 394 393 387

Root Node
> 89

292 294 293 292 97 89 95 89 59 57 56

D A T A

Level 2

Level 1

Level 0

An index is used to find a row quickly, similar to the way an index is used in a book. Indexes are organized in B+ trees. A B+ tree is a set of nodes that contain keys and pointers that are arranged in a hierarchy. The size of a node is the size of one page. The B+ tree is organized into levels . Level 0 contains a pointer, or address, to the actual data. The other levels contain pointers to nodes on different levels that contain keys that are less than or equal to the key in the higher level. In the example above, the 292 key has a pointer to the level 0 node with keys less than or equal to 292 and greater than 89. When you access a row through an index, you read the B+ tree starting at the root node and follow the nodes down to level 0, which contains the pointer to the data. In the example above, three read operations will be required to find the pointer to the data.

Indexes and Indexing Strategy 5-3

B+ Tree Splits

Before
> 292

After

414 378

414 378 292 150

292 150 88

Level 1

When a node gets full, it must be split into two nodes. B+ trees grow toward the root. Attempting to add a key into a full node forces a split into two nodes and promotion of the middle key value into a node at a higher level. If the key value that causes the split is greater than the other keys in the node, it is put into a node by itself during the split. The promotion of a key to the next higher level can also cause a split in the higher level node. If the full node at this higher level is the root, it also splits. When the root splits, the tree grows by one level and a new root node is created. In the example above, key 88 needs to be added but the node is full. A split forces half the keys (378 and 414) into one node and half the keys (292, 150, and 88) into the other on the same level. Key 292 will be promoted to the next highest level. Using this method, it is impossible for a B+ tree to be unbalanced (having different levels in different parts of the tree).

5-4 Indexes and Indexing Strategy

Indexes: Unique and Duplicate


Unique Index customer_num 105 106 113 114 115 Duplicate Index Data lname Albertson Beatty Currie Higgins 105 106 113 114 115 Anthony Higgins Philip Currie Lana Beatty Frank Albertson Alfred Higgins Play Ball! Phils Sports Sportstown Sporting Place Gold Medal Sports 113 114 115 106 Lana Beatty Frank Albertson Alfred Higgins Philip Currie Sportstown Sporting Place Gold Medal Sports Phils Sports Data 105 Anthony Higgins Play Ball!

There are four characteristics associated with indexes: unique, duplicate, composite, and cluster. An index must be either unique or duplicate . In addition, it may or may not be composite, and it may or may not be clustered. A unique index allows no more than one occurrence of a value in the indexed column. Therefore, a unique index prohibits users from entering duplicate data into the indexed column. For column(s) serving as a table's primary key, a unique index ensures the uniqueness of every row. A duplicate index allows identical values in different rows of an indexed column.

Indexes and Indexing Strategy 5-5

Composite Index
Index

customer_num lname

fname

customer table

An index on two or more columns is a composite index. IBM Informix Dynamic Server allows up to sixteen columns in a composite index with a maximum key size of 255 bytes. When you create a composite index to improve query performance, some queries on the component columns can also take advantage of the index.The composite index above can be used for the following queries:
n n n n n

Joins on customer_num, or customer_num and lname, or customer_num, lname and fname Filters on customer_num, or customer_num and lname, or customer_num, lname and fname ORDER BY or GROUP BY on customer_num, or customer_num and lname, or customer_num, lname, and fname Joins on customer_num and filters on lname and fname Joins on customer_num and lname, and filter on fname

5-6 Indexes and Indexing Strategy

Cluster Indexes

customer table
customer_num lname 101 102 103 104 ... Pauli Sadler Currie Higgins ... ... ... ... ...

after clustering by lname

customer_num 103 104 101 102

lname Currie Higgins Pauli Sadler

Information stored in a database is extracted from the disk in blocks (sections of disk space). Through clustering , you can cause the physical placement of data on disk to be in indexed order. Only one cluster index can exist per table. Clustered indexes can increase the efficiency of data retrievals when the retrievals are in similar order as the index. By placing rows that are frequently used together in close physical proximity you can substantially reduce disk access time. In the example above, the data row is ordered physically by lname . A SELECT statement retrieving many rows in the customer table in order by lname will be more efficient, especially if the table is large. The rows in a table with a cluster index are not re-ordered when rows are added or removed from the table. However, you can re-order the table by re-clustering the index. Cluster indexes are used most effectively on static tables and are less effective on dynamic tables. To maintain the effectiveness of a cluster index, it is a good idea to re-cluster the index used on a dynamic table frequently. When you cluster an existing index, a complete copy of the table is made on disk while the clustering is being done. You must have sufficient disk space to contain two copies of the table. Clustering and re-clustering takes a lot of space and time. The table is locked in EXCLUSIVE mode while the clustering is being done. You can avoid some clustering by loading data into the table in the desired order in the first place.

Indexes and Indexing Strategy 5-7

The CREATE INDEX Statement


CREATE INDEX ix_orders ON orders(customer_num) IN dbs1; CREATE UNIQUE INDEX ix_stock ON stock (manu_code, stock_num); CREATE UNIQUE CLUSTER INDEX ix_manufact ON manufact(manu_code); CREATE INDEX ix_man_stk ON items(manu_code DESC, stock_num);

Use the CREATE INDEX statement to create a unique or duplicate index, and optionally, to cluster the physical table in the order of the index. IBM Informix Dynamic Server indexes can be referenced in ascending and descending sequence. It is not necessary to create two separate indexes. The table is locked in exclusive mode while the index is being created. The examples above show different ways to create an index.
n

The first example creates a duplicate index called ix_orders on the customer_num column in the dbspace dbs1. If you do not specify the dbspace for an index it will be stored in the same dbspace as the table. The second example creates a unique composite index on two columns: manu_code and stock_num. The third example creates a unique cluster index on the manu_code column. The fourth example creates a duplicate composite index with the manu_code in descending order (the default is ascending order). The keywords ASC or DESC are still needed in cases where a composite index is created and the component columns are accessed in different orders.

n n n

5-8 Indexes and Indexing Strategy

Index Fill Factor

CREATE INDEX state_code_idx ON state(code) FILLFACTOR 80 IN dbs1;

Percentage of each index page that will be filled during index build.

The index fill factor is the percentage of each index page that will be filled during the index build. This percentage can be set with the CREATE INDEX statement. If it is not specified with CREATE INDEX, the default becomes the value specified in the IDS configuration parameter FILLFACTOR. If the fill factor is not specified in either location, the default fill factor is 90 percent. If you do not anticipate many new inserts into the table after the index is built, you can set the FILLFACTOR higher when you create the index. If you are expecting a large number of inserts into the table, you can leave the FILLFACTOR at the default value or set it lower. If the FILLFACTOR is set too low, you risk a decrease in the cache rate and an unnecessary increase in the amount of disk space needed for the index. The fill factor is not kept during the life of the index. It is only applied once as the index is built. It does not take effect unless the table is fragmented, or there are 5,000 rows in the table occupying at least 100 pages of disk space.

Note
The DBSCHEMA utility will not list the fill factor if it is specified in an index.

Indexes and Indexing Strategy 5-9

Managing Indexes

ALTER INDEX ix_man_cd TO CLUSTER; ALTER INDEX ix_man_cd TO NOT CLUSTER; DROP INDEX ix_stock; RENAME INDEX ix_stock TO newix_stock;
7.31 FEATURE

10

You can change the cluster attribute of an index with the ALTER INDEX statement, as in the first two examples above. The syntax of the statement is somewhat misleading. The real effect is to re-order the actual rows in the table. When you use the NOT CLUSTER option, the cluster attribute on the index name is dropped, but the physical table is not affected. Whatever clustering of rows existed will remain. Because only one clustered index per table can exist, you must use the NOT option to release the cluster attribute from one index before you assign it to another. You cannot use ALTER INDEX to change the components of an index. You must delete and recreate the index. To delete an index, use the DROP INDEX statement. You cannot use the DROP INDEX statement to drop an index that is created as a unique constraint by the CREATE TABLE or ALTER TABLE statement. You must use the ALTER TABLE statement to remove these indexes. Beginning with version 7.31, you can re-name an existing index. To determine the existing indexes on a table you may use utilities such as DB-Access and DBSCHEMA, IECC, or by querying the system catalog table sysindexes .

5-10 Indexes and Indexing Strategy

Benefits of Indexing
n n n n

Improve performance for data retrieval by replacing sequential reads with non-sequential (indexed) reads Improve performance during data sorting Ensure uniqueness of key values Avoid reading row data at all when processing queries that retrieve only indexed columns (key-only selects)

11

Without an index, tables are accessed sequentially (that is, every row in the table is read in the physical order of the data file). Placing an index will allow the optimizer to choose to replace the sequential read of the table with an indexed read when it will improve performance. An index on a column or columns can be used to retrieve data in a sorted order. By performing an indexed read (a read of the table via an index), rows returned will automatically be in sorted order. This prevents the database server from having to sort the output data. By creating an index on a column with the UNIQUE keyword, only one row in the table can have a column with that value. This prevents the need to perform any uniqueness checking through the application program. When all columns listed in the SELECT clause are part of the same index, IBM Informix can read the index instead of reading the data rows. This can greatly reduce the amount of I/O needed to process such a query.

Indexes and Indexing Strategy 5-11

Index Join Columns


orders customer

order__num...customer_num 1001 104 101 104 106


... ...

customer_num . . . 104 101 105 106 ...

Duplicate index

Unique index

12

In order to replace sequential reads with indexed reads, there should be an index on at least one column named in any join expression. If there is no index, the database server will either:
n n

Build a temporary index before the join and perform a nested loop join Sequentially scan the table and perform a hash join

When there is an index on both columns in a join expression, the optimizer has more options when it constructs the query plan. OLTP environments As a general rule in OLTP environments, place an index on any column that is frequently used in a join expression. Primary and foreign keys are automatically indexed by the system. If you decide to index only one of the tables in a join, index the table with unique values for the key corresponding to the join column(s). A unique index is preferable to a duplicate index for implementing joins. DSS environments As a general rule, in Decision Support (DSS) environments large amounts of data are read and sequential table scans are performed. Indexes may not play an optimal role in implementing joins since hash joins may be preferred.

5-12 Indexes and Indexing Strategy

Index Filter Columns


mail ...

zipcode 94086 94117 94303 94115 94062 92117 95086


index

13

If a column is often used to filter the rows of a large table, consider placing an index on it. The optimizer can use the index to pick out the desired rows, avoiding a sequential scan of the entire table. An example is a table containing a large mailing list. If you find that a zipcode column is often used to filter out a subset of rows, you should consider putting an index on it even though it is not used in joins. This strategy will yield a net savings of time only when the selectivity of the column is high, that is, only when there are not a lot of duplicate values in that column. Non-sequential access through an index takes more disk I/O operations to retrieve many rows than sequential access, so if a filter expression will cause a large percentage of the table to be returned, the database server might as well read the table sequentially. Generally, indexing a filter column will save time when:
n n

The column is used in filter expressions in many queries or in queries of a large table There are relatively few duplicate values

Indexes and Indexing Strategy 5-13

Index Columns Involved in Sorting

index

orders table

order_date 01/20/1996 03/23/1996 06/01/1996 10/12/1996


...

customer_num 101 104 104 104


...

order_date 06/01/1996 01/20/1996 10/12/1996 03/23/1996

14

When a large quantity of rows has to be ordered or grouped, the database server will sort the selected rows via a sort package before returning them to the front-end application. If, however, there is an index on the ordering column(s), the optimizer will sometimes plan to read the rows in sorted order through the index, avoiding the final sort. Whether the index is used depends upon the complexity of the query. Since the keys in an index are in sorted sequence, the index really represents the result of sorting the table. By placing an index on the ordering column(s), you can eliminate many sorts during queries. The example above shows a table whose data is not sorted. Without an index on order_date, the server would have to sort the data. With an index on order_date, the server only needs to read the index (which is in order) to retrieve the data by order date.

5-14 Indexes and Indexing Strategy

Avoid Highly Duplicative Indexes

Avoid indexing columns with many duplicate values

table

sex .. m f m m f

15

When duplicate keys are permitted in an index, the entries that have any single value are grouped in a list. When the selectivity of the column is high, these lists will be short. But when there are only a few unique values, the lists become quite long. For example, in an index on a column whose only values are m for male and f for female, all the index entries are contained in just two lists of duplicates. Such an index is not very useful.

Updating an Index with Many Duplicates


When an entry has to be deleted from a list of duplicates, the server has to read the whole list and rewrite some part of it. When adding an entry, the database server puts the new row at the end of the list. Neither operation is a problem until the number of duplicate values becomes very high. The server is forced to perform many I/O operations to read all the entries, in order to find the end of the list. When it deletes an entry, it will typically have to update and rewrite half of the entries in the list. When such an index is used for querying, performance can also degrade because the rows addressed by a key value may be spread out over the disk. Imagine an index addressing rows whose location alternates from one part of the disk to the other.

Indexes and Indexing Strategy 5-15

As the database server tries to access each row via the index, it must perform one I/O for every row read. It will probably be better off reading the table sequentially and applying the filter to each row in turn. If it is important to index a highly duplicate column, you may consider forming a composite key with another column that has few duplicate values.

5-16 Indexes and Indexing Strategy

Avoid Heavy Indexing of Volatile Tables

table

inserts deletes update

inserts deletes update

17

Because of the extra reads that must occur when indexes are updated, some degradation will occur when there are many indexes on a table that is being updated frequently. An extremely volatile table should probably not be heavily indexed unless you feel that the amount of querying on the table outweighs the overhead of maintaining the index file. Indexes can be dropped and recreated. During periods of heavy querying (for example, reports) you can improve performance by creating an index on the appropriate column. Creating indexes for a large table, however, can be a time-consuming process. Also, while the index is being created, the table will be exclusively locked, preventing other users from accessing it.

Indexes and Indexing Strategy 5-17

Create Composit Indexes


w w w w

To join tables on multiple columns To filter on multiple columns in a table To increase uniqueness To sort on multiple columns in a table

18

Composite indexes can improve performance in several ways.


n

Composite indexes facilitate joining tables on multiple columns. If several columns of one table join with several columns in another table, create a composite index on the columns of the table with the larger number of rows. If queries frequently filter on multiple columns in a table, create a composite index corresponding to the filter columns used in the query. Use a composite index to speed up an INSERT into an indexed column with many duplicate values. Adding a unique (or more unique) column to a column that has many duplicate values will increase the uniqueness of the keys and reduce the length of the duplicate lists (previously mentioned). The query will be able to perform a partial key search using the first (highly duplicate) field, which will be faster than searching the duplicate lists. When a table is commonly sorted on several columns, a composite index corresponding to those columns can sometimes be used to implement the ordering.

n n

5-18 Indexes and Indexing Strategy

Keep Key Size Small

Key size should be small relative to the row size.

More keys values can be stored in a node of a B+ tree if the values are small.

19

Because an index can require a substantial amount of disk space to maintain, it is best to keep the size of the index small relative to the row size. It is important to keep key size to a minimum for two reasons:
n n

One page in memory will hold more key values, potentially reducing the number of read operations to look up several rows. A smaller key size may cause less B+ tree levels to be used. This is very important from a performance standpoint. An index with a 4 level tree will require 1 more read per row than an index with a 3 level tree. If 100,000 rows are read in an hour, this means there will be 100,000 less reads to get the same data.

When the rows are short or the key values are long, it may be more efficient to just read the table sequentially. There is, of course, a certain break-even point between the size of a key and the efficiency of using that index, though this will vary according to the number of rows in the table. An exception to this is key-only selects. If all the columns selected in the query are in the index, the table data will not be read, thus increasing the efficiency of using such an index.

Indexes and Indexing Strategy 5-19

ESTIMATING INDEX SPACE REQUIREMENTS


1. 2. 3. Add the widths of all the columns in the index. Add 4 bytes for the slot table entry. The result is keysize. a.) Unique indexes - calculate the index entry size (entrysize ). Non-fragmented tables: entrysize = keysize + 4 byte row pointer + 1 byte delete flag Fragmented tables: entrysize = keysize + 4 byte row pointer + 1 byte delete flag + 4 byte fragment id b.) Duplicate indexes - calculate the index entry size (entrysize). First, calculate the proportion of unique entries in the index to the total number of rows in the table ( propunique). propunique = # of unique entries/# of rows in table Use this value to estimate the index entry size. Non-fragmented tables: entrysize = (keysize * propunique) + 5 4. Fragmented tables: entrysize = (keysize * propunique) + 9 Determine page length in bytes by subtracting 28 (for the header) from the page size (2k or 4k depending on your system) to obtain the usable page space ( pageuse). pageuse = (page size - 28) Determine the number of entries per index page (round down): # of entries per page = pageuse/entrysize Determine the number of leaf pages (level 0) that are needed (round up): # of leaves = # of rows in the table/# of entries per page 7. 8. Adjust for the fill factor if necessary. Determine the number of branch pages at the next level of the index (round up): # of branches = # of leaves/# of entries per page If # of branches is greater than one, additional branch levels are needed. Determine the number of branches needed at the next level: # of new branches = previous number of branches/# of entries per page 9. Continue this calculation until the number of new branches needed equals 1. Add the number of leaves plus the total number of branches for all levels to get the total number of pages needed.

5. 6.

# of leaves + # of branches = total pages 10. Determine the total space needed in kilobytes. disk space = page size * total pages needed This method will yield a conservative (high) estimate of the space needed.

5-20 Indexes and Indexing Strategy

Calculating disk space for an index: example using manu_code from items table (duplicate index)
1. Add the widths of all the columns in the index. Column name Data type manu_code TOTAL 2. 3. Add 4 bytes for the slot table entry. keysize = 3+ 4 = 7 Duplicate indexes - calculate the index entry size (entrysize). SELECT COUNT(DISTINCT manu_code) FROM items; --propunique = 9/67 = .13 Use this value to estimate the index entry size. 4. 5. 6. 7. entrysize = (7 * .13) + 5 = 5.91 = 6 bytes Determine usable page length in bytes (for a 2k page). pageuse = 2048 - 28 = 2020 Determine the number of entries per index page (round down): # of entries per page = 2020/6 =336.6 = 336 Determine the number of leaf pages (level 0) that are needed (round up): # of leaves = 67/336 = .19 = 1 Determine the number of branch pages at the next level of the index (round up): char(3) 3 3 Bytes

# of branches = 1/336 = .001 = 1 8. No additional branch levels are needed. 9. Add the number of leaves plus the total number of branches. 1 + 1 = 2 10. Determine the total space needed in kilobytes. disk space = 2k * 2= 4k

Alternative method of estimating space needed:


Use a pointer of 5 or 9 as appropriate depending on whether the table is fragmented. space needed = ((keyvalue+4) * (# of unique values) + 5 or 9)) * number of rows When calculating the number of leaf and twig pages, a fudge factor (perhaps 10%) should be added for the repetition on each page.

Indexes and Indexing Strategy 5-21

INDEXES AND EXTENT SIZE


n

If table is not fragmented and the index is created in the same dbspace as the table: w Data and index pages are interleaved in the extent. w Table extent size must be increased to hold the index. If table is fragmented or index is created in a separate dbspace: w Index pages are stored in a separate extent. w Index extent size is automatically calculated based on table extent size.

22

When indexes are created in the same dbspace as the table and the table is not fragmented, the data and index pages are interleaved in the extent . The extent size for the table must be increased to allow for storage of the index. size of table extent = disk space needed for table + disk space needed for index If the index is created in a separate dbspace, or if the table is fragmented, the index pages are not stored in the same extent as the data pages . It does not matter if the index is fragmented or nonfragmented. The index extent size is automatically determined, based on the extent size for the table. In this case, the index extent size can be calculated as follows: size of index extent = (index row size/table row size) * table extent size

5-22 Indexes and Indexing Strategy

Costs of Indexing

Disk Space Costs

Processing Time Costs


table

data index

inserts deletes updates

index

23

The first cost associated with an index is one of disk space. An index contains a copy of every unique data value in the indexed column(s) and an associated 4-byte slot table entry. It also contains a 4-byte pointer for every row in the table and a 1-byte delete flag. For indexes on fragmented tables, the 4-byte pointer is expanded to 8 bytes to accommodate a fragment ID. This can add many blocks or pages to the space requirements of the table. It is not unusual to have as much disk space dedicated to index data as to row data. The second cost is one of processing time while the table is modified. Before a row is inserted, updated, or deleted, the index key must be looked up in the B+ tree first. Assume an average of two I/Os are needed to locate an index entry. Some index nodes may be in shared memory, though other indexes that need modification may have to be read from disk. Under these assumptions, index maintenance adds time to different kinds of modifications as follows: Deleting rows - the related entries are deleted from all indexes. Null values are entered in the row in the data file. Inserting rows - the related entries are inserted in all indexes. The node for the inserted row's entry is found and rewritten for each index. Many insert and delete operations can also cause a major restructuring of the index (as they are implemented using B+ trees), requiring more I/O activity.

Indexes and Indexing Strategy 5-23

Updating rows - the related entries are looked up in each index that applies to a column that was altered. The index entry is rewritten to eliminate the old entry, then the new column value is located in the same index and a new entry made.

5-24 Indexes and Indexing Strategy

Mass Updates to a Table


table

massive updates or large loads

1. Disable or drop indexes 2. Do update 3. Enable or re-create the indexes


25

In some applications, the majority of table updates can be confined to a single time period such as overnight or at the end of the month. When this is the case, consider dropping or disabling all non-unique indexes while the updates are being performed and re-creating them afterward. The presence of indexes also slows down the population of tables. Loading a table that has no indexes at all is a very quick process (little more than a disk-to-disk sequential copy), but updating indexes adds a great deal of overhead. Dropping or disabling the indexes can have two positive effects.
n

Since there are fewer indexes to update, the updating program is likely to run faster. It is often the case that the total time to drop the indexes, update without them, and recreate them afterward will be less time than the time to update with the indexes in place. Newly-made indexes are the most efficient ones. Frequent updates tend to dilute the index structure, causing it to contain many partly-filled index nodes. This reduces the effectiveness of an index, as well as wasting disk space.

Another time-saving measure is making sure that a batch updating program calls for rows in the sequence defined by the primary key index. That will cause the pages of the primary key index to be read in order and only one time each.

Indexes and Indexing Strategy 5-25

Indexes and Empty Extents

ALTER INDEX item_idx TO CLUSTER;

ALTER INDEX . . . TO CLUSTER causes the table to be re-written, freeing unused space for other tables.
26

The ALTER INDEX TO CLUSTER statement will physically re-structure the table, packing it and freeing up space used by any extents. The new table may still have many extents even after having compressed all the rows, particularly if there are many tables in the same dbspace. To restructure your table so that there are fewer, larger, extents, you must unload the data and rebuild the table specifying an appropriate EXTENT SIZE and NEXT SIZE.

5-26 Indexes and Indexing Strategy

Fast Indexing
n n n

Indexes built in parallel Parallel sort package used PDQPRIORITY can increase memory available

27

In addition to the CREATE INDEX statement, adding a unique primary key, or referential constraint, or enabling one of these constraints, may result in an parallelized index build. The IBM Informix parallel sort package will be used for parallel index builds, to sort the keys prior to their insertion into the index. If the table is larger than the available memory, some disk I/O may be done. The PDQPRIORITY environment variable can be used to determine the maximum amount of memory that can be used, as well as the maximum number of scan threads.

Location of temporary sort files


Temporary files created by sorting can be stored in the directory that you specify by setting the environment variable PSORT_DBTEMP. If this is set to more than one directory, the temporary files are stored in a round robin fashion in the directories listed. If the PSORT_DBTEMP environment variable is not set, then the dbspaces included in the DBSPACETEMP environment variable or the DBSPACETEMP configuration parameter are used (temporary files are fragmented across the dbspaces). If both the environment variables and configuration parameter are not set, then the temporary files are stored in the /tmp directory.

Indexes and Indexing Strategy 5-27

SYSINDEXES
SELECT sysindexes.* FROM sysindexes, systables WHERE tabname = "items" AND systables.tabid = sysindexes.tabid

28

The system catalog table sysindexes describes each index on a column in the database. Each row contains an index name, the owner, the tabid of the table, the index type, and other information such as whether the index is clustered and the column numbers of the columns of the index. The sysindexes table can be queried to get index information for a particular table. The SELECT statement above returns index information for the items table in the stores database. The structure and contents of the sysindexes table is explained in detail in the system catalog appendix to this manual.

5-28 Indexes and Indexing Strategy

Exercises

Indexes and Indexing Strategy 5-29

Exercise 1
Creating and Using Indexes
Complete this exercise using the tool specified by your instructor to enter the SQL statements. This exercise requires the employee and department tables that you created in an earlier exercise. 1.1 The HR department often must look up an employee by the last name. Add an index to make this operation more efficient. 1.2 The HR department also frequently runs an employee report that lists all the employees in order by last name. Change the index you created in step #1 to make this operation more efficient. 1.3 Create an index for the department table that guarantees that the department number is unique. The following illustrates the use of the oncheck utility to monitor index growth. 1.4 Execute the following command against the employee table and pipe the output to more (this oncheck command will lock the table in share mode): oncheck -pT <database name>:employee | more 1.5 The items table in the stores demonstration database has more rows that your employee table. Execute oncheck against the items table and answer the following questions: a. How many keys are there in the items table? b. How many pages are allocated to the table? What do they contain? c. How many levels are there in each index? How many free bytes?

5-30 Indexes and Indexing Strategy

Solutions

Indexes and Indexing Strategy 5-31

Solution 1
1.1 The HR department often must look up an employee by the last name. Add an index to make this operation more efficient. CREATE INDEX lname_idx ON employee(lastname); 1.2 . Change the index you created in step #2 to make this operation more efficient. ALTER INDEX lname_idx TO CLUSTER; 1.3 Create a unique index on department. CREATE UNIQUE INDEX dep_idx ON department(deptnum); 1.4 No solution required. 1.5 Execute an oncheck command against the items table. oncheck -pT <database name>:items | more a. How many keys are there in the items table? 3 b. How many pages are allocated to the table? What do they contain? 8 pages are allocated; 1 data page, 3 index pages, 1 bit map page c . How many levels are there in each index? How many free bytes? 1 level; 1015 free bytes, 1501 free bytes, 1334 free bytes

5-32 Indexes and Indexing Strategy

Module 6
Fragmentation

Fragmentation 09-2001 2001 International Business Machines Corporation

6-1

Objectives
At the end of this module, you will be able to: n List the ways to fragment a table and index n Create a fragmented table and index n Alter a fragmented table and index

6-2 Fragmentation

Fragmentation

Fragmentation is the distribution of data from one table across separate dbspaces.

a_table

dbspace 1

dbspace 2

dbspace 3

Using IBM Informix Dynamic Server, tables and indexes can be fragmented within the IBM Informix system. Fragmentation provides the ability to distribute data from a table on separate disks. IBM Informix implements fragmentation (sometimes known as horizontal fragmentation or partitioning) by placing each fragment in a separate dbspace. Presumably, each fragment is located on a separate disk and may have one or more chunks.

The Goal of Fragmentation


A goal in fragmentation is to balance I/O and maximize throughput across multiple disks. This is a very difficult task to accomplish without fragmentation. IBM Informixs fragmentation provides an intelligent method for grouping and distributing data in a table across many disk drives to help achieve this goal.

Local Fragmentation Only


IBM Informix Dynamic Server provides for local fragmentation. It does not support distributed or remote tables where a fragment of the table is located in another database or IBM Informix system.

Fragmentation 6-3

Advantages of Fragmentation
Advantages of fragmentation include: n Parallel scans and other parallel operations n Balanced I/O n Finer granularity of archives and restores n Higher availability

The primary advantages of fragmentation include: Parallel scans If you are in a decision support (DSS) environment and using the Parallel Database Queries (PDQ) features in IBM Informix Dynamic Server, the server can read multiple fragments in parallel. This is advantageous to DSS queries where large amounts of data are read. Other parallelized operations include: joins, sorts, aggregates, groups, and inserts. By balancing I/O across disks, you can reduce disk contention and eliminate bottlenecks. This is advantageous in OLTP systems where a high degree of throughput is critical.

Other operations Balanced I/O

Archive and restore Fragmentation provides for a finer granularity of archives and restores. An archive and restore can be performed at the dbspace level. Since a fragment resides in a dbspace, this means that an archive and restore can now be performed at the fragment level. Higher availability You can specify whether to skip unavailable fragments in a table. This is advantageous in DSS where large amounts of data are read, and processing should not be interrupted if a particular fragment is unavailable.

6-4 Fragmentation

Parallel Scans and Fragmentation

scan thread

scan thread

scan thread

fragment 1

fragment 2

fragment 3

One of the benefits of fragmentation is that it enables parallel scans. A parallel scan is the simultaneous access of multiple fragments from the same table. In IBM Informix Dynamic Server, a single query may have multiple threads of execution and each thread can potentially access a different fragment. A query may have multiple threads on a single processor machine but only one thread will execute at a time. The optimum situation is to execute parallel queries on a multiprocessor machine where many threads can execute simultaneously. This is especially useful with DSS queries.

DSS Queries
n n n n n n

Many rows are read and result in no (or very little) transaction activity Data is read sequentially Complex SQL operations execute Large temporary files are created Response times are measured in hours and minutes There are relatively few concurrent queries

Fragmentation 6-5

Parallel Scans (PDQ Queries)


n n

Environment variable PDQPRIORITY=40 SQL statement SET PDQPRIORITY 40

Parallel Database Query, or PDQ, is the feature of IBM Informix Dynamic Server that permits queries to be parallelized. A query is parallelized only if it is designated as a PDQ Query (except for sorting). This can be done at the session level by setting an environment variable, PDQPRIORITY. The SQL statement SET PDQPRIORITY can be used to enable PDQ for individual queries. The valid values for PDQPRIORITY are shown on the next page.

Important!
Since PDQ queries can take up more resources than non-PDQ queries, this feature should generally be reserved for decision support (DSS) queries.

6-6 Fragmentation

The valid values for PDQPRIORITY are: Parameter percent-ofresources Meaning This indicates the percent of resources that a database server uses in order to answer the query. Resources include the amount of memory and the number of processors. The higher the number, the more resources the database server uses. The System Administrator can limit the resources that are available for PDQ. Range = -1, 0, 1 to 100 Only parallel scans are enabled. No other forms of parallelism are used. This is the equivalent of a setting of 1. The database server determines an appropriate value to use for PDQPRIORITY, based on the number of available processors, fragmentation of the tables, and so on. Uses the value specified in the PDQPRIORITY environment variable; equivalent to -1. No parallelism is used; equivalent to 0.

LOW

HIGH

DEFAULT OFF

Fragmentation 6-7

Balanced I/O and Fragmentation

fragment 1

fragment 2

fragment 3

Fragmentation can be used to balance the I/O across disk drives. Individual users can access different fragments of the same table and not be in contention. Balanced I/O is more important than parallelism in On Line Transaction Processing ( OLTP ) environments because maximum throughput of many concurrent queries is critical.

OLTP queries
n n n n n n

Relatively few rows and tables read Transaction activity (inserts, updates, and deletes) Data accessed via indexes Simple SQL operations Response times measured in seconds and fractions of seconds Many concurrent queries

Important!
OLTP queries should not be flagged as PDQ queries. Doing so exposes them to the concurrent queries limitation defined for PDQ in the configuration parameters and may cause severe bottlenecks.
6-8 Fragmentation

Types of Distribution Schemes


n

Round robin:
insert into t1 values(...) insert into t1 values(...) insert into t1 values(...)

Expression based:
insert into t1 (col1) values(800) insert into t1 (col1) values(220) insert into t1 (col1) values(240)

col1 <= 100

col1 > 100 and col1 < 500

remainder

There are two types of distribution schemes:


n

Round robin - This type of fragmentation creates even data distributions by randomly placing rows in fragments.
w w

For insert statements, the server uses a hash function on a random number to determine in which fragment to place the row. For insert cursors, the server places the first row in a random fragment, the second in the next fragment, and so on in a true round robin fashion.

Expression based - This type of fragmentation puts related rows in the same fragment. You can use this type of fragmentation to create uneven distributions of data. You specify an SQL expression for each fragment that identifies a set of rows. If a row matches the criteria in the expression, it will be placed in that fragment. You can specify a remainder fragment, which holds all rows that do not match the criteria specified for any other fragment. In the above example, the row where col1 = 800 is put in the remainder fragment because it does not match the criteria for the first ( col1 < =100) or second ( col1 > 100 and col1 < 500) fragment.

Fragmentation 6-9

Fragments and Extents

extent 1

extent 2

Table Fragment

tblspace1

dbspace1

10

Table fragments and index fragments are placed in designated dbspaces. Each fragment has a separate tblspace id. The tblspace id is also known as the fragment id. Each tblspace contains separate extents.

Extent Sizes
You will need to recalculate extent sizes for a fragmented table. When creating fragmented tables, the extent size is specified for the fragment. Remember that in expression-based fragmentation, the number of rows in each fragment will not be uniform, unlike round-robin fragmentation. For a fragmented table, the extent size of the index is determined by IBM Informix based on the extent size of the table (See the previous chapter on Indexes and Indexing strategy).

6-10 Fragmentation

Fragmenting a Table: Round Robin


CREATE TABLE table1( col_1SERIAL, col_2CHAR(20), ...) FRAGMENT BY ROUND ROBIN IN dbspace1,dbspace2 EXTENT SIZE 10000 NEXT SIZE 3000;

11

A FRAGMENT BY option has been added to the CREATE TABLE statement. The option is placed before the EXTENT or LOCK MODE options. The FRAGMENT BY ROUND ROBIN option must specify at least two dbspaces where the fragments will be placed. In the example above, the rows will be placed alternatively in dbspace1 and dbspace2. When a table is created, one extent of EXTENT SIZE will be reserved in each dbspace listed. You should calculate EXTENT SIZE and NEXT SIZE for an average size fragment.

Advantages and Disadvantages


The major advantage of the round robin strategy is that no knowledge of the data is needed to achieve an even distribution among the fragments. Also, when column values are updated, rows are not moved to other fragments because the distribution does not depend on column values. A disadvantage of the round robin strategy is that the query optimizer is not able to eliminate fragments when evaluating a query.

Fragmentation 6-11

When to Use Round Robin


Use the round robin distribution strategy when your queries perform sequential scans and you have little information about the data being stored. For example, consider using round robin when the data access method or the data distribution is unknown. Round robin may also be useful when your application is update-intensive or when fast data loading is important.

6-12 Fragmentation

Fragmenting a Table: Expression


CREATE TABLE table1( col_1SERIAL, col_2CHAR(20), ...) FRAGMENT BY EXPRESSION col_1 <= 10000 AND col_1 >= 1 IN dbspace1, col_1 <= 20000 AND col_1 > 10000 IN dbspace2, REMAINDER IN dbspace3;

13

The FRAGMENT BY EXPRESSION option provides control in placing rows in fragments. You specify a series of <k keyword>SQL expressions and a designated dbspace. If the expression is evaluated to true, the row will be placed in the corresponding dbspace.The REMAINDER IN clause specifies a dbspace that will hold rows that do not evaluate into any of the expressions. A row should only evaluate to true for at most one expression. If a row evaluates to true for more than one expression, it will be placed in the dbspace for the first expression. You can use any column in the table as part of the expression. Columns in other local or remote tables are disallowed. No subqueries or stored procedures are allowed as part of the expression.

Advantages and Disadvantages


Distributing data by expression has many potential advantages:
n n n n

Fragments may be eliminated from query scans. Data can be segregated to support a particular archiving strategy. Users can be granted privileges at the fragment level. Unequal data distributions can be created to offset an unequal frequency of access.

Fragmentation 6-13

A disadvantage is that CPU resources are required for rule evaluation. As the rule becomes more complex, more CPU time is consumed. Also, there is more administrative work with expression based fragmentation than with round robin. Finding the optimum rule may be an iterative process, and, once found, may need to be monitored.

When to Use Expression-Based Fragmentation


The goal of expression-based fragmentation is increased I/O throughput and fragment elimination during query optimization. The optimum situation for fragment elimination is when expression conditions involve a single column and do not overlap. Consider using an expression strategy when:
n n n n

non-overlapping fragments on a single column can be created. the table is accessed with a high degree of selectivity. the data access is not evenly distributed. overlapping fragments on single or multiple columns can be created.

6-14 Fragmentation

Logical and Relational Operators


CUSTOMER_NUM CUSTOMER_NUM CUSTOMER_NUM CUSTOMER_NUM IN (101,7924,9324,3288) = 4983 OR zipcode = 01803 < 10000 BETWEEN 10000 AND 20000

15

An expression-based distribution scheme uses an expression or rule to define which rows will be inserted into specific fragments. Each condition in the rule determines the contents of one fragment. There can be up to 2048 fragments and their associated conditions in one table. The following relational and logical operators can be used in a rule:
n n

>, <, >=, <=, IN , BETWEEN


AND , OR

Rules which use these operators are sometimes called range or arbitrary rules. A single condition may use multiple operators and may reference multiple columns. It is recommended, however, that you keep conditions as simple as possible to minimize CPU usage and to promote fragment elimination from query plans.

Fragmentation 6-15

Using Hash Functions


CREATE TABLE table1( customer_num SERIAL lname CHAR(20) ...) FRAGMENT BY EXPRESSION MOD(customer_num, 3) = 0 IN dbspace1, MOD(customer_num, 3) = 1 IN dbspace2, MOD(customer_num, 3) = 2 IN dbspace3;

16

A hash function can be used to evenly distribute data across fragments, especially when the column value may not divide commonly accessed data evenly across fragments. The example above shows one way that a hash function can be created. The SQL algebraic function MOD returns the modulus or remainder value for two numeric expressions. You provide integer expressions for the dividend and the divisor. The value returned is an integer. An expression-based distribution scheme which uses a hash function is also referred to as a hash rule.

Advantages and Disadvantages


A hash expression yields an even distribution of data. It also permits fragment elimination during query optimization when there is an equality search (including inserts and deletes). Fragment elimination does not occur during a range search.

When to use Hash Expressions


Use a hash expression if data access is via a particular column but the distribution of values within the column is unknown or unpredictable.

6-16 Fragmentation

Fragmenting by Expression
Guidelines: n Avoid REMAINDER IN clauses. n Attempt to balance I/O across disks. n Keep fragmentation expressions simple. n Arrange the conditions so the most restrictive part comes first. n Avoid any expression that must perform a conversion. n Optimize data loads by placing the most frequently accessed fragment first in your fragmentation statement. n If a significant benefit is not expected, do not fragment the table.

17

Once you have determined that fragmenting by expression is the optimal fragmentation strategy for you, there are additional guidelines that can help you maximize your strategy.
n n n

Avoid REMAINDER IN clauses when possible. It is best to have a specific expression for each module. Distribute data so that I/O is balanced across disks. This does not necessarily mean an even distribution of data. Keep fragmentation expressions simple. Fragmentation expressions can be as complex as you wish. However, very complex expressions take more CPU time to evaluate and may prevent the database server from eliminating fragments. Arrange the condition so the most restrictive part is first to reduce the number of expression evaluations that must be performed for many cases. In a logical AND operation, if the first clause is false, then the rest of the condition for that dbspace is not evaluated. For example, to insert the value 25, six evaluations are performed:
x >= 2 and x <= 10 in dbspace1, x > 12 and x <= 19 in dbspace2, x > 21 and x <= 29 in dbspace3,

remainder in dbspace4

Fragmentation 6-17

In the re-arranged condition, four evaluations are performed:


x <= 10 and x >= 2 in dbspace1, x <= 19 and x > 12 in dbspace2, x <= 29 and x > 21 in dbspace3,

remainder in dbspace4
n

Avoid any condition that must perform a data type conversion. A data type conversion will cause an increase in the time it takes to evaluate the condition. For example, a date data type is converted to an integer internally. If data loads are one of your primary performance objectives, you may be able to optimize your data loads by placing the most commonly accessed fragment first in your fragmentation statement. If a significant benefit is not expected, do not fragment the table.

6-18 Fragmentation

Fragmenting Indexes
Fragment 1 Fragment 2

dbspace1

No fragmentation is specified, and the entire index is placed in one separate dbspace.

dbspace1

dbspace2

An expression based fragmentation scheme is specified. Each index fragment occupies a different dbspace.

19

You can decide whether or not to fragment indexes. If you fragment your indexes, you must use an expression fragmentation scheme. You cannot use round robin fragmentation for indexes.

Non-Fragmented Indexes
If you do not fragment the index, you can put the entire index in a separate dbspace. In this strategy the resulting index and data pages are separate.

When to Use Fragmented Indexes


Since OLTP applications frequently use indexed access instead of sequential access, it can be beneficial to fragment indexes in an OLTP environment. DSS applications generally access data sequentially. Therefore, it is generally not recommended to fragment indexes in a DSS environment. System indexes, created to support constraints, remain un-fragmented and are created in the dbspace where the database is created.

Fragmentation 6-19

Warning!
If you do not specify a dbspace (non-fragmented) or list of dbspaces (fragmented), then the index will default to the same fragmentation strategy as the table. This scenario is not desirable if your table is fragmented round robin. It is highly recommended that you specify a dbspace for your indexes in this scenario.

6-20 Fragmentation

CREATE INDEX Statement


n

By Expression CREATE INDEX idx1 ON table1(col_1) FRAGMENT BY EXPRESSION col_1 < 10000 IN dbspace1, col_1 >= 10000 IN dbspace2; No fragmentation scheme is specified. CREATE INDEX idx1 ON table1(col_1) IN dbspace1;

21

A FRAGMENT BY EXPRESSION option has been added to the CREATE INDEX statement. If you do not want to fragment your indexes, simply specify the dbspace you want the entire index located in. The index fragments are created in a separate tblspace with their own extents.

Fragmentation 6-21

ROWIDS
n

To access a fragmented table by rowid, a rowid column must be explicitly created: CREATE TABLE orders( order_num SERIAL, customer_num INTEGER, part_num CHAR(20)) WITH ROWIDS FRAGMENT BY ROUND ROBIN IN dbs1,dbs2;
ALTER TABLE items ADD ROWIDS; ALTER TABLE items DROP ROWIDS;

22

Rowid in a non-fragmented table is an implicit column which may be used to uniquely identify a row in the table. In a fragmented table, rowids are no longer unique because they may be duplicated in different fragments. To use rowids with fragmented tables, you must explicitly add them to the table as in the examples above. A four byte rowid column is added to each row. When you add rowids to a fragmented table, the database server creates an index which maps the internal unique row address to the new rowid. Access to the table using rowid is always through the index. If your application uses rowids, performance may be affected when it accesses fragmented tables because of the index used for rowid mapping. IBM Informix recommends that you use primary keys instead of rowids for unique row access.

6-22 Fragmentation

Guidelines for a Fragmentation Strategy


n n n n

Identify the tables being accessed Analyze how the tables are being accessed (selectivity, filters) Determine whether the environment is DSS or OLTP Answer the questions: w How many CPUs and disks are available? w Is data loading an important factor? w Are fragment permissions an important factor? Evaluate I/O and adjust the distribution strategy

23

n n

Identify the tables being accessed.


w w

Examine your critical SELECT statements and identify the tables. Identify whether the tables are accessed sequentially or via index. Determine what filters and join columns are used in the SELECT statements. Attempt to utilize one or more of these columns in an expression strategy. If there is no suitable column for an expression strategy, or if the table is always read sequentially, use a round robin distribution scheme. In a DSS environment (sequential reads), indexes should generally not be fragmented whereas an OLTP environment may benefit from fragmented indexes. If the table distribution strategy is round robin, indexes should not be fragmented.

Analyze how the tables are being accessed.

Determine whether the environment is DSS or OLTP.


w

Fragmentation 6-23

Answer the questions:


w

How many CPUs and disks are available? DSS performance increases linearly with the addition of fragments up to the number of CPUs. OLTP performance may not improve after a certain point because the chances improve that range searches will need to scan multiple fragments. Is data loading an important factor? If data loading is a persistent issue then a round robin distribution scheme may provide the best performance. Are fragment permissions an important factor? Permissions may be granted on a fragment basis. If this is a desired feature, then distribution by expression must be used. Creating the optimum fragmentation strategy is an iterative process. After creating your fragments, evaluate the I/O pattern and attempt to achieve balanced I/O by adjusting the fragmentation rule. You may wish to switch from an expression strategy to a round robin strategy or vice versa. Monitoring tools that are available include SET EXPLAIN, onstat -d, onstat -g ppf, and onstat -g iof .

w w

Evaluate I/O and adjust the distribution strategy.


w

6-24 Fragmentation

The ALTER FRAGMENT Statement


n

ALTER FRAGMENT ... INIT w Initialize a new fragmentation scheme. ALTER FRAGMENT ... ADD w Add an additional fragment. ALTER FRAGMENT ... DROP w Drop a fragment. ALTER FRAGMENT ... MODIFY w Modify a fragmentation expression or dbspace. ALTER FRAGMENT ... ATTACH or DETACH w Combine tables with identical structures into a single fragmented table, or move a fragment into a separate table.
25

Use the ALTER FRAGMENT statement if you want to change your fragmentation strategy. For example, if by monitoring the I/O on your table fragments you determine that there is a bottleneck caused by unbalanced I/O, you would want to modify your original fragmentation strategy.
n

ALTER FRAGMENT ...INIT


w

Make a fragmented table non-fragmented.


ALTER FRAGMENT ON TABLE table1 INIT IN dbspace2

Make a non-fragmented table fragmented.


ALTER FRAGMENT ON TABLE table1 INIT FRAGMENT BY ROUND ROBIN IN dbspace1, dbspace2

Completely change the fragmentation strategy.


ALTER FRAGMENT ON TABLE table1 INIT FRAGMENT BY EXPRESSION col_1 <= 10000 AND col_1 >= 1 IN dbspace1, col_1 <= 20000 AND col_1 > 10000 IN dbspace2, REMAINDER IN dbspace3;

Fragmentation 6-25

ALTER FRAGMENT ...ADD During the execution of the ADD command the rows are shuffled to comply with the new distribution scheme.
w

Add additional fragment for expression-based fragmentation


ALTER FRAGMENT ON TABLE orders ADD note_code <=3000 OR note_code = 3500 IN dbspace3 BEFORE dbspace4

The BEFORE or AFTER clause is used to insert the new condition either before or after existing conditions. This can be important because conditions within an expression are evaluated sequentially. If BEFORE or AFTER is not specified, the dbspace is added at the end of the expression but before any remainder clause.
w

Add additional fragment for round robin fragmentation


ALTER FRAGMENT ON TABLE customer ADD dbspace3

ALTER FRAGMENT ... DROP The DROP clause moves all rows (or index keys) in the specified fragment to another fragment and drops the fragment. Make sure the other fragments have enough space to hold the rows that will be moved there. Dropping the number below two is not allowed. In an expression-based scheme, the rows in the dropped fragment will most likely go to the remainder fragment.
ALTER FRAGMENT ON TABLE table1 DROP dbspace1

ALTER FRAGMENT ... MODIFY If you change the expression, rows in the existing fragment not matching the expression will be moved to the appropriate fragment. If no fragment exists for that row, an error will be returned and the ALTER FRAGMENT will fail.
ALTER FRAGMENT ON TABLE table1 MODIFY dbspace1 TO col_1 > 30000 IN dbspace1

ALTER FRAGMENT ... ATTACH or DETACH


w

Use the ATTACH clause to combine two non-fragmented tables with identical schemas into one table. Both tables must have identical schemas and must be in different dbspaces. No referential, primary key, unique, or NOT NULL constraints are allowed in either table. The consumed table cannot have serial columns and the surviving table may not have check constraints. Index builds can be avoided if the newly added fragment is symmetric to the tables fragmentation.
ALTER FRAGMENT ON TABLE table1 ALTER table1, table2

6-26 Fragmentation

Use the DETACH clause to separate a table into two tables. Once a fragment is detached, the table that is created may be dropped. This is particularly useful in situations where a rolling set of fragments is being maintained over time with new fragments being added and old fragments being removed. Index rebuilds on the original table will not be necessary if the index fragmentation strategy of the detached fragment is identical to or highly parallel with the table fragmentation. In that case, the index fragments corresponding to the detached fragment will simply be dropped. The DETACH command will not work on tables with rowids.
ALTER FRAGMENT ON TABLE table1 DETACH dbspace2 table2

Fragmentation 6-27

How is ALTER FRAGMENT Executed?


Databases with transaction logging: n Executes as a single transaction, creating potential for long transaction. n Entire table is locked during the statement. n If a row is moved to a fragment, it is deleted in the old location and added in the fragment. Databases without transaction logging: n The entire table is locked during the statement. n The old fragments are kept intact until the ALTER FRAGMENT operation completes. n Need enough disk space for old and new fragments.
28

For databases with logging, ALTER FRAGMENT is executed as follows:


n

The statement executes as a single transaction, with each row move added as an entry in the logical log. Because of the potentially large number of log entries, you may run into a long transaction. For very large tables, consider turning off logging during this statement or separating the statement into smaller ALTER FRAGMENT statements. The entire table is locked exclusively during execution of the statement. If a row is moved to a fragment, it is deleted in the old location and added to the new fragment. The disk space for the location of the old row is freed as soon as the row is moved, but the extent is still allocated and will be allocated until it is entirely emptied. Make sure you have enough disk space to accommodate the fragment that is being deleted as well as the fragment that is being added.

n n

For databases without logging, ALTER FRAGMENT is executed as follows:


n n

The fragment is kept intact until the ALTER FRAGMENT statement completes. Make sure you have enough disk space to accommodate both the old and new fragments. The entire table is locked during execution of the statement.

6-28 Fragmentation

Skipping Inaccessible Fragments


SET DATASKIP SQL statement or IDS configuration parameter: n To turn on dataskip: SET DATASKIP ON n To turn off dataskip: SET DATASKIP OFF n To skip specific fragments: SET DATASKIP dbspace1 n To follow the skip strategy set by the configuration parameter: SET DATASKIP DEFAULT

29

You can use the SQL statement SET DATASKIP or IBM Informix configuration parameter to choose whether to skip unavailable fragments during a SELECT operation. Whenever a fragment is skipped, the sqlca.sqlwarn.sqlwarn7 flag is set to W (<pn product name>ESQL/C and <pn product name>ESQL/COBOL). An unavailable fragment cannot be skipped under the following circumstances:
n n n n n

Referential integrity - In order to delete a parent row, the child rows must also be available for deletion. In order to insert a child row, the parent row must be available. Updates - An update that must move a row from one fragment to another requires that both fragments be available. Inserts or deletes - A row that must be put in a specific fragment (because of expression based fragmentation) requires that the fragment be available. Indexes - An index key must be available if an INSERT, UPDATE, or DELETE affects that key. Serial keys - The first fragment stores the current serial key value. An INSERT that requires the next serial value requires the first fragment.

Fragmentation 6-29

Sysfragments
n

Each fragment is represented as a separate row in the table. w Fragment type w Table id w Index name w Partnum w Distribution type (round robin, expression) w Position of fragment in fragment list w Expression text w dbspace for the fragment w Number of data pages (indexes => # of leaf pages) w Number of rows in the fragment (indexes => # unique keys)
30

The system catalog table sysfragments stores fragment information. There is a row for each fragment in the sysfragments table. Some of the columns are: fragtype tabid indexname partn strategy evalpos exprtext dbspace npused nrows Table or index Table id in systables Index name Unique number for each fragment Expression or round robin Fragment number (indicates execution order of expression conditions) Expression text Dbspace name the fragment is located in Number of pages used or number of leaf pages Number of rows in the fragment or the number of unique keys

6-30 Fragmentation

Exercises

Fragmentation 6-31

Exercise 1
Drop the employee table you created in chapter 4 and use the same columns when you recreate the table in the following exercise. Ask your instructor for the three dbspaces you should use for this exercise. 1.1 Using the CREATE TABLE statement, create an employee table, fragmenting by round robin. (Use the three dbspaces provided by the instructor.) 1.2 Query the sysfragments table to verify that the table has been fragmented. The following SQL statement may be used:
SELECT a.fragtype, a.tabid, b.tabname, a.strategy, a.dbspace FROM sysfragments a, systables b WHERE a.tabid = b.tabid AND b.tabname = "employee";

1.3 Drop the table from the database. Re-create the table, fragmenting by expression on the employee_num column, assuming the employee numbers range from 1 to 7500. Verify that the table has been fragmented.

6-32 Fragmentation

Solutions

Fragmentation 6-33

Solution 1
1.1
CREATE TABLE employee( employee_num SERIAL, hire_date DATE, lname CHAR(20), fname CHAR(20), salary MONEY(9,2)) FRAGMENT BY ROUND ROBIN IN dbspace1, dbspace2, dbspace3 EXTENT SIZE 100 NEXT SIZE 60 LOCK MODE ROW;

1.2
SELECT a.fragtype, a.tabid, b.tabname, a.strategy, a.dbspace FROM sysfragments a, systables b WHERE a.tabid = b.tabid AND b.tabname = "employee";

Results:
fragtype T T T tabid tabname 112 employee 112 employee 112 employee strategy R R R dbspace dbspace1 dbspace2 dbspace3

1.3
CREATE TABLE employee( employee_num SERIAL, hire_date DATE, lname CHAR(20), fname CHAR(20), salary MONEY(9,2)) FRAGMENT BY EXPRESSION employee_num <= 2500 AND employee_num >= 1 IN dbspace1, employee_num <= 5000 AND employee_num > 2500 IN dbspace2, employee_num <= 7500 AND employee_num > 5000 IN dbspace3 EXTENT SIZE 100 NEXT SIZE 60 LOCK MODE ROW;

6-34 Fragmentation

Module 7
Concurrency Control

Concurrency Control 09-2001 2001 International Business Machines Corporation

7-1

Objectives
At the end of this module, you will be able to: n Discuss the types of concurrency control in IBM Informix databases n Describe the four isolation levels for reading data in IBM Informix databases n Describe the four levels of locking granularity

7-2 Concurrency Control

Types of Concurrency
n

Read Concurrency (SELECT statements) Update concurrency (INSERT, DELETE, and UPDATE statements)

Concurrency control deals with influencing how data can be viewed and updated by users accessing the same information at one time. For example, do you want one user to view an order that is being changed by another user? Do you want one user to change an order that is being viewed by another user? There are two classes of concurrency control:
n n

The first class is concurrency that applies to read-only database access, that is, SELECTs. This is referred to as isolation level . There are 4 levels of isolation. The second class is concurrency that applies to updating database records (that is, INSERTs, DELETE s and UPDATE s).

IBM Informix enforces concurrency control by using locks. There are three kinds of locks that can be used:
n n

Exclusive lock: No other locks can be placed on the data that holds an exclusive lock. Shared lock: Shared locks are placed by processes reading data. A shared lock cannot be put on data that already holds an exclusive lock. More than one shared lock on data is allowed. Update lock: This lock is similar to a shared lock except that it can be promoted to an exclusive lock later.

Concurrency Control 7-3

Read Concurrency
Levels of isolation for reading n Dirty read n Committed read n Cursor stability n Repeatable read

There are four levels of isolation for reading, which are listed above. To supply these levels, IBM Informix Dynamic Server uses shared locks. Shared locks let other processes read rows but not update them.

7-4 Concurrency Control

Dirty Reads

Table Server Process

Database server process reads rows from database table without checking for locks

At the isolation level of DIRTY READ, your process is not isolated at all. You get no locks whatsoever, and the process does not check for the existence of any locks before reading a row. During retrieval, you can look at any row, even those containing uncommitted changes. Such rows are referred to as dirty data. Rows containing dirty data may be phantom. A phantom row is a row that has been inserted within a transaction, which is later rolled back before the transaction completes. Although the phantom row never existed in a permanent sense, it would have been visible to a process using an isolation level of dirty read. DIRTY READs can be useful, though, when:
n n n

The table is static. 100% accuracy is not as important as speed and freedom from contention. You cannot wait for locks to be released.

Concurrency Control 7-5

Committed Reads

Table

Can lock be acquired?

Server Process

Database server process reads rows from database after seeing that lock could be acquired.

A COMMITTED

READ attempts to acquire a shared lock on a row before trying to read it. It does not actually try to place the lock, rather, it sees if it could acquire the lock. If it can, it is guaranteed that the row exists and is not being updated by another process while it is being read. Remember, a shared lock cannot be acquired on a row that is locked exclusively, which is always the case when a row is being updated. With COMMITTED READ, you have low-level isolation. During retrieval, you will not be looking at any phantoms or dirty data. You know that the current row was committed (at least when your process read it). After your process has read the row, though, other processes can change it. COMMITTED READs can be useful for:
n n n

lookups queries reports yielding general information

For example, COMMITTED READs are useful for summary-type reports such as month-ending sales analyses.

7-6 Concurrency Control

Cursor Stability

Table

Shared lock placed on row

Server Process

Database server process reads rows from database table and locks each one as it is real, lock is held until next row is fetched.

With CURSOR STABILITY, a shared lock is acquired on each row as it is read via a cursor. This shared lock is held until the next row is retrieved. If data is retrieved using a cursor, the shared lock is held until the next FETCH is executed. At this level, not only can you look at committed rows, but you are assured the row will continue to exist while you are looking at it. No other process can change (UPDATE or DELETE) that row while you are looking at it. SELECTs using an isolation level of CURSOR STABILITY can be used for:
n n n

lookups queries reports yielding operational data

For example, SELECTs using CURSOR STABILITY are useful for detail-type reports like price quotation or job tracking systems. If the isolation level of CURSOR STABILITY is set and a cursor is not used, CURSOR STABILITY behaves in the same manner as COMMITTED READ (the shared lock is never actually placed).

Concurrency Control 7-7

Repeatable Reads

Table

Locks put on all rows examined

Server Process

Database server process puts locks on all rows examined to satisfy the query.

The REPEATABLE READ isolation level places a shared lock on all the rows examined by the database server; all these locks are held until the transaction is committed. With REPEATABLE READ, you have high-level isolation. In explicit transactions, you are assured the row will continue to exist not only while you are looking at it, but also when you reread it later. No other process can change (UPDATE or DELETE) that row until you COMMIT your transaction. REPEATABLE READs are useful when you must treat all rows read as a unit or to guarantee that a value will not change. For example:
n n

Critical, aggregate arithmetic (for example, account balancing) Coordinated lookups from several tables (for example, reservation systems)

It is important to note that with REPEATABLE READs , all the rows examined are locked, this includes rows that do not meet the select criteria but had to be read in order to determine their ineligibility. For example, if you use the REPEATABLE READ isolation on a query that requires a table to be read sequentially (if no indexes are available, for example) all the rows in the table are locked, and those locks are held for the duration of the transaction. In order to ensure the integrity of the data set the corresponding index keys are also locked. Tip: Only use REPEATABLE READ on queries that can do indexed reads.

7-8 Concurrency Control

Setting the Level of Isolation


Examples: SET ISOLATION SET ISOLATION SET ISOLATION SET ISOLATION
TO TO TO TO DIRTY READ; COMMITTED READ; CURSOR STABILITY; REPEATABLE READ;

To make use of process isolation, your database must use logging. To pick an isolation level, use the SET ISOLATION statement. The syntax for this statement is shown above. If logging is not turned on, all reads are DIRTY READs and the isolation level cannot be set. A non-MODE ANSI database which uses logging defaults the isolation level to COMMITTED READ. MODE ANSI databases default the isolation level to REPEATABLE READ. Once set, the isolation level remains in effect for the duration of your session. It may be reset to a different value during the same session, and may be changed even within a transaction.

Concurrency Control 7-9

SET TRANSACTION Statement


n n n

Dirty Read SET TRANSACTION READ UNCOMMITTED; Committed Read SET TRANSACTION READ COMMITTED; Repeatable Read SET TRANSACTION SERIALIZABLE;

10

The isolation levels that you can set with the ANSI-compliant SET TRANSACTION statement are comparable to the isolation levels that you can set with the IBM Informix SET ISOLATION statement. The major difference between the SET TRANSACTION and SET ISOLATION statements is the behavior of the isolation levels within transactions. The SET TRANSACTION statement can be issued only once for a transaction. With the SET ISOLATION statement, after a transaction is started, you can change the isolation level more than once within the transaction. There is no comparable SET TRANSACTION statement for Cursor Stability.

7-10 Concurrency Control

Degree of Tolerable Interference

Isolation Level Dirty Read Let this process look at dirty data.

Committed Read Do not let this process look at dirty data. Cursor Stability Repeatable Reads Do not let other processes change my current row. Do not let other processes change any of the rows I have looked at until I am done.

11

To summarize, the chart on the slide above relates isolation levels to degrees of tolerable interference.

Concurrency Control 7-11

RETAIN UPDATE LOCKS - 7.31 Feature


n n

Lock is placed during a SELECT ... FOR UPDATE Retains update locks until end of transaction w Dirty Read w Committed Read w Cursor Stability
SET ISOLATION TO COMMITTED READ RETAIN UPDATE LOCKS;

12

For isolation levels less than REPEATABLE READ the database server releases update locks placed on rows as soon as the next row is fetched by a cursor during a SELECT ... FOR UPDATE operation. This new feature of IDS 7.31 allows you to hold the lock until the end of the transaction for DIRTY READ, COMMITTED READ, and CURSOR STABILITY isolation levels. This allows you to avoid the overhead of REPEATABLE READ - only update locks are held, while REPEATABLE READ holds both update and shared locks. To turn off this feature, use the SET ISOLATION statement without the RETAIN UPDATE LOCKS syntax. From that point on, a subsequent fetch with the cursor releases the update lock of the immediately preceding fetch.

7-12 Concurrency Control

Update Concurrency:
Levels of locking granularity n Database level n Table level n Page level n Row Level n Key Level

13

Locking granularity refers to the size of the object being locked. Granularity ranges through five levels, from coarse to fine. This range allows you to make trade-offs between concurrency and locking overhead. IBM Informix Dynamic Server provides five different levels of locking granularity. The coarsest level is database level locking, the finest level is row level locking. Key level locking is performed on index entries.

Concurrency Control 7-13

Database Level Locking

database stores exclusive

Other users cannot access database

stores database

14

It is occasionally necessary or advantageous to prevent other users from accessing any part of the database for some period of time. This may be the case if you are:
n n n

Executing a large number of updates involving many tables. Archiving the database files for backups Altering the structure of the database

The entire database can be locked by using the DATABASE statement with the EXCLUSIVE option. An example is shown above. The EXCLUSIVE option opens the database in an exclusive mode and allows only the current user access to the database. To allow other users access to the database, you must execute the CLOSE DATABASE statement and then reopen the database. Users with any level of database permission can open the database in exclusive mode. Doing so will not give them any greater level of access than they normally have.

7-14 Concurrency Control

Table-Level Locking

Other users cannot modify the table

database table

15

Table-level locking can be used to prevent other users from modifying the table. Use table-level locking to:
n n n n

Avoid conflict with other users during batch operations that affect most or all of the rows of a table. Avoid running out of locks when running an operation as a transaction. (This is covered in detail later in this module). Prevent users from updating a table for a period of time. Prevent access to a table while altering its structure or creating indexes.

You should use table-level locking only when making major changes to a table in a multi-user environment and when simultaneous interaction by another user would interfere. Only one exclusive lock can apply to a table at any given time. That is, if a user locks a table in exclusive mode, no other user can lock that table until the first user has unlocked it. You cannot lock the system catalog tables. If your database has transactions, tables can only be locked within transactions. Therefore, be sure that you have executed BEGIN WORK (unless you are using a MODE ANSI database) before attempting to lock a table. The table will be unlocked when the transaction is completed.

Concurrency Control 7-15

Locking a table in share mode


If you want to give other users read access to the table but prevent them from modifying any of the data that it contains, then you should use the LOCK TABLE statement with the IN SHARE MODE option. LOCK TABLE table-name IN SHARE MODE; When a table is locked in SHARE mode, other users are able to SELECT data from the table but they are not able to INSERT, DELETE, or UPDATE rows in the table or ALTER the table. It should be noted that locking a table in SHARE MODE does not prevent row locks from being placed for updates by your process. If you wish to avoid exclusive row locks in addition to the share lock on the table, you must lock the table in EXCLUSIVE MODE.

Locking a table in exclusive mode


If you want to prevent other users from having any access to the table, then you should lock it in EXCLUSIVE mode. In EXCLUSIVE mode, other users will be unable to SELECT (unless dirty read isolation is used), INSERT, DELETE, or UPDATE rows in the table until you unlock the table. LOCK TABLE table-name IN EXCLUSIVE MODE; Only one lock is used to lock the table, regardless of the number of rows that are updated within a transaction. In the case of a table that contains BLOBs located in a blobspace: If the table is locked in EXCLUSIVE MODE and changes are made to the associated BLOB values, each BLOB accessed obtains its own exclusive locks. These locks are placed and released automatically. Two locks are used per blobpage. Tables containing BLOBs located in the table do not obtain additional locks.

Unlocking a table
The UNLOCK TABLE statement restores access to a previously locked database table. Use this statement when you no longer need to prevent other users from accessing and modifying the table. UNLOCK TABLE table-name; If the table was locked in a transaction, UNLOCK TABLE is disallowed and generates an error. Finishing the transaction (via COMMIT or ROLLBACK) will unlock the table.

7-16 Concurrency Control

Setting the Lock Mode

SET LOCK MODE TO WAIT;

Wait forever for lock to be released

SET LOCK MODE TO NOT WAIT;

Do not wait for lock to be released

SET LOCK MODE TO WAIT 20;

Wait up to 20 seconds for lock to be released

17

The SET LOCK MODE statement is used to determine whether calls that alter or delete a locked row wait for the row to become unlocked. The TO NOT WAIT option causes an error to be returned if a statement attempts to alter or delete a row (or to SELECT a row FOR UPDATE) that another process has locked. This is the default mode. The TO WAIT option will cause a statement to wait on an attempt to alter or delete a row that has been locked by another process until the locked row becomes unlocked. If you specify the number of seconds to wait, the server will try to obtain the lock repeatedly until either the lock is obtained or the time has expired.

Concurrency Control 7-17

Page and Row Level Locking


n n n n

Determined at table-creation time Page locking locks an entire data page Row locking locks only the row Concurrency/resource trade-offs

18

When you create a table, you choose the lock mode used when accessing any rows from that table. Page level locking causes an entire data page to be locked whenever a single row located on that page needs to be locked. Row level locking causes only the row in question to be locked. The default lock mode when creating a table is page level. Page locks are useful when, in a transaction, you process rows in the same order as the table's cluster index or process rows in physically sequential order. Row locks are useful when, in a transaction, you process rows in an arbitrary order. When the number of locked rows becomes large, you run these risks:
n n

Number of available locks becomes exhausted. Overhead for lock management becomes significant.

There is a trade-off between these two levels of locking. Page level locking requires fewer resources than does row level locking but also reduces concurrency. If a page lock is placed on a page containing many rows, other processes needing other data from that same page could be denied access to that data.

7-18 Concurrency Control

Lock Access: Row/Page Level

B R e q u e s t e d

A (holds locks)

X X U S n n n

U n n y

S n y y

none y y y
x = exclusive u = update s = shared

19

The chart above shows the interaction between locks held and locks requested by two different processes on the same row/page level resource. On the horizontal axis are the locks that might be held by process A. On the vertical axis are the locks requested by process B. The matrix shows the result of the lock request ( y = lock granted, n = lock denied).

Concurrency Control 7-19

Deadlock Detection

Holds a lock on row x Process A Wants a lock on row y

Wants a lock on row x Process B Holds a lock on row y

20

When multiple processes are accessing the same rows in a table, it is possible for a deadlock to occur. In the example above, process A holds a lock on row x. It then wants to obtain a lock on a second row, row y. Row y is currently locked by another user, process B. If process A is waiting for locks (using set lock mode to wait), it will wait for process B to release the lock on row y. Process B , in the meantime, is holding the lock on row y, and wants to obtain a lock on row x. It is locked by another user, and if process B is waiting for locks, it will wait for row x to become free. In other words, A is waiting for B, and B is waiting for A. This is a deadlock situation, as both processes will wait forever. This situation can also arise with more than two users. Deadlocks are serious problems, as they can halt a major portion of the activity in a database system. IBM Informix has a built-in mechanism that detects deadlocks and prevents them from happening. This is how: The database server maintains a list of locks for every user on the system. Before a lock is granted, the lock list for each user is examined. If a lock is currently held on the resource that the process wishes to lock, the owner of that lock is identified, and their lock list is traversed to see if there are waits on any locks held by the user wanting the new lock. If there are, the deadlock is detected at that point, and an error message is returned to the user who wanted the lock.The ISAM error code returned is: -143 ISAM error: deadlock detected

7-20 Concurrency Control

Key Value Locking


Key Value Locking - a method of B+ tree locking where the key being updated, inserted, or deleted is locked.

key value

rowid

delete flag

0 = Not deleted 1 = Deleted

21

IBM Informix Dynamic Server uses a method of B+ tree locking called key value locking. For a DELETE statement, this means that IBM Informix actually locks the key value that is being deleted for the duration of the transaction. In order to do this, IBM Informix does not actually delete the key value but marks it as deleted by setting a delete flag, which is physically located in each item. The delete flag is part of every key in every index. It is one byte that is marked as 0 for keys that have not been deleted and is 1 for keys that are deleted. In addition, there is a flag in the page header that indicates that there is a key in the page that has been deleted and should be cleaned up.

Exceptions
Key value locking occurs unless the following is true:
n n

The database is not logged. In this case, no transactions are allowed so the key is deleted immediately. The table is locked IN EXCLUSIVE MODE. Because no other user will check or obtain locks, there is no need to place a lock on individual keys. The key is deleted immediately.

Concurrency Control 7-21

REPEATABLE READ
IBM Informix Dynamic Server uses special handling for the repeatable read isolation level. The adjacent key is checked for a lock when a row is inserted. If the lock on the adjacent key indicates that the process holding the lock is using repeatable read isolation level, the insert will fail. This special lock testing for repeatable read isolation level is necessary because this level must protect the set of rows it has read.

7-22 Concurrency Control

What Happens After a DELETE?

btree cleaner Pool

btcleaner

Index key

1.When a deleted item is committed, its page number is put in the btree cleaner pool. 2.The btcleaner thread reads the pool occasionally and removes all committed deleted items on each page it finds.

23

Since a DELETE does not delete the associated keys in the index, there must be another mechanism that eventually performs a key delete. That mechanism is known as the btcleaner thread . When an item is deleted, the delete flag is set. When the transaction is committed, a request to delete the item is placed in a pool in shared memory called the btree cleaner pool. The request is a 20 byte structure that consists of the tblspace number, the page number and the key number for the key to be deleted. Only one request is placed in the pool for each page. The btree cleaner pool starts out at 1k but if this space becomes full, another 1k is allocated to the pool for more requests. At one minute intervals or if the number of requests in the btree cleaner pool exceed 100, the btcleaner thread wakes up and reads requests in the btree cleaner pool. For each request, the btcleaner finds the page and deletes the key that is marked as deleted. Before deleting the key, however, the btcleaner thread makes sure the row has been committed by test-locking it.

Concurrency Control 7-23

How Other Sessions See Deleted Keys


If another session encounters a deleted key while reading an index, the session will check to see if the key value is still locked. If it is, the session assumes that row still exists. If, however, the row is marked as deleted but is not locked (the btree cleaner has not deleted the key yet), the session skips over the key entry as if it was not there. The UPDATE STATISTICS statement normally reads through the index btree leaf pages to compute statistics for the query optimizer. In addition, the UPDATE STATISTICS statement also looks for pages that have the delete flag marked on in the page header. If any keys are found, the page is put in the btree cleaner pool to be cleaned. UPDATE STATISTICS acts as a backup for the normal mechanisms that remove items that have been deleted. For example, if a system crash causes the btree cleaner pool to be lost (because it is in shared memory) the keys will not be removed. When the system comes up, everything will run normally; the deleted keys will just occupy space that cannot be re-used immediately. You can run the UPDATE STATISTICS statement to remove the items.

7-24 Concurrency Control

syslocks and syssessions


SELECT username, sid, waiter, dbsname, tabname FROM sysmaster:syslocks, sysmaster:syssessions WHERE sysmaster:syssessions.sid = sysmaster:syslocks.owner AND sysmaster:syssessions.username = "your login";

25

The tables syslocks and syssessions in the sysmaster database can give you information about locks that are currently being held by users of your system. You can join the syslocks and syssessions tables to list the current locks, who owns them, and if any session is waiting for a lock. Results of the query in the slide above:
username informix informix informix informix informix sid 202 202 202 202 202 waiter 206 dbsname stores7 stores7 stores7 stores7 stores7 tabname customer manufact manufact customer customer

The column sid gives the session id assigned to the users session. In the example output above, session 206 is waiting for the locks that are being held by the user informix.

Concurrency Control 7-25

7-26 Concurrency Control

Exercises

Concurrency Control 7-27

Exercise 1
The following exercise requires that you have two terminal sessions both executing SQL statements against your database. 1.1 Session A - enter:
begin work; update manufact set lead_time = 2 where manu_code = ANZ;

1.2 Session B - enter:


select * from manufact where manu_code = HRO; n

What happens and why? 1.3 Session A - enter:


rollback work; alter table manufact lock mode(row); begin work; update manufact set lead_time = 2 where manu_code = ANZ;

1.4 Session B - enter:


select * from manufact where manu_code = HRO; n

How is data accessed by this statement? Why is the outcome of step #4 different than step #2? 1.5 Session B - enter:
select * from manufact; n

What happens and why? 1.6 Session B - enter:


set isolation to dirty read; select * from manufact; n

This is the same select statement as #5. What happens differently and why? 1.7 Session B - enter:
set isolation to committed read; set lock mode to wait; select * from manufact; n

What happens and why?

7-28 Concurrency Control

1.8 Session A - enter;


commit work; n

What happens to Session B after this statement is executed? Why? 1.9 Session A - enter:
set isolation to repeatable read; begin work; update manufact set lead_time = 2 where manu_name = Husky; n

How is data accessed by the update statement - sequentially or via index? What types of locks are created when the update statement is executed? 1.10 Session B - enter:
set lock mode to not wait; update manufact set lead_time = 2 where manu_code = ANZ; n

How does this statement access data? What happens and why? 1.11 Session B - enter:
select * from manufact where manu_code = ANZ; n

What isolation level does this statement have? Do you think any locks are created? What happens and why?

Concurrency Control 7-29

7-30 Concurrency Control

Solutions

Concurrency Control 7-31

Solution 1
1.1 Session A - enter:
begin work; update manufact set lead_time = 2 where manu_code = ANZ;

1.2 Session B - enter:


select * from manufact where manu_code = HRO; n

What happens and why?

The SELECT statement fails because step #1 has created exclusive locks and the locking granularity of the manufact table is page. Step #2 attempts to read a row which is located on an exclusively locked page. Notice that the duration of an exclusive lock is a transaction. 1.3 Session A - enter:
rollback work; alter table manufact lock mode(row); begin work; update manufact set lead_time = 2 where manu_code = ANZ;

Step #3 releases the exclusive locks generated by step #1 ( ROLLBACK WORK ) and changes the locking granularity of the manufact table to row. The UPDATE statement creates exclusive locks at the row level. 1.4 Session B - enter:
select * from manufact where manu_code = HRO; n

How is data accessed by this statement? Why is the outcome of step #4 different than step #2?

Step #4 succeeds because it accesses rows which are not locked by step #3. It is using indexed access and not sequential access. 1.5 Session B - enter:
select * from manufact; n

What happens and why? This statement fails because it attempts to read the manufact table sequentially and encounters rows which have been locked by step #3. It is using the default isolation level which is committed read.

7-32 Concurrency Control

1.6 Session B - enter:


set isolation to dirty read; select * from manufact; n

This is the same select statement as #5. What happens differently and why?

This statement succeeds whereas step #5 fails because the dirty read isolation level ignores all locks. 1.7 Session B - enter:
set isolation to committed read; set lock mode to wait; select * from manufact; n

What happens and why?

The statement waits to complete execution in this step. When it encounters the exclusive locks created by step #3 it waits until the locks are released. 1.8 Session A - enter;
commit work; n

What happens to Session B after this statement is executed? Why?

Step #8 commits the work from step #3 and releases the locks. At this point session B continues to read the manufact table. 1.9 Session A - enter:
set isolation to repeatable read; begin work; update manufact set lead_time = 2 where manu_name = Husky; n

How is data accessed by the update statement - sequentially or via index? What types of locks are created when the update statement is executed? There is no index on manufact.manu_name so the UPDATE statement accesses data sequentially. Since the isolation level is repeatable read a shared lock is held on each row that is read.Session B - enter:

set lock mode to not wait; update manufact set lead_time = 2 where manu_code = ANZ;

n How does this statement access data? What happens and why?
Step #10 accesses data via index but it fails because step #9 has created shared locks on all rows in the manufact table. The attempt to exclusively lock a row conflicts with the pre-existing shared lock.

1.10 Session B - enter:


select * from manufact where manu_code = ANZ;

Concurrency Control 7-33

What isolation level does this statement have? Do you think any locks are created? What happens and why? This statement has a committed read isolation level. It succeeds because committed read is compatible with the pre-existing shared locks.

7-34 Concurrency Control

Module 8
Referential Integrity

Referential Integrity 09-2001 2001 International Business Machines Corporation

8-1

Objectives
At the end of this module, you will be able to: n Explain the benefits of having referential constraints at the database server level (versus the application level) n Specify referential constraints n Specify cascading deletes

8-2 Referential Integrity

What is Referential Integrity?

Parent

Primary Key

Child

Foreign Key n n

Child must have a parent Parent must have a unique primary key
3

Referential integrity is used to enforce the relationships between tables. For example, a customer record should exist before an order is placed.

Referential constraints
Referential constraints allow users to specify primary and foreign keys to enforce parent-child (master-detail) relationships. To define a referential constraint, a user must have REFERENCES privilege or be the table owner. Here are some rules that are enforced with referential constraints: 1. 2. 3. 4. 5. If a user deletes a PRIMARY KEY and there are corresponding FOREIGN KEYS, the delete fails. You can circumvent this rule with cascading deletes. There are no restrictions associated with deleting FOREIGN keys. If a user updates a PRIMARY KEY and there are FOREIGN KEYS corresponding to the original values of the PRIMARY KEY, the update fails. If a user updates a FOREIGN KEY and there is no PRIMARY KEY corresponding to the new, non-NULL value of the FOREIGN KEYS, the update fails. All values within a PRIMARY KEY must be unique. An attempt to insert a duplicate value into a PRIMARY KEY results in an error.

Referential Integrity 8-3

6.

When a user inserts a row into a child table, if all FOREIGN KEYS are non-NULL and there is no corresponding PRIMARY KEY, the insert fails.

Note
If you want to enforce referential integrity, NO NULLS should be allowed in the primary and foreign key columns.

Integrity at the server level vs. application level


By placing constraints at the database server level, consistency throughout all applications that reference the tables involved is ensured. If this checking is placed at the application level, all application code will have to be monitored to prevent inconsistencies. A change in a constraint may require changing the code of all affected applications.

Types of referential constraints


n n n

Cyclic Referential Constraints enforce parent-child relationships between tables. Self-referencing Constraints enforce a parent-child relationship within a table. Multiple-path Constraints refer to a primary key that may have several foreign keys.

8-4 Referential Integrity

Referential Constraints: Example


INSERT INTO customer VALUES (1, "Smith"); INSERT INTO orders VALUES (0, 1); INSERT INTO orders VALUES (0, 2); # #691: Missing key in referenced table for #referential constraint (karen.fk_cnum). #111:ISAM error: no record found. DELETE FROM customer WHERE customer_num = 1; # ^ #692:Key value for constraint (karen.pk_cnum) #is still being referenced.

In the example above, an order cannot be added to the orders table for customer number 2 because customer number 2 does not exist in the customer table. Customer number 1 cannot be deleted from the customer table because there are orders in the orders table for customer number 1. If the customer record is missing, who would you bill for the order?

Deferring constraints
The checking of referential constraints can be deferred until the end of a transaction. This is covered in the chapter Other Constraints and Maintenance.

Referential Integrity 8-5

Creating Referential Constraints


CREATE TABLE customer( customer_numSERIAL, fname CHAR(20), PRIMARY KEY(customer_num) CONSTRAINT pk_cnum); CREATE TABLE orders( order_num SERIAL, customer_numINTEGER, FOREIGN KEY (customer_num) REFERENCES customer CONSTRAINT fk_num);

To enforce a referential constraint, you must specify a primary key in the parent table and a corresponding foreign key in the child table. There can only be one primary key per table. There are two ways to add referential constraints in a CREATE TABLE or ALTER TABLE statement. Both methods accomplish the same thing.
n

At the table level - this is the only way to define a constraint that consists of more than one column (a composite key).

CREATE TABLE customer( customer_num SERIAL, lname CHAR(20), PRIMARY KEY(customer_num, lname) CONSTRAINT pk_cust); ALTER TABLE customer ADD CONSTRAINT PRIMARY KEY(customer_num, lname) CONSTRAINT pk_cust;

When defining the foreign key constraint, it is not necessary to list the column name after the REFERENCES keyword since there is only one primary key allowed in the parent table.
n

At the column level - this modifies everything about the column, so you must be careful to include all constraints (any constraints not listed for that column will be dropped).

ALTER TABLE customer MODIFY customer_num SERIAL PRIMARY KEY CONSTRAINT pk_cust; 8-6 Referential Integrity

Constraint Names
n

n n

Are assigned to all constraints w by person creating constraint w by default by system Must be unique within database Are stored in the sysconstraints system catalog table

A constraint is defined by its name. You may assign a name to all constraints. Although the NOT NULL constraint existed in earlier releases, it can be named beginning with IDS 7.10. You may assign a name or use the default name assigned by the database server. System default names are a composite of a constraint ID code, a table ID, and a unique constraint ID. Name your constraints instead of taking the system default and use a naming convention. This will make identifying a constraint and its purpose easier.The names must be unique within the database where they exist. Constraint names are stored in the sysconstraints system catalog table. This table is defined in the system catalog appendix to this manual.

Referential Integrity 8-7

Cascading Deletes
Cascading deletes provide for the automatic deletion of child rows when a parent row is deleted.
CREATE TABLE customer (customer_num INT, PRIMARY KEY(customer_num)); CREATE TABLE orders (order_num INT, customer_num INT, PRIMARY KEY(order_num), FOREIGN KEY(customer_num) REFERENCES customer ON DELETE CASCADE); --------------------------------------------------------------------------------$ DELETE FROM customer WHERE customer_num = 101; /* all rows in orders table for customer 101 are automatically deleted */

Cascading deletes make it possible to define a referential constraint, so the database server automatically deletes child rows when the corresponding parent row is deleted. This feature is useful in simplifying application code and logic.

Performance enhancement
By automatically deleting rows in the database server rather than requiring the application to delete children first, fewer SQL statements are processed. The database server can process deletes more efficiently because the overhead of an SQL statement is not incurred. If for any reason the original DELETE statement fails or the resulting DELETE statements on the child rows fail, the entire DELETE statement is rolled back.

Invoking cascading deletes


To invoke cascading deletes, add the ON DELETE CASCADE clause after the REFERENCES clause in the CREATE TABLE statement for the child table. See the slide above for an example of the ON DELETE CASCADE clause.

8-8 Referential Integrity

Restrictions on cascading deletes


n

The database must have logging for cascading deletes to be activated. Referential integrity with cascading deletes can be created with logging off, however, the cascading deletes are not activated. If logging is turned off, cascading deletes will be de-activated (you will get a referential integrity error). Once you turn logging on, cascading deletes will be automatically reactivated; no action is necessary by the administrator.

A correlated subquery using the child table in a DELETE statement for the parent table will not use cascading deletes. Instead, you will receive the error:
735: Cannot reference table that participates in a cascaded delete.

Altering a table to add a CASCADING DELETE


If the column has a foreign key constraint and you want to add a cascading delete, drop the constraint and re-add it with the ON DELETE CASCADE:
ALTER TABLE orders DROP CONSTRAINT orders_fk1, ADD CONSTRAINT (FOREIGN KEY (customer_num) REFERENCES customer ON DELETE CASCADE CONSTRAINT orders_fk1);

When both operations are done in the same ALTER TABLE statement, the index is not dropped, so the overhead involved with dropping and re-adding the constraint is minimal.

Referential Integrity 8-9

Self-Referencing Referential Constraints


CREATE TABLE emp( enum SERIAL, mnum INTEGER, PRIMARY KEY (enum) CONSTRAINT pk_enum, FOREIGN KEY (mnum) REFERENCES emp (enum) CONSTRAINT fk_enum); INSERT INTO emp VALUES (1, 1); INSERT INTO emp VALUES (2, 1); INSERT INTO emp VALUES (3, 10); #691: Missing key in referenced table for #referential constraint (karen.fk_enum). #111: ISAM error: no record found
10

Self referencing referential constraints enforce parent-child (master-detail) relationships within a table. An example of a self referencing referential constraint is shown above. This example assumes the scenario where an employee table is used to track all employees and the manager to which they are assigned. A self-referencing constraint is used to ensure that the manager assigned to each employee exists in the employee table. In other words, you cannot have a manager who is not an employee. The enum (employee number) is a primary key that must exist for the set of values stored in the mnum column (manager number). In the example, the emp table requires that the value entered in the enum (employee number) column exists before it can be added to the mnum (manager number). A manager number of 1 is allowed, but a manager number of 10 fails.

8-10 Referential Integrity

Delete/Update of a Parent Row

DELETE FROM orders WHERE order_num = (1004);


DELETE of row in orders table causes shared locks to be placed on the items keys in the index.
orders items

1001 1002 1003 1004 1005

1001 1001 ... ... 1004 1004

Since rows exist in the child table the DELETE fails.

11

Indexes will be used to support referential integrity when deleting or updating a row in a parent table. Before deleting or updating a row in a parent (master) table, the database server will look up any foreign keys that correspond to the primary key of the row being updated or deleted. When a corresponding foreign key is found, a shared lock is placed on the foreign keys in the index. The lock is required to test for the existence of a key that is in the process of being removed or a newly inserted foreign key that has not been committed yet.

Referential Integrity 8-11

Insert/Update of a Child Row

INSERT INTO items (order_num) VALUES (1004);


INSERT of a row in the items table causes a lock to be placed on the orders index key
orders items

1001 1002 1003 1004 1005

1001 1001 ... ... 1004 1004 1004

Because rows exist in the parent table the INSERT is OK!

12

Indexes will be used in the following ways to support referential integrity when inserting or updating a row into a child (detail) table. Before inserting or updating a the row, the database server will look through all foreign keys on this table that will be set to non-NULL values by the update. For each of these foreign keys, the database server will use the unique index corresponding to the primary key and do a lookup on the parent table. If rows are found, the database server will put a shared lock on the index key to ensure that the row is not deleted before the child row is inserted or updated. The lock is held until the referencing row has been inserted or updated.

8-12 Referential Integrity

Exercises

Referential Integrity 8-13

Exercise 1
Complete this exercise using the tool specified by your instructor. Make the following changes to the tables that you created earlier: 1.1 Create a primary key constraint for the department table on the column deptnum. 1.2 Create a foreign key constraint for the employee table on the column deptnum. Would this be a good candidate for a cascading delete? 1.3 Insert a row into the employee table with a deptnum of 999 to check that referential integrity is being enforced.
INSERT INTO employee (empnum, deptnum) VALUES (0, 999);

8-14 Referential Integrity

Solutions

Referential Integrity 8-15

Solution 1
1.1
ALTER TABLE department ADD CONSTRAINT PRIMARY KEY (deptnum) CONSTRAINT pk_dept;

1.2
ALTER TABLE employee ADD CONSTRAINT FOREIGN KEY (deptnum) REFERENCES department CONSTRAINT fk_dept;

This would probably not be a good candidate for a cascading delete; if a department was eliminated, you might want to keep the employee information stored in this table. 1.3 The insert fails because the department number does not exist in the department table.

8-16 Referential Integrity

Module 9
Other Constraints and Maintenance

Other Constraints and Maintenance 09-2001 2001 International Business Machines Corporation

9-1

Objectives
At the end of this module, you will be able to: n Specify default values n Create check constraints and not null constraints n Determine when constraint checking occurs

9-2 Other Constraints and Maintenance

Enforcing Integrity
n n

Entity Integrity: Does each row in the table have a unique identifier? Semantic Integrity: Does the data in the columns properly reflect the types of information the column was designed to hold?

In addition to referential integrity and constraints, there are other constraints that can be created to enforce the accuracy of the data in the database. Entity and semantic integrity are enforced by using the constraints listed on the following page.

Other Constraints and Maintenance 9-3

Types of Constraints
n n n n n

Data type - defines the type of value that can be stored. Default value - Automatically provides values for columns omitted in an INSERT statement. NOT NULL constraint -Requires that a value be provided for a column during an insert (if there is no default value) or an update. Check constraint - All inserted and updated rows must meet this constraint. Unique constraint - Every row inserted or updated must have a unique value for the key specified.

Data Types : The data type defines the type of values that you can store in a column. For example, the data type smallint allows you to enter values from -32,767 to 32,767. Default Values : The default value is the value inserted in a column when an explicit value is not specified. For example, the user_id column of a table may default to the login name of the user if no name is entered. NOT NULL Constraints : The NOT NULL constraint ensures that a column contains a value during insert and update operations. Check Constraints : Check constraints specify conditions on data inserted or updated in a column. Each row inserted into a table must meet those conditions. For example, the quantity column of a table may check for quantities greater than or equal to 1. Check constraints can also be used to enforce relationship within a table. For example, in an order table, the ship_date must be greater than the order_date . Unique Constraints - every row that is inserted or updated must have a unique value for the column specified. Check, Unique and NOT NULL constraints apply integrity checks within a single row, referential constraints apply integrity checks between rows.

9-4 Other Constraints and Maintenance

Default Values
CREATE TABLE test ( test_num INTEGER DEFAULT 1NOT NULL, state CHAR(2) DEFAULT "CA", test_date DATE DEFAULT TODAY, user_id CHAR(10) DEFAULT USER, test_time DATETIME HOUR TO MINUTE DEFAULT CURRENT HOUR TO MINUTE );

Default values allow users to specify what value is inserted when no explicit value is supplied for a column in an INSERT statement.
n n

The default value is applied during INSERT only (not UPDATE). It applies to columns not listed in the INSERT statement. A default value can be a literal value or one of the following SQL functions: USER, CURRENT, NULL, TODAY, or SITENAME.

A default value can be added, modified, or removed by using the CREATE TABLE or ALTER TABLE statements. If a default value is changed, the new default value applies only to rows inserted after the change has been made. BLOB columns accept only NULL as the default. Serial columns cannot have a default value. An example of the CREATE TABLE statement that allows you to specify default values is shown above. The INSERT statement uses the default values when you do not specify a value. For example:
INSERT INTO test (test_num) VALUES (4);

test_num 4

state CA

test_date 9-9-95

user_id karen

test_time 11:01

Other Constraints and Maintenance 9-5

NOT NULL Constraint


The NOT NULL constraint requires that a column contain a value during an insert or update operation.
CREATE TABLE test (test_num INTEGER DEFAULT 1 NOT NULL, state CHAR(2) DEFAULT "CA", ... ); ALTER TABLE orders MODIFY order_num INTEGER NOT NULL;

The NOT NULL constraint is used to ensure that null values are not inserted into a column. It can be used in the CREATE TABLE or ALTER TABLE statements. If you do not indicate a default value for a column, the default is NULL unless you place a NOT NULL constraint on the column. In this case, no default value exists for the column. If you specify a NULL value to be inserted in a column that has a default value, a NULL value will be inserted. Using the example on the preceding page:
INSERT INTO test (state) VALUES (NULL);

test_num 1

state

test_date 9-9-95

user_id karen

test_date 11:01

If you try to insert a NULL value in a column that does not allow nulls, the INSERT statement will fail.
INSERT INTO test (test_num) VALUES (NULL); 391: Cannot insert a null into column (karen.col01).

9-6 Other Constraints and Maintenance

Check Constraint
n n n n

A check constraint is an expression that yields a boolean value of TRUE or FALSE. Check constraints are applied to each row that is inserted or updated. All referenced columns must be from the current table. All existing rows must pass a new constraint added to a table.

Check constraints allow users to specify conditions or integrity constraints on tables. An inserted or updated row must pass all check constraints added for the table or specific columns. The expression specified in the check constraint can only include columns from the table being altered or created.

Other Constraints and Maintenance 9-7

Example: Check Constraint


CREATE TABLE customer( customer_num SERIAL, state CHAR(2) check (state IN ("CA", "AZ")) ); INSERT INTO customer VALUES (0, "CA"); INSERT INTO customer VALUES (0, "WA"); # ^ #530: Check constraint (karen.c117_11) failed.

Constraint name

An example of the CREATE TABLE statement that allows you to specify a check constraint is shown above. Check conditions cannot contain subqueries, aggregates, host variables, rowids, or the USER, SITENAME, TODAY, and CURRENT functions. Because the values for TODAY and CURRENT change, these functions cannot be used because constraints would be inconsistent over time. When you add a constraint, you may specify the name of the constraint. If you do not, IBM Informix creates one for you. In the above example, c117_11 is the name of the constraint (with karen as the owner). If your application receives a constraint violation error, the constraint name may be found in the SQLCA structure. The SQLCA structure holds information about the status of the SQL statement and is loaded upon completion of every SQLstatement.This structure can be accessed by IBM Informix 4GL or IBM Informix-ESQL application programs. The name of the field where the violation is kept is sqlca.sqlerrm.

9-8 Other Constraints and Maintenance

When adding a check constraint to a table that already contains data, the data in the table must satisfy the constraint conditions. For example, if one of the current values in the state column is "CA", the following statement will fail with an error message:
ALTER TABLE customer MODIFY state CHAR(2) CHECK(state IN ("WA")); #530: Check constraint () failed.

Other Constraints and Maintenance 9-9

Adding Constraints
n

At the table level ALTER TABLE items ADD CONSTRAINT CHECK (quantity >=1 AND quantity <= 10); ALTER TABLE orders ADD CONSTRAINT CHECK (paid_date > ship_date); At the column level ALTER TABLE items MODIFY quantity SMALLINT CHECK (quantity >=1 AND quantity <= 10);

10

Constraints may be added at the table level or the column level. If you reference more than one column, the constraint must be at the table level. The columns must be from the same table. When you modify a column, you modify everything about that column (this is why the MODIFY clause must include the data type). If you do not list all constraints with the MODIFY clause, any currently existing constraints not listed will be dropped.

9-10 Other Constraints and Maintenance

Unique Constraints
CREATE TABLE call_type( call_type CHAR(1) UNIQUE, call_desc CHAR(30) ); INSERT INTO call_type VALUES ("T", "Test"); INSERT INTO call_type VALUES ("T", "Trial"); # ^ #239: Could not insert new row - duplicate value in UNIQUE INDEX column. #100: ISAM error: duplicate values for a record with unique index

11

Unique constraints are a way to assure that any rows inserted or updated for a table will have a unique value for the column(s) listed in the unique constraint. There are several other ways to assure uniqueness:
n n

Unique indexes will assure that values in a column or set of columns will be unique. A column added with a primary key constraint is guaranteed to be unique.

The advantage of a unique constraint over a unique index is that constraint checking can be deferred. To create a unique constraint for a combination of columns, add the constraint at the table level:
CREATE TABLE newtest ( new_num INTEGER, new_state CHAR(2), new_user CHAR(8), UNIQUE (new_num, new_state) CONSTRAINT uq_newtest);

Other Constraints and Maintenance 9-11

Constraint Transaction Modes


Constraint transaction modes determine when checking for referential violations will occur. n Immediate constraint checking (default) w Check for violations at the end of a statement. n Deferred constraint checking w Check for violations at COMMIT time. n Detached constraint checking (databases without logging, automatic non-specifiable) w Check for violations in the middle of the statement.

12

Constraint transaction modes allow you to specify when the checking of constraints occurs. The IMMEDIATE keyword sets the transaction mode of constraints to statement-level checking. The DEFERRED keyword sets the transaction mode to transaction-level checking. Change the transaction mode using the SET CONSTRAINTS statement. For example:
SET CONSTRAINTS pk_orders,fk_orders DEFERRED

If the database does not have logging, the only constraint mode available is DETACHED, which is automatically in effect. You cannot execute the SET CONSTRAINTS statement outside of a transaction. Duration of the transaction mode set by the SET CONSTRAINTS statement is the transaction in which it is executed. Once a COMMIT WORK or ROLLBACK WORK statement is executed, the transaction mode reverts back to IMMEDIATE. The SET CONSTRAINTS command is also used to change the object-mode of a constraint. See the Modes and Violation Detection chapter for more detail.

9-12 Other Constraints and Maintenance

Immediate Constraint Checking

CREATE TABLE test (current_no INTEGER UNIQUE); UPDATE test SET current_no = current_no + 1;
Index violation occurs after first row is updated No index violation at end of update

1 2 3 4 5

2 2 3 4 5

2 3 4 5 6

13

Immediate checking, also called effective checking, specifies that the checking of constraints appears to occur at the end of each statement. If some constraint is not satisfied, then the statement will appear to have not been executed. Immediate checking is the default mode. In the example above, after the first row is updated but before the second row is updated, a duplicate value exists which violates the unique index constraint. However, after the all the rows are updated, all the values are unique, so the statement executes successfully.

How it is Implemented
A change that violates a constraint is allowed to succeed but is recorded as a violation. Later, at the end of the statement, checks are made to see if the violation still exists. If violations still exist, an error is returned and the statement is undone. Savepoints are used to allow the database server to undo the effects of a single statement without undoing earlier changes made within the same transaction. By establishing a savepoint at the beginning of a statement, the database server can roll back to that savepoint if a constraint violation occurs during effective checking. For referential constraints, a memory buffer/temp table records the violations. There is one temp table that records violations for each referential pair. The temp table contains key values that were violated. As rows are inserted, deleted, and updated, the temp tables are updated to

Other Constraints and Maintenance 9-13

reflect new violations and removal of old ones. Later, when checking is done, the temp files are scanned and for those keys that are still valid, the violations are revalidated. As violations are resolved, records are removed from the temp table. For check constraints, a memory buffer/temp table is used again. However, this time the temp table records only the rowids of the violating rows. As rows are updated, rows that now pass the check constraints are removed. When checking is done, the temp table should now be empty.

Note
For unique indexes the checking is done on a row by row basis instead of at the end of the statement. If a user wants to be able to do effective checking, unique constraints should be used rather than creating unique indexes.

9-14 Other Constraints and Maintenance

Deferred Constraint Checking


BEGIN WORK; SET CONSTRAINTS ALL DEFERRED; UPDATE department SET deptnum = 50 WHERE deptnum = 1; UPDATE employee SET deptnum = 50 WHERE deptnum = 1; COMMIT WORK;

15

Deferred checking turns statement-level checking off, and all specified constraints are not checked until the transaction is committed. Deferred checking can be used when the primary and foreign key values in a parent and child pair are changed to a new value. It can also be used when you need to switch the primary key values for two or more rows in a table. If you defer checking for a primary key constraint, checking the NOT NULL constraint for that column or set of columns is also deferred.

How it is Implemented
Deferred checking specifies that the checking of constraints does not occur until immediately before the transaction is committed or the user changes the mode to immediate. If a constraint error occurs at commit time, then the transaction is rolled back. In the example above, if constraint mode is not set to deferred, the statement will fail. The failure would occur at the first update statement because there would be employees who did not have a department (department 1 no longer exists in the department table). Deferred checking is implemented in a similar fashion to immediate checking. However, the checks for violations are made at the end of the transaction, as opposed to the end of the statement.

Other Constraints and Maintenance 9-15

You must put the SET CONSTRAINTS ALL DEFERRED statement within a transaction. It is valid from the time that it is set until the end of the transaction. You also have the option to replace the keyword ALL with the constraint name to defer only a specific constraint, for example:
SET CONSTRAINTS uniq_ord DEFERRED

9-16 Other Constraints and Maintenance

Detached Constraint Checking


CREATE TABLE test (current_no INTEGER UNIQUE); UPDATE test SET current_no = current_no + 1;
Index violation occurs after first row is updated and the statement fails

1 2 3 4 5

2 2 3 4 5

1 2 3 4 5

17

Detached checking is the only mode available in databases created without logging and for temp tables that have been created WITH NO LOG. If logging is not on, it is not possible to do the rollbacks required by effective checking.

How it is Implemented
The checking of constraints is done on a row by row basis . Once a constraint error occurs, an error is returned immediately to the user and the rest of the statement is not executed.

Other Constraints and Maintenance 9-17

Performance Impact
n n n n n

Referential constraints are implemented using indexes on both primary and foreign keys. Unique constraints are implemented using unique indexes on the appropriate columns. Indexes must be updated on UPDATE, INSERT, and DELETE. Index lookups on each UPDATE, INSERT, and DELETE. Numerous locks are required. Shared locks are held on all indexes being used.

18

The overhead involved in maintaining constraints is outlined in the slide above. When a constraint is created, if an index already exists, then that index will be used. Otherwise, the index is created by the database server. It is possible for a column to have both a referential and unique constraint. It is also possible for a column to have two different referential constraints. In these situations, a single index will be used to enforce the multiple constraints.

9-18 Other Constraints and Maintenance

Dropping a Constraint l
ALTER TABLE orders DROP CONSTRAINT pk_orders;

The ALTER TABLE drops the primary key constraint and any corresponding foreign key constraints

19

To drop a constraint without altering the table in any other way use the DROP CONSTRAINT clause in the ALTER TABLE command. When a primary key constraint is dropped, the foreign key constraints in other tables that reference it will also be dropped. When a foreign key constraint is dropped, the corresponding primary key constraint is not affected. The indexes that were used to implement the constraints are dropped only if the indexes were built implicitly by the creation of the constraint.

Dropping a column
When a column that has a constraint is dropped, the action may affect more than just the table that is mentioned in the ALTER TABLE statement. Any constraints that reference the dropped column will also be dropped. For example:
ALTER TABLE orders DROP order_num;

Dropping the primary key column order_num in the orders table will drop the primary key constraint as well as the foreign key constraint in the items table that references order_num. The items table will be locked while the foreign key constraint is dropped.

Other Constraints and Maintenance 9-19

System Catalog Tables

sysdefaults

sysconstraints

syscoldepend

syschecks

sysreferences

19

The following system catalog tables are used for enforcing referential and entity integrity and may be queried to obtain information: sysconstraints Stores constraint names and information about the constraints syschecks syscoldepend sysdefaults sysreferences Contains the text of the check constraint Keeps track of the table columns specified in each check constraint Keeps track of every column that has a user specified default value Lists the referential constraints placed on the columns in the database

Example queries:
SELECT sysconstraints.*, systables.tabname FROM sysconstraints, systables WHERE sysconstraints.tabid = systables.tabid; SELECT sysconstraints.tabid, systables.tabname, syschecks.* FROM sysconstraints, systables, syschecks WHERE sysconstraints.constrid = syschecks.constrid AND syscontraints.tabid = systables.tabid AND syschecks.type = "T";

The complete definition and structure of the tables is included in the system catalog appendix to this manual.
9-20 Other Constraints and Maintenance

Exercises

Other Constraints and Maintenance 9-21

Exercise 1
Complete these exercises using the tool specified by your instructor and the stores demonstration database you created earlier. Create the constraints indicated in the exercises below. Test each of your constraints. 1.1 Alter the items table so that the quantity column only accepts values that are greater than zero. 1.2 Modify the items table so that the quantity value defaults to 1. West Coast Distributors has decided it needs to change the manufacturer code for Hero . The current value of HRO is obsolete and needs to be changed to HER. 1.3 Update all the HRO values to HER in the manufact , catalog, stock, and items tables. There are referential constraints on all of the tables.

9-22 Other Constraints and Maintenance

Solutions

Other Constraints and Maintenance 9-23

Solution 1
1.1 Alter the items table so that the quantity column only accepts values that are greater than zero.
alter table items modify quantity smallint check(quantity >0);

1.2 Modify the items table so that the quantity value defaults to 1.
alter table items modify quantity smallint default 1 check (quantity > 0);

1.3 . Update all the HRO values to HER in the manufact , catalog, stock, and items tables.
begin work; set constraints all deferred; update manufact set manu_code = update stock set manu_code = update items set manu_code = update catalog set manu_code = commit work;

"HER" where manu_code = "HRO"; "HER" where manu_code = "HRO"; "HER" where manu_code = "HRO"; "HER" where manu_code = "HRO";

9-24 Other Constraints and Maintenance

Module 10
Creating and Using Triggers

Creating and Using Triggers 09-2001 2001 International Business Machines Corporation

10-1

Objectives
At the end of this module, you will be able to: n List the possible uses for triggers in a database system n Use the CREATE TRIGGER statement to create a trigger n Understand the implications of using triggers with transactions, security and constraints

10-2 Creating and Using Triggers

What is a Trigger?

Event

Triggers

Action

INSERT UPDATE DELETE

INSERT UPDATE DELETE EXECUTE PROCEDURE

Triggering table

A trigger is a database object that will execute an SQL statement automatically when a certain event occurs. It is available starting in the 5.01 release of IBM Informix-SE and IBM InformixOnLine, and IBM Informix Dynamic Server. The event that can trigger an action can be an INSERT, UPDATE or DELETE statement on a specific table. The UPDATE statement that triggers an action can specify either a table, or one or more columns within the table. The table that the trigger event operates on is called the triggering table. When the trigger event occurs, the trigger action will be executed. The action can be any combination of one or more INSERT, UPDATE, DELETE, or EXECUTE PROCEDURE statements. Triggers are a feature of the database server, so the type of application tool used to access the database is irrelevant in the execution of a trigger.

Creating and Using Triggers 10-3

Why Use Triggers?


n n n n n n

Business rules Derived column values Audit log Security authorization Cascading deletes Table Replication

Triggers can restrict how data is manipulated in a database. By invoking triggers a DBA can ensure that data is treated consistently across application tools and programs.
n

Business rules - The term business rules refers to the way a business uses data. Triggers and/or Stored Procedures can be used to enforce business rules for data within a database. An example of a business rule may be: If inventory for an item reaches a certain level, automatically place an order to re-stock the item . Derived values - In some cases, it may be necessary to store a derived value, such as an account balance, in a database. Using triggers to do this will force the derived value to be synchronized with the values from which it is derived. Audit trails - An organization may have a need to record certain transactions in an audit table. Triggers will assure that all of the specified transactions will be recorded. For example, if an employees salary gets changed, an audit record can be added to the audit table, specifying the change made and the login of the person who made the change. Security authorization - Triggers can be used to augment database security that already exists. For example, triggers can be used to check for a date before authorizing a change, or to allow only certain people to create orders greater than $1,000.

10-4 Creating and Using Triggers

Although cascading deletes are part of referential integrity starting with version 6.0 of database servers, earlier versions of the database server can use triggers to perform the same function. Triggers can be used to replicate changes to a table automatically.

Stored Procedures are frequently used in conjunction with Triggers to implement some of the above.

Creating and Using Triggers 10-5

CREATE TRIGGER Components

Create Trigger

Trigger

Trigger Name

Trigger Event Correlation Names

Trigger Action

The CREATE TRIGGER statement is used to store the trigger event and trigger action in the database server. It is an SQL statement that can be executed by applications compiled with tools that support the statement. You must be either the owner of the table or the database administrator to create a trigger on a table. Each trigger must be given a trigger name, which is unique within the database for which it is created. Each trigger also has a trigger event and trigger action. The correlation names can be used to reference values of the row before and after it is changed.

10-6 Creating and Using Triggers

Trigger Events

INSERT ON or DELETE ON or UPDATE ON or UPDATE of

tab_name tab_name tab_name col_name ON tab_name

One INSERT trigger and one DELETE trigger is allowed per table.

Multiple UPDATE triggers are allowed for a table, but column lists must be mutually exclusive. If columns are not listed, all columns are assumed, and only one UPDATE trigger is allowed.
7

The trigger event can be an INSERT, UPDATE, or DELETE SQL statement. Only one trigger is allowed for the operation and table combination except for the UPDATE statement. This means that there can be only one INSERT trigger event for a table and only one DELETE trigger event for a table. The UPDATE trigger event can include one or more columns within a table, but columns in UPDATE triggers for a table must be mutually exclusive. This means that if there are five columns in a table, there can be at most five UPDATE triggers for a table (one for each column). If you have an UPDATE trigger for the entire table (not specifying columns), only one UPDATE trigger is allowed.

Only Local Tables Allowed


The table specified by the trigger event must be a table in the current database. You cannot specify a remote table.

Creating and Using Triggers 10-7

Trigger Action
The trigger action specifies the action that should occur and when it should occur: BEFORE (EXECUTE PROCEDURE xyz()) {Executed before rows are processed}
FOR EACH ROW (DELETE FROM items) {Executed after each row processed) AFTER (EXECUTE PROCEDURE abc()) {Executed after all rows processed)

Trigger actions are executed at the following times:


n

Before the trigger event occurs - The BEFORE triggered action list executes once before the trigger event executes. Even if no rows are processed by the trigger event, the BEFORE trigger actions are still executed. After each row is processed by the trigger event - The FOR EACH ROW trigger action occurs once after each row is processed by the trigger event. After the trigger event completes - The AFTER triggered action list executes once after the trigger event executes. If no rows are processed by the triggering statement, the AFTER triggered action list is still executed.

n n

Triggers are reentrant; both the triggered event and the triggered action can operate on the same table. However, a triggered action cannot be an UPDATE statement that references a column that was updated by the triggering event.

Remote Tables Allowed


The statements that are part of the trigger action can reference remote tables.

10-8 Creating and Using Triggers

Trigger Example
CREATE TRIGGER test1 UPDATE ON orders BEFORE(EXECUTE PROCEDURE check_permission()) FOR EACH ROW (EXECUTE PROCEDURE log_chg()) AFTER (EXECUTE PROCEDURE log_total()); _____________________________________________________ UPDATE ORDERS SET ship_instruct = express WHERE customer_num = 106; _____________________________________________________ Order of execution (2 rows updated): check_permission UPDATE row log_chg UPDATE row log_chg log_total

In the trigger shown above, the triggering event (update of a row in the orders table) results in several triggering actions that call a stored procedure. This example shows the order of execution of the trigger event and the trigger action lists. For the INSERT statement, there is not much distinction between the AFTER clause and the FOR EACH ROW clause, as only one row is affected. Since the DELETE and UPDATE statements can affect multiple rows, how you use the BEFORE, FOR EACH ROW and AFTER clauses can be quite significant.
n n n

The BEFORE action always executes first. The AFTER action always executes last. The FOR EACH ROW executes 0 or more times, depending upon the number of rows processed by the trigger event.

If the trigger event fails for some reason, the trigger action statements will NOT execute.

Comments
Comments can be placed within a trigger in a line by prefixing it with two dashes, (--). You may also include a comment by enclosing it between two braces ({}). The use of two dashes is the ANSI-compliant method of introducing a comment.
Creating and Using Triggers 10-9

REFERENCING Clause
Use the REFERENCING clause to refer to column values from the triggering event:
REFERENCING NEW AS post OLD AS pre

Columns can then be referred to as: pre.column_name post.column_name where column_name refers to a column in the triggering table.

10

The REFERENCING clause can be used to reference columns from the triggering table in the trigger action. The REFERENCING clause allows you to give a correlation name to columns of the triggering table both before the trigger event changes the row and after the row is changed. The values can be referenced in the trigger action SQL statements with the correlation name and a period prefixing the column name.
n n n n

NEW and OLD are reserved words within the REFERENCING clause. The correlation name can only be used in the FOR EACH ROW clause, not in the BEFORE or AFTER clause of the trigger action. The NEW correlation name should not be used for a DELETE trigger event. Likewise, the OLD correlation should not be used for an INSERT trigger event. Avoid using table or synonym names as correlation names.

10-10 Creating and Using Triggers

REFERENCING Example
CREATE TRIGGER item_upd UPDATE OF total_price ON items REFERENCING NEW AS post OLD AS pre FOR EACH ROW (UPDATE ORDERS SET order_price = order_price + post.total_price - pre.total_price WHERE order_num = post.order_num);

11

The trigger above shows how to create a trigger to update a derivative value. The NEW and OLD correlation value is needed to update the order_price column for the corresponding row in the orders table.

Note
This example is for illustration purposes only. The column order_price does not exist in the demonstration database.

Creating and Using Triggers 10-11

The WHEN Condition


The WHEN condition allows you to base the triggered action on the outcome of a test: CREATE TRIGGER ins_cust_calls INSERT ON cust_calls {Flag billing problems for billing dept. review} REFERENCING NEW AS post FOR EACH ROW WHEN(post.call_code = "B") (INSERT INTO warn_billing VALUES(post.customer_num));

12

You can specify the trigger action to occur only if a certain condition is true by including the WHEN clause.When the WHEN condition evaluates to true, the accompanying trigger action statements are executed. When the WHEN condition evaluates to false or unknown, the trigger action statements are not executed. The slide shows an example of a conditional trigger action. Billing complaints (call_code = B) are flagged by putting the customer number in the warn_billing table when they occur. You can include one or more WHEN conditions after the BEFORE, FOR EACH ROW, and AFTER keywords. Each WHEN condition is evaluated separately; for example:
FOR EACH ROW WHEN (post.call_code = "B") INSERT INTO warn_billing VALUES(post.customer_num), WHEN (post.call_code = "C") INSERT INTO complaints VALUES(post.customer_num)

The condition can contain boolean expressions such as BETWEEN, IN, IS NULL, LIKE, and MATCHES. You can use a subquery as a part of the condition. The condition can also contain keywords such as TODAY, USER, CURRENT, and SITENAME.

10-12 Creating and Using Triggers

Multiple Update Triggers on One Table


What is the value of a.col3 after these statements execute?
CREATE TABLE a(col1 INTEGER, col2 INTEGER, col3 INTEGER); INSERT INTO a VALUES(1,2,3); CREATE FOR CREATE FOR TRIGGER test1 UPDATE OF col1 ON a EACH ROW (UPDATE a SET col3 = 5); TRIGGER test2 UPDATE OF col2 ON a EACH ROW (UPDATE a SET col3 = 1);

UPDATE a SET (col1,col2) = (2,2);

13

You can have more than one trigger that executes when a table is updated, as long as the column lists of the triggers are mutually exclusive. The order in which the triggers are executed will depend upon the order of the columns in the syscolumns system catalog table. The order of the columns in the system catalog table depends on the order in which they appear in the CREATE TABLE statement. The triggers shown in the example above are legal in the sense that the column lists are exclusive for the triggers (test1 is for col1, test2 is for col2). Both triggers would be executed when an UPDATE statement updates both columns as shown. Trigger test2 would be executed last, so the value of col3 in table a would be 1 after the trigger statement completes.

Creating and Using Triggers 10-13

Cascading Triggers
CREATE TRIGGER del_cust --cascading delete example DELETE ON customer REFERENCING OLD AS pre_del FOR EACH ROW(DELETE FROM orders WHERE customer_num = pre_del.customer_num, DELETE FROM cust_calls WHERE customer_num = pre_del.customer_num); CREATE TRIGGER del_orders DELETE ON orders REFERENCING OLD AS pre_del FOR EACH ROW(DELETE FROM items WHERE order_num = pre_del.order_num);
14

Executing one trigger may cause another trigger to be executed, as shown in the example above. Deleting a customer row causes the del_cust trigger to execute. The del_cust trigger deletes a row from the orders table, which in turn triggers the del_orders trigger. When these triggers complete, the DELETE statements would have been executed in this order:
DELETE DELETE DELETE DELETE customer orders items cust_calls

This technique was frequently used before cascading deletes became a feature of the CREATE TABLE statement. Cascading deletes makes it possible to define a referential constraint in which the database server will automatically delete child rows when a parent row is deleted.

10-14 Creating and Using Triggers

If a Trigger Fails?
n n

Databases with no logging - no rollback occurs Dynamic Server databases with logging - automatically roll back trigger event and trigger action if either fail

15

Databases with no logging have no rollback capabilities. Therefore, if a trigger event or trigger action statement fails, no rollback occurs. You could be left with an inconsistent database. Databases without logging will not be able to safely enforce a business rule with triggers. IBM Informix Dynamic Server databases with logging will automatically roll back both the trigger event and trigger action if either fail. Any other SQL statements within the transaction will not roll back unless the ROLLBACK WORK statement is executed, or the program stops without completing the transaction.

Creating and Using Triggers 10-15

Discontinuing an Operation
Use a stored procedure to roll back a triggering event: CREATE PROCEDURE stop_processing() RAISE EXCEPTION -745; END PROCEDURE;
CREATE TRIGGER trig1INSERT ON tab1 REFERENCING NEW AS new_val FOR EACH ROW WHEN (new_val.col2 > 20) (EXECUTE PROCEDURE stop_processing());

16

Stored Procedure Language has a statement called RAISE EXCEPTION that will discontinue the stored procedure with an error (if the error is not trapped in a stored procedure with the ON EXCEPTION statement) and return control to the application. The RAISE EXCEPTION statement can be used to discontinue both the trigger event and the trigger action. If the database has been created with logging, the application may then roll back the transaction. Error number -745 is reserved for use with triggers. The error message that the users will receive is:
745: Trigger execution has failed.

The application code is responsible for checking for errors after the triggering SQL statement and issuing a ROLLBACK WORK. Note that any error code could be used in the RAISE EXCEPTION statement that is called from the trigger. You do not have to use error 745.

10-16 Creating and Using Triggers

Trigger to Pass Values Into an SP


CREATE PROCEDURE ship_charge_check(order_num int, ship_charge money(12,2)) IF ship_charge >100 THEN INSERT INTO warn_tab VALUES(order_num,ship_charge); END IF; END PROCEDURE; -------------_---------------------------------------CREATE TRIGGER trig1 INSERT ON orders REFERENCING NEW AS new_val FOR EACH ROW (EXECUTE PROCEDURE ship_charge_check (new_val.order_num, new_val.ship_charge));

17

You can pass values into a stored procedure from the EXECUTE PROCEDURE statement that is part of the trigger action. If the stored procedure is part of the FOR EACH ROW clause, you can pass columns by using the OLD and NEW correlation values. If the stored procedure is part of the BEFORE or AFTER clause, you are limited to passing constants to the stored procedure.

Creating and Using Triggers 10-17

Returning Values From a Procedure


n n

The stored procedure called from a trigger can return values if the trigger event is an UPDATE statement. The returned values will update the table specified in the trigger event.

18

The stored procedure included in a CREATE TRIGGER statement can return one row (or set of values) if the trigger event is an UPDATE statement and the stored procedure occurs in the FOR EACH clause. The returned values will update the triggering table once they are returned (this means the table is updated twice for each row, once by the trigger event and once by the trigger action). Use the INTO clause of the EXECUTE PROCEDURE statement to specify the column names that will be updated upon return of the stored procedure. Prior to Dynamic Server version 7.3 you could not specify any columns already specified in the trigger event. An example of this is shown on the following page.

10-18 Creating and Using Triggers

Code Sample: Returning Values from a Stored Procedure


The stored procedure is allowed to return one row (or set of values). The returned values will update the triggering table once they are returned.
{upd_price re-calculates the total price given the new quantity} CREATE PROCEDURE upd_price (p_stock_num INTEGER, p_manu_code CHAR(3), p_quantity INTEGER) RETURNING MONEY(12,2); DEFINE p_total_price MONEY(12,2); LET p_total_price = p_quantity * (SELECT unit_price FROM stock WHERE stock_num = p_stock_num AND manu_code = p_manu_code); RETURN p_total_price; END PROCEDURE; {upd_proc trigger action updates the total_price column which is passed back from the stored procedure} CREATE TRIGGER upd_proc UPDATE OF quantity ON items REFERENCING OLD AS pre new AS post FOR EACH ROW(EXECUTE PROCEDURE upd_price(post.stock_num,post.manu_code, post.quantity) INTO total_price);

Creating and Using Triggers 10-19

Triggers and Stored Procedures


When stored procedures are called from triggers, the restrictions are: n The stored procedure called from a trigger cannot contain a BEGIN WORK, COMMIT WORK, ROLLBACK WORK, or SET CONSTRAINTS statement. n The stored procedure as the trigger action cannot be a cursory procedure (returning more than one row).

20

A common method to perform complex processing within a trigger is to have it call one or more stored procedures. However, stored procedures have some restrictions if they are used as part of a trigger action:
n n

The stored procedure cannot contain the BEGIN WORK, COMMIT WORK, ROLLBACK WORK, or SET CONSTRAINTS statement. The stored procedure included in a CREATE TRIGGER statement cannot return more than one row (i.e., with the RETURN WITH RESUME statement). The following error message will appear if you do:
686: Procedure (xxx) has returned more than one row.

10-20 Creating and Using Triggers

Cursors and Triggers


INSERT cursors: n When the row is flushed to the database server, the complete trigger is executed for each INSERT statement. UPDATE cursors: n Each UPDATE WHERE CURRENT OF statement executes the complete trigger.

21

An INSERT cursor is used to increase performance because it buffers the contents of several INSERTs in application memory before they are sent to the database server. The rows are flushed to the database server when the FLUSH statement is executed or when the buffer gets full. When the data is flushed to the database server, each row causes the trigger to be executed in full as if it were a singleton INSERT statement. UPDATE or DELETE statements within cursors act differently than a singleton UPDATE or DELETE statement. The entire trigger will be executed with each UPDATE or DELETE with the WHERE CURRENT OF clause. For example, if five rows are changed with a cursor, the BEFORE, FOR EACH, and AFTER trigger actions will be executed five times, once for each row.

Creating and Using Triggers 10-21

Triggers and Constraint Checking


n n

Constraint checking is deferred during the execution of the trigger action. After the trigger is executed, all constraints are checked for violations.

22

Using IBM Informix Dynamic Server with logging, the database server will defer checking of all constraints until the trigger action has completed to prevent a violation of constraints when the trigger action is executed. All constraints will be checked after the trigger action. This is equivalent to running SET CONSTRAINTS ALL DEFERRED before the statements and SET CONSTRAINTS <constraint list> IMMEDIATE after the statements. For databases without logging and IBM Informix-SE databases with logging, an error is generated if a constraint violation occurs.

10-22 Creating and Using Triggers

Dropping a Trigger
The DROP TRIGGER statement deletes the trigger from the database.
DROP TRIGGER trig_name

23

To delete the trigger from the database, use the DROP TRIGGER statement and specify the trigger name of the trigger you wish to delete. Deleting a table will cause triggers that reference that table in the trigger event clause to be deleted.

Important!
When you ALTER a table and drop a column, the column will be dropped from trigger column lists in the trigger event. Triggers that reference the table in the trigger action will not be deleted, you must find and drop those triggers yourself.

Creating and Using Triggers 10-23

How a Trigger is Executed


Application Trigger event
System catalog entries for a trigger event table are retrieved .

System Catalog Tables

Database Server

A trigger action is retrieved, optimized and executed.

24

The trigger is stored in two system catalog tables, systriggers and systrigbody. The first time any table is referenced, the information from systriggers , systrigbody, and other dictionary tables are put in the database server memory. Before an INSERT, UPDATE, or DELETE statement is executed by the database server, the dictionary in the database server memory is scanned for a trigger that exists for the table and type of SQL statement that is being executed. If a trigger exists, it is retrieved, optimized and executed by the database server at the appropriate time.

10-24 Creating and Using Triggers

Getting Information About Triggers


n

system catalog tables w systriggers w sysbody the dbschema utility

25

In the system catalog, the systriggers table holds general information about a trigger, and systrigbody stores both the text of the trigger and the code used to execute the trigger. Some example queries against these tables are shown below:
SELECT systriggers.*, systables.tabname FROM systriggers, systables WHERE systriggers.tabid = systables.tabid AND systables.tabname = "employee"; SELECT systriggers.trigname, systriggers.tabid, systrigbody.trigid, systrigbody.data FROM systriggers, systrigbody WHERE systriggers.trigid = systrigbody.trigid AND systrigbody.datakey IN ("D", "A");

A complete description of these tables in included in the system catalog appendix to this manual. The dbschema utility can also be used to obtain information about triggers. The output of the following command will contain all the SQL statements to re-create the table, including any indexes, constraints, and triggers on the table:
dbschema -d database_name -t table_name

Creating and Using Triggers 10-25

Using Triggers for Table Replication


Triggers and stored procedures can be used to engineer a customized table replication application. By putting an INSERT, UPDATE, and DELETE trigger on the table, all changes to the table can be replicated to a table in another IBM Informix system. There are some issues in creating a table replication application that also are applicable in other situations using triggers and stored procedures.
n

Since COMMIT WORK and ROLLBACK WORK is not a valid statement in a stored procedure called by a trigger, all SQL statements inside the triggered stored procedure are part of the same transaction. This means that if errors are logged to a database table, any transactions that must be rolled back will also cause the INSERT into the error log to roll back! There are several possible ways around this situation:
w w

Log the errors to a file, instead of a database table. Handle errors where the triggering event is. If the triggering event is in a stored procedure, the error procedure can be called from there. If the triggering event is in the application, the application code will have to handle the error logging.

If the replicated table is unavailable (the remote system may be down) then you still want the primary table to be updated. To do this, you have to catch the error in the stored procedure with the ON EXCEPTION statement.

The example on the following page shows the general logic needed for table replication.

10-26 Creating and Using Triggers

Code Sample: Table Replication Example (INSERT only)


CREATE PROCEDURE "informix".i_users( p_user_id CHAR(8), p_last_name CHAR(30), p_first_name CHAR(15), p_dept_id INT) DEFINE DEFINE DEFINE DEFINE DEFINE p_sql_ret_code p_isam_ret_code p_sql_ret_msg p_table_name p_sql_cmd INTEGER; INTEGER; CHAR(72); CHAR(37); CHAR(1000);

-- Log unanticipated errors with default exception handler. ON EXCEPTION SET p_sql_ret_code, p_isam_ret_code, p_sql_ret_msg --Be careful to not use any DML statements in error_log --if application uses transaction logging. They will get --rolled-back along with the rest of the transaction! CALL error_log(p_sql_ret_code, p_isam_ret_code, p_sql_ret_msg,"i_users"); RAISE EXCEPTION p_sql_ret_code, p_isam_ret_code, p_sql_ret_msg; END EXCEPTION; BEGIN ON EXCEPTION IN (-28,-50,-51,-52,-53,-54, -113, -668,-908) --if replicated table is unavailable, log --row into a file for later insertion LET p_sql_cmd = "INSERT INTO "|| p_table_name || "VALUES ( "||"'"||p_user_id||"'"||","|| "'"||p_last_name||"'"||","|| "'"||p_first_name||"'"||","|| p_dept_id ||" );"; -- Log soft replication errors here. INSERT INTO repl_err_log VALUES (0, p_table_name, CURRENT YEAR TO SECOND, p_sql_cmd); END EXCEPTION WITH RESUME; --Seed variable with replicated tabname in case an error occurs LET p_table_name="db@east:users";

Creating and Using Triggers 10-27

--replicate inserted row to two different tables, so --there will be two replicated tables INSERT INTO db@east:users VALUES (p_user_id, p_last_name, p_first_name, p_dept_id); --seed table name in case an error occurs LET p_table_name="db@west:users"; INSERT INTO db@west:users VALUES (p_user_id, p_last_name, p_first_name, p_dept_id); END END PROCEDURE; CREATE TRIGGER "informix".i_users INSERT ON users REFERENCING NEW AS NEW FOR EACH ROW ( EXECUTE PROCEDURE i_users(new.user_id, new.last_name, new.first_name, new.dept_id));

10-28 Creating and Using Triggers

Exercises

Creating and Using Triggers 10-29

Exercise 1
Create a trigger that keeps a history table for deleted customers. 1.1 Create a history table with the same columns as the customer table. 1.2 Create a DELETE trigger that will take a deleted row from the customer table and INSERT it into the history table. 1.3 Execute the following DELETE statement:
delete from customer where customer_num = 102

1.4 Verify that a row was inserted in the history table. An easy way to create a history table with the same columns as the customer table is to output the customer table schema using dbschema by running the following command:
dbschema -d dbname -t customer > history.sql

Then edit the history.sql file to change the name of the table to history . Change the customer_num column to an integer and remove the primary key constraint. Execute your history.sql script to create the history table.

10-30 Creating and Using Triggers

Exercise 2
The following exercises illustrate the use of BEFORE, AFTER, and ON EACH ROW, and what happens when your trigger encounters an error. 2.1 Create the following trigger for an AFTER event of the orders table. The trigger will insert a row into the manufact table, which has a unique index on the manufacturers code.
CREATE TRIGGER trig1 UPDATE ON orders AFTER (INSERT INTO manufact VALUES ( "TXS", "Texas", "0"));

2.2 Enter the following SQL statements. There is no order number 222 in the orders table.
UPDATE orders SET order_date = "01/01/98" WHERE order_num = 222; SELECT * FROM manufact;

What is in the manufact table? 2.3 Enter these SQL statements. Order number 1006 does exist in the orders table.
UPDATE orders SET WHERE order_num SELECT order_date WHERE order_num order_date = "01/01/98" = 1006; FROM orders = 1006;

What is the order date set to? 2.4 Create a new trigger for the EACH ROW event of the orders table to insert a different row in the manufact table.
DROP TRIGGER trig1; CREATE TRIGGER trig1 UPDATE ON orders FOR EACH ROW (INSERT INTO manufact VALUES ("ALA", "Bama", "0"));

2.5 Enter the following SQL statement. There is no order number 222 in orders.
UPDATE orders SET order_date = "01/01/98" WHERE order_num = 222; SELECT * FROM manufact;

What is in the manufact table?

Creating and Using Triggers 10-31

10-32 Creating and Using Triggers

Solutions

Creating and Using Triggers 10-33

Solution 1
1.1 Create a history table with the same columns as the customer table.
create table history ( customer_num integer not null, fname char(15), lname char(15), company char(20), address1 char(20), address2 char(20), city char(15), state char(2), zipcode char(5), phone char(18) );

1.2 Create a DELETE trigger that will take a deleted row from the customer table and INSERT it into the history table.
CREATE TRIGGER del_cust DELETE ON customer REFERENCING old AS pre FOR EACH ROW (INSERT INTO history VALUES (pre.customer_num, pre.fname, pre.lname, pre.company, pre.address1, pre.address2, pre.city, pre.state, pre.zipcode, pre.phone));

1.3 Execute the following DELETE statement:


DELETE FROM customer WHERE customer_num = 102;

1.4 Verify that a row was inserted in the history table


SELECT * FROM history;

10-34 Creating and Using Triggers

Solution 2
2.1 2.2 Enter the following SQL statements:
UPDATE orders SET order_date = "01/01/98" WHERE order_num = 222; O rows updated

What is in the manufact table?


SELECT * FROM manufact;

The row for TXS is in the table even though no rows were updated, since the trigger was an AFTER trigger. 2.3 Enter this SQL statement;
UPDATE orders SET order_date = "01/01/98" WHERE order_num = 1006; Unique constraint u102_7 violated.

The row for TXS cannot be inserted into the manufact table again. What is the order date set to?
SELECT order_date from orders where order_num = 1006;

The order date has not been changed. Since the trigger action failed, the trigger event has been rolled back. 2.4 2.5 Enter the following SQL statement:
UPDATE orders SET order_date = "01/01/98" WHERE order_num = 222; 0 rows updated.

What is in the manufact table?


SELECT * FROM manufact;

The row for ALA is not in the table since the trigger was FOR EACH ROW and no rows were updated.

Creating and Using Triggers 10-35

10-36 Creating and Using Triggers

Module 11
Modes and Violation Detection

Modes and Violation Detection 09-2001 2001 International Business Machines Corporation

11-1

Objectives
At the end of this module, you will be able to: n Enable and disable constraints, triggers, and indexes n Use the filtering mode for constraints and indexes n Record violations in a database table

11-2 Modes and Violation Detection

Types of Database Objects


n

n n

A constraint w Unique constraint w Referential constraint w Check constraint w NOT NULL constraint An index A trigger

A database object is defined as including the following:


n

Any constraint A constraint can be one of the following types:


w w w w

Unique constraint: every row inserted or updated must have a unique value for the key specified. Referential constraint: enforce parent/child (master/detail) relationships between the primary key and foreign key. Check constraint: every row must pass the condition specified for one or more columns. NOT NULL constraint: a column cannot have a null value.

n n

Any index Any trigger

Modes and Violation Detection 11-3

Database Object Modes


n n n

Enabled- normal state Disabled- not enforced (constraints) or used (triggers or indexes) Filtering (except triggers)- Enabled, but errors are logged. Transaction is not rolled back.

Database objects can have one of the following modes:


n

Enabled: this is the normal and default state or mode of an object. An enabled constraint is enforced. An enabled index is active and contains all entries. An enabled trigger is fired when the trigger event occurs. Disabled: a disabled constraint is not checked or enforced. With a disabled index, contents are not updated when a row is inserted, deleted, or updated. A disabled trigger is not fired and is ignored by the database server. Even though the constraint, trigger, or index entry is disabled, it remains in the system catalog tables. Filtering: a filtering constraint is checked and enforced, just like enabled constraints. However, any errors are placed in an error log table. One important effect of a filtering object is that, if a constraint is violated, then the rest of the transaction is not rolled back. Triggers cannot be in filtering mode. The only type of index that can be in filtering mode is a unique index.

Database object modes are supported in IBM Informix Dynamic Server and IBM Informix-SE.

11-4 Modes and Violation Detection

Why Use Object Modes?


n n n n

Database loads are faster without constraints, triggers, or indexes. Re-enabling an object is easier and more accurate than re-creating it. Finding a constraint violation is easy when the object is in filtering mode. You can skip statements in a transaction that violate constraints instead of causing the statement to roll back (in filtering mode).

Object modes are convenient for the following reasons:


n

Inserting or updating a large number of rows will be dramatically slower when the database server must check constraints or insert keys into indexes. For fastest performance, disable the objects, load the data, and re-enable the objects. However, if you are unsure about the integrity of the data you are loading, set constraints to filtering mode and disable indexes. In prior database releases, you had to delete an object if you did not want it enabled. After a data load, you had to re-create constraints and indexes. By disabling an object instead of removing it, you simply re-enable it when needed. One SQL statement can re-enable all objects in a table. Sometimes it may be difficult to find which row violated a constraint when a single UPDATE statement, DELETE statement, or INSERT cursor affects many rows. By changing the object to filtering mode, the rows that contained the error will be placed in a violations table. Without object modes, any SQL error in a transaction would cause the statement to automatically roll back (databases with logging only). In filtering mode, you can skip any statements that cause a violation error, while still allowing the transaction to continue.

Modes and Violation Detection 11-5

Disabling an Object
n

Disabling individual objects: SET CONSTRAINTS c117_11, c117_12 DISABLED; SET INDEXES idx_x1 DISABLED; SET TRIGGERS upd_cust DISABLED; Disabling all objects for a table: SET CONSTRAINTS, INDEXES, TRIGGERS FOR customer DISABLED;

Results: w Constraints are not checked w Triggers do not fire w Indexes are not updated or used for queries

There are two methods to disable objects that already exist in a database.
n n

Individually - To disable a constraint, index, or trigger, specify the object name. The constraint and index names can be found with the dbschema or DB-Access utility. By table - All constraints, triggers, and indexes for a table can be disabled with one SQL statement, as shown above. Any trigger which names the table in the trigger event is disabled.

A disabled constraint is not checked and a disabled trigger does not execute. A disabled index is neither updated nor referenced by the optimizer when choosing query paths. If an index is created as a result of adding a referential or unique constraint, the index is always enabled as long as the constraint is enabled.

Note
The SET CONSTRAINTS statement places an exclusive table lock on the target table for the duration of the statement.

11-6 Modes and Violation Detection

Creating a Disabled Object


Index: CREATE UNIQUE INDEX idx1 ON employee(emp_no) DISABLED; Constraint: CREATE TABLE customer(customer_num SERIAL, state CHAR(2) CHECK (state IN ("CA","AZ")) DISABLED); Trigger: CREATE TRIGGER t1 UPDATE ON orders BEFORE (EXECUTE PROCEDURE x1()) DISABLED;
7

You can specify that an object is disabled when you create it. The DISABLED keyword is added to the end of the CREATE UNIQUE INDEX statement, CREATE TRIGGER statement, or the column or table level constraint definition within the CREATE TABLE statement. The ALTER TABLE statement can also be used to create a filtered or disabled constraint. For example:
ALTER TABLE employee ADD CONSTRAINT CHECK (age<100) CONSTRAINT agelimit FILTERING;

Modes and Violation Detection 11-7

Enabling a Constraint
n

Enabling individual objects: SET CONSTRAINTS c117_11, c117_12 ENABLED; SET INDEXES idx_x1 ENABLED; SET TRIGGERS upd_cust ENABLED; Enabling all objects for a table: SET CONSTRAINTS, INDEXES, TRIGGERS FOR customer ENABLED;
If a constraint that is set to disabled is enabled, all existing rows must satisfy the constraint. If some rows violate the constraint, an error will be returned.

When an object is created, its default mode is enabled . There are two methods to enable objects that already exist in a database.
n n

Individually - To enable a constraint, index, or trigger, you must specify the object name. The constraint and index names can be found by dbschema or DB-Access. By table - All constraints, triggers, and indexes for a table can be enabled with one SQL statement, shown above. Any trigger which names the table in the trigger event will be enabled.

When a constraint is enabled from disabled, all existing rows are checked to see if they satisfy the constraint. If any rows do not satisfy the constraint, an error is returned and the constraint remains disabled. When a constraint is enabled from filtering, the existing rows are not re-checked because they already satisfy the constraint. When an index is enabled from disabled mode, the entire index is effectively re-built.

11-8 Modes and Violation Detection

Recording Violations

Set constraints enabled Row causing violation One row may have multiple violations Constraint(s) that were violated

Violations Table

Diagnostics Table

When the constraint mode is changed to "filtering" or "enabled" you can record any subsequent violations in two tables: the violations table and the diagnostics table. All violations for constraints or indexes on a table are placed in its corresponding violations table and diagnostics table. There can only be one pair of these tables for each database table. The violations table holds information about the row where the violation occurred. The diagnostics table contains one row for every violation that occurred. In some cases, one row may have multiple violations. For example, an inserted row may have violated the NOT NULL constraint, a referential constraint, and a primary key constraint. In this case, three rows would be placed in the diagnostics table, and only one row would be placed in the violations table. In addition to the violation being recorded in the violations table, the user may or may not get an error. If the constraint is enabled, the user will receive an error when a violation occurs. Error handling for violations in filtering mode is discussed later in this chapter.

Modes and Violation Detection 11-9

Violations Tables Setup


n n

To create the violations and diagnostic tables: START VIOLATIONS TABLE FOR tab_name; To explicitly name the violations tables: START VIOLATIONS TABLE FOR tab_name USING vio_tabname,diag_tabname; To cap the number of rows in the diagnostics table: START VIOLATIONS TABLE FOR tab_name MAX ROWS x;

10

To start violation logging, run the START VIOLATIONS TABLE statement. This statement does two things:
n n

It creates the violations and diagnostics tables. It causes the database server to log violations if they occur.

You can restrict the number of rows inserted in the diagnostics table as a result of a single data row with the MAX ROWS clause (up to the limit of 2,147,483,647 rows). Remember that MAX ROWS only restricts the number of rows that are created per data row, not the total number of rows inserted into the diagnostics table. Only the owner of the table (with resource privileges for the database) or the DBA can execute the START VIOLATIONS TABLE statement for a table.

Table Names
If you do not explicitly name the tables with the USING clause, they will be named tabname_vio and tabname_dia, where tabname is the name of the table that violations are being recorded for. If the table name is greater than 14 characters, you should specify the table names explicitly (because of the limit of 18 characters for a table name).

11-10 Modes and Violation Detection

Table Permissions
In general, if a user has INSERT, UPDATE, or DELETE permissions on the target table, the user also has permissions to INSERT into the violations tables. For more information about permissions on the violations tables, consult the IBM Informix Guide to SQL: Reference manual.

Table Extents
The extent sizes for the violations and diagnostics table are set at the default value. The violations table and the diagnostics table will be placed in the same dbspace as the target table. If the target table is fragmented, the violations table will be fragmented in the same manner. The diagnostics table will be fragmented in a round robin fashion over the same dbspaces on which the target table is fragmented.

Modes and Violation Detection 11-11

Filtering Modes
n

n n

Set individual objects to filtering: SET CONSTRAINTS c117_11, c117_12 FILTERING; SET INDEXES idx_x1 FILTERING; Set all objects for a table to filtering: SET CONSTRAINTS, INDEXES FOR customer FILTERING; Cause an error to be returned to the application if a violation occurs: SET CONSTRAINTS, INDEXES FOR customer FILTERING WITH ERROR;

12

Constraints and unique indexes can be set to filtering mode. Triggers and indexes other than unique indexes cannot be set to filtering mode. In filtering mode, any constraint or unique index violations will be recorded in the violations table as they occur. Before setting an object to filtering mode, make sure violation logging is enabled for the table with the START VIOLATIONS TABLE statement. In filtering mode, a violation will not cause the statement to roll back. In the default filtering mode (WITHOUT ERROR) the application tool will not be informed that a violation occurred. Be careful with this mode, as a user could incorrectly assume the transaction was completed in full when in fact it may not have been. If the WITH ERROR clause is included in the SET statement, an error is returned to the user.
971: Integrity violations detected.

However, unlike enabled mode, the error does not automatically roll back the statement. To roll back the transaction, the user must explicitly execute ROLLBACK WORK.

11-12 Modes and Violation Detection

Note
The INSERT statements that add the errors to the diagnostics and violations table are a part of the current transaction. If you roll back the transaction, the rows in the violations and diagnostics tables get rolled back also.

Modes and Violation Detection 11-13

Turning Off Violation Logging


To turn off logging: STOP VIOLATIONS TABLE for tab_name;

14

You can turn off violation logging for a table with the STOP VIOLATIONS statement. This statement does not remove the violations tables. The administrator should do that after violation logging is stopped using the DROP TABLE statement. Violation logging should always be on when the target table has any constraints in filtering mode. If logging is off, any filtering constraint violations will produce an error because they cannot be logged.

11-14 Modes and Violation Detection

Example
CREATE TABLE customer ( customer_num SERIAL NOT NULL CONSTRAINT n_cust, name CHAR(15), PRIMARY KEY (customer_num) CONSTRAINT pk_cust); CREATE TABLE orders ( order_num SERIAL NOT NULL CONSTRAINT n_ord, customer_num INTEGER NOT NULL CONSTRAINT n_ordcnum, ship_instruct CHAR(40)); ALTER TABLE orders ADD CONSTRAINT (FOREIGN KEY (customer_num) REFERENCES customer CONSTRAINT fk_ord);
15

The next few pages illustrate an example of how object modes might be used. The example shows two tables, customer and orders . The customer table is the parent table with a primary key, and the orders table is the child table, with a constraint referencing the customer table. This means that there must be a corresponding customer row for every orders row.

Modes and Violation Detection 11-15

Example (cont.)
The following statement will produce an error: SET CONSTRAINTS, TRIGGERS, INDEXES FOR customer DISABLED; Instead: SET CONSTRAINTS, TRIGGERS, INDEXES FOR orders DISABLED; SET CONSTRAINTS, TRIGGERS, INDEXES FOR customer DISABLED; Now start violation logging: START VIOLATIONS TABLE FOR customer; START VIOLATIONS TABLE FOR orders;
16

Suppose we wish to perform a large load, where some data may cause some temporary referential integrity errors. We decide to disable the constraints for the customer and orders table. It is important to execute the SET CONSTRAINTS statement in the proper order. You cannot disable an object when other enabled objects refer to it. Because the referential constraint for orders refers to the customer table, disabling customer constraints first produces an error. Instead, you should disable the orders constraints first. In order to get violations logged, we must inform the database server that violations for a table should be logged with the START VIOLATIONS TABLE statement. This statement will create the two violations tables for the table listed. In our example, four additional tables will be created: customer_vio, customer_dia, orders_vio, and orders_dia.

11-16 Modes and Violation Detection

Example (cont.)
This statement is successful: INSERT INTO ORDERS (order_num,customer_num,ship_instruct) VALUES (0,2,"ship tomorrow"); However, an error will occur when constraints are enabled: SET CONSTRAINTS, TRIGGERS, INDEXES FOR customer ENABLED; SET CONSTRAINTS, TRIGGERS, INDEXES FOR orders ENABLED;

971: Integrity violations detected.

17

Once constraints are disabled, they are not checked. That is why the INSERT statement above is successful even though there is no customer number 2 in the customer table. If constraints were enabled, the statement would fail because you cannot add an order row without a corresponding customer row. But when you try to enable the constraints, the database server must check all rows to make sure that there are no violations. The SET CONSTRAINTS... ENABLED statement above will fail because of the violation introduced with the INSERT statement.

Modes and Violation Detection 11-17

Example (cont.)
Errors placed in the violations tables: SELECT * FROM orders_vio, orders_dia WHERE orders_vio.informix_tupleid = orders_dia.informix_tupleid;
order_num customer_num ship_instruct informix_tupleid informix_optype informix_recowner informix_tupleid objtype objowner objname 2 2 ship tomorrow 1 S informix 1 C informix fk_ord

18

When the SET CONSTRAINTS...ENABLED statement is executed, any violations are placed in the violations tables (if the START VIOLATIONS TABLE statement has been executed for the table). The administrator can browse through the violations table and determine (by the objname column) which constraints were violated. In the above example, the fk_ord constraint has been violated. The administrator can run dbschema and look for the fk_ord constraint:
alter table "informix".orders add constraint( foreign key (customer_num) references "informix".customer constraint "informix".fk_ord disabled);

The constraint violated is a referential integrity constraint. By adding a customer row with customer_num = 2, this constraint will no longer be violated.

11-18 Modes and Violation Detection

Example 2
Now suppose the administrator wants to know what violations are occurring on a table during normal SQL activity: SET CONSTRAINTS, INDEXES FOR customer FILTERING; SET CONSTRAINTS, INDEXES FOR orders FILTERING; This row will not be inserted but no error is returned! INSERT INTO orders (order_num,customer_num,ship_instruct) VALUES (0,4,"ship tomorrow");

19

Suppose for some reason, the administrator wants to know what violations are occurring during a load process, without causing the load process to fail. To do this, the administrator sets constraints and indexes (unique) to filtering. Since the WITH ERROR clause was not included in the SET statement, the application is not notified when any error is returned. The INSERT statement above fails and an entry is put in the violations table. However, the application receives no error.

Serial Values
Even though the row in the example above is not inserted, the serial counter for the table is incremented. The violations table shows the serial value of order_num as it would have been had the row been inserted.

Modes and Violation Detection 11-19

Example 2 (cont.)
Enable constraints: SET CONSTRAINTS, INDEXES FOR customer ENABLED; SET CONSTRAINTS, INDEXES FOR orders ENABLED; Turn off violations logging: STOP VIOLATIONS TABLE FOR customer; STOP VIOLATIONS TABLE FOR orders; Fix errors that caused violations to occur: INSERT INTO customer(customer_num,name) VALUES (4,"SCHMIDT"); Insert rows that caused violations into target table: INSERT INTO orders SELECT order_num, customer_num, ship_instruct FROM orders_vio;
20

Now the administrator wants to reconcile any violations. First, he enables the constraints and indexes. Then he turns off violations logging. These two steps are required in order to insert the violations back into the target table, thus avoiding any endless cycles (violations being added to the violations table and later being inserted back into the customer table). Next, the administrator fixes any errors that caused the violations to occur. In the above example, the parent row was missing for a referential constraint. Finally, the administrator can copy any rows in the violations table into the target table with the INSERT INTO ... SELECT FROM statement.

11-20 Modes and Violation Detection

Violations Table Schema


Name target_table columns... Informix_tupleid Informix_optype serial char(1) Data Type Description Columns from the target table ... Unique serial identifier I=Insert D=Delete O=Update (original values) N=Update (new values) S=Created by the SET command The user submitting the SQL statement causing the violation
21

Informix_recowner char(8)

The violations table contains the same columns as the target table (the table the violations are being recorded for). In addition, there are three more columns that store a unique serial id, the operation type that caused the error, and the login of the user that submitted the SQL statement that caused the violation to occur. If the target table (the table the violations are being recorded for) contains a serial value, it is stored as an integer in the violations table. The violations table receives one row every time an SQL statement causes one or more violations to occur.

Modes and Violation Detection 11-21

Diagnostic Table Schema

Name informix_tupleid obj_type obj_owner obj_name

Type integer char(1) char(8)

Description references the violations table C=Constraint violation I=Unique index violation Owner of the constraint or index

char(18) Name of the constraint or index as stored in sysconstraints or sysindexes

22

The diagnostics table contains one row for every constraint or unique index violation occurrence. The table stores information about the constraint that was violated. The informix_tupleid column can be joined with the column of the same name in the violations table to associate the violation entries with their corresponding diagnostics. To determine what constraint was violated, run the dbschema utility and look for the constraint name that matches the obj_name in the diagnostics table.

11-22 Modes and Violation Detection

System Catalog Tables


sysobjstate holds the state of constraints, triggers, indexes.
Name objtype owner name tabid state Type char(1) char(8) int char(1) Description C = Constraints, I = Index, T = Trigger Owner of the object Table-id, join with systables to find table name D = Disabled, E = Enabled, F = Filtering with no error, G = Filtering with error
23

char(18) Name of the object

The sysobjstate system catalog table contains one row for every trigger, constraint, and index.

Modes and Violation Detection 11-23

System Catalog Tables (cont.)


sysviolations holds information about the violations and diagnostics tables
.

Name targettid

Type int

Description Table id, join with systables to find the target table name Violations table id Diagnostics table id Maximum number of rows allowed in the violations table (null if no maximum)
24

viotid diatid maxrows

int int int

The sysviolations table stores information about the violations table and the diagnostics table for the target database table. There is one row in this table for every table that has associated violations and diagnostics tables.

11-24 Modes and Violation Detection

Exercises

Modes and Violation Detection 11-25

Exercise 1
1.1 Disable all constraints for the items table. 1.2 Insert the following row:
insert into items(item_num,order_num,stock_num, manu_code,quantity,total_price) values(3,1001,1,"JKL",1,250);

1.3 Start violation logging for the items table. 1.4 Attempt to enable the constraints. What errors occurred in the violations tables? How do you know which constraint has been violated? 1.5 Fix the errors and try enabling the constraints again.

11-26 Modes and Violation Detection

Solutions

Modes and Violation Detection 11-27

Solution 1
1.1 Disable all constraints for the items table.
set constraints, triggers, indexes for items disabled;

1.2 Insert the following row:


insert into items(item_num,order_num,stock_num, manu_code,quantity,total_price) values(3,1001,1,"JKL",1,250);

1.3 Start violation logging for the items table.


start violations table for items;

1.4 Attempt to enable the constraints. What errors occurred in the violations tables? How do you know which constraint has been violated?
set constraints, triggers, indexes for items enabled;
971: Integrity violations detected.

select * from items_vio, items_dia where items_vio.informix_tupleid = items_dia.informix_tupleid;


item_num 3 order_num 1001 stock_num 1 manu_code JKL quantity 1 total_price $250.00 informix_tupleid 1 informix_optype S informix_recowner informix informix_tupleid 1 objtype C objowner informix objname r104_12 Determine that r104_12 refers to the foreign key constraint for stock_num, manu_code by looking at the constraints table or running dbschema.

1.5 Fix the errors and try enabling the constraints again.
update items set manu_code = "HSK" where item_num = 3 and order_num = 1001; set constraints, triggers, indexes for items enabled;

11-28 Modes and Violation Detection

Module 12
The IBM Informix Cost-Based Optimizer

The IBM Informix Cost-Based Optimizer 09-2001 2001 International Business Machines Corporation

12-1

Objectives
At the end of this module, you will be able to: n Discuss the features of the cost-based optimizer n Describe how the optimizer finds the best method to process the query n Use the SET EXPLAIN output to analyze a query plan n Use optimizer directives to influence the optimizer

12-2 The IBM Informix Cost-Based Optimizer

Definitions
n

Join w Combines information from two tables based on the relationship between one or more columns in each table Tuple w Contains the results of a join between two tables

The terms join and tuple (rhymes with couple) will be used in this module to explain query optimization. Every join has two tables. One table is chosen as the first table, the table that will be scanned initially. The other table in the join will be referred to as the second table, which will be accessed to find the corresponding data to complete the join. When data is joined together between two tables, a tuple is created. It is important to understand how tuples are generated in order to fully appreciate the importance of an optimal query path.

The IBM Informix Cost-Based Optimizer 12-3

Primary Join Strategies


n

Nested loop join w Scan the first table in any order and match the corresponding columns in the second table to form tuples. Hash join w Scan the first table sequentially to build a hash table, and look up the corresponding columns from the second table to form tuples.

The primary join strategies that are used to join tables together are:
n

Nested loop join


w

This is the traditional join technique. The table chosen by the optimizer to be accessed first in the join is scanned in any order and matched with the corresponding column found in the second table in the join. A variation of this, the semi-join, may be used by version 7.3 if a subquery is converted to a nested loop join for better performance. In this variation the optimizer halts the inner table scan when the first match is found, if appropriate. A sequential scan of the first table is performed to build a hash table, then rows from the second table are looked up in the hash table to perform the join.

Hash joins
w

In previous versions of IBM Informix it was more common to see a third join method, a sortmerge join, used when no indexes were available. This method sorts each table by the join column and then merges the rows from each table.

12-4 The IBM Informix Cost-Based Optimizer

Nested Loop Join


select * from B,C where C.x = B.w and C.z = 17
first table second table

Table C z 17 17 18 17 x cc aa ee ff

Table B w aa cc ff y 2 1 1

Resulting tuple z x w y 17 17 17 cc aa ff cc aa ff 1 2 1

The nested loop join scans the outer table and then joins the rows found with corresponding rows in the inner table. First, the first row from the first table is fetched. The column(s) in this row that will be used to join the first table with the second table are formed into a key. Next, using the key generated from the columns in the first table, the second table will be searched for all rows with that given key. If the key in the second table is declared as unique, we need only perform one read on the second table. If the key is non-unique, we may have to read several rows. If no index is present or usable, either an index will be created on the second table or the table will be searched sequentially. If a third table was to be accessed in the query, then for each row fetched from the second table, the third table would be searched (again, using an index whenever possible) for matching rows. This process would continue for each table in the query. For every row fetched from the first table, at least one row is fetched from the second table unless there is no matching row, and so on for all the tables in the query. Each row fetched or examined may not wind up in a returned tuple, as additional filter conditions may be applied that will eliminate that row from further consideration. Typically, more rows will be examined than will actually be returned.

The IBM Informix Cost-Based Optimizer 12-5

Hash Join

1. table2 is scanned and placed in a . hash table

table2

row header

rows

table1

What cant fit in memory will be partitioned out to disk.

2. Values in table1
are looked up in the hash table.

DBSPACETEMP

Hash joins can provide significant performance advantages over the other join methods, especially if the size of the join tables are very large. Hash joins are faster than sort merge joins where both tables must be sorted. Typically in a hash join, the hash table is created on the smaller table, and the larger of the two tables does not have to be sorted. In the example above, table2 is chosen to create a hash table. Using a hash function, IBM Informix reads each row in the table, executes the hash function on the row, and determines the hash bucket that will hold the row. The rows in each of the buckets are not sorted. Once table2 has been read and placed in a hash table, rows in table1 are read, and the hash key value is calculated and looked up in the hash table. The hash table is created in the virtual portions of your shared memory segments. If there is not adequate memory available, the hash table is partitioned out to disk. The temporary file space is allocated in temporary dbspace(s), as indicated by DBSPACETEMP. Optimally, you want the hash table to be created entirely in shared memory. You can estimate memory requirements for hash joins using the following calculation:
(32 bytes + row_size) * # of rows in the smallest table = # of bytes in hash table

12-6 The IBM Informix Cost-Based Optimizer

Query Paths

Path 1
table 1 table 3 table 2

Path 2
table 2 table 3 table1

Path 3
table 2 table 1 table 3

The optimizer exhaustively examines all possible ways of scanning each table in a query.

A query path is the method the optimizer will use to form the tuples. If the optimizer chooses to scan table A first and join it to table B , the query path is AB. Query paths are important for several reasons:
n

A good query path can minimize the amount of data that will be examined. The more you can narrow down the number of possible rows that satisfy the query in the earlier stages, the less time is spent reading rows in other tables that may not match the query criteria. This is why very small tables, or tables with very restrictive filters (e.g. order_num = 1001), are usually put early in the query path. A good query path can prevent extra sorting for ORDER BY or GROUP BY statements. For example, to prevent an ORDER BY in a SELECT statement from requiring a sort, the optimizer may choose to put the table containing the sort columns first (assuming there is an index on the sort columns and the nested loop join is used for subsequent joins). Because the table will be scanned in the order of the column specified in the ORDER BY, no extra sort process is needed.

The IBM Informix Cost-Based Optimizer 12-7

Calculating Costs
Calculate Cost for each path:
Cost = # disk accesses + W * (# tuples processed)

Cost = (l/O cost)

+ W * (CPU cost)

Choose best path:


Optimal path = Cheapest path

The optimizer uses cost estimates to determine the optimal query path to use. Each path is assigned a cost estimate based on a formula; the cheapest path is considered the optimal path. The formula used to calculate the cost is based on the cost of the I/O that will be performed and the amount of CPU that will be needed to process the data. I/O cost is calculated from the estimated number of disk accesses that will be needed to process all the necessary data. CPU cost is based upon the number of tuples processed. This cost is converted to an equivalent I/O cost using a weighting factor. The weighting factor is an adjustment to the CPU cost because processing a row (CPU cost) is relatively less expensive than a disk access (I/O cost). The weighting factor is a hard-coded value that cannot be changed. The full formula used is as follows:
cost = (I/O cost) + W * ( CPU cost)

where
w w w

I/O cost = number of disk accesses CPU cost = number of tuples touched W = weighting factor converting CPU cost to relative I/O cost

12-8 The IBM Informix Cost-Based Optimizer

Optimization Process
Summary of optimization process: n Examine all table, filters, and indexes n Estimate costs for every join pair n Repeat estimation for each additional table in the join

The above steps summarize how the optimization process finds the best way to execute a query. Each step will be examined in detail on the following pages.

The IBM Informix Cost-Based Optimizer 12-9

Step 1: Examine All Tables


n

Examine selectivity of every filter w Data distributions w Indexes w Equality operators Determine if indexes can be used for w Filters w ORDERS BY or GROUP BY Find the best way to scan a table w Sequentially w By an Index

10

The optimizer examines all filters in the query to see how they might cut down on the number of rows read in a table. All filters are assigned a selectivity, a number between 0 and 1 that indicates the fraction of rows the optimizer thinks the filter will pass. A very selective filter will have a selectivity near 0.
n n

Data distributions, if available, will be used by the optimizer in determining the selectivity. Otherwise, indexes are used. Filters on columns containing indexes are considered more selective than filters on columns without indexes. For example: Filter Expression indexed-column = literal value indexed column = host-variable indexed-column IS NULL table1.indexed-column = table2. indexed column any-column = any-expression any-column IS NULL Selectivity Calculation 1/(number of distinct keys in index)

1/(number of distinct keys in larger index) 1/10

12-10 The IBM Informix Cost-Based Optimizer

The operators used with the filter are also examined. The filter equality operator (=) is more selective than any of the other operators. Compare the selectivity of the filters below with the equality expressions on the previous page: Filter Expression any-column > any-expression any-column < any-expression 1/3 Selectivity Calculation

any-column MATCHES (LIKE) 1/5 any-expression

Scanning the Table


Each table will also be examined to find the best way to scan it (sequentially or with an index). Remember that only the first table is scanned for a nested loop join. In version 7.3, if the optimizer chooses to convert a subquery to a join, the following variations of a table scan may be used:
n n

First Row scan - as soon as the optimizer finds one match the scan halts. Skip-Duplicate Index scan - the optimizer does not scan duplicates.

The IBM Informix Cost-Based Optimizer 12-11

Step 2: Estimate Cost for Joined Pair


select * from a,b,c,d where a.a = b.a and b.b = c.b and c.c = d.c

Possible Join Pairs: ab ac ad ba bc bd ca cd cb da db dc For every join pair, the optimizer will: n Find the best way to join two tables n Decide which indexes would be best for the join n Calculate the cost of the join n Eliminate redundant join pairs

12

The second step involves generating all possible table pairs for every table in the query. For each join pair, the optimizer will:
n n n n

Find the best way to join two tables (generally nested loop or hash join). Decide whether indexes should be used in the join between the two tables, and which indexes would be best. Calculate the cost of the join of every pair. Eliminate redundant pairs. For example, there are two ways to join table a and table b : ab and ba (remember that the order is important). Once the costs are calculated, the optimizer chooses the least costly way to join the two tables and drops the higher cost pair. If, however, the redundant pair may potentially be used to avoid a sort for an ORDER BY, we may keep the redundant path, even though the cost at this point is higher.

The optimizer calculates the cost for every pair, even pairs that do not possess join filters (ab ,bd ). There are a few cases where the optimizer will choose a cartesian product between tables if the cost of the path is less than if another path was used.

12-12 The IBM Informix Cost-Based Optimizer

Step 3: Repeat for Each Extra Table

select * from a,b,c,d where a.a = b.a and b.b = c.b and c.c = d.c
ab abc abcd abd abdc acd acdb ac acb acbd... adc
cost for 3-ways cost for 4-ways

ad

bd...

13

Once all possible join pairs are generated and assigned costs, the optimizer generates three-way joins. A three-way join is examined in the same way as a two-way join. Think of a three-way join as a result table of a join between the first two tables, joined with a third table. All possible ways to join two tables to a third (3-way join) are examined and a cost is generated for each join. This process repeats for every four-way join, five-way join, and so on, until all tables are joined. Finally, if there are any ORDER BY or GROUP BY clauses that will require a sort, the cost is added on to the total cost of the path. After all the tables are joined in all different ways, the path with the least cost is chosen by the optimizer.

How the Optimizer Usually Works


The optimizer usually uses the following guidelines when determining the query plan:
n

Do not use an index if the database server must read a large portion of the table. Reading the table sequentially is more efficient than traversing an index when the database server must read most of the rows.

The IBM Informix Cost-Based Optimizer 12-13

n n

When choosing between indexes, choose the index that can rule out the most rows. Place small tables, or tables with more restrictive filters, early in the query plan. By ruling out rows in a table, the database server will not have to read as many rows from the next tables that join to it. Choose a hash join when neither column in a join filter has an index. Choose a nested loop join if:
w w

n n

the number of rows in the outer table after applying filters is small, and an index on the inner table can be used to retrieve rows. an index on the outer table can be used to return rows in the order of an ORDER BY clause.

In some cases the optimizer does not choose the best path because it does not have enough information about the nature of the data. Running the UPDATE STATISTICS command and creating data distributions can provide the optimizer with more information. This is covered in detail in the chapter Update Statistics and Data Distributions.

12-14 The IBM Informix Cost-Based Optimizer

OPTCOMPIND
n n

OPTCOMPIND = 0: Only consider the index paths in a join pair. OPTCOMPIND = 1: For repeatable read isolation level, only consider the index paths in a join pair. Otherwise, choose the lowest cost access method (index path, hash join). OPTCOMPIND = 2: Always choose the lowest cost access method (index path, hash join). This is the default.

15

The OPTCOMPIND configuration parameter is one way to influence the optimizer choice in a query path. It can be set as a configuration parameter or environment variable. OPTCOMPIND simply determines when the optimizer is free to compare a hash join with a nested loop or sort merge join for a specific join pair only. Setting OPTCOMPIND to 0 forces the optimizer to behave as in earlier releases; namely, when examining a specific join pair, only consider a hash join if there is no existing index which can be used to accomplish the join. Suppose you have two tables, A and B. You also have an index on B.x. You execute the following query:
select * from A, B where A.x = B.x;

The optimizer will consider TWO join orders: ( A ,B ) and ( B ,A). When it considers ( A, B), it will examine OPTCOMPIND and do one of the following:
n n

OPTCOMPIND is 0: only consider the index join using the index on B.x. Dynamic hash join may be used if no index exists. OPTCOMPIND is 1:
w w

If the transaction is in Repeatable Read mode, then only consider the index join using the index on B.x. Otherwise choose the lowest cost join method.
The IBM Informix Cost-Based Optimizer 12-15

OPTCOMPIND is 2: choose the lowest cost between the index, hash, and sort merge joins. This is the default value.

When OPTCOMPIND is 2 and the optimizer considers ( B ,A ), because there are no useful indices on A , it will choose between a dynamically constructed index, hash, and sort merge join. Finally, the optimizer compares the cost of the path ( A ,B ) with (B ,A ) and chooses the lower cost path. If the cost of path (B, A), which will be a dynamic join method, is less than the cost of (A, B), an index path, then the dynamic join path will be chosen.

12-16 The IBM Informix Cost-Based Optimizer

Optimizer Enhancement
SET OPTIMIZATION [HIGH | LOW][FIRST_ROWS];

17

The optimizer enhancements HIGH and LOW allow the application to choose whether the optimizer will examine all paths or only the most likely ones. The default strategy is HIGH. The optimizer enhancement FIRST_ROWS instructs the optimizer to choose a plan that minimizes the time to retrieve the first screen of data, as opposed to minimizing total query time.

The IBM Informix Cost-Based Optimizer 12-17

Optimization LOW

select * from a,b,c,d where a.a = b.a and b.b = c.b and c.c = d.c
ab acb ac acd acdb Examine. only the lowest cost paths ad bc ....

18

Setting the optimization strategy to LOW will eliminate some of the work the optimizer must do before the statement is executed. You risk, however, not having the optimal path chosen because it may have been eliminated early in the optimization process. The example above shows how the optimizer might optimize a query when optimization is set to LOW. At each level (2-way join, 3-way join, 4-way join), the lowest cost join is chosen and the other paths are not examined further. In the optimization LOW example above, ac is chosen as the least cost 2-way join. The other 2way joins are not examined any further. Next, the three-way joins possible from the ac join are examined. Again, only the least cost join is followed down to the next level. As you can see, the number of joins that must be examined is drastically reduced.

12-18 The IBM Informix Cost-Based Optimizer

Optimization of Stored Procedures


The SQL statements in a stored procedure will be optimized automatically when: n The procedure is created or at the first execution. n The structure of involved tables, columns, or indexes changes. If optimization is set to LOW, the procedure may not be aware of changes in structure.

19

When a stored procedure is created, all optimization will be attempted at that time. If the tables cannot be examined at compile time (they may not exist or may not be available), the SQL will be optimized the first time the stored procedure is executed. An SQL statement in a stored procedure will also be optimized at execution time if any DDL statement (ALTER TABLE, CREATE INDEX, DROP INDEX) has been run that might alter the query plan. Altering a table which is linked to another table via a referential constraint (in either direction) will cause re-optimization of procedures which operate on the referenced table. The stored procedure determines whether re-optimization is necessary by checking its dependency list of involved tables and indexes against the version numbers stored in the data dictionary. The risk of running a procedure with low optimization is that the procedure may be unaware of any changes made by other users to the tables.

The IBM Informix Cost-Based Optimizer 12-19

When to Try OPTIMIZATION LOW


Try SET OPTIMIZATION LOW: n If the query time is unacceptable and five or more tables are involved in the query This is more likely to be successful if: n One of the tables is joined to all the other tables n The tables are of varying sizes n Indexes are on the join columns

20

You should only try SET OPTIMIZATION LOW when your query time is unacceptable. There is no real way to determine how long the query optimization time is. If a query is unacceptably slow, you may try setting optimization to LOW and compare the query time to the HIGH query time. The goal is to have the optimizer choose the same query plan as it would if optimization was set to HIGH, even though all the possible paths were not examined. Because SET OPTIMIZATION LOW eliminates some paths early, you may actually make the query time worse by setting optimization to LOW. The optimization time may be faster, but if the less optimal path is chosen, the actual query may take longer. The optimizer is more likely to choose the optimal path, even if optimization is set to LOW, when your query involves a large number of tables (usually more than 5). A query using 4 or less tables will almost never show a decrease in the optimization time. The number of possible join combinations is very high when there are many tables in the SELECT statement; the optimizer set to HIGH will have to examine all these combinations. If one table is joined to many tables, the number of possible join combinations is reduced and the correct query path is more likely be chosen. If the tables are of varying sizes the optimizer is more likely to choose the optimal path by placing the smaller table early on in the query plan.

12-20 The IBM Informix Cost-Based Optimizer

If the tables have indexes on the join columns, the optimizer set to LOW may have a better chance of choosing the appropriate path. There is no sure formula for knowing when SET OPTIMIZATION LOW will give you better results, except by testing the query with both settings.

The IBM Informix Cost-Based Optimizer 12-21

FIRST_ROWS Optimization
Return first screen of data quickly: n SQL statement SET OPTIMIZATION FIRST_ROWS n Environment variable/configuration parameter OPT_GOAL

22

The enhancement FIRST_ROWS instructs the optimizer to develop a query plan to return the first screen of data to the user as quickly as possible, even if the overall query time will be increased. All rows will still be returned. The optimization will be in effect for the entire session, or until the default is re-set:
SET OPTIMIZATION ALL_ROWS

This can also be accomplished by setting the configuration parameter OPT_GOAL, or the environment variable OPT_GOAL, to -1 for ALL_ROWS or 0 for FIRST_ROWS. The default is ALL_ROWS. This command is sometimes confused with the SQL statement SELECT FIRST n. The FIRST n statement will return only the specified number of rows. For example,
SELECT FIRST 6 fname, lname FROM customer ORDER BY lname

will return only the first six rows. The ORDER BY clause is necessary if you want to return a specific set of rows. Otherwise, the optimizer will simply return the first six rows encountered.

12-22 The IBM Informix Cost-Based Optimizer

Using SET EXPLAIN

SET EXPLAIN ON; SELECT statement 1; SELECT statement 2; SELECT statement 3; SET EXPLAIN OFF;

output file

query plans statement 1 statement 2 statement 3

23

You have the ability to see what path the optimizer chose for a query by executing the SET EXPLAIN ON statement before the query is run. When you issue a SET EXPLAIN ON command, text that describes each query will be written to a file on every SELECT statement. This will continue until the program ends or until you issue a SET EXPLAIN OFF command. If the file already exists, subsequent output is appended, which allows you to turn it on and off as required, to collect a log of selected queries. On UNIX systems the file is named sqexplain.out, and is stored in the current directory. On NT, the file is written to %INFORMIXDIR\sqexpln\username .out . Each query plan is documented in a summary that contains the following information:
n n n

An estimate of cost in combined units The tables that will be used in the order they will be used If temporary tables are needed to process the query

The type of access to a table will be one of the following:


n n

SEQUENTIAL SCAN all rows are read sequentially INDEX PATH one or more indexes will be scanned

The IBM Informix Cost-Based Optimizer 12-23

n n

AUTOINDEX PATH a temporary index will be created REMOTE PATH a remote server will decide the access

The type of join between two tables, usually nested loop join or hash join, will be listed. You can also retrieve the SET EXPLAIN output of the last SQL statement for every connected session from the sysmaster database. Simply connect to the sysmaster database and execute the following SQL statement:
SELECT * FROM syssqexplain;

An optimizer directive may be used to turn on SET EXPLAIN for a specific query. This is explained later in this chapter.

12-24 The IBM Informix Cost-Based Optimizer

SET EXPLAIN Example 1


QUERY: select manu_code, stock_num, description from stock order by description;
Estimated cost: 20 Estimated # of Rows Returned 74 Temporary Files Required For Order by 1) client.stock: SEQUENTIAL SCAN

25

Even the simplest query is optimized in order to find the best access strategy. When a query path is chosen before a query is run, the statistics kept by the optimizer are put in the output file.

Estimated Cost
The estimated cost of each query is printed out. In the example above, the estimated cost is 20. The units are not important as this value is only used in comparison to other possible paths. It is important to understand this: the estimated cost is in no way useful for determining either how long the query will take or what the cost in resources will be. Its sole value is for comparison with alternative paths.

Estimated Row Returned


The estimated number of rows to be returned is also printed out. Again, this is only an estimate, but does often come reasonably close to the actual number of rows returned. This estimate is most accurate when all filter and join conditions are associated with indexes and when the statistics for the tables involved in the query are up to date.

The IBM Informix Cost-Based Optimizer 12-25

Temporary File
It will also be reported when a temporary table or file is created for the query; the reason for the temporary file or table is given in this report. In the example on the previous page, we see that a sort was required to process the ORDER BY clause. The sort requires space to hold intermediate files. No temporary file is created if an index can be used to order the tuples. Only the selected path is reported via the SET EXPLAIN command; you cannot find out what alternate paths were considered.

Table Access Strategy


The access strategy for each table in the query is shown. The table in the example will be accessed via a SEQUENTIAL SCAN, wherein the entire table is read from beginning to end.

12-26 The IBM Informix Cost-Based Optimizer

SET EXPLAIN Example 2


QUERY: select max(order_num) from orders;
Estimated cost: 1 Estimated # Rows Returned: 1 1) client.orders: INDEX PATH (1) Index Keys: order_num

(Key-Only) (Aggregate)

27

When a query can take advantage of an index on one or more of the tables, the optimizer will choose to use one or more of these indexes to retrieve rows from the table.

Index Path
This type of access is known as an INDEX PATH. Generally, an INDEX PATH is the fastest access method, as it means you only have to look at rows that satisfy one or more of your filter conditions. The fewer rows that have to be read, the faster the query will run. When the optimizer chooses to use an index to access a table, it will print out the keys used for each index. This information is printed out as Index Keys: for the table.

Key-Only Select
In some cases, all the data you want to retrieve from a table is already contained in the index itself. In such a case, it makes no sense to read the rows for the data because the index holds everything you need. In such a case, the optimizer will perform a Key-Only select. The fact that a Key-Only select is performed is indicated by the (Key-Only) note in the explain output.

The IBM Informix Cost-Based Optimizer 12-27

In the example on the previous page, the column selected is part of an index. Because a maximum value has been selected, it makes sense to use that index to find this value. The index will contain all the information we need, so there is no need to read the data from the rows in the table. A variation of this, the Key-First scan, may be used by the optimizer in version 7.3. The optimizer can apply all key filters prior to retrieving the data pages, to rule out matches.

12-28 The IBM Informix Cost-Based Optimizer

SET EXPLAIN Example 3


QUERY:
select stock.stock_num, stock.description, items.quantity from stock, items where stock.stock_num = items.stock_num and items.quantity > 1; Estimated cost: 14 Estimated # Rows Returned: 51 1) client.items: SEQUENTIAL SCAN Filters: client.items.quantity > 1 2) client.stock: INDEX PATH (1) Index Keys: stock_num,manu_code Lower Index Filter: (client.stock.stock_num = client.items.stock_num)

29

When a query accesses several tables, the explain output will list the tables in the order in which they will be accessed.

Index Read Start and Stop Points


When an indexed search of the tables is performed, there is generally one of two conditions that define the indexed search: the start point and the stop point.

Lower Index Filter


When performing an indexed read, it is first necessary to position within the index to find the first key value. Once this position is found, the index can be read sequentially until the key value no longer meets the condition set. The condition that defines where to initially position in the index is called a Lower Index Filter. The explain output will include the Lower Index Filter for each index used when appropriate.

The IBM Informix Cost-Based Optimizer 12-29

Nested Loop Join


In the example above, the condition stock.stock_num = items.stock_num is used to start the index search on the stock table. For each items.stock_num value retrieved, an index search of the stock table will be performed using that as a key value (nested loop join).

12-30 The IBM Informix Cost-Based Optimizer

SET EXPLAIN Example 4


Query:
select items.quantity, stock.stock_num, stock.unit_price from items, stock where items.total_price = stock.unit_price; Estimated cost: 11 Estimated # Rows Returned: 75 1) informix.items: SEQUENTIAL SCAN(Parallel, fragments:ALL) 2) informix.stock: SEQUENTIAL SCAN DYNAMIC HASH JOIN (Build Outer) Dynamic Hash Filters: informix.items.total_price = informix.stock:unit_price

31

The example above shows a partial SET EXPLAIN output. Typically in DSS environments, large amounts of data are read and full table scans are required. Hash joins can provide significant performance advantages over the other join methods, especially where the size of the join tables are very large. The DYNAMIC HASH JOIN keywords indicate that a hash join will be used. A hash table will be built on one table and a hash join will be performed. It includes the filter that will be used for the join. By default, the hash table is built on the second table listed in the SET EXPLAIN output. If the term Build Outer is listed, the hash table is built on the first table listed.

Fragmented Tables
The SET EXPLAIN output will indicate if a sequential scan of a fragmented table will be performed in parallel, and the number of fragments that will be read. The ability to read tables in parallel greatly increases query performance. For more information on Parallel Data Queries, refer to the IBM Informix Dynamic Server System Administration training courses.

The IBM Informix Cost-Based Optimizer 12-31

Current SQL Information


onstat -g sql
Sess ID 830 825 821 SQL Stmt Type SELECT UPDATE SELECT Current Database nc nc nc Iso Lvl CR CR CR Lock Mode Not Wait Not Wait Not Wait SQL ERR 0 0 0 ISAM ERR 0 0 0 F.E. Vers 7.30 7.30 7.30

33

The onstat -g sql command includes summary information about the last SQL statement executed by each session. The fields included in onstat -g sql are: Session Id The session id of the user executing the SQL statement. To find the user name execute the onstat -g ses command and find the corresponding session id. such as SELECT, UPDATE, DELETE, INSERT. The name of the current database for the session. The current isolation level (CR = committed read, RR = repeatable read, CS = cursor stability, DR = dirty read, NL = no logging). Version 7.31 will include U if update locks are being retained. The current lock mode (either Not Wait or Wait n). The last SQL error. The last ISAM error. The IBM Informix version of the client application.

Statement type Current Database Isolation level

Lock mode SQL ERR ISAM ERR F.E. Vers

12-32 The IBM Informix Cost-Based Optimizer

Adding the session ID to the command will give you more information about the SQL statements of a particular session:
onstat -g sql <session-id>

You can also retrieve the information by running the following SQL statement:
select * from sysmaster:syssqlcurses

The IBM Informix Cost-Based Optimizer 12-33

Optimizer Directives
n n n n n

Access methods Join methods Join order Optimization Goal EXPLAIN

35

Optimizer directives, a new feature of IBM Informix Dynamic Server, allow you to influence the optimizer in the creation of a query plan for an SQL statement. This is best used in the special circumstances where the optimizer does not choose the optimal path. Before adding directives to a query, the query should first be tested with up-to-date statistics and data distributions. If the query time is still not satisfactory, optimizer directives provide a quick way to alter a plan and test it. Directives can be written to tell the optimizer what to AVOID or what to choose. Directives are written as a comment whose first character is a "+" sign. To allow directives to be used in IBM Informix-ESQL products, comments containing directives are passed to the server instead of being stripped out. The output from SET EXPLAIN indicates directives followed and not followed. Directives support control in the following areas of the optimization process:
n

Access methods - index versus scans. The directives are INDEX, AVOID_INDEX, FULL, AVOID_FULL. For example,
SELECT --+ INDEX (e salary_indx) name, salary FROM employee e

12-34 The IBM Informix Cost-Based Optimizer

WHERE e.deptnum = 1 AND e.salary > 50000; n

Join methods - forcing hash joins or nested loop joins. The directives are USE_NL, AVOID_NL, USE_HASH, AVOID_HASH. For example,
SELECT --+USE_NL (department) lastname, salary, deptname FROM employee e, department d WHERE e.deptnum = d.deptnum;

This will cause the optimizer to use a nested loop join to join the department table with the employee table. The department table will be the inner table of the join.
n

Join order - specify order in which tables are joined. This forces the optimizer to join tables in the order in which they appear in the FROM clause. For example,
SELECT --+ORDERED lastname, title, deptnum FROM department d, job j, employee e WHERE e.deptnum = d.deptnum AND e.jobname = job.jobname;

Optimization Goal - for faster response time versus throughput. This duplicates the functionality of SET OPTIMIZATION FIRST_ROWS for the specific query. For example,
SELECT --+FIRST_ROWS lastname, deptname FROM employee e, department d WHERE e.deptnum =d.deptnum;

EXPLAIN - generates query plan output, such as sqexplain.out.

The IBM Informix Cost-Based Optimizer 12-35

EXPLAIN Directive
SELECT --+EXPLAIN AVOID_FULL(e) lastname, title, jobname FROM employee e, jobs j WHERE j.title = Clerk AND j.job = e.job

37

This directive mode directive turns on EXPLAIN plan output for the given query, regardless of the session setting.

12-36 The IBM Informix Cost-Based Optimizer

Using Directives
n

Directives can be used w In SELECT, UPDATE, and DELETE statements w In SELECT Statements embedded in INSERT statements w In Stored Procedures and Triggers Directives cannot be used w In distributed queries that access remote tables w For UPDATE/DELETE WHERE CURRENT OF statements Directives can be controlled by w DIRECTIVES configuration parameter w IFX_DIRECTIVES environment variable

19

Support for directives is provided for all SELECT, UPDATE, and DELETE statements. They may appear in stored procedures, triggers, and views. They may also appear in SELECT statements embedded in INSERT statements. Directives will not be valid for distributed queries and for UPDATE/DELETE statements WHERE CURRENT of cursor statements. Directives will be processed by default. A configuration parameter, DIRECTIVES, can be set 0 (OFF) to disable the processing of directives. The environment variable IFX_DIRECTIVES can also be set to ON or OFF to control whether directives are processed. Optimizer directives should be used with caution. Over time, your strategy may become invalid. The access plan that you selected may not be the optimal plan after extensive updates have been made to the tables in the query.

The IBM Informix Cost-Based Optimizer 12-37

12-38 The IBM Informix Cost-Based Optimizer

Exercises

The IBM Informix Cost-Based Optimizer 12-39

Exercise 1
1.1 This exercise demonstrates that indexes may be accessed in ascending or descending sequence. Generate EXPLAIN output for the following SQL command:
SELECT customer_num, fname, lname FROM customer ORDER BY customer_num DESC;

Display the EXPLAIN output file. The following exercises illustrate the importance of accurate information to the query optimizer. 1.2 Generate EXPLAIN output for the query listed below. Save your query in a file so it can be executed again.
SELECT lname, city, order_num FROM customer, orders WHERE customer.customer_num = orders.customer_num;

1.3 Load additional rows into the orders table, using the file orders.unl.
LOAD FROM "orders.unl" INSERT INTO orders;

What information is available to the optimizer? Query the system catalog table systables to see the value of nrows (number of rows in the table).
SELECT * FROM systables WHERE tabname = "orders";

Execute the query in #2 again and examine the EXPLAIN output; did the query plan change? 1.4 Execute the following SQL statement to update the statistics for the table, and query systables again.
UPDATE STATISTICS FOR TABLE orders;

What is the value of nrows now? The UPDATE STATISTICS command will be discussed in detail in the next chapter, Update Statistics and Data Distributions. 1.5 Run the query in #2 again and examine the EXPLAIN output. Which table is accessed first?

12-40 The IBM Informix Cost-Based Optimizer

Solutions

The IBM Informix Cost-Based Optimizer 12-41

Solution 1
1.1 SET EXPLAIN output
QUERY: ------select customer_num, fname, lname from customer order by customer_num desc Estimated cost: 4 Estimated # of Rows Returned: 28 1) informix.customer: INDEX PATH (1) Index Keys: customer_num

The index on customer_num can be used in descending as well as ascending order. No temporary files are required for the ORDER BY clause. 1.2 SET EXPLAIN OUTPUT
QUERY: -----select lname, city, order_num from customer, orders where customer.customer_num = orders.customer_num Estimated Cost: 11 Estimated # of Rows Returned: 23 1) stu101.orders: SEQUENTIAL SCAN 2) stu101.customer: INDEX PATH (1) Index Keys: customer_num Lower Index Filter: stu101.customer.customer_num = stu101.orders.customer_num NESTED LOOP JOIN

In the stores demonstration database the orders table has less rows than the customer table, so the query plan chose to access it first. The estimated number of rows returned by the query is 23. 1.3 Even though rows have been added to the orders table, the value in the column nrows of systables has not been updated. Results of the query against systables:
tabname owner partnum orders stu101 1048830

12-42 The IBM Informix Cost-Based Optimizer

tabid rowsize ncols nindexes nrows created version tabtype locklevel npused fextsize nextsize flags site dbname

101 80 10 2 23 03/08/1999 6684674 T P 1 16 16 0

The query plan did not change when the SQL statement was executed again. 1.4 After executing the UPDATE STATISTICS command the number of rows for the orders table has been updated:
tabname owner ... nrows ... orders stu101 1015

1.5 SET EXPLAIN output


QUERY: -----select lname, city, order_num from customer, orders where customer.customer_num = orders.customer_num Estimated Cost: 94 Estimated # of Rows Returned: 1015 1) stu101.customer: SEQUENTIAL SCAN 2) stu101.orders: INDEX PATH (1) Index Keys: customer_num Lower Index Filter: stu101.orders.customer_num = stu101.customer.customer_num NESTED LOOP JOIN

The SET EXPLAIN output indicates that the smaller customer table is accessed first now.

The IBM Informix Cost-Based Optimizer 12-43

12-44 The IBM Informix Cost-Based Optimizer

Module 13
Update Statistics and Data Distributions

Update Statistics and Data Distributions 09-2001 2001 International Business Machines Corporation

13-1

Objectives
At the end of this module, you will be able to: n Use UPDATE STATISTICS n Improve the accuracy of the statistics available to the optimizer n Use the dbschema utility to display distribution information n Determine when to run UPDATE STATISTICS in medium or high mode

13-2 Update Statistics and Data Distributions

Improving Query Performance


n n

The UPDATE STATISTICS command Data Distributions

The query optimizer is influenced by the information stored in the system catalog tables. It is very important that the statistics in those tables are up-to-date. All of the statistics stored in the tables are not updated automatically. The only way to do this is to run the UPDATE STATISTICS command. This causes the database server to read through the data tables and indexes, compile the statistics, and store that information in the appropriate system catalog tables. The UPDATE STATISTICS command also is used to create data distributions. These are additional system catalog entries that contain information about the distribution of values within a column. This information is used by the optimizer to make more informed decisions regarding the selectivity of filters, the access method for tables, and the best join techniques. Using distributions may significantly improve the execution time of your queries.

Update Statistics and Data Distributions 13-3

UPDATE STATISTICS
UPDATE STATISTICS [LOW|MEDIUM|HIGH]; UPDATE STATISTICS [LOW|MEDIUM|HIGH] FOR TABLE ; UPDATE STATISTICS [LOW|MEDIUM|HIGH] FOR TABLE tabname; UPDATE STATISTICS [LOW|MEDIUM|HIGH] FOR TABLE tabname (colname); UPDATE STATISTICS FOR PROCEDURE [procedure_name];

You have the following options for updating table statistics:


n

UPDATE STATISTICS (LOW|MEDIUM|HIGH)


w

Using LOW mode, no data distribution is created. LOW mode updates only the statistics in the systables , syscolumns, and sysindexes system catalog tables. If a mode is not specified, then the default LOW mode is used. HIGH and MEDIUM mode create data distributions. This is discussed in the following pages. This statement will update the statistics for all tables, and optimize all stored procedure SQL . This statement will update statistics for all tables. This statement will update the statistics for only the table specified. The SQL statements in stored procedures that reference this table will not be re-optimized. However, because the version number for the table will change, the SQL will be re-optimized the next time the stored procedure is run.

UPDATE STATISTICS
w

n n

UPDATE STATISTICS FOR TABLE


w w

UPDATE STATISTICS FOR TABLE table

13-4 Update Statistics and Data Distributions

n n

UPDATE STATISTICS FOR TABLE table (colname)


w w

This statement will update the statistics for only the column specified. This statement re-optimizes the SQL in all procedures, or in the procedure listed.

UPDATE STATISTICS FOR PROCEDURE [procedure_name]

Update Statistics and Data Distributions 13-5

Statistics Available
systables n nrows - number of rows in the table n npused - number of pages on disk used for table sysindexes n leaves - number of pages on the 0 level of the B+ tree n levels - number of b-tree levels n nunique - number of unique key values n clust - degree of clustering syscolumns n colmin - second minimum value of column n colmax - second maximum value of column
6

The following information is stored in the system catalog tables to aid the optimization process: The systables table - contains column values used to estimate the cost of a physical read of the table.
n n

nrows : Number of rows in the table npused : Number of data pages (on disk) occupied by a table

The sysindexes table - contains column values used to estimate the cost of doing an indexed read of a table.
n n n

leaves : Number of level 0 nodes in the B+ tree levels: Number of btree levels nunique: Number of unique keys. This column is also used to estimate the selectivity of equality filters. Note that nunique only applies to the first key of a composite index, so the nunique value for an index created on (a, b, c) would only reflect the number of unique values of the column a. clust: Degree of clustering - the extent to which the rows in the table are in the same order as the index. Smaller numbers correspond to greater clustering.

13-6 Update Statistics and Data Distributions

The syscolumns table - contains column values used to estimate the selectivity of < and > filters (assumes a uniform distribution of data).
n n

colmin : The second smallest ordinal value in the column colmax : The second largest ordinal value in the column

Update Statistics and Data Distributions 13-7

MEDIUM and HIGH Mode


n

Sampling rows in the table to create data distributions UPDATE STATISTICS MEDIUM; UPDATE STATISTICS MEDIUM FOR TABLE tabname(colname); Reading all rows in the table to create distributions UPDATE STATISTICS HIGH; UPDATE STATISTICS HIGH FOR TABLE tabname(colname);

The UPDATE STATISTICS statements MEDIUM and HIGH create data distributions for columns.
n n

MEDIUM mode randomly selects rows in the table to build a distribution. However, all rows are read to obtain the sample. HIGH mode reads and orders all rows to build a distribution.

For large tables, high mode will use more resources and take more time during UPDATE STATISTICS than the sampling method of medium mode. However, medium mode can be less accurate than high mode. You can run UPDATE STATISTICS for all columns in all tables, for all columns in a specific table, or for a specific column in a table. Distributions will not be created on TEXT or BYTE columns. You must have DBA permissions or be the owner of the affected tables to use MEDIUM or HIGH mode.

13-8 Update Statistics and Data Distributions

Distributions Only
The UPDATE STATISTICS statement allows you to produce data distributions only.
UPDATE STATISTICS MEDIUM DISTRIBUTIONS ONLY; UPDATE STATISTICS HIGH DISTRIBUTIONS ONLY;

Stored Procedures
If UPDATE STATISTICS is executed without the FOR TABLE clause, regardless of the LOW, MEDIUM, or HIGH designation, all stored procedures are re-optimized. If the statistics for a table are updated in HIGH mode, the version number of the table is changed. This causes re-optimization of a stored procedure the next time that it is executed.

Update Statistics and Data Distributions 13-9

How Distributions are Created

Table

4. Read rows from table. 5. Sort rows. 6. Divide rows into bins 147 86

32

90

123

20 20

20

Bin 1: 1-50

Bin 2: 51-90

Bin 3: 91-150

Overflow bin

10

To create distributions, IBM Informix follows these steps during execution of UPDATE STATISTICS: 1. The value of the column for each row is read from the table. The key value can be read using only the index if the following is true (otherwise the data pages are read):
w w

The column is the first key of an ascending index.

2. 3.

Only one column is specified in the UPDATE STATISTICS statement. If the statistics mode is MEDIUM, all the rows are read but only a random sample is sent to the sort routine. If the mode is HIGH, all rows are read and fed to the sort routine. Next, this set of rows is sorted. The sort routine sorts the values. A sort pool located in the virtual portion of shared memory holds the current rows being sorted. Temporary sort space on disk holds the intermediate runs for large tables or samples. Note that if an index was used to read the values, no sort is needed.
w

Once the values have been sorted, the database server scans each sorted value, retrieves the first value, last value, and every Nth value where N is (resolution / 100 * number of values). The first and last value are obtained from true data, not the sample. This information is used to divide the data into bins with each bin containing an equal number of values. If 10 bins are created, each bin will hold 1/ 10 of the rows in the set. If the database server finds a large number of duplicates of a particular value, that value goes in an overflow bin.

13-10 Update Statistics and Data Distributions

What Information is Kept


The following information is kept for each bin in the sysdistrib system catalog table: n The maximum value n The number of distinct values For the distribution, the following information is kept: n The number of rows represented by each bin n The minimum and maximum value for the column n The last bin size

11

Once the columns are divided into bins, the optimizer will analyze the data and keep the following information about the values in each bin:
n n

The maximum value in the bin. The minimum value for each bin is not kept but is derived from the maximum value for the previous bin. The number of distinct values in the bin.

For the column distribution, the following information is kept:


n n n

The number of rows represented by each bin (except for the last bin) The actual minimum and maximum value for the column in the table The last bin size. The last bin may represent less rows than the other bins in the distribution.

To prevent skewing the number of distinct values in the bins, statistics for any highly duplicate values are kept separately. A highly duplicate value is defined as a value that contains more instances (rows) than 25% of the number of rows in a bin. The data kept for these values include:
n n

The column value The number of rows containing the column value

This information is stored in a system catalog table called sysdistrib.


Update Statistics and Data Distributions 13-11

Distribution Output
dbschema -d bank -hd account

Distribution for branch_nbr

count

distinct

high value

frequency

value

12

The -hd option of dbschema displays the information kept for each bin, as well as the overflow values and their frequency. The -hd option requires a table name or all for all tables. Only the owner of the table, users that have SELECT permission on the column, or the DBA can list distributions for a column with dbschema . The sample dbschema output shown above has two sections, the distribution and the overflow section. The distribution section shows the values in each bin. For example, bin 1 represents 5638 instances of values between 0 and 16. Within this interval, there are 16 unique values. The overflow section shows each value that has a large number of duplicates. For example, value 80 has 6112 duplicates.

13-12 Update Statistics and Data Distributions

Resolution

Resolution: The percentage of the data that is put in each bin.

UPDATE STATISTICS HIGH FOR TABLE tabname RESOLUTION 10

100/resolution = # bins

13

The resolution can be used to specify how many bins the data will be divided into. The formula for calculating the number of bins is:
100/resolution = #bins

A resolution of 1 means that one percent of the data will go into each bin (100/1 = 100 bins). A resolution of 10 means that ten percent of the data will go into each bin (10 bins will be created). The resolution can be a number between .005 and 10. However, you cannot specify a number less than 1/(rows in the table). The lower the resolution value, the more bins will be created. The more bins you have, the more accurate the optimizer can be regarding the number of rows that satisfy the SELECT filter. However, if there are too many bins allocated, the optimization time may increase slightly because the system catalog pages that hold the distribution must be read (from memory if they are in the cache or from disk if they are not). Consider this example. We have the following statistics for a bin that is for column x:
count: 1000 distinct: 100 high value 10000

Update Statistics and Data Distributions 13-13

Suppose this is the first bin and you know the bin contains a count for the number of rows between 1 and 10,000. There are 1000 rows in this range but only 100 rows have unique values. The optimizer assumes that the duplicates are evenly spread among the distinct values. This means that each column value in this bin has 1000/100 = 10 rows. A SELECT statement with an equality filter, such as:
select * from tab where x = 250

will return 10 rows, according to the optimizers estimate. Now suppose you decrease the resolution value so that there are more bins and each bin represents less data. As an example, suppose the first bin contained the following statistics:
count = 300 distinct = 60 high value = 5000

Now the optimizer can estimate that the SELECT statement will return (300/60) = 5 rows.

Default Resolution
The default resolution for HIGH mode is .5. The default resolution for MEDIUM mode is 2.5. You specify the resolution for a column, table, or database using the UPDATE STATISTICS statement.

Are More Bins Better?


Even though the optimizer can estimate the number of rows to be returned more accurately, increasing the number of bins may not obtain a better or different path. This means that the SELECT statement may not run any faster with a better estimate. It depends entirely on the distribution and the SELECT statement that is retrieving the data.

13-14 Update Statistics and Data Distributions

Confidence

The resolution and confidence determine the sample size.

UPDATE STATISTICS MEDIUM FOR TABLE tabname RESOLUTION 1 .99


Confidence is a statistical measure of the reliability of the sample (if UPDATE STATISTICS MEDIUM is used).

15

The term confidence is a statistical measure of the reliability of the sample. It represents the estimate of the probability that you will stay within the resolution you choose. For example, with a confidence value of 99% (.99), your confidence should be high that the results (that is, the number of rows per bin) of a sample taken to create the distribution will be roughly equivalent to what you would get if all the rows were examined.

Default Confidence
The confidence is expressed as a value between .80 and .99. The default value is .95. Confidence is only used when sampling data for a medium distribution (UPDATE STATISTICS MEDIUM). The resolution and confidence are used to determine the sample size for medium distributions.

Sample Size
The size of the sample that is taken for UPDATE STATISTICS MEDIUM is dependent upon the resolution and confidence. By increasing the confidence or decreasing the resolution value, the sample size increases. The sample size does not depend upon the size (population) of the table.
Update Statistics and Data Distributions 13-15

Space Utilization
To limit the amount of disk space used for sorts during UPDATE STATISTICS: setenv DBUPSPACE space-in-kbytes
Maximum amount of disk space that can be used to sort values

16

UPDATE STATISTICS attempts to construct distributions for as many columns as possible simultaneously. This minimizes the number of scans needed for a table and makes UPDATE STATISTICS run more efficiently. However, with more distributions being created at once, the need for temporary disk space for sort runs increases. You may run out of disk space for these temporary files. You can set the environment variable DBUPSPACE before running UPDATE STATISTICS to constrain the amount of temporary disk space used for sorts. This also may reduce contention in 24 X 7 operations. The database server calculates how much disk space is needed for each sort and will start as many distributions at once as can fit in the space allocated. At least one distribution will be created at one time, even if DBUPSPACE is set too low to accommodate it. If DBUPSPACE is set to any value less than 1000 Kbytes, it will be ignored and the value of 5000 Kbytes will be used. In addition to limiting the amount of disk space used for sorts during UPDATE STATISTICS, the database server will limit the amount of memory used to 4 megabytes. However, at least one distribution will be created at one time, even if more than 4 megabytes is needed. A sort may occur for every column for which you are building a distribution.

13-16 Update Statistics and Data Distributions

Guidelines for Creating Distributions


n n n n n

Run UPDATE STATISTICS MEDIUM with the DISTRIBUTIONS ONLY option for all columns that do not have an index Run UPDATE STATISTICS HIGH for columns that are the first key in an index. Run UPDATE STATISTICS HIGH for the first column that differs when indexes begin with the same subset of columns Run UPDATE STATISTICS LOW for all columns in multicolumn indexes. For small table, run UPDATE STATISTICS HIGH.

17

For all columns that do not have an index, run UPDATE STATISTICS MEDIUM with the DISTRIBUTIONS ONLY option. The index information that is not created by using the DISTRIBUTIONS ONLY option is created in the following steps. It may be easier (and not that much more expensive) to run UPDATE STATISTICS MEDIUM DISTRIBUTIONS ONLY so that distributions are created on all columns in the database. This command scans the entire table and then samples 2960 rows. This should be run prior to executing any UPDATE STATISTICS HIGH statements to avoid overwriting the distributions created by HIGH. This should be executed as a single statement for each table. For example, if columns b and c in table t1 do not head an index:

UPDATE STATISTICS MEDIUM FOR TABLE t1(b,c) DISTRIBUTIONS ONLY n

Next, run UPDATE STATISTICS HIGH for each column that heads an index (that is, all non-composite indexes and the first column in composite indexes). This UPDATE STATISTICS statement should execute quickly because the index (as long as it is ascending) will be used to read the data, no sort will be executed. Execute a single statement for each column. If there are indexes that begin with the same subset of columns, run UPDATE STATISTICS HIGH for the first column in each index that differs.

Update Statistics and Data Distributions 13-17

Finally, run UPDATE STATISTICS LOW for all columns in composite indexes. This updates the sysindexes system catalog table with regard to these indexes. For single column indexes UPDATE STATISTICS LOW is implicitly executed when you execute UPDATE STATISTICS HIGH.

In most cases, this strategy should yield a good enough sample size for the optimizer to pick the correct path for most queries. If there is a problem query (one which you perceive to be running slower than it should), you should take the following steps:
n n n n

First, run the query with SET EXPLAIN ON to record the query plan. Then run UPDATE STATISTICS HIGH for the columns listed in the WHERE clause. Run the query with SET EXPLAIN ON. Was the estimated cost less? Compare the query plans. If UDPATE STATISTICS HIGH produced a different query plan and the estimated cost is less, the optimizer made a better choice and the SELECT statement benefited from having more data available to the optimizer. In addition to comparing before and after query plans, you may wish to compare the corresponding runtime statistics. To do so, set the environment variable SQLSTATS to 2, run the query, and then use your session id to query the syssqlcurses table in the sysmaster database.

13-18 Update Statistics and Data Distributions

Guidelines (cont.)
If UPDATE STATISTICS HIGH produced improved query performance: n Run UPDATE STATISTICS MEDIUM with CONFIDENCE of .99 and an increased RESOLUTION. n Re-run the query with SET EXPLAIN ON. n Check the query plan to see if it produced the same results as with UPDATE STATISTICS HIGH.

19

You may have received better results with UPDATE STATISTICS HIGH. However, it might not be feasible for you to take the extra time each day to run HIGH mode on these columns. Instead, you can move back to UPDATE STATISTICS MEDIUM for the columns involved in the query, but this time set the confidence to .99 and adjust the resolution value slightly lower so that the sample size is higher. Then re-run the query and check the query plan to see if it returned the same results as HIGH mode. You can repeat this adjustment-and-test process until the query plan matches the query plan of HIGH mode.

Update Statistics and Data Distributions 13-19

The DROP DISTRIBUTIONS Clause


To drop distributions while updating other statistics: UPDATE STATISTICS LOW DROP DISTRIBUTIONS;
UPDATE STATISTICS LOW FOR TABLE orders(order_num) DROP DISTRIBUTIONS;

20

The DROP DISTRIBUTIONS clause can be used in an UPDATE STATISTICS LOW statement to drop the existing distributions, while updating other statistics such as the number of levels of the btree, the number of pages used by the index, and so on. When you run UPDATE STATISTICS LOW without the DROP DISTRIBUTIONS clause, only the statistics in systables, sysindexes and syscolumns are updated. The distributions are not dropped or altered in any way. When you run UPDATE STATISTICS LOW on a table or specific column with the DROP DISTRIBUTIONS clause, the statistics in systables, sysindexes, and syscolumns for that table or specific column will be updated, and any distributions for the table or specific column listed will be dropped. Only a DBA-privileged user or the owner of a table can remove distribution information.

13-20 Update Statistics and Data Distributions

The sysdistrib System Catalog Table

Column Name tabid colno seqno constructed mode resolution confidence encdat

Type integer smallint integer date char(1) float float

Description Table id found in systables Column nbr. found in syscolumns Sequence nbr. for multiple entries Date the distribution was created L=Low, M=Medium, H=High Resolution used to create distribution Confidence used to create distribution

char(256) Encoded histogram information

21

The sysdistrib system catalog table is used to store distributions for the database. The columns for the sysdistrib table are shown above.

Update Statistics and Data Distributions 13-21

When Table Changes Affect Distribution


n n

If the column data type or size is altered, the distribution will be dropped. If the column is dropped, the distribution will also be dropped.

22

If the ALTER TABLE statement alters the column data type or size, the distribution will be dropped and must be re-created with UPDATE STATISTICS. If the ALTER TABLE drops the column, the distribution will also be dropped. The dbexport and dbschema utilities automatically generate the UPDATE STATISTICS statements to re-create the currently existing distributions.

13-22 Update Statistics and Data Distributions

Exercises

Update Statistics and Data Distributions 13-23

Exercise 1
1.1 Create distributions for order_num in the orders table, stores demonstration database. This is the table which you loaded with extra rows in a previous chapter. Use a resolution of 5 and the default confidence. 1.2 Display the distributions information that you have created using the dbschema command.

13-24 Update Statistics and Data Distributions

Solutions

Update Statistics and Data Distributions 13-25

Solution 1
1.1 Create distributions for order_num in the orders table.
UPDATE STATISTICS HIGH FOR TABLE orders(order_num) RESOLUTION 5;

1.2 Display the distributions that you created.


dbschema -d dbname -hd orders

The output of the dbschema command:


Distribution for stu101.orders.order_num Constructed on 03/16/1999 High Mode, 5.000000 Resolution 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: } ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 15, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 15, 2032) 2086) 2136) 2186) 2236) 2286) 2336) 2386) 2436) 2486) 2536) 2586) 2636) 2686) 2736) 2786) 2836) 2886) 2936) 2986) 3001)

13-26 Update Statistics and Data Distributions

Module 14
Data Security

Data Security 09-2001 2001 International Business Machines Corporation

14-1

Objectives
At the end of this module, you will be able to: n Use GRANT and REVOKE commands to control user access n Describe how system catalog tables store database access privileges

14-2 Data Security

Levels of Data Security

Database Table Column

Data security is concerned with issues of protecting data from unauthorized users. A secure database allows users to only access or modify data for which they are authorized. There are several levels of privileges in a database:
n n n

Database level privileges Table level privileges Column level privileges

Data Security 14-3

Database Level Privileges

CONNECT RESOURCE Access Database Tables Create and drop tables and indexes; alter tables Grant and revoke privileges Drop the database X X X

DBA X X X X

In order to access a database, a user must have the CONNECT privilege or a higher privilege. The CONNECT Privilege The CONNECT privilege allows a user to specify the database in a DATABASE statement, but the user cannot create or drop tables and indexes. Although a user with the CONNECT privilege may not create permanent tables, the user may create views and temporary tables. The RESOURCE Privilege The RESOURCE privilege gives users the CONNECT privilege as well as the ability to create and drop tables and indexes in the database. The DBA Privilege A user with DBA privilege has all the RESOURCE privileges as well as the ability to grant and revoke CONNECT, RESOURCE, and DBA privileges. The only restriction placed on users with DBA status is the inability to revoke the DBA privilege from themselves. However, a user with DBA status can grant the privilege to another user, who can then revoke it from the grantor.

14-4 Data Security

Granting Database Level Privileges


GRANT CONNECT TO PUBLIC; GRANT RESOURCE TO maria, joe; GRANT DBA TO janet;

You can use the GRANT statement to grant database access privileges to users. The components of the GRANT statement are: privilege PUBLIC user-list Is one of the database-level access types: CONNECT, RESOURCE or DBA. Is the keyword that you use to specify access privileges for all users. Is a list of login names for the users to whom you are granting access privileges. You can enter one or more names, separated by commas.

Examples
In the first example shown above, the CONNECT privilege is granted to all users (PUBLIC). In the second example, the RESOURCE privilege is granted only to the user maria and the user joe. In the third example, janet is given DBA privilege.

Data Security 14-5

Table/Column Level Privileges


ALTER DELETE INDEX SELECT Add, delete or modify columns. Remove rows from a table. Create indexes for a table. Retrieve information from the columns in a table. UPDATE Modify information in the columns of a table. INSERT Insert rows into a table. REFERENCES Reference columns in referential constraints. ALL Perform any or all of the preceding operations.
6

You can specify the operations a user can perform on a table or columns within a table that you have created. The privileges that you may grant and revoke are shown above. None of these privileges take affect until the user has at least CONNECT privileges on the database level.

14-6 Data Security

Granting Table Level Privileges


GRANT ALL ON customer TO PUBLIC; GRANT UPDATE ON orders TO liz WITH GRANT OPTION; GRANT INSERT, DELETE ON items TO mike AS maria;

You can use the GRANT statement to specify the operations that a user can perform on a table that you have created. The components of a table level GRANT are: privilege table or view PUBLIC user list Is one or more of the table access types: ALTER, DELETE, INDEX, INSERT, SELECT, UPDATE, REFERENCES, ALL. Is the name of the table or view for which you are granting access privileges. Is the keyword that you use to specify access privileges for all users. Is a list of login names for the users to whom you are granting access privileges. You can enter one or more names, separated by commas.

WITH GRANT OPTION Allows the user or users listed in the GRANT statement the ability to grant the same privileges to other users. AS [user]Makes the grantor of the permission another user. Adding this option relinquishes your ability to later revoke the granted privilege. In the first example shown above, all privileges are granted to all users (PUBLIC) on the customer table. In the second example, liz is given update permissions on the order table with the ability to give that permission to other users. In the third example the grantor becomes maria to grant INSERT and DELETE privileges to the user mike.

Data Security 14-7

Granting Column Level Privileges


n n

Only SELECT, UPDATE, and REFERENCES privileges may be granted to individual columns. Column level privileges are granted in the same way that table level privileges are granted, except that a column list must follow the privilege in the GRANT statement.

Examples: GRANT SELECT (company, fname, lname) ON customer TO PUBLIC;


GRANT INSERT, UPDATE (quantity), SELECT ON items TO maria;
8

When granting privileges for a table, you may specify the SELECT, UPDATE, and REFERENCES privileges to apply to only certain columns in the table. In the first example shown above, the SELECT privilege is granted to all users for columns company, fname, and lname of the customer table. In the second example, the UPDATE privilege is granted only on the quantity column, but the INSERT and SELECT privileges are granted on all columns of the table.

14-8 Data Security

Default Privileges
Database Level n When you create a database, you automatically have DBA privilege. Table Level n Non-ANSI databases: All table-level privileges except ALTER and REFERENCES are granted to all users. n ANSI databases: No default privileges are granted.

Default database level privileges When you create a database, you are automatically the DBA of that database and are the only one who has access to the database. If you want to allow other users to access the database, you must grant them CONNECT, RESOURCE or DBA privileges.

Default table level privileges In a database that is not ANSI-compliant, the default is to grant all table-level privileges (except ALTER) to all users (PUBLIC). In an ANSI-compliant database, no default table-level privileges are granted. You must explicitly grant these privileges.

Data Security 14-9

Stored Procedure Privileges


Granting permission REVOKE DELETE ON orders FROM PUBLIC; GRANT EXECUTE ON delete_proc TO PUBLIC; n Privileges of user DBA Stored Procedure Non-DBA Stored Procedure
n

10

A stored procedure consists of code (SQL and Stored Procedure Language statements) that is stored in the database. Permissions to a procedure are granted and revoked the same as permissions to a table. You grant and revoke EXECUTE permission for a specific stored procedure. The above example disallows any user (except the owner of the table) from deleting any rows from the orders table. The next GRANT statement allows users to execute the delete_proc stored procedure, which may contain specialized code to delete orders.

Privileges of user
n

DBA Stored Procedure A user running a DBA procedure (created with the DBA keyword) will have DBA permissions for the duration of the procedure. A DBA procedure can only be created by someone who has DBA permissions on the current database. A DBA procedure keeps DBA permissions even if the owners DBA permissions are revoked after the procedure has been created. All tables in the current database are accessible.

14-10 Data Security

Non-DBA Stored Procedure For all objects in the procedure that are owned by the owner of the procedure, the user inherits all current permissions of the owner.

For all objects in the procedure that are not owned by the owner of the procedure, the user inherits only those permissions that the owner has been granted using the WITH GRANT option. If the procedure owner has DBA permissions, anyone running the procedure will inherit DBA permissions for the duration of the procedure.

Data Security 14-11

Revoking Database Level Privileges


Examples:
REVOKE CONNECT FROM mike; REVOKE RESOURCE FROM maria;

12

You can use the REVOKE statement to revoke database access privileges from users. The components of a REVOKE statement are: privilege PUBLIC user list Is one of the database-level access types: CONNECT, RESOURCE, or DBA. Is the keyword that you use to specify access privileges for all users. Is a list of login names for the users whose access privilege you are revoking. You can enter one or more names, separated by commas.

If you revoke the DBA or RESOURCE privilege from one or more users, they are left with the CONNECT privilege. To revoke all database privileges from users with DBA or RESOURCE status, you must revoke CONNECT as well as DBA or RESOURCE. In the first example shown above, the CONNECT privilege is revoked from mike. In the second example, the RESOURCE privilege is revoked from the user maria. Maria now has the CONNECT privilege.

14-12 Data Security

Revoking Table Level Privileges


Examples:
REVOKE ALL ON orders FROM PUBLIC; REVOKE DELETE, UPDATE ON customer FROM mike, maria;

13

You can use the REVOKE statement to prevent specific operations that a user can perform on a table that you have created. The components of a REVOKE statement are: privilege table or view PUBLIC user list Is one or more of the table access types: ALTER, REFERENCES, DELETE, INDEX, INSERT, SELECT, UPDATE, ALL. Is the name of the table or view for which you are revoking access privileges. Is the keyword that you use to specify access privileges for all users. Is a list of login names for the users to whom you are revoking access privileges. You can enter one or more name, separated by commas.

Although you can grant UPDATE and SELECT privileges for specific columns, you cannot revoke these privileges column by column. If you revoke UPDATE or SELECT privileges from a user, all UPDATE and SELECT privileges that you have granted to that user are revoked. In the first example shown above, all privileges are revoked from all users (PUBLIC) on the orders table. In the second example, the DELETE and UPDATE privileges are revoked from the users mike and maria.

Data Security 14-13

Role-Based Authorization
Role - a group of users n You must have DBA privileges to create a role. n Table and column privileges can be assigned.

14

A role is a group of users that can be granted security privileges. Roles make the job of administering security easier. Once a user is assigned to a role, the system administrator need only GRANT and REVOKE table and column privileges to a role. For example, within an application users can be assigned a role. The role can allow them to execute SQL statements for which they do not have permission outside of that role. Table and column level privileges can be assigned to roles. However, database level privileges cannot be assigned to roles. Roles can be nested within other roles. For example, a sales role may have some users who are also part of a salesadmin role. Roles are available beginning with IBM Informix Dynamic Server 7.10.UD1.

14-14 Data Security

Roles and Permissions


CREATE ROLE slsadmin; GRANT slsadmin TO andy,liz,sam; REVOKE ALL ON orders FROM public; GRANT SELECT ON orders TO public; GRANT INSERT, UPDATE, DELETE ON orders TO slsadmin;

15

T he CREATE ROLE statement creates a role. The statement effectively puts an entry in the sysusers table, where the user type is G. The role name must be less than or equal to eight characters and cannot be the same name as any user that is granted privileges, or that connects to IDS as a session. In order to enforce this rule, the following checks are in place:
n n

The CREATE ROLE statement checks NIS to make sure the role name is not present in the password file. A user will not be able to connect if the user name is created as a role name.

The CREATE ROLE statement can only be executed by a user that has DBA permissions on the database. A ROLE is a database object, meaning that it is only applicable for the database in which it was created. Once the role is created, the next step is to assign users to roles. The GRANT statement assigns one or more users to the role specified. A successful GRANT statement puts an entry in the sysroleauth system catalog table. Next, privileges are assigned to the role.

Data Security 14-15

Using Roles
n

Before permissions for a role can be used, the session must execute the SET ROLE statement: SET ROLE slsadmin; This statement, run by user liz, will execute: INSERT INTO ORDERS(order_num, customer_num) VALUES (0,104);

16

The user must execute SET ROLE before she receives the permissions granted to a particular role. In the SET ROLE example shown above, the slsadmin permissions are activated for the current session. The user gains the privileges of the role and retains her own privileges as well as PUBLICs. The roles permissions are active until the session is discontinued, or the current database is changed. To disable the role execute:
SET ROLE {NONE|NULL}

If a user attempts to SET ROLE, and the role has not been granted to the user, the following message is generated:
19805: No privilege to set to the role.

14-16 Data Security

Discussion
n

Assume the following statements are executed by the DBA. CREATE ROLE mkting; CREATE ROLE sales; GRANT mkting TO jim,mary,ram; GRANT sales TO mkting; REVOKE ALL ON orders FROM PUBLIC; GRANT select ON orders TO sales; The following statements are run by user mary. Which statements will fail? Why? SELECT * FROM orders; SET ROLE mkting; SELECT * FROM orders;
17

The first SELECT statement will fail because the user mary does not have SELECT permission on the orders table. After setting her role as mkting, the SELECT will succeed. The role mktg was granted the role sales, and the role sales has SELECT privilege on the orders table.

Data Security 14-17

GRANT and REVOKE FRAGMENT


Examples: REVOKE ALL ON orders FROM PUBLIC; GRANT SELECT ON orders TO PUBLIC;
REVOKE FRAGMENT ALL ON orders FROM user1; GRANT FRAGMENT INSERT,UPDATE,DELETE ON orders(dbspace1) TO user1;

18

Two examples of the GRANT FRAGMENT and REVOKE FRAGMENT statement are shown above. These examples show how you can grant read-only privileges to all fragments but the one in dbspace1. If user1 tries to INSERT a row into any fragment but the one in dbspace1, the following error will occur:
977: No permission on fragment (dbspace1). 271: Could not insert new row into the table

The fragment level privileges which may be granted are INSERT, UPDATE, DELETE and ALL. They may be granted whether or not a user has table level privileges. Table level privileges take precedence over fragment level privileges. For example, if a user has table level insert capability, fragment level insert privileges are not checked. REVOKE FRAGMENT and GRANT FRAGMENT are only valid when executed on tables fragmented by expression. These commands are available in IBM Informix Dynamic Server versions 7.10.UD1 and above.

14-18 Data Security

Discussion
The orders table is fragmented so that orders for customer numbers 1-10,000 are in dbspace1, and orders for customer numbers 10,001 20,000 are in dbspace2. Given the GRANT and REVOKE FRAGMENT statements on the previous page, which of these statements (if executed by user1) would fail?
INSERT INTO orders(cust_nbr) VALUES 100; SELECT * FROM orders; UPDATE orders SET cust_nbr = 12200 WHERE cust_nbr = 220;

19

The INSERT statement shown above would succeed because user1 has INSERT permissions into the fragment in dbspace1. The SELECT statement shown above would succeed because user1 has SELECT permissions on the table (fragment permissions are only for INSERT, UPDATE, and DELETE statements). The UPDATE statement shown above would fail because user1 does not have UPDATE permissions for the fragment in dbspace2. The user requires UPDATE permissions for the fragment from where the row is moving and the fragment to where the row is moving.

Data Security 14-19

System Catalog Tables


n n n n n n

sysusers - database level privileges granted to users systabauth - table-level privileges granted syscolauth - column-level privileges granted sysfragauth - privileges granted on table fragments sysprocauth - privileges granted on stored procedures sysroleauth - roles that are granted to users

20

The system catalog tables listed in the slide above contain information about the privileges that have been granted to users of the database. A complete description of the tables is in the System Catalog appendix to this manual.

14-20 Data Security

Exercises

Data Security 14-21

Exercise 1
Complete this exercise by executing the appropriate GRANT statements. Begin with disallowing all table level privileges for all users on the items table by running the following statement:
REVOKE ALL ON items FROM PUBLIC;

4. 5. 6.

Using the GRANT statement, give another member of the class the ability to create and drop tables, but not the ability to drop the database. Using the GRANT statement, give another member of the class the ability to select and insert rows in the items table. Using the GRANT statement, give another member of the class the ability to update only the manu_code column in the items table.

14-22 Data Security

Exercise 2
(7.10.UD1 or Greater) The purchasing manager at West Coast Wholesalers has just requested that his staff have sole responsibility for adding new stock items and for updating the unit price of existing items in the stores database. He realizes that the rest of the company still needs to read the stock table. Use the following steps to create an environment which will satisfy the purchasing managers request. 2.1 Revoke all privileges except SELECT from public. 2.2 Create a role for the purchasing department and grant it the appropriate privileges. 2.3 Grant the purchasing role to a student id and run a test by adding a row to the stock table. Try to insert a row using a student id that has not been granted the purchasing privileges.

Data Security 14-23

14-24 Data Security

Solutions

Data Security 14-25

Solution 1
Complete this exercise by executing the appropriate GRANT statements. Begin with disallowing all table level privileges for all users on the items table by running the following statement:
REVOKE ALL ON items FROM PUBLIC;

1.1 Using the GRANT statement, give another member of the class the ability to create and drop tables, but not the ability to drop the database.
GRANT RESOURCE TO stu101;

1.2 Using the GRANT statement, give another member of the class the ability to select and insert rows in the items table.
GRANT SELECT,INSERT ON items TO stu101;

1.3 Using the GRANT statement, give another member of the class the ability to update only the manu_code column in the items table.
GRANT UPDATE(manu_code) ON items TO stu101;

14-26 Data Security

Solution 2
2.1 Revoke all privileges except SELECT from public.
REVOKE ALL ON stock FROM public; GRANT SELECT ON stock TO public;

2.2 Create a role for the purchasing department and grant it the appropriate privileges.
CREATE ROLE purchase; GRANT INSERT,UPDATE(unit_price) ON stock TO purchase;

2.3 Grant the purchasing role to a student id and run a test by adding a row to the stock table. Try to insert a row using a student id that has not been granted the purchasing privileges.
GRANT purchase TO studentxx;

Studentxx attempts to insert a row into the stock table:


INSERT INTO stock (stock_num,manu_code) values (1,"ANZ"); 275: No INSERT permission SET ROLE purchase; INSERT INTO stock (stock_num,manu_code) values (1,"ANZ");

Data Security 14-27

14-28 Data Security

Module 15
Views

Views 09-2001 2001 International Business Machines Corporation

15-1

Objectives
At the end of this module, you will be able to: n Ensure data security and integrity n Present derived and aggregate data n Hide joins from users

15-2 Views

What is a View?

A Virtual Table

A view is often called a virtual table . As far as the user is concerned, it acts like an ordinary table. But in fact, a view has no existence in its own right. Rather, it is derived from columns in real tables. A view can also be called a dynamic window on your database. For example, it can store the results of a computation like sum ( total_price ). Yet as individual prices change, the value of the stored sum will always be up to date.

Views 15-3

Creating a View
CREATE VIEW ordsummary AS SELECT order_num, customer_num, ship_date FROM orders; CREATE VIEW they_owe (ordno, orddate, cnum) AS SELECT order_num, order_date, customer_num FROM orders WHERE paid_date IS NULL;

The CREATE VIEW statement consists of a CREATE VIEW clause and a SELECT statement. You can additionally give names to the columns in a view by listing them in parentheses after the view name. If you do not assign names, the view will use the names of the columns in the underlying table. Follow normal rules for writing the SELECT statement, EXCEPT the following syntax is prohibited:
n n n

FIRST ORDER BY INTO TEMP

Restricting Access to Columns


Views can restrict access to certain columns within a table or tables. This may be useful for two reasons:
n

Information in some columns may be sensitive and should be restricted from general access. For example, a salary column in an employee table should not be accessible to all users.

15-4 Views

Some columns may contain irrelevant data for some users. By leaving those columns out of a view, the database looks simpler and uncluttered.

In the first example, the view ordsummary will have three columns. They will be given the same names as the columns in the orders table.

Restricting Access to Rows


Views can also restrict access to certain rows within a table or tables. There are two reasons this may be valuable:
n n

Some rows may contain sensitive data or data which should be restricted to certain users. Some rows may be unimportant to certain users. For example, the accounts receivables department may only be interested in orders that have not been paid.

In the second example, the view they_owe will only show certain rows of the orders table where paid_date is null. The view they_owe will also have three columns. However, the view's column names will differ from the column names in the orders table. They will be called ordno, orddate, and cnum instead of order_num, order_date, and customer_num.

A View Cannot Be Altered


You cannot ALTER a view. To make changes to a view, you must first remove the view using DROP VIEW, then recreate it with CREATE VIEW. When you drop a view, no data is actually deleted. The underlying table(s) remain intact. Example:
DROP VIEW ordsummary;

Views 15-5

Creating Views: Examples


n

A Virtual Column
CREATE VIEW ship_cost (ordno, cnum, s_wt, s_chg, chg_per_lb) AS SELECT ord_num, customer_num, ship_weight, ship_charge, ship_charge / ship_weight FROM orders;

An Aggregate Function
CREATE VIEW manu_total (m_code, total_sold) AS SELECT manu_code, SUM(total_price ) FROM items GROUP BY manu_code;

A view can be created with a SELECT statement that includes an expression. The result of the expression can be called a virtual column. In the example above, the view includes a column which is the computed ship-charge per-pound. The ship charge per pound can be computed using the formula chg_per_lb = ship_charge / ship_weight . When a user queries on the view, any virtual columns in the view will look just like real columns. The result of the computation:
ship_charge / ship_weight

is displayed in the virtual column chg_per_lb. Aggregate functions (e.g., SUM, MIN, MAX, AVG, COUNT) can also be included in a SELECT statement for a view. The example above shows a view which selects the sum of the total price for each group of items with a different manu_code. The aggregate function will be placed in a virtual column called total_sold.

15-6 Views

A View that Joins Two Tables


Example: CREATE VIEW stock_info AS SELECT stock.*, manu_name FROM stock, manufact WHERE stock.manu_code =

manufact.manu_code;

You can use a view to hide joins from a user. This makes a complicated join invisible to a user. In the example above, the result of a SELECT on the view would be a combination of data from the stock and manufact tables. The view creates a useful illusion that the data is located in one place, called stock_info. To the user, stock_info will look like a single table. In reality, stock_info is not a single table, but a view based on two underlying tables: stock and manufact .

Views 15-7

A View on Another View

Example:
CREATE VIEW manu_total (m_code,total_sold) AS SELECT manu_code, SUM(total_price) FROM items GROUP BY manu_code; CREATE VIEW manu_new AS SELECT manu_name, total_sold FROM manufact, manu_total WHERE manufact.manu_code = manu_total.m_code;

A view can be based wholly or partially on another view. The example above first creates a view called manu_total which selects the total price for each manu_code group. The view manu_new takes the data selected from the manu_total view and joins it with the manu_name column in the manufact table. The view manu_new will display two pieces of data, the column manu_name and the virtual column total_sold.

15-8 Views

Restrictions on Views
n n n n

You cannot create indexes on a view. A view depends on its underlying table(s). Some views restrict inserts, updates, and deletes. You must have full SELECT privileges on all columns in the view in order to create it.

There are several restrictions imposed on views:


n n

You cannot create indexes on a view. However, when querying, you do receive the benefit of existing indexes on columns in the underlying table(s). A view depends on its underlying tables (and views). If you drop a table, all views derived from that table are automatically dropped. If you drop a view, any views derived from that view are automatically dropped. Some views restrict inserts, updates and deletes. These restrictions are described on the next page. You must have full SELECT privileges on all columns in the view in order to create it.

n n

Views 15-9

Views: INSERT, UPDATE, DELETE


n

n n n

You cannot INSERT, UPDATE or DELETE from a view if it has: w A join w An aggregate You cannot UPDATE a view with a virtual column. You cannot INSERT into a view with a virtual column. You can DELETE from a view with a virtual column.

10

Some restrictions for inserting, updating and deleting rows of views are shown above.

15-10 Views

The WITH CHECK OPTION Clause


Compare: CREATE VIEW no_check AS SELECT * FROM stock WHERE manu_code = "HRO";
CREATE VIEW yes_check AS SELECT * FROM stock WHERE manu_code = "HRO" WITH CHECK OPTION;

11

The views we have created thus far will let you insert rows into the database even if those rows are outside the scope of the view . For example, the view no_check will let you INSERT rows whose manu_code has values other than HRO . Ironically, every such row you INSERT immediately becomes inaccessible through the view. We can fix the situation by using the WITH CHECK OPTION clause at the end of our CREATE VIEW statement. The view yes_check will allow the user to insert only data that satisfies the view's own selection criteria. A view with the CHECK OPTION gives the database administrator the ability to add an extra level of security. The database administrator can require use of a view to update, delete or insert into a table. That view may enforce special restrictions against certain columns in a table, as in the example above.

Views 15-11

More on WITH CHECK


Which of the following will succeed -- and why? INSERT INTO no_check VALUES (1, "ANZ", "soccer ball", 30, "each", "each");
INSERT INTO yes_check VALUES (1, "ANZ", "soccer ball", 30, "each", "each");

12

Here is an example of what could happen when the WITH CHECK OPTION clause is not used:
n n n

A user inserts a row through the view no_check. A moment later, the user runs the following:
select * from no_check;

The newly added row doesn't show up in the output.

How can we determine whether the soccer ball was successfully entered into the database? If the user had been using yes_check instead of no_check, then his INSERT would have been rejected with an error message, (e.g., Data value out of range).

15-12 Views

Views and Access Privileges


REVOKE ALL ON stock FROM PUBLIC; REVOKE ALL ON stock_info FROM PUBLIC; GRANT SELECT ON stock_info TO dennis, karen, mari;

13

You can GRANT and REVOKE table level privileges on views as if they were tables. However, INSERT, UPDATE, and DELETE privileges cannot be granted if such privileges would violate the rules discussed under Restrictions on Views. Also, the ALTER privilege is not available for views. You may revoke privileges on a table, and then grant privileges on a view that accesses that table, forcing users to use a view to access the table. In the example above, after the statements are executed, no user can access the stock table unless the stock_info view is used.

Views 15-13

System Catalog Tables for Views


n

sysviews w Stores the CREATE VIEW statement. sysdepend w Pairs each view with its underlying table(s) and/or view(s).

14

Two system catalog tables contain information about views: sysviews and sysdepend. The sysviews table stores the CREATE VIEW statement. The sysdepend table stores the tables or views that are involved in the view.

15-14 Views

Exercises

Views 15-15

Exercise 1
Create a view that will match up each customer with the orders he or she has placed. Include only the orders that have not been shipped yet ( ship_date is null.) Display the following information:
n n n n n

Customer number Company name Order number Order date Date paid

15-16 Views

Exercise 2
Create a view that will give the total value of each order. Display the following:
n n

Order number Total value of the order

Views 15-17

Exercise 3
Create a view to only allow users to insert an order without any shipping information, and the PO number begins with the letter B (ship_date, ship_weight , ship_charge and ship_instruct should not be included in the INSERT statement). Use the WITH CHECK OPTION. Test the view with an INSERT statement.

15-18 Views

Solutions

Views 15-19

Solution 1
Note: the solutions are only one of several correct answers.
Create a view that will match up each customer with the orders he or she has placed. Include only the orders that have not been shipped yet ( ship_date is null.) Display the following information:
n n n n

Customer number Company name Order number Order date

n Date paid CREATE VIEW view_ord (cnum, company, ordno, orddate, paiddate) AS SELECT customer.customer_num, company, order_num, order_date, paid_date FROM customer, orders WHERE customer.customer_num = orders.customer_num AND ship_date IS NULL;

15-20 Views

Solution 2
Create a view that will give the total value of each order. Display the following:
n

Order number

n Total value of the order CREATE VIEW sum_view (ordno, sumprice) AS SELECT order_num, sum(total_price) FROM items GROUP BY order_num;

Views 15-21

Solution 3
Create a view to only allow users to insert an order without any shipping information, and the PO number begins with the letter B (ship_date, ship_weight , ship_charge and ship_instruct should not be included in the INSERT statement). Use the WITH CHECK OPTION. Test the view with an INSERT statement.
CREATE VIEW ins_view AS SELECT order_num, order_date, customer_num, backlog, po_num,paid_date FROM orders WHERE po_num MATCHES "B*" WITH CHECK OPTION;

15-22 Views

Module 16
IBM Informix Dynamic Server Data Movement Utilities

IBM Informix Dynamic Server Data Movement Utilities 09-2001 2001 International Business Machines Corporation

16-1

Objectives
At the end of this module, you will be able to use the following utilities: n dbimport n dbexport n dbload n onload n onunload

16-2 IBM Informix Dynamic Server Data Movement Utilities

Loading and Unloading Data


n

UNLOAD w SQL UNLOAD command w dbexport w onunload w High Performance Loader LOAD w SQL LOAD command w dbimport w onload w dbload w High Performance Loader
3

In addition to the SQL commands LOAD and UNLOAD, the above utilities are used to either unload data from a database, or to load data into a database. The High Performance Loader will be examined in later chapters.

IBM Informix Dynamic Server Data Movement Utilities 16-3

Dbexport/Dbimport Highlights
n

Dbexport w Unloads data for an entire database into ascii files w Schema file of SQL commands is created w Unloads to disk or tape w dbexport.out message file is created Dbimport w Database is created w Imports ascii data into the database w Data and schema can be loaded from disk or tape w User is granted DBA permission on the database

Dbexport unloads data from an entire database into ascii files. A schema file of SQL commands, named databasename .sql , is also created. The ascii files and schema can be retained on disk or on tape. You have the option of unloading the database data to tape and the schema to disk. Error messages and warnings are written to a file named dbexport.out . Dbimport will create the database for you, using the schema file generated by dbexport . The ascii files generated by dbexport will be used to load the data into the specified database. The ascii files and schema can be loaded from disk or tape. A message file, dbimport.out, is created. It contains error messages and warnings related to running the program. The user who runs dbimport is granted DBA permission on the database. The original owner of the database will also have DBA privileges, and will be the owner of all the tables, indexes, and views.

16-4 IBM Informix Dynamic Server Data Movement Utilities

Directory/File Structure Created

stores7.exp

custome100.unl

items_102.unl

manufact104.unl

orders_101.unl

stores7.sql

stock_103.unl

sysmenu105.unl

sysmenu106.unl

When the output of dbexport is placed on disk, it is structured in a directory named after the database with the command file and ascii data files in that directory. For example, the resulting directory/file structure from a dbexport of the database stores7 (also known as the Stores Demonstration database) would look like the example above.

IBM Informix Dynamic Server Data Movement Utilities 16-5

Using Dbimport and Dbexport

SE Database

ascii file dbimport ascii file

SE Database

dbexport IDS Database

IDS Database

Dbimport and dbexport were originally designed to allow migration from large IBM InformixSE databases to large IBM Informix D ynamic Server databases, but can also be used to move from one SE database to another, or from one IDS database to another, or from IDS to SE.

16-6 IBM Informix Dynamic Server Data Movement Utilities

Dbexport Syntax

Syntax: dbexport
-c
destination options

-d

-q

destination option

database -ss
destination options

-V

-o directory -t device -b blocksize -s tapesize -f pathname

Dbexport will take a database and create a special directory containing files of ascii dumps of each of the tables in the database specified, and a schema SQL file containing DDL commands and some additional accounting information. These files will be used by the companion dbimport utility to recreate the database.
n

The -c option instructs the program to continue even if errors occur, until a fatal error occurs. The fatal errors are:
w w w w

Unable to open the tape device specified Bad writes to the tape or disk Invalid command parameters Cannot open database or no system permission

n n

The -d option exports blob descriptors only and not blob data. The -q option suppresses the echoing of SQL statements, error messages and warnings.

IBM Informix Dynamic Server Data Movement Utilities 16-7

The destination options are:


w

-o directory-path specifies the directory where the ascii files are to be stored. The directory specified must already exist. A sub-directory within the specified directory named database.exp will be created for you and will hold the data files. The default is the current working directory. -t device directs the output to a tape device. You must specify the blocksize and the amount of data on each tape. -b blksize specifies the tape block size in kilobytes. -s tapesize specifies the number of kilobytes to be written to each tape. the maximum tape size is 2,097,151 kilobytes. -f file-path directs the schema SQL command file to disk in the file indicated by the full path name.

w w w w n n

database is the name of the database to be exported. The -ss option generates server-specific information for all tables in the specified database. When the database is unloaded, the schema will contain the following information:
w w w w

Logging mode of the database Initial extent size of the table Lock mode of the table Dbspace the table is located in

The -V option displays product version information.

16-8 IBM Informix Dynamic Server Data Movement Utilities

Dbexport Example
n n n n

Export stores7 database to tape Block size 16 kilobytes Tape size 24,000 kilobytes Continue if errors occur
dbexport -c -t /dev/rmt0 -b 16 -s 24000 stores7

The above command exports the stores7 database to tape. The block size is 16 kilobytes, and 24,000 kilobytes are written to each tape. If errors occur, the program continues.

IBM Informix Dynamic Server Data Movement Utilities 16-9

Using Additional Options


n n

Write schema file stores.sql to disk Generate server-specific information


dbexport -c -t /dev/rmt0 -b 16 -s 2400 -f /usr/port/stores.sql stores7 -ss

10

The above command exports the stores7 database to tape and puts the stores7.sql file on disk in the /usr/port directory. The server-specific information, such as logging mode and extent sizes, will be included.

16-10 IBM Informix Dynamic Server Data Movement Utilities

Dbimport Syntax
Syntax: dbimport
-c Input file location -q
Input file location create option

database

-i directory -t device -b blocksize -s tapesize -f pathname

11

Dbimport, the companion to dbexport, will create and load a database.


n

The -c option instructs the program to continue even if errors occur, until a fatal error occurs. The fatal errors are:
w w w w

Unable to open the tape device specified Bad reads from the tape or disk Invalid command parameters Cannot create database or no system permission

IBM Informix Dynamic Server Data Movement Utilities 16-11

The input file location options are:


w

-i directory-path specifies the directory where the data files are located. You can use either the full directory pathname or a directory path relative to your current directory. -t device directs the input from a tape device. Do not run the program in the background when you use the tape option. -b blocksize specifies the tape block size in kilobytes. You must use the same blocksize used when the database was exported. -s tapesize specifies the number of kilobytes to be read from each tape. You must use the same tapesize used when the database was exported. -f pathname s pecifies the input path for the SQL command file. This must be the same command file specified when the database was exported.

w w w w n n

The -q option suppress the echoing of SQL statements. database is the name of the database.

16-12 IBM Informix Dynamic Server Data Movement Utilities

Dbimport Syntax Create Options

create option

-d dbspace -l buffered -ansi

13

The create options are : -d dbspace -l buffered -ansi is the destination dbspace for the database. If you do not specify this option, the database is created in the root dbspace. specifies that the imported database is to use transaction logging. By default, the imported database is created without logging. specifies that logging is to be buffered. Otherwise, the logging will be unbuffered. tells a program to create the new database as MODE ANSI.

IBM Informix Dynamic Server Data Movement Utilities 16-13

Dbimport Example
n n n

Load stores7 database from tape into dbspace dbspace2 Suppress the echo of SQL statements Continue if errors occur
dbimport -cq -d dbspace2 -t/dev/rmt0 -b 16 -s 24000 stores7

14

The above command loads the stores7 database from tape into the dbspace dbspace2. The block size is 16 kilobytes, and 24,000 kilobytes are written to each tape. It suppresses the echo of SQL statements. The program continues even if errors are found.

16-14 IBM Informix Dynamic Server Data Movement Utilities

Additional Dbimport Options


n n n

Import data from directory /usr/informix/port Use default location (root dbspace) Create the database as MODE ANSI
dbimport -c -i /usr/informix/port -ansi stores7

15

The above command imports the stores7 database from the directory /usr/informix/port/stores7.exp using data definition statements and commands from the file stores7.sql in that directory. Because a location is not specified, the new database is put in the root dbspace. The new database is MODE ANSI, with implied unbuffered logging.

Schema: SQL Command File


The code below shows a sample command file created from running dbexport. The command file will be used by dbimport to create the new tables.
{DATABASE stores7 delimiter | } grant dba to davek; grant resource to public; {TABLE customer row size = 134 number of columns = 10} {unload file name = customer100.unl number of rows = 18} create table davek.customer ( customer_num serial not null fname char (15) lname char (15) company char (15)

IBM Informix Dynamic Server Data Movement Utilities 16-15

address1 address2 city state zipcode phone

char(20) char (20) char (15), char (2) char (5) char (18));

revoke all on customer from public; create unique index davek.c_num_ix on customer (customer_num); create index davek.zip_ix on customer (zipcode); grant select on customer to public as davek; grant update on customer to public as davek; grant insert on customer to public as davek; grant delete on customer to public as davek; grant index on customer to public as davek; {VIEW custview} create view davek.custview (firstname, lastname, company, city) as select fname, lname, company, city from customer where city = "Redwood City" with check option; grant select on custview to public as davek; grant update on custview to public as davek; grant insert on custview to public as davek; grant delete on customer to public as davek; grant index on customer to public as davek;

16-16 IBM Informix Dynamic Server Data Movement Utilities

Dbload Highlights
n n

Loads data from one or more ascii files on disk into one or more existing tables. The ascii file can be created by the UNLOAD utility or by another mean (c program, text editor or any other unload utility file that unloads data to ascii). Dbload offers more flexibility than LOAD.

17

Dbload is a flexible utility that loads data into database tables from ascii files on disk. Dbload differs from dbimport in that the data in the ascii files can be added to data that already exists in a table. The table must be created before dbload can add data to it. The load file is an ascii file with each column separated by a specific delimiter. You may alter the delimiter from the default (|) by setting the DBDELIMITER environment variable. Dbload has several features that offer more flexibility than using the SQL LOAD statement:
n n n

You can specify a starting point in the load file. You can add transaction logic (commit after every x rows). You can limit the number of bad rows read, at which time dbload terminates.

Also, dbload can be used to load fixed length files into a database.

IBM Informix Dynamic Server Data Movement Utilities 16-17

Dbload Syntax

Syntax:
dbload -d database -c commandfile -l errorfile -r -i num-rows ignored -n num-rows bad-row limits

bad-row limits

-e num-errors

-p

18

Options to the dbload command: -c commandfile -d database -l errorfile -r -i num-rows ignored -n num-rows Specifies the file name of a dbload command file. Specifies the name of the database to receive the data. Specifies the file name or path name of an error log file. Instructs dbload not to lock the table(s) during loading, enabling other users to update data in the table during the load. Instructs dbload to ignore the specified number of NEWLINE characters in the input file. Instructs dbload to execute a COMMIT after the specified number of new rows are inserted. This option only works if your database has logging. If this option is not set, a COMMIT occurs every 100 rows. Specifies the number of bad rows that dbload will read before terminating. Prompts for instructions if the number of bad rows exceeds the limit.

-e num-errors -p

16-18 IBM Informix Dynamic Server Data Movement Utilities

Dbload Command File: Delimited

delimiter between columns

number of columns

FILE stock.unl DELIMITER "|" 6; INSERT INTO stock; FILE customer.unl DELIMITER "|" 10; INSERT INTO customer;

19

You must first create a command file before you run dbload . Command files may use delimiters or character positions. The above example uses the delimiter form.
n

The FILE statement specifies the location and description of the data files. The fields in the data file will be separated by a delimiter that you specify in your FILE statement within quotes. You must also specify how many fields each data row has. When DBLOAD executes, the columns will be assigned the internal names f01, f02 , f03 , etc. The INSERT statement will specify the table in which the rows are to be inserted, and the order of the columns. In the example above, the fields will be loaded into the stock columns in the order the columns appear in the syscolumns system catalog table. You may also specify only certain columns in any order, for example:
INSERT INTO stock(stock_num,manu_code,unit_price)

If a value in a field is longer than its associated character column, it will be truncated. If it is shorter, it will be padded with spaces.

IBM Informix Dynamic Server Data Movement Utilities 16-19

Tip
If you only want to load certain fields in the data file, you may refer to those data fields by their internal names, for example:
INSERT INTO stock(stock_num,manu_code,unit_price VALUES(f01,f02,f04)

16-20 IBM Informix Dynamic Server Data Movement Utilities

Dbload Command File: Character Position


FILE cust_date (city 1-15, state 16-17, zip 18-22 NULL="?????"); INSERT INTO valid_addr VALUES(city, state, zip);

21

The character-position file statement assumes that the load file is fixed length, and each field starts at a specific character position within each line. The example above describes three values, city, state and zip. Each value is given a start position and a stop position. The zip value has an optional NULL clause. If dbload encounters the value specified in the NULL clause in the zip position, it will substitute a NULL for the zip value in the INSERT statement.

Tip
It is possible to specify more than one INSERT statement for each FILE statement. You may, for example, want to insert a row in the valid_addr table and a row into the zip table for each line in the load file.

IBM Informix Dynamic Server Data Movement Utilities 16-21

Onunload/Onload Highlights
Onunload: n Transfers data to tape in binary pages Onload: n Creates database or tables in specified dbspaces n Loads data created by onunload only n Page size must be compatible

22

Onunload and onload are very efficient ways to transfer data because the data is read and written in binary pages. The utilities are not available for SE databases and cannot be run for databases across a network. The onunload utility writes an entire database or table in binary disk-page units to tape. It cannot be used to write to a file. You can only read the tape using the onload utility. You can specify a table or entire database to be created when onload is run. You must have resource privilege on the database to run onload. The owner defaults to the user running onload unless otherwise specified. Page size varies from machine to machine . You cannot use the utilities to transfer data from one IBM Informix system to another with a different page size. Most IBM Informix systems have either 4k or 2k pages.

16-22 IBM Informix Dynamic Server Data Movement Utilities

Onunload Syntax
Syntax:
onunload
tape parameters database owner.table

tape parameters
-l 1 -b blocksize 1 -s tapesize 1 -t device

23

You can unload either the entire contents of a database, or the entire contents of a table with onunload .
n n

If you do not specify any tape parameter options, onunload uses the archive tape parameters by default. The tape parameter options are: -l directs onunload to read the values for tape device, block size and tape size from the logical log backup device parameters in the IDS configuration file.
w

-t device specifies the path name of the tape device. You may specify a remote tape device using the following syntax:
host_machine_name:tape_device_pathname

w w n n

-b blksize specifies the block size in kilobytes of the tape device. -s tapesize specifies the number of kilobytes to be stored on each tape.

database specifies the name of the database. owner.table specifies the owner and name of the table.

The logging mode and ANSI compliance are not preserved when a database is unloaded. You must change these options after the database is loaded to the destination. You must have DBA privileges to unload a database and either DBA privileges or ownership of the table to unload a single table.

IBM Informix Dynamic Server Data Movement Utilities 16-23

Unload Syntax
Syntax:
database tape parameters create options create options
1

onload

owner.table

-d dbspace -i o-ind n-ind

tape parameters
-l 1 -b blocksize 1 -s tapesize 1 -t device

24

Onload creates an entire database or table in a specified dbspace and loads the ascii data from a tape created by the onunload utility.
n

The tape parameter options are:


w w

-l directs onload to read the values for tape device from the IDS configuration file parameters. -t device specifies the path name of the tape device. You may specify a remote tape device using the following syntax:
host_machine_name:tape_device_pathname

w w n w

-b blksize specifies the block size in kilobytes of the tape device. -s tapesize specifies the number of kilobytes to be stored on each tape. -d dbspace specifies the name of the dbspace the table or database will reside in. You do not specify the dbspace the table or database being loaded will be placed in the root dbspace. -i o-ind n-ind can be used to rename indexes during the load to avoid conflict with existing index names. A table name must be specified in the command line for the -i option to take effect.

The create options are:

16-24 IBM Informix Dynamic Server Data Movement Utilities

Data loaded as a result of onload will be logged if the database is created with logging. It is recommended you turn off logging before loading a large amount of data, and then perform an archive after your data is successfully loaded. When a new database is loaded, the user who runs onload becomes the owner. Ownership within the database (tables, views and indexes) remains the same as when the database was unloaded to tape with onunload. Synonyms or access privileges are not carried over if you are loading an individual table.

IBM Informix Dynamic Server Data Movement Utilities 16-25

16-26 IBM Informix Dynamic Server Data Movement Utilities

Exercises

IBM Informix Dynamic Server Data Movement Utilities 16-27

Exercise 1
1.1 Use the UNLOAD statement to create an ascii file with data from the stock table:
UNLOAD TO "stock.unl" SELECT * FROM stock;

1.2 Create a table in your database named table1 with the following columns:
stock_num (SMALLINT) description (CHAR(15))

Create a table in your database named table2 with the following columns:
stock_num (SMALLINT) unit_price (MONEY) unit (CHAR(4))

1.3 Use the dbload utility to transfer the data from the ascii file to the appropriate columns of the new tables. Insert the value "each" in the unit column of every row of table2 .

16-28 IBM Informix Dynamic Server Data Movement Utilities

Exercise 2
Use dbexport to unload your database and examine the directory structure that is created, and the SQL command file.

IBM Informix Dynamic Server Data Movement Utilities 16-29

16-30 IBM Informix Dynamic Server Data Movement Utilities

Solutions

IBM Informix Dynamic Server Data Movement Utilities 16-31

Solution 1
1.1 1.2
CREATE TABLE table1 ( stock_num SMALLINT, description CHAR(15)); CREATE TABLE table2 ( stock_num SMALLINT, unit_price MONEY, unit (CHAR(4));

1.3 Contents of the dbload command file:


FILE "stock.unl" DELIMITER "|" 6; INSERT INTO table1 (stock_num, description) VALUES (f01, f03); INSERT INTO table2 (stock_num, unit_price, unit) VALUES (f01, f04, "each");

Then dbload is executed from the command line:


dbload -d stores -c commandfilename -l errs

Output:
DBLOAD Load Utility INFORMIX-SQL Version 7.30.UC2 Copyright (C) Informix Software, Inc., 1984-1998 Software Serial Number AAC#J334212 Table table1 had 74 row(s) loaded into it. Table table2 had 74 row(s) loaded into it.

16-32 IBM Informix Dynamic Server Data Movement Utilities

Solution 2
Executed from the command line:
dbexport -q databasename -ss

IBM Informix Dynamic Server Data Movement Utilities 16-33

16-34 IBM Informix Dynamic Server Data Movement Utilities

Module 17
Introduction to the High Performance Loader

Introduction to the High Performance Loader 09-2001 2001 International Business Machines Corporation

17-1

Objectives
At the end of this module, you will be able to: n List the main features of the High Performance Loader n Describe the components of the load process n Recognize the IECC interface

17-2 Introduction to the High Performance Loader

High Performance Loader Features


n n n n n

Fast loading/unloading of large amounts of data Data conversion capabilities Support for a variety of storage devices Mapping of input fields to output fields Storage of load/unload job definitions

The High Performance Loader (HPL) is a utility that enables rapid loading and unloading of tables. Though the HPL has been available on the Unix ports of the engine with version 7.2, the 7.3 version of Dynamic Server brings it to the NT port of the engine for the first time.

High Performance Loader Features


n

Fast loading/unloading of large amounts of data is achieved through a number of means:


w w w w w

Multiple devices can be used. I/O to the devices and to the server is performed in parallel. Light scans and light appends may be used on the server to process a buffer at a time instead of a record at a time. Data conversion is performed in parallel by multiple HPL processes. An IBM Informix streams interface is used to transfer data between the HPL and the IBM Informix Dynamic Server.

Introduction to the High Performance Loader 17-3

Data conversion capabilities


w

Data is converted from the format and data types in the load file to the format and data types in the database table. For example, ASCII numeric data in the load file can be converted to a float data type. Case conversion, default values, or null value replacement are also available. Conversion activities are performed by the HPL, not the database server. Files Tapes Pipes You determine where input fields are placed in the output record. The HPL database stores all the information about the load and unload jobs.

Support for a variety of storage devices


w w w

n n

Mapping of input fields to output fields


w w

Storage of load/unload job definitions

Note
HPL can be used for single table loads only.

Why use HPL?


n n n

Performance is the the big plus; rapid loading and unloading of tables. To load data from other sources. The data conversion and manipulation facilities of HPL make this job easier to accomplish. To replace the ALTER TABLE statement. In IDS versions prior to 7.2 ALTER TABLE often makes a complete copy of the table in the same dbspace as the table. If you do not have enough disk space for this copy, you can unload the data to tape using the High Performance Loader, alter the table, then re-load the data into the table. The ALTER FRAGMENT statement in versions prior to 7.2 also uses extensive disk space, making it impractical to use when substantially altering the fragmentation scheme of a table. The High Performance Loader can be used to unload the table before dropping it. The table can be re-created with the new fragmentation scheme and loaded.

17-4 Introduction to the High Performance Loader

Parallel Loading and Unloading


Devices

onpload

aio VP data convert session thread

aio VP data convert session thread

aio VP data convert session thread Exchange IBM Informix Streams Interface

IBM Informix Dynamic Server

Insert thread

Insert thread Fragments

The diagram above illustrates a load job. Parallelism is similar for both load and unload jobs, and is implemented in the HPL with the following features:
n

Device array
w

A device array may be composed of multiple tapes, files, or pipes and is used during loading or unloading. HPL is implemented as a multi-process, multi-threaded client. Device I/O, conversion, filtering, and I/O to the server can all be performed in parallel. IDS establishes multiple threads to service the HPL client. Scans, inserts, and I/O can be performed in parallel. Fragmenting the database table enables greater parallelism.

Multi-process, multi-threaded client


w

Multi-threaded server
w

Introduction to the High Performance Loader 17-5

HPL Constituants
Target database User Interfaces

onpload onpload database

Data files

The High Performance Loader consists of the following:


n

The onpload utility is the workhorse of the HPL. It moves, filters, and converts data, and keeps a log of all its activity. Records which are rejected by onpload during a load/ unload are written to a reject file. Onpload must reside on the same machine as the target database, the database that will contain the data you are loading. The onpload database stores information about the components of the load or unload job.It is created automatically when the user interface connects to the IDS system (if it does not exist already). The database will be created in the IDS instance declared by the environment variable INFORMIXSERVER. In contrast to the onpload utility, the database can reside in any IDS server on your network. User Interfaces to the HPL

17-6 Introduction to the High Performance Loader

User Interfaces
n n n

ipload - graphical interface on UNIX winpload - graphical interface on Windows NT/95 UNIX command-line option

The ipload utility is the HPLs graphical user interface on UNIX, running under XWindows. You use it to define the components of your load and unload jobs (i.e. format, map, devices, and filters). The job component information is stored by ipload in the onpload database where it can be accessed at a later time by the onpload utility. To execute ipload , the IDS database server must be on-line. It is important to remember that the user interface, ipload, is used to define jobs. It does not process data. Only the onpload utility filters, converts, and moves data. For further details on the ipload user interface, reference the Appendix on "The IPLoad Interface" and the Guide to the High-Performance Loader.

IBM Informix Dynamic Server 7.30.TC1 introduced an HPL interface for the Windows NT/95 platform. This interface is named winpload and is part of the IBM Informix Enterprise Command Center (IECC). Winpload is a subset of the ipload interface and provides only a portion of the ipload functionality. While the ipload/onpload combination is provided only on Unix, the winpload/ onpload product can have the following configurations:
w w w

Winpload client on NT/95 with onpload on Unix Winpload client on NT/95 with onpload on remote NT machine Winpload client on NT/95 with onpload on local NT machine

Introduction to the High Performance Loader 17-7

The onpload utility can be run from the UNIX command line . See the Guide to the High Performance Loader for additional details.

Important!
The architecture of the onpload process and database remain the same regardless of the interface being used.

17-8 Introduction to the High Performance Loader

Jobs and Components


onpload database

Unload Job

Load Job

Query Map Format File

Table Filter Map Format File

A job is a single load or a single unload task. During an unload job, the result of a single SQL statement is output to a device. A load job takes input data from a device and loads a single table. The IDS server must be on-line when a job is run. Job components are the individual elements that define the job
n n n n n n n

Device arrays - the devices that the data is read from, or written to. Record format - the format of the load or unload file Map - the relationship between columns in a table and fields in a file Query - defines the SQL statement to be used for unloading table data. The SQL statement may contain WHERE clause filters. Filter - defines the criteria to accept or reject source file records for the load Table - database table File - operating system data file

The job and component definitions reside in the onpload database.

Introduction to the High Performance Loader 17-9

Exporting and Importing With Winpload


n

n n

From IECC, winpload has the ability to: w Export databases to disk or tape w Import databases from disk or tape Only entire databases can be exported/imported Similar functionality as provided by the dbimport and dbexport server utilities

10

The winpload interface has the ability to import and export entire databases. This feature is similar to the dbexport and dbimport server utilities. The database schema and data files can be exported to a disk directory or to tape devices.

17-10 Introduction to the High Performance Loader

Using the IECC Interface

11

Winpload is launched from the IECC. IECC is a graphical interface that enables you to manage your IBM Informix database servers and data, and to create and modify database objects. IECC displays your IBM Informix database servers and administration tools in a tree view. You navigate through the IECC window as you would navigate through the usual Windows program interface.

Introduction to the High Performance Loader 17-11

17-12 Introduction to the High Performance Loader

Exercises

Introduction to the High Performance Loader 17-13

Exercise 1
The purpose of this workshop is to make sure your IECC client is properly connected to the database server for future exercises. You will create a database, and connect IECC to the server instance for that database. The instructor will advise you what instance to use.

Tasks
1.1 A server agent needs to be started for your IDS instance. These instructions may differ based upon your platform. Your instructor will start the agent, or provide details on starting the server agent. 1.2 Settings in the SETNET32 utility should be verified to allow IECC to connect to your instance. The correct settings for your classroom will be provided by the instructor. 1.3 With the server agent running, launch the IECC Console from the Start Menu by selecting Programs > Informix > IECC Console. The console may take a few seconds to launch. Your DBSERVERNAME should appear in the IBM Informix Neighborhood with a blue circle next to it. Other servers may appear in the list along with yours. 1.4 To connect to your server, right-click the server name and select Connect. If you are online and the connection is successful, the blue circle will appear with the letters DS within the circle. 1.5 Right-click the server name, again. Select Task. Make sure a black bullet is next to the On Line keyword. This indicates your server is in online mode. 1.6 Once your server is online, use dbaccess or SQL Editor to create an unlogged database:
create database test;

1.7 In the IBM Informix Neighborhood, double-click the database server. The server should expand and display six folders. 1.8 Double-click the database servers Databases folder. The databases appear. Make sure your test database can be seen. 1.9 From the console menu, select File > Save. Your IECC display settings are saved as an .imc file. You have successfully connected to your database server from IECC. Your configuration should be set for remaining exercises.

17-14 Introduction to the High Performance Loader

Module 18
Using the Winpload Interface

Using the Winpload Interface 09-2001 2001 International Business Machines Corporation

18-1

Objectives
At the end of this module, you will be able to: n List the steps in defining a load and an unload job n Understand express mode loading n Access existing jobs n Know where to look for error messages, violations, and diagnostic records

18-2 Using the Winpload Interface

Loading a Table
n n

Create a job to specify how data is to be loaded The Load Table Wizard assists with these tasks: w Naming the job w Choosing input devices w Defining input field names w Defining data format w Defining mappings between fields and columns

IECC enables you to load single tables residing in an IBM Informix database server. To specify how data is to be loaded, you create a job. While creating a job, the Load Table Wizard assists in defining the job components. These components are discussed on the following pages.

Using the Winpload Interface 18-3

Naming the Job

onpload

Job stored in database.

Job retrieved and run.

onpload database

When a job is named and saved, the job name and all components that make up the job are stored in the onpload database. The tables in the onpload database hold information that HPL uses to perform data loads and unloads. When a load or unload job is executed, IECC issues a request to the onpload process to run the job. When onpload receives this request, it retrieves the job components from the onpload database and performs the load or unload job.

18-4 Using the Winpload Interface

Choosing Input Devices


Input devices can be: n Tapes n Disk files

Input and output devices are the media from which data is read or written. A device array is a logical grouping of physical devices for use by a job. The High Performance Loader achieves some of its parallelism by performing I/O for multiple devices simultaneously. In an unload operation, I/O is spread evenly across the devices. The user cannot specify which device to use for a particular piece of data. Similarly, in a load operation, the data is read simultaneously from devices and is fed to the database server in no particular order. The device needs to reside on the box that holds the database server instance. When using winpload, the file location is specified on the UNIX device, not the local PC. The types of devices in a device array can be:
n

Tapes - When the load or unload job starts, the tape should be mounted. The operator is prompted when a new tape is needed. One or more tapes can be used for a load or unload operation. Disk files - One or more disk files can be used for a load or unload operation. Files must exist on the same system as the target database. The file will be created for you during an unload if it does not exist. Make sure you have enough disk space.

The number of external devices used can vary from job to job. This offers the option of using as many external devices as necessary to match the speed of parallel I/O to the disks on which a fragmented table resides.
Using the Winpload Interface 18-5

Defining Input Field Names


Each data field from the input device needs a field name:

106 | George | Watson | Watson & Son | . . . 107 | Charles | Ream | Athletic Supplies | . . .

Field1 Field2 Field3 Field4

An input field name must be defined for each data field from the input device. These field names are later used for mapping against database table columns. The Load Table Wizard uses the underlying database table definition to determine the number of columns in the source table. Default input field names are generated using the naming convention Field1, Field2 , Field3 , etc. until all columns have a field name. The default field names can be changed by the user if a more meaningful description is needed.

18-6 Using the Winpload Interface

Defining Data Format


There are two types of data formatting: n Fixed - each field begins and ends in the same place in every record
(col 1)(col 7) (col 21) (col 36) 106 George Watson Watson & Son 107 Charles Ream Athletic Supplies

Delimited - fields are separated by a delimiter


106 | George | Watson | Watson & Son | . . . 107 | Charles | Ream | Athletic Supplies | . . .

106

George

Watson Watson & Son

The formats component defines a format for the load or unload file. There are two types of formats that can be specified:
n

Fixed - Each field starts and ends at the same place for every record. You specify the length of each field. The winpload utility calculates the offset for each field and the total length of the record from the field lengths that you supply. Delimited - Each field is separated by a delimiter. You can specify the delimiter in the Delimiter Options window, in terms of hexidecimal, octal, control character, or ASCII name.

Using the Winpload Interface 18-7

Mapping Input Fields to Columns

Customer Table customer_num lname fname company

Field1

Field2

Field3

Field4

Input Field Names

A map is the relationship between columns in a table and fields in a file. With winpload, each field from the input device must directly map to a database column. This is accomplished by mapping the input field names previously defined, against a database table column. Mapping is especially useful when the fields in the load file do not correspond in order to the database columns. Example: The first field in the customer table is customer_num. The load file contains customer_num as the third input field:
George|Watson|106|Watson & Son| . . . Charles|Ream|107|Athletic Supplies|. . .

Assign input field names to each field in the load file.


Field1|Field2|Field3|Field4

We can now perform the following mapping to the database columns:


customer_num fname lname company Field3 Field1 Field2 Field4

18-8 Using the Winpload Interface

Steps to Create a Load Table Job


1. 2. Launch the IECC Console. Connect to the desired database server. In the IBM Informix Neighborhood, double-click the database server where the target database is located. Expand the folder list until you see the Databases folder (click on the plus sign (+) located next to your database server). Double-click the database servers Databases folder. The databases appear. Double-click the database that contains the source table to display the folders for that database. Double-click the Tables folder. The tables appear. Right-click the source table and select Task > Load Table Wizard from the popup menu. Enter information in the Load Table wizard pages to define a new load table job. Use the help button on the Load Table wizard to display detailed information about each wizard item. The last page of the Load Table wizard is a summary page that displays the specifications you entered. The wizard asks: "Do you want this job to run when you finish this wizard?" Specify when the job is to be run. To run the job immediately, select Yes. The job is saved and is run immediately, and the Running Job window displays the jobs progress. To save the job to run at a later time, select No . The job is saved and is available to run at a later time. Click Finish when you have completed defining your load table job. The load table job is saved in the onpload database, and the job name is displayed in the Available Jobs dialog boxes.

3. 4. 5. 6. 7.

8.

9.

Warning!
If you click Close in the Running Job window while a job is running, the job is terminated, and the Running Job window closes.

Using the Winpload Interface 18-9

Express Mode Loading


n n n n n n n

No logging occurs. Constraints, indexes, and triggers are set to DISABLED until after the job completes. Table is exclusively locked for the duration of the load. Load proceeds as a single transaction and is rolled back in the event of a system crash. Blobs are not supported. Row length must be less than a page. Level 0 archive must be performed after the load to enable write access (logged databases only).

10

There are two modes which onpload uses to load data: express mode and deluxe mode . The winpload interface operates solely using express mode. Express mode is designed to be more efficient for loading larger amounts of data.
n

No logging occurs for data inserted during an express mode. This improves load performance and helps prevent a long transaction from occurring (the load is considered one transaction). Only new extent allocations are written to the log. Hint: Dont stick with the 8-page default extent size. You could still get a long transaction.
n

Indexes, constraints, and triggers are automatically disabled during the load. After the load completes, the indexes, constraints, and triggers are automatically re-enabled. This means that all indexes will be re-built (parallel index builds will happen automatically). A constraint cannot be disabled when other enabled constraints refer to it. For example, a primary key constraint cannot be disabled by the HPL if a foreign key constraint in another table refers to it. The HPL will return an error in such a case. You must disable any foreign key constraints which refer to the target table prior to loading in express mode. Re-enabling a trigger does not cause a trigger to be fired for the rows loaded. You are responsible for ensuring that processing associated with triggers is carried out for the new rows.

18-10 Using the Winpload Interface

n n

Blobs are not handled because of the extra processing overhead required to handle blobs larger than a buffer size. If the database is logged, then the loaded table is not made available for write access until after a database server backup is performed. If the database is not logged, then no backup is necessary. To minimize backup time, perform express loads on all necessary tables and then perform the database server backup.

If Violations Occur
Violating rows will be copied to the violations and diagnostics tables ( tabname_vio and tabname_dia where tabname is replaced by the actual table name). The violating rows are not deleted from the table and constraints and indexes are not re-enabled.

Using the Winpload Interface 18-11

Creating an Unload Table Job


n n

Data can be unloaded to one or more files or tape devices The Load Table Wizard assists with these tasks: w Naming the job w Choosing output devices w Defining output field names w Defining data format w Defining mappings between fields and columns w Defining a query

12

An unload table job is created to extract data from the specified table to an output device array. The unload job is stored in the onpload database and run by the onpload process when requested. The components for an unload job are similar to the components defined for a load job. One exception is that the file and tape devices now become output media for writing data. The second different is that a query must be defined for an unload job.

18-12 Using the Winpload Interface

Defining a Query
Customer Table
Select all columns from the source table?

Select specified columns from the source table? Select all rows or use a where clause?

13

The query component defines the SQL statement to be used for unloading table data. There are two types of queries available:
n n

Default query - automatically generated by selecting all columns from the source table with no WHERE clause Custom query - statement defined by the user to override the default query

An unload job is initially created with the user specifying a source table. For example, if the customer table is chosen, the following default query would automatically be generated and used by the unload job:
SELECT * FROM customer

If a more selective SQL statement is needed, the custom query option can be selected. At this time, the user is able to type in the desired SQL statement. The custom option can be used to select specific columns from the table or add a WHERE clause to limit rows returned.

Note
While the ipload interface allows join queries to select data from multiple tables, the winpload interface only allows single table select statements.

Using the Winpload Interface 18-13

Steps to Create an Unload Table Job


1. 2. 3. 4. 5. 6. In the IBM Informix Neighborhood, double-click the database server where the source table is located. Double-click the database servers Databases folder. The databases appear. Double-click the database that contains the source table. Double-click the Tables folder. The tables appear. Right-click the source table and select Task > Unload Table Wizard from the popup menu. The Unload Table wizard appears. Type information in the Unload Table wizard pages to define a new Unload Table job. Use the question mark button on the Unload Table wizard to display detailed information about each wizard item. The last page of the Unload Table wizard is a summary page that displays the specifications you entered. The wizard asks: Do you want this job to run when you finish this wizard?" Specify when the job is to be run. To run the job immediately, select Yes. The job is saved and is run immediately, and the Running Job window displays the jobs progress. To save the job to run at a later time, select No . The job is saved and is available to run at a later time. Click Finish when you have completed defining your unload table job. The unload table job is saved in the onpload database. The job name is displayed in the Available Jobs dialog boxes.

7.

8.

Warning!
If you click Close in the Running Job window while a job is running, the job is terminated, and the Running Job window closes.

18-14 Using the Winpload Interface

Unloading Blobs
A blob may be unloaded to multiple volumes n When loaded, the volumes containing the blob must be loaded in order n A blob which spans volumes is unloaded to a single device and must be loaded by a single device

15

The HPL will always unload a row onto a single media volume (tape). Volumes containing data from an unload job can be loaded back via onpload in any sequence and from any device. This is not true for data containing blobs. The following table illustrates an unload job using three tape devices: device1 tape1 tape4 tape7 device2 tape2 tape5 tape8 device3 tape3 tape6

When a record is unloaded containing blobs, the blobs could possibly span multiple tapes (all data is unloaded to the same device, however). For example, in the table above, a record with blobs could span tapes 1, 4, and 7. It would not span tapes 3, 5, and 7. This means that the tapes 1, 4, and 7 will have to be loaded back in consecutive order, without interruption, through a single device. During a load job, the HPL checks for sequence information in header records to assure that the tapes are mounted in the correct order.

Using the Winpload Interface 18-15

Accessing Existing Jobs


The following tasks can be performed on a previously created job: n Run a job and view the output n View and edit a jobs definition n Delete a job

16

To access existing jobs using IECC: 1. 2. In the IBM Informix Neighborhood, right-click the database server that contains jobs you want to access. Select Task > Access Load Jobs or Task > Access Unload Jobs from the popup menu. The Access Load Jobs dialog box or the Access Unload Jobs dialog box appears.

18-16 Using the Winpload Interface

Running the Job


Running Job
Name: cust_unload1 Type: Unload

Connecting to server @srv_tcp1 Starting job cust_unload1 Mon Dec 14 14:25:30 1998

Stop Job

Close

17

After a job has been created it can be run at any time. As the job is run, a Running Job window appears to display the jobs progress. From this window, you can tell when the job has completed. Messages regarding any errors that occur during the job will also appear in the Running Job window. If the Stop Job button is clicked while the job is running, the window will remain open in case any further output is generated.

Using the Winpload Interface 18-17

Reject and Log Files


cust_load load job

Status Message

Bad Record

cust_load.log

cust_load.rej

18

If onpload cannot unload or load a record for some reason, it will place the record in a reject file whose name is specified by the syntax jobname.rej. Any corresponding status messages and error messages will be stored in a log file. The log file resides in the same directory as the reject file and is named jobname.log . Winpload automatically assigns the filename and location for the log and reject files. On NT/95, the message files will appear in %INFORMIXDIR%\bin or in the directory specified by the TEMP environment variable. On Unix, the message files will be placed in the directory specified by the devices array.

Violation and Diagnostic Records


If violations occur during a load job, onpload creates two tables to store additional status information.
n

The Violations table is named tabname_vio where tabname is replaced by the actual table name being loaded. This table holds constraint violations encountered by the server.

18-18 Using the Winpload Interface

The Diagnostics table is named tabname_dia where tabname is replaced by the actual table name being loaded. This table shows diagnostic or informational messages about a load or unload job.

When the constraints and indexes are enabled after an express load, violating rows will be copied to the violations and diagnostics tables. These records can be viewed by querying the tables.

Using the Winpload Interface 18-19

Importing and Exporting Databases


n

n n

Two stored procedures must be added to the sysmaster database: w dbexp() - accomplishes the export process w dbimp() - accomplishes the import process INFORMIXDIR/bin/winpload.sql contains procedures Procedures can be run through w dbaccess w SQL Editor (NT/95)

20

To use the database export or import features of the IECC client, two stored procedures must be added to the sysmaster database on the IBM Informix database server on which the IECC Server Agent is running. To add the dbexp() and dbimp() stored procedures to the sysmaster database, run the following command:
UNIX: NT/95: dbaccess - $INFORMIXDIR/bin/winpload.sql dbaccess -%INFORMIXDIR%\bin\winpload.sql

If this procedure for adding dbexp() and dbimp() is not done, any users who attempt to use the database export or import feature of IECC will receive an error notifying them of these missing stored procedures.

Note
If you already have the two stored procedures dbexp() and dbimp() installed in the sysmaster database, remove the comment signs around the two "drop..." lines in this script file. The winpload.sql script is also documented in the IECC release notes.
18-20 Using the Winpload Interface

Steps to Export a Database


1. 2. 3. In the IBM Informix Neighborhood, double-click the database server where your source database is located to display the folders for the server. Double-click the servers Database folder. A list of databases appears. Right-click the source database and select Task > Export Database from the popup menu. The Export Database dialog box appears, displaying the name of the source database. Enter the fully-qualified path of your destination location in the Path text box. Click the Directory or Tape drive button in the Destination Medium box. If you select the Tape drive option, type the following information: Block size as expressed in kilobytes (KB). The default block size is 1 KB. Tape length as expressed in either megabytes (MB) or gigabytes (GB). The default tape length is 2 GB. Click Export. To review the status of the export process, read the dbexport.out log file located in the same directory as the destination path. If the exported database does not appear in the specified destination location, check the agent_servername.log file in the $INFORMIXDIR directory for the database server (servername is the value specified by the DBSERVERNAME parameter of the server configuration file).

4. 5. 6.
n n

7.

Using the Winpload Interface 18-21

Steps to Import a Database


1. In the IBM Informix Neighborhood, right-click the database server where you want to locate your target database, and select Task > Import Database from the popup menu. The Import Database dialog box appears. Enter the name of the database that you want to import in the Database to Import text box. Enter the path for the source files in the Path text box. Select a source location and click the Directory button or the Tape drive button in the Location Medium box. If you select the Tape drive option, type the following information:
n n

2. 3. 4.

Block size as expressed in kilobytes (KB). The default block size is 1 KB. Tape length as expressed in either megabytes (MB) or gigabytes (GB). The default tape length is 2 GB. Click the MB button or the GB button to indicate megabytes or gigabytes. Click Import. To review the status of the import process, read the dbimport.out log file located in the same directory as the target path.

5.

If the imported database does not appear in the servers Databases folder, check the agent_servername.log file in the $INFORMIXDIR directory for the database server, where servername is the value specified by the DBSERVERNAME parameter of the ONCONFIG file.

Note
The imported database will not appear in the IBM Informix Neighborhood until you refresh the Databases folder that contains the new database. To refresh the folder, right-click on the servers Databases folder and select Task Refresh.

18-22 Using the Winpload Interface

Additional Courses Available


You may also be interested in taking the following course offered by IBM Informix: n IBM Informix Dynamic Server 7.x Administration

23

Now that you have successfully completed the Managing and Optimizing IBM Informix Dynamic Server 7.x Databases course, you might be interested in taking additional IBM Informix courses. In addition to the above course, a complete list of IBM Informix courses is provided in the course catalog. Your instructor will be happy to answer any questions you have about additional training.

Using the Winpload Interface 18-23

18-24 Using the Winpload Interface

Exercises

Using the Winpload Interface 18-25

Exercise 1
In this exercise you will load data into a table using the Load Table Wizard. Before beginning this exercise, obtain the hpl_tab.sql and hpl_tab.unl files from your instructor. The .sql file can remain on your NT machine. The .unl file must be placed on your server machine in your device directory. 1.1 With dbaccess or SQL Editor, use the file hpl_tab.sql to create table hpl_tab in your test
database.

1.2 If your IECC console is not still running, launch it from the Windows Start Menu by selecting Programs > Informix > IECC Console. Your previously saved .imc file will automatically be loaded. From the IBM Informix Neighborhood, connect to your database server. Make sure the server is still online. 1.3 Double-click the database server to display the Databases folder. 1.4 If your session from the previous exercise is still running, you need to refresh your database server to display the newly created table. Right-click the Databases folder and select Task > Refresh. 1.5 Use the Load Table Wizard to create a job named hpl_load to load the hpl_tab table.
w w w w w

Double-click the Databases folder. Double-click the test database. Double-click the Tables folder. The tables appear. Right-click the hpl_tab table and select Task>Load Table Wizard from the popup menu. The Load Table wizard appears.

Name the job hpl_load. Press Next. 1.6 Select the File(s) radio button. In the File Names text box, type the full pathname to the hpl_tab.unl file. Press Add . Press Next. 1.7 Use the default number and naming conventions for the input fields. Press Next. 1.8 Select the Delimited Format radio button. Use the default field and record delimiters. Press Next. 1.9 Use the default mapping. Press Next. 1.10 Select the Yes radio button to run the job when the wizard is finished. The job will also be saved at this time. Press Finish . 1.11 Observe the output in the Running Job window. Either a successful job or errors will be reported.If you want to check the success of your job, use dbaccess or SQL Editor to select data from hpl_tab. 1.12 On your server, see the log and reject files that are created.

18-26 Using the Winpload Interface

Exercise 2
In this exercise you will create an unload job. 2.1 Right-click the hpl_tab table and select Task > Unload Table Wizard from the popup menu. The Unload Table wizard appears. 2.2 Name the job hpl_unload. Press Next. 2.3 Select the File(s) radio button. Add two output devices named hpl_tab1.unl and hpl_tab2.unl. Press Next. 2.4 Use the default number and naming conventions for the output fields. Press Next . 2.5 Select the Delimited Format radio button. Use the default field and record delimiters. Press Next. 2.6 Use the default mapping. Press Next. 2.7 Select the Default Query radio button. This will unload all records from the table. Press Next. 2.8 Select the No radio button. This will save but not run the job. Press Finish. 2.9 To run the unload job, right-click the database server and select Task > Access Unload Jobs. 2.10 Highlight the hpl_unload job and select Run. 2.11 Observe the output in the Running Job window. Either a successful job or errors will be reported. 2.12 From your server, perform an ls -l on hpl_tab1.unl and hpl_tab2.unl . Notice that onpload buffered the output data to both devices. 2.13 Close the Running Job window. Close the Access Unload Jobs window.

Using the Winpload Interface 18-27

18-28 Using the Winpload Interface

Appendixes

Appendix A
Using the IPLoad Interface

Using the IPLoad Interface 09-2001 2001 International Business Machines Corporation

A-1

Objectives
At the end of this module, you will be able to: n List the steps in defining a load and an unload job n Explain the benefits of the Generate option n Perform a load using the Generate option

A-2 Using the IPLoad Interface

Starting the User Interface


The user interface is a UNIX graphical interface running under XWindows: n ipload

Starting the User Interface


The graphical user interface to the High Performance Loader is started with the ipload command. The ipload executable will start and connect to the IDS system that is specified with the INFORMIXSERVER environment variable. If the onpload database does not exist on this IDS system, it will be automatically created. If you wish to use the onpload database on a different IDS system, you can do so by choosing the Configure:Server menu option.

Using the IPLoad Interface A-3

GUI Tips

Saves values and exits Does not save and exits Saves but does not exit.

Brings up the help window

GUI Tips
The ipload GUI is intuitive once you understand some basics about its operation. Most ipload windows have one or more of the following icons:
n n n n

The OK icon saves the values entered in the window and exits the operation. The Apply button saves the values but does not exit. The Cancel button does not save any values and exits the operation. The Help button displays a help window.

A-4 Using the IPLoad Interface

GUI Tips (cont.)

Prints parameters Copies to another project Deletes a component Brings up a Notes Window

GUI Tips (cont.)


Standard buttons appear at the top of many windows in ipload. They may include:
n n n n

The Copy button copies the selected component (format, map, query, etc.) from one project to another. The Delete button deletes the selected component (format, map, etc.). The Print button prints the parameters for the selected item. Set the PRINTER environment variable to choose the printer destination. The Note button allows you to enter descriptive text for an item.

Using the IPLoad Interface A-5

Selecting a Project

Selecting a Project
The ipload utility groups mapping instructions, formats, filters, and queries into projects. The project window is the first window to appear when you start ipload. You should select a project before beginning any load or unload activity. The default project is always listed, and can be used instead of creating a new project. To create a new project, select the Configure:Project menu item. In the project window that appears, fill in the new project name. Then choose OK.

Ready to Go
Now you can start the unload or load configuration. Select Jobs and then select Unload or Load.

A-6 Using the IPLoad Interface

Selecting and Creating a Job

Enter load job name.

Select an existing job.

Selecting and Creating a Job


To create a load job, choose Jobs:Load. To create an unload job, choose Jobs:Unload. The window shown above appears. To create a new job, select the Create option. Then enter a job name. Select the OK button at the bottom of the window.

Using the IPLoad Interface A-7

The Load Job Window

The Load Job Window


The Load Job Window shows the components of a load. You can also see these components in the Components menu in the main window. They are:
n n n n n n

The Device component specifies the device array that holds the data to be loaded. The Table component lists the table that the data will be loaded into. A job may only load data into one table. The Format component specifies the format of the data being loaded. The Filter component specifies which records should be loaded and which should be ignored. The Map component specifies how fields should be loaded into table columns. The Options component specifies load options.

To create a component for a job, select the icon for the component, or choose the component from the menu in the main window. These components will be discussed in detail on the following pages.

A-8 Using the IPLoad Interface

The Device Array

Select Open for an existing device array, or Create for a new one.

The Device Array


The device array is a list of one or more tape devices, files, or pipes that hold the data. Using multiple devices will improve performance for large load or unload processes. In most cases, you will use the same device array for different load jobs. The Device Array window allows you to choose an existing device array, or to create a new one. To create a device array, select Create. Then enter the device array name and choose OK. To select an existing device array, choose Open, then choose a device array to open.

Using the IPLoad Interface A-9

Device Arrays
Devices can be: n Tapes n Disk n UNIX pipes

10

Device Arrays
Device arrays are the devices that the data is read from, or written to. The High Performance Loader achieves some of its parallelism by performing I/O for multiple devices simultaneously. In an unload operation, I/O is spread evenly across the devices in a device array. The user cannot specify which device to use for a particular piece of data. Similarly, in a load operation, the data is read simultaneously from devices and is fed to the database server in no particular order. The types of devices in a device array can be:
n

n n

Tapes - Any tape device that uses a standard UNIX driver can be used. When the load or unload job starts, the tape should be mounted. The operator is prompted when a new tape is needed. One or more tapes can be used for a load or unload operation. Disk - One or more disk files can be used for a load or unload operation. UNIX pipes - For load operations, you can capture data sent to standard-out from any UNIX process. Unload operations can send data to standard-in of any UNIX process. For operations that require faster performance, you can use custom drivers instead of UNIX pipes.

A-10 Using the IPLoad Interface

The number of external devices used can be varied. This offers the option of using as many external devices as necessary to match the speed of parallel I/O from the disks on which a fragmented table resides.

Using the IPLoad Interface A-11

Device Rules
n

Tape devices w Tapes must be mounted when the onpload job starts w Any tape devices that use standard UNIX device drivers are allowed (DAT, Cartridge, etc.) Files w Files must exist on the same system as the target database

12

Device Rules
Some rules about the devices that are used for a load or unload:
n

Tape devices - Tapes should be mounted (or inserted) and ready when the load or unload process starts. If there is no tape in the drive, a prompt will be issued. If multiple tapes are needed, you will receive a load request for subsequent tapes. Any tape devices that use standard UNIX device drivers are allowed. Files must exist on the same system as the target database. You may NFS mount the file from another system, however. The file will be created for you during an unload if it does not exist. Make sure you have enough disk space.

A-12 Using the IPLoad Interface

Creating a New Device Array

13

Creating a New Device Array


To create a new device array, choose an array item type ( Tape, File, or Pipe). If you choose the File type, enter the file name in the File Name field. If you choose the Tape item type, enter tape parameters ( Block Size and Tape Size and the scale of the tape size - megabytes or gigabytes). If you choose the Pipe option, enter the executable name in the File Name field. Then select Perform. When you have completed entering devices, choose OK.

Pipe Devices
If you are using the pipe device for an unload, onpload pipes the rows being unloaded to standard-in of the executable. If you use the pipe device for a load, onpload receives the records from standard-out of the executable.

Using the IPLoad Interface A-13

Defining Formats

14

Defining Formats
The Formats component defines a format for the load or unload file. There are several types of formats that can be specified, and the type you choose will determine what information you enter for each field.
n n

Fixed - Each field starts and ends at the same place for every record. Delimited - Each field is separated by a delimiter. You can specify the delimiter in the Delimiter Options window, in terms of hexidecimal, octal, control character, or ASCII name. COBOL - COBOL formats have their own specific field types. These field types are listed completely in the Guide to the High Performance Loader. A COBOL format is fixed.

After specifying the Create mode, the format type, and the format name, choose OK and the format entry window will appear.

A-14 Using the IPLoad Interface

Format Entries
Fixed

Delimited

COBOL

15

Format Entries
Depending upon the format type you choose, the format entry window will allow:
n

n n

Fixed - For a fixed type, enter the field name, the data type, and the number of bytes the field occupies in the file. Refer to the Guide to High Performance Loader for allowable data types. If you are converting floating point types to ASCII, you can also specify the number of places to the right of the decimal. Delimited - For a delimited file type, enter the field name and the type ( Chars, Blob Hex ASCII, or Blob file). COBOL - For a COBOL file type, enter the name, picture, and usage. Fill in the name, picture, and usage. For allowable picture entries, consult the Guide to the High Performance Loader.

Using the IPLoad Interface A-15

Filters (for Loads)

16

Filters (for Loads)


Filters can be used to pre-screen data being loaded into a table. For example, you may want to load only item rows where the quantity is greater than 100. The filter consists of a condition whose syntax is very much like conditions in an SQL statement. For information on what expressions are allowed, consult the Guide to the High Performance Loader. Generally, relational and logical operators are allowed (=, >, <, or, and), the BETWEEN condition, NULL, asterisk ( *), and question mark ( ?). To define a filter, choose the Filter icon. The Filter Views window appears. Then choose Create. The Filters window appears. Choose Create again and then the window above appears. Enter a filter name and the filter conditions you wish.

A-16 Using the IPLoad Interface

Map View

order_num order_date customer_num Customer Table customer_num lname fname company

ship_date Orders Table

Output File customer_num lname fname company order_num order_date

17

Map View
A map is the relationship between columns in a table and fields in a file. An unload map associates the columns in one or more tables with the fields in a file that the data would be unloaded to. A load map associates the fields in a file to the columns in a table.

Using the IPLoad Interface A-17

Creating a Map

18

Creating a Map
The map window has two sides: column to field mapping, and field to column mapping. There are two views of the map for cases where there are a large number of columns and fields. Without double mapping, lines could be drawn between fields that could not be shown in the window. The ipload utility will automatically map columns and fields with identical names together. To connect a column to a field, select a column, and holding the left mouse button, drag to the field on the right window. You can connect a field to a column by selecting a field and dragging to the column on the left window.

A-18 Using the IPLoad Interface

Other Mapping Options

19

Other Mapping Options


Select a field or column in the mapping window, and then click on the Option icon to get special mapping options. Some mapping options are: Justification Case Conversion Transfer Bytes Field Offset Picture Justify text to the left, center, or right of a field. Convert the case to upper, lower and proper name (the first letter in each word is capitalized). Transfer only a certain number of bytes. For example, you may want to truncate the last name if it is over 20 characters. The starting byte to transfer. For example, to skip the area code in a phone number, you may choose a field offset of 3 (offset starts at 0). Used for reformatting data using masks: packed decimal, alphanumeric, decimal, and dates. Refer to the Guide to the High Performance Loader for specifics about picture strings. A user supplied function that will be called for every record processed in the file.

Function

Using the IPLoad Interface A-19

Discard Records and Logfile

20

Discard Records and Logfile


If onpload cannot unload or load a record for some reason, it will place the record in a file whose name is specified in the Discard Records field. Enter the path and file name. Note that these records were discarded by the High Performance Loader, not the database server. If the database server cannot insert rows (during a load) due to violations detected, it will place these rows in the violation table (see discussion on violation detection in a subsequent chapter). The file specified in the Logfile field will store status messages for this particular load or unload operation.

A-20 Using the IPLoad Interface

Generating Formats and Maps

21

Generating Formats and Maps Automatically for Load Jobs


You can automatically generate a format and map for a load. You must specify an existing device array, or simply enter a file name. You must also specify a database and table where the data will be loaded. If you wish to filter out records from the load file, you must choose the Filter icon from the load window. If no filter is added, all records are loaded.

Using the IPLoad Interface A-21

The Unload Job Window

22

The Unload Job Window


The Unload Job Window shows the components in an unload job. You must supply each of the required components before a load can be executed:
n n n n n

The Query component is the SQL query that will extract data from the database. The Device component specifies the device array where the data will be stored. The Format component specifies the format the unloaded data will be stored as. The Map component specifies how columns will be unloaded into fields. The Options component specifies unload options.

To define information for each of these steps, select the icon for the step. You can tell which steps have been defined by seeing a name in the box to the right of the icon in the window above.

A-22 Using the IPLoad Interface

Generating Formats and Maps

23

Generating Formats and Maps Automatically for Unload Jobs


The automatic generation feature will automatically generate a format and map for a given table to be unloaded. From the Unload Job window, you can choose the Generate icon, or you can select Components:Generate from the main menu. The window shown above appears. To generate a format and map for an unload, you must supply either a table or a query. You must also specify a device array or a file for the destination of the unloaded data.

Using the IPLoad Interface A-23

A-24 Using the IPLoad Interface

Appendix B
Monitoring Load Operations

Monitoring Load Operations 09-2001 2001 International Business Machines Corporation

B-1

Objectives
At the end of this module, you will be able to: n Recognize which configuration parameters effect High Performance Loader n List the operational components of the High Performance Loader n Monitor the onpload process

B-2 Monitoring Load Operations

HPL Operations Overview


N Tape Devices

aio VP onpload data convert session thread IBM Informix Dynamic Server

aio VP data convert session thread

aio VP data convert session thread Exchange

IBM Informix Streams Interface

Scan thread

Scan thread M Fragments

HPL Operations Overview


For best performance with a large amount of data, use an array of tape devices and a fragmented table. For example, a simple unload of an M fragment table to an N tape device array is depicted above. There is a scan thread for each fragment in the table to read data in parallel. The scan threads use the light scan feature to retrieve rows in blocks. Multiple session, or sqlexec, threads are created to send data to onpload. The exchange partitions the rows to the session threads, which send the rows through a streams interface to the front-end. Streams are used to pass data from the server to the client to allow the use of large buffers. The exchange maps rows from M fragments to N tape devices. This allows the number of tape devices to be different than the number of fragments. The exchange partitions the scanned buffers according to the length of the stream queues so as to balance the load across the devices in the device array. The onpload client initiates a variable number of converter VP s and AIO VPs, depending on the HPL configuration parameters and job requirements. The converter VPs take buffered records off the streams, convert the data, and queue up buffers to be written out by the AIO VPs.

Monitoring Load Operations B-3

In order to keep up with the rate of data scanned in parallel from multiple fragments, a device array is used to write out the data in parallel. The device array consists of one or more devices; each of the devices can be a tape, disk, or pipe. The number of devices used in the device array does not have to be the same as the number of fragments in the table. On the client, processing for a load job is essentially the same as an unload job except in reverse. Deluxe and express load jobs are processed the same on the client. Server processing during a load job varies depending on whether the mode is deluxe or express. A deluxe load job processes in the same manner as an insert cursor. An express load job bypasses the <k keyword>SQL layer and the optimizer, and processes at the RSAM layer. Light appends occur during an express load job (see the following slides).

Note
IDS and onpload communicate via shared memory.

B-4 Monitoring Load Operations

The Onpload Client

Device Array

AIO VPs

AIO Buffers Conversion threads Stream Buffers DB Connect threads

The Onpload Client


The onpload client is a set of processes that handles the I/O to the device, converts the data, and passes the data to and from the database server. Onpload is a multi-threaded client. The threads running in onpload are: I/O threads Conversion threads These threads handle I/O for the devices for the load or unload job. There is one I/O thread for each device in the device array. These threads handle any conversion needed on the data. For example, a COBOL field may need to be converted to an integer column, or a packed decimal value may need to be converted to a floating decimal value. The number of conversion threads can be configured. These threads handle communication to the database server. There is one connect thread for each device in the device array.

Connect threads

Onpload uses shared memory to store data as it is being converted and moved to and from the device array. The Onpload shared memory is separate from IDS shared memory. Onpload shared memory holds the following information:

Monitoring Load Operations B-5

n n

AIO buffers: The AIO buffers are an intermediate storage area between the I/O threads and the conversion threads. The number and size of these buffers are configurable. The stream buffers are the intermediate storage area between the data conversion threads and the database connect threads.

As you can see from the diagram, the threads operate on a stream of data for each device in the device array. One stream may perform slower than another stream, depending upon the speed of the tape or disk device. Onpload processing is the same for deluxe and express load jobs. You can use onstat -j logfile to monitor memory usage and the execution of threads within the onpload client.

B-6 Monitoring Load Operations

Threads in the Database Server

Onpload client

sqlexec threads

Exchange

Insert threads

Threads in the IBM Informix Dynamic Server


The database server has parallel operations as well. There is one session started for every device in the load (use onstat -g ses to monitor sessions). During a load, the data passes from the onpload client to the sqlexec threads and then to the exchange, which determines which fragment the data belongs to. Then the data is passed to threads that create write requests. There is one insert thread for every fragment in the table up to the number of CPU VPs configured for the database server.

Light Scan
The light scan is used to increase the speed of reading a table during an unload. This operation reads a set of rows instead of a row at a time. The light scan reads data directly from disk bypassing the buffer cache. However, it does check the buffer cache for disk pages already in the cache. Usage of light scans is determined by the query optimizer. Light scans are not used for tables that contain blobs or rows that are greater than a page size. Also, light scans are not used when the isolation level is committed read and the table is not locked.

Monitoring Load Operations B-7

Light Append
The express mode data load bypasses the IDS buffer cache with a scheme called light append. Light append is faster than using the buffer cache, but the type of pages the server can write to are restricted. The database server will only use uninitialized pages; that is, pages that have never been used to store data for the table. For example, a new extent contains uninitialized pages until they are written to for the first time. Bit maps for the new pages are not updated until the end of the job. If the job terminates abnormally, the bit maps are not updated. Use onstat -g lap to monitor light append activity.

B-8 Monitoring Load Operations

Monitoring Onpload
Use onstat -j logfile to monitor onpload operations. n Displays monitoring information about a single instance of the client, onpload n Information displayed includes VPs, threads, queues, memory segments, and I/O statistics. n Displayed information is independent of the database server. n Used in interactive mode in the same way as onstat -i; use the -r option for repeat displays.

Monitoring Onpload
Use onstat -j logfile to monitor the operations of the onpload client. The logfile parameter is the logfile for the HPL job. Since multiple instances of onpload can execute simultaneously, onstat uses the log file to identify a particular instance. All information displayed refers to the onpload instance and not to the database server. For example:
# onstat -j lineitem_unload.log onstat> ath Threads:
tid 1 2 3 4 6 7 8 9 10 11 12 tcb e606ee0 e627e70 e642770 e643238 e6850b8 e6855d8 e6afdc0 e693498 e693e78 e6b7298 e6b7be8 rstcb 0 0 0 0 0 0 0 0 0 0 0 prty 2 2 2 4 4 4 4 4 4 4 4 status ready sleeping(Forever) sleeping(Forever) running cond wait(ses Q 1) cond wait(error Q 2) cond wait(idle:4) cond wait(rec Q 3) cond wait(rej WR -6) cond wait(drv RD -9) running vp-class 1cpu 4aio 5msc 6cpu 10cpu 8msc 9cpu 1cpu 1cpu 10cpu 1cpu name main_thread aio vp 0 msc vp 0 master sesnmgr errmgr ulworker recmgr driver sdriver convert

onstat>

Monitoring Load Operations B-9

onstat -j Options
onstat -j lineitem_unload.log onstat> ? Interactive Mode: One command per line, and - are optional. -rz repeat option every n seconds (default: 5) and zero profile counts MT COMMANDS: all Print all MT information ath Print all threads wai Print waiting threads act Print active threads rea Print ready threads sle Print all sleeping threads spi print spin locks with long spins sch print VP scheduler statistics lmx Print all locked mutexes wmx Print all mutexes with waiters con Print conditions with waiters stk <tid> Dump the stack of a specified thread glo Print MT global information mem <pool name|session id> print pool statistics. seg Print memory segment statistics. rbm print block map for resident segment nbm print block map for non-resident segments afr <pool name|session id> Print allocated pool fragments. ffr <pool name|session id> Print free pool fragments. ufr <pool name|session id> Print pool usage breakdown iov Print disk IO statistics by vp iof Print disk IO statistics by chunk/file ioq Print disk IO statistics by queue iog Print AIO global information iob Print big buffer usage by IO VP class sts Print max and current stack sizes opn [<tid>] Print open tables qst print queue statistics wst print thread wait statistics jal Print all Pload information jct Print Pload control table jpa Print Pload program arguments jta Print Pload thread array jmq Print Pload message queues, jms for summary only onstat>

B-10 Monitoring Load Operations

Configuration Parameters

CONVERTTHREADS 1 CONVERTVPS 6

Number of conversion threads per device Maximum number of conversion VPs

11

Configuration Parameters
The configuration parameters are in $INFORMIXDIR/etc/plconfig.std. You can use a custom configuration by setting the environment variable PLCONFIG. CONVERTTHREADS Number of converter threads per I/O device. Having more than one converter per device will, in general, allow the conversion phase to run faster, if CPU resources are available. Conversion can be a CPU intensive phase for complex conversion activities. Default: 1 CONVERTVPS Sets the number of convert virtual processors that will be started. The convert virtual processor runs the convert threads. CONVERTVPS should generally not be set to greater than the number of processors on the system. Too many convert VPs consume an excessive amount of system resource (CPU and shared memory), causing performance degradation. Default: single processor system: 1, multi-processor system: 50% of physical processors.

Monitoring Load Operations B-11

Configuration Parameters (cont.)

STRMBUFFSIZE 64 STRMBUFFERS AIOBUFSIZE AIOBUFFERS 4 64 4

Buffer size for server stream buffer (kbytes) Number of server stream buffers per device Buffer size for tape/file I/O (kbytes) Number of buffers for I/O per device

12

Configuration Parameters (cont.)


More HPL configuration parameters are listed below: STRMBUFFSIZE Stream buffers are the buffers used to pass data between the server and the onpload client. Larger buffers are more efficient because there is less overhead in moving buffers around. Range: depends on system resources. Min: 2 * system page size. Number of server stream buffers per device. The onpload utility sends data to the database server through a server stream. The server stream is a set of shared memory buffers. Memory for the buffers is allocated from the server. Recommended: (3* CONVERTTHREADS). Default: max of (4, 2*CONVERTTHREADS) The AIO buffers are buffers used to pass data between the converters and the I/O drivers. This is different from the device block size, if any, which is set in the device arrays.

STRMBUFFERS

AIOBUFSIZE

B-12 Monitoring Load Operations

AIOBUFFERS

Number of buffers for I/O per device. Range: >4. Default: max of (4, 2 *CONVERTTHREADS). Recommended: (3* CONVERTTHREADS).

Monitoring Load Operations B-13

Improving Performance
n n n

n n n

Increase hardware resources: CPU, memory, devices Increase number of table fragments Reduce conversion processing w No conversion: reorganize computer configuration w Fixed internal format: alter the schema of a table w External format: import/export data Tune configuration parameters Increase EXTENT SIZE Increase the commit interval

14

Improving Performance
Some of the actions you can perform to improve High Performance Loader performance are:
n

Increase hardware resources - In general, the performance of the HPL depends on the underlying hardware resources: CPU, memory, disks, tapes, controllers, and so on. Any of these resources could be a bottleneck, depending on the speed of the resource, the load on the resource, and the particular nature of the load or unload. During an unload, the same amount of data is directed toward each device in the device array. This means that the duration of a load may be limited by the slowest device. If the devices in your device array vary greatly in speed, you may wish to remove the slowest devices. Increase the number of table fragments - Fragmented tables increase parallelism in the database server. Without fragmented tables, the I/O to the target table could be a bottleneck.

B-14 Monitoring Load Operations

Reduce conversion - Data conversion is CPU intensive, so use the least amount of conversion needed to accomplish your task. If you are not changing the table schema, use a no conversion job. No conversion jobs can also be used when you are moving IBM Informix data between different computers (even if the internal byte representation is different). If you are changing the schema of the table, use fixed internal format (see the Generate menu on the main window). Fixed internal format loads/unloads data a column at a time in the IBM Informix internal format. Finally, if you are importing/exporting data from an external source, use the fixed, delimited, or COBOL external formats. Tune the configuration parameters in the plconfig.std file. Loads and unloads other than raw and fast-format ones are likely to be CPU intensive due to conversion overhead. In such cases, it may be beneficial to increase the number of conversion VPs and conversion threads. Ensure that there are at least two AIO and stream buffers per converter thread. The size of AIO buffers should be at least as large as the device block size, and the size of the stream buffers should be large (32 or 64 kilobytes). If your extent size is small during a load, repeated extent allocation may limit the speed of your load. Use the ALTER TABLE statement with the EXTENT clause to increase the size of your extents. Increase the commit interval - Performance improves with larger commit intervals. As the commit interval increases, more logical log space will be required.

Monitoring Load Operations B-15

B-16 Monitoring Load Operations

Appendix C
Advanced Features Availabe Through IPLoad

Advanced Features Availabe Through IPLoad 09-2001 2001 International Business Machines Corporation

C-1

Objectives
At the end of this module, you will be able to: n Describe the difference between an Express and a Deluxe Load job n Describe the purpose of violation records n Create and execute a deluxe load job n Explain the function of a no conversion job

C-2 Advanced Features Availabe Through IPLoad

Load Options
n

n n n n n n

Load mode w Deluxe: constraint checking, no table locking w Express: disables constraints and indexes, table lock Generate Violations Records Number of tapes Number of records Start record Max errors Commit Interval (deluxe mode only)

Load Options
Some load options can be changed by selecting the Options icon. The load options are:
n

n n n n n n

Load mode - Deluxe mode does complete constraint checking and does not lock the table. Express mode disables constraints and indexes, and locks the table. Express mode is generally faster in a large load. Generate Violations Records - populate the violations and diagnostics tables with records that violate constraints or unique indexes. Number of tapes - Enter the number of tapes that contain the source data. The onpload utility will prompt for more tapes, depending upon this value. Number of records - Enter the number of records to load. Start record - Record number in the source file to begin with. Max errors - The maximum number of errors that can occur before the load stops. Note that these errors do not include constraint errors. Commit Interval - The number of rows loaded between COMMIT statements. This option is only valid for the deluxe load mode.

Advanced Features Availabe Through IPLoad C-3

Express Load Mode


n n n n n n n n

Faster performance than deluxe mode No logging Archive must be performed after the load to enable write access (logged databases only) Constraints, indexes, and triggers are set to DISABLED until after the job completes Blobs are not supported Row length must be less than a page Table is exclusively locked for the duration of the load Load proceeds as a single transaction and is rolled back in the event of a system crash

Express Load Mode


The express mode is designed to be more efficient for loading larger amounts of data.
n

Indexes, constraints, and triggers are automatically disabled during the load. After the load completes, the indexes, constraints, and triggers are automatically re-enabled. This means that all indexes will be re-built (parallel index builds will happen automatically). The equivalent SQL statements for disabling and enabling database objects are:
SET CONSTRAINTS, INDEXES, TRIGGERS FOR table_name TO DISABLED; SET CONSTRAINTS, INDEXES, TRIGGERS FOR table_name TO ENABLED;

No logging occurs for data inserted during an express mode. This increases performance and helps prevent a long transaction from occurring (the load is considered one transaction). Only new page allocations are written to the log. If the database is logged, then the loaded table is not made available for write access until after an archive is performed. If the database is not logged, then no archive is necessary. To minimize archive time, perform express loads on all necessary tables and then perform an archive.

C-4 Advanced Features Availabe Through IPLoad

Blobs are not handled because of the extra processing overhead required to handle blobs larger than a buffer size.

Warning!
When the constraints and indexes are enabled after an express load, violating rows will be copied to the violations and diagnostics tables ( tabname_via and tabname_dia where tabname is replaced by the actual table name). The violating rows are not deleted from the table and, if violated, constraints or indexes will not be re-enabled. For more information on database objects and object modes, reference the Managing and Optimizing IBM Informix Dynamic Server Databases training manual and the IBM Informix Guide to SQL. You may choose not to generate violations records by selecting No for the Generate Violations Records option in the Load Options window. Re-enabling a trigger does not cause a trigger to be fired for the rows loaded. You are responsible for ensuring that processing associated with triggers is carried out for the new rows. A constraint cannot be disabled when other enabled constraints refer to it. For example, a primary key constraint cannot be disabled by the HPL if a foreign key constraint in another table refers to it. The HPL will return an error in such a case. You must disable any foreign key constraints which refer to the target table prior to loading in express mode.

Advanced Features Availabe Through IPLoad C-5

Deluxe Load Mode


n n n n n n

Fully functional load mode Data is logged Constraints, indexes, and triggers are set to FILTERING WITHOUT ERROR Blobs are supported Commit interval is chosen by the user Table is not locked

Deluxe Load Mode


The deluxe load mode is more efficient for loading smaller amounts of data because indexes are not disabled. In the deluxe load mode:
n n

Constraints, indexes, and triggers are set to FILTERING WITHOUT ERROR. Any rows not inserted are written to the violations and diagnostics tables. Data is logged during the load. The number of rows inserted between COMMIT WORK statements can be specified in the Commit Interval option in the Load Options window.

The onpload client operates the same for both express and deluxe load modes. Object modes (enabled, disabled, filtering) are handled by the IDS server.

C-6 Advanced Features Availabe Through IPLoad

Note
There is no definitive recommendation for when to use deluxe mode versus express mode. The express mode will probably be faster when a large amount of data is being loaded. However, at the end of an express mode load, time is required to enable constraints and to perform an archive. In a situation where a large amount of data is to be loaded into several tables and there are few constraints, express mode would probably be faster. Little time would be needed for enabling constraints, and the archive could be postponed until after the last load.

Advanced Features Availabe Through IPLoad C-7

Browsing Records

Browsing Records
The browsing feature can be used to review source records in the format you specify. Use this option prior to a load to verify that the format has been defined properly. You can also view rejected records or violations that occurred in the database server. To show the browse window, choose the Browsers menu option . Then choose Records, Violation , or Logfile. The window shown above is a sample browse window for a load file (Records option).

C-8 Advanced Features Availabe Through IPLoad

Violations Table Browser

Violations Table Browser


The Violations Table Browser lists the contents of the tablename_ vio table. This table holds constraint violations encountered by the server.

Advanced Features Availabe Through IPLoad C-9

Log File Browser

10

Log File Browser


The log file browser lists the contents of the log file, and whose name is specified in the load or unload window. The log file shows diagnostic or informational messages about a load or unload job.

C-10 Advanced Features Availabe Through IPLoad

Connecting to an Active Job

11

Connecting to an Active Job


A load or unload job may run for hours when there is a large amount of data or complicated mapping or lookups involved. You may want to reconnect to a running job after a period of time. To do this, choose the Connect icon on the Load Job Select window or the Unload Job Select window.

Advanced Features Availabe Through IPLoad C-11

No Conversion Jobs
n n n n n n

No formats, maps, or filters Defined from the Components/Generate/No Conversion menu options on the main window Uses IBM Informix internal format Referred to as fast job, raw load, or no conversion job Used for unloading and loading tables with no schema changes Can be used across heterogeneous environments

12

No Conversion Jobs
No conversion jobs, or raw loads, yield the fastest performance because there is no formatting, mapping, or filtering. The entire row is unloaded in the internal IBM Informix format. A log file and a reject file are not generated. Define the job using the Components/Generate/No Conversion menu options, and then run the job using the normal select job windows. You should use raw loads when unloading and reloading table data without any schema changes. Raw loads can be used across different platforms.

C-12 Advanced Features Availabe Through IPLoad

Appendix D
The System Catalog

The System Catalog 09-2001 2001 International Business Machines Corporation

D-1

Objectives
At the end of this module, you will be able to: n Use the system catalog as a source of information about the database environment

D-2 The System Catalog

The System Catalog: Data About Your Data

System Catalog:
systables syscolumns sysindexes systabauth syscolauth sysdepend syssynonyms sysusers sysviews sysconstraints syssyntables ...

database

create database rivers

The system catalog is a collection, or catalog, of system tables that describe the structure of a database. Each table in the system catalog contains specific information about structural elements in the database. The system catalog tables contain information about the tables, columns, and indexes of the database. In addition, they track views, authorized users, levels of access, synonyms, and constraints. The system catalog is automatically generated every time a new database is created. System catalog tables will be located in the dbspace in which your database is created.

The System Catalog D-3

Automatic Maintenance
Rivers Database

Add new table

New Table Allocation


river_length

System Catalog:
database server systables syscolumns sysindexes systabauth syscolauth sysdepend syssynonyms sysusers sysviews sysconstraints syssyntables

Update system catalog tables

create table river_length

The system catalog is maintained automatically and transparently by the database server. For example, as you create tables for your database, two changes occur in the database directory:
n n

New space for each table is created. The system catalog tables are updated to reflect the addition of each new table.

Every addition and modification of tables, columns, and indexes causes the system catalog to be updated. This is true of other changes as well, such as changes to views, synonyms, access privileges, and constraints.

D-4 The System Catalog

Querying the System Catalog


tabname owner partnum tabid rowsize ncols nindexes nrows created version tabtype locklevel npused fextsize nextsize flags site dbname river_length laura 16777241 100 24 2 2 0 07/09/1991 5 T P 2 16 16 0

select * from systables where tabname="river_length"

One very important characteristic of the system catalog is that it is stored in normal database tables. Therefore, system catalog tables can be queried like any other database table. The ability to use standard query methods to access the system catalog allows the Database Administrator or user to become familiar with the structure and characteristics of a database quickly.

The System Catalog D-5

SYSTABLES

Column Name tabname owner partnum tabid rowsize ncols nindexes nrows created version tabtype locklevel npused fextsize nextsize flags site dbname

Type char (18) char (8) integer serial smallint smallint smallint integer date integer char (1) char (1) smallint integer integer smallint char (18) char (18)

Explanation name of table owner of table tblspace of identifier internal table identifer row size number of columns number of indexes number of rows date created table version number table type lock mode for table (B, P=Page, R=Row) number of pages in use size of initial extent of kbytes size of all subsequent extents in kbytes reserved for future use reserved for future use reserved for future use

The systables table describes each table in the database. It contains one row for each table, view, and synonym defined in the database. This includes the system catalog tables themselves. Whenever a new table, view, or synonym is created, a new entry is automatically added to the systables table. Table names are stored only in the systables table. Elsewhere, tables are referred to by their tabid, or table identifier. Therefore, one important function of the systables table is to provide a link between table names and tabids. Tabids are assigned sequentially and are unique. System catalog tables are numbered between 1 and 100, and user tables are numbered from 100. The partnum field is used in IDS shared memory to identify a table; it is the tblspace number (tblsnum) referenced in the onstat output. If you want to know the correspondence between tblsnum and its table, you can select this information from the systables table. Because the tblsnum is always printed in hexadecimal, you will have to select the partnum value in hex format. This can be done simply via the query:
SELECT tabname, hex(partnum) FROM SYSTABLES;

D-6 The System Catalog

Note
To increase speed, IDS does not update the nrows field after an addition to or deletion from a table. This field is only updated by the UPDATE STATISTICS command. The information in the nrows field is used to optimize queries, and it is recommended that the UPDATE STATISTICS command be run periodically, especially after mass additions to or deletions from a table.

The System Catalog D-7

SYSCOLUMNS

Column colname tabid colno coltype collength colmin colmax

Type char (18) integer smallint smallint smallint integer integer

Explanation column name table identifer column number column type column length second minimum second maximum

The syscolumns table describes each column in the database. Each row contains a column name, the tabid of the table, the sequence number of the column within the table, the type of column, and the physical length. Column numbers are sequentially assigned by the system from left to right within each table. Column types, represented in the coltype field, can be the following:
0 = char 1 = smallint 2 = integer 3 = float 4 = smallfloat 5 = decimal 6 = serial 7 = date 8 = money 10 = datetime 11 = byte (<pn product name>On<pn product name>Line only) 12 = text (<pn product name>On<pn product name>Line only) 13 = varchar (<pn product name>On<pn product name>Line only) 14 = interval 15 = char 16 = nvarchar

D-8 The System Catalog

If the coltype field contains a value that is greater than 256, that column does not allow null values. To determine the data type for this kind of column, subtract 256 from the value. For example, if a column had a coltype value of 258 you would subtract 256 to get 2, which indicates that the column is an integer column with no nulls.

Note
The colmin and colmax column values hold the second-smallest and secondlargest data values in the column, respectively. These columns contain values only if the column is indexed, and you have run UPDATE STATISTICS. The values are used by the optimizer to develop efficient query plans. For nonnumeric columns, colmin and colmax contain the initial 4 bytes of the minimum and maximum values, respectively.

The System Catalog D-9

Calculating the Column Size


money or decimal: (precision x 256) + scale varchar or nvarchar: (min-space x 256) + max-size datetime or interval: (length x 256) + (largest qualifier x 16) + smallest qualifier

10

You can find the size of any column created for a table by querying the collength column of syscolumns. Some column sizes, however, need to be translated by applying a formula. The formulas needed are listed in the slide above. For example, here is how to interpret a length of 4098:
16 R 2 256 4098 - 256 1538 -1536 2

The result of dividing 4098 by 256 is 16 with a remainder of 2. This translates to: Total number of digits: 16 Number of decimal places: 2

D-10 The System Catalog

SYSINDEXES

Column Name idxname owner tabid idxtype clustered part1 part2 part3 ... part16 levels leaves nunique clust

Type char (18) char (8) integer char (1) char (1) smallint smallint smallint smallint smallint integer integer integer

Explanation index name owner of index table identifer index type (U=unique, D=dups) clustered or non-clustered (C=clustered) column number of a single index or 1st composite index 2nd component of a composite index 3rd component of a composite index 16th component of a composite index number of B+ tree levels number of leaves number of unique keys in the first column degree of clustering (small numbers greater clustering)

11

The sysindexes table describes each index on a column in the database. Each row contains an index name, the owner, the tabid of the table, the index type, information about whether or not the index is clustered, and the column numbers of the columns of the index. Fields part1 through part16 identify the columns upon which each index is based. Because only 16 columns can be stored for each index in IDS databases, the maximum number of columns that can be used in a composite index is 16. The sysindexes table can be queried to get index information for a particular table. The SELECT statement below returns index information for the items table in the stores database:
SELECT sysindexes.* FROM sysindexes,systables WHERE tabname = "items" AND systables.tabid = sysindexes.tabid

Note
The columns levels, nunique, and clust are only updated by the UPDATE

STATISTICS command. The information in these fields is used to optimize queries, and it is recommended that the UPDATE STATISTICS command be run periodically, especially after mass additions to, or deletions from, a table.
The System Catalog D-11

SYSFRAGMENTS

Column Name Type


fragtype tabid indexname colno partn strategy location servername evalpos exprtext exprbin exprarr flags dbspace levels npused nrows clust char(1) integer char(18) smallint integer char(1) char(1) char(18) integer text byte byte integer char(18) smallint integer integer integer

Explanation
fragment type: 1=index; 2=table table identifier index identifier blob column identifier physical location identifier (partnum) distribution type: R=round robin; E=expression; reserved for future use (=L) reserved for future use position of fragment in the fragmentation list text of expression binary version of expression range partitioning data internally used dbspacename for fragment number of B+ tree index levels # of data pages (table); # of leaf pages (index) # of rows (table); # of unique keys (index) degree of index clustering (smaller => greater

12

The system catalog table sysfragments stores fragment information for fragmented tables and fragmented indexes. There is a row for each fragment in the sysfragments table. The strategy type T is used for attached indexes (eg. index fragmentation is the same as the table fragmentation).

D-12 The System Catalog

Sysfragments and Systables


fragtype tabid indexname colno partn strategy location servername evalpos exprtext ((rcp_st_cd exprbin exprarr flags dbspace levels npused nrows clust T 131 0 3145730 E L 1 )= ) AND (rcp_st_cd (= GA ) ) <BYTE value> <BYTE value> 0 acf2dbs 0 485796 1943183 0 The partnum of the fragment is in the sysfragments table. Note the tabid for the fragment.

Table name for the fragment. SELECT * FROM systables WHERE tabid = 131 tabname owner partnum tabid rowsize ncols nindexes nrows created version tabtype locklevel npused fextsize nextsize flags rcp_cust_account informix 0 131 402 53 1 9951369 08/12/1994 8650757 T R 2487844 1800000 180000 0 Note partnum is always fragmented tables. 0 for

The System Catalog D-13

SYSFRAGAUTH

Column Name grantor grantee tabid fragment fragauth

Type char(8) char(8) integer char(18) char(6)

Explanation grantor of privilege grantee (receiver) of privilege table ID of table that contains name of the dbspace where is stored 6 byte pattern specifying fragment privileges: u=update, i=insert,

14

Users may be granted insert, update and delete privileges on a fragment level basis. The sysfragauth table stores information about the privileges that are granted on table fragments. If a code in the fragauth column is lowercase, the grantee cannot grant the privilege to other users. If a code in the fragauth column is uppercase, the grantee can grant the privilege to other users. The following is an example of the GRANT FRAGMENT command:
GRANT FRAGMENT INSERT,UPDATE,DELETE ON items(dbspace2) TO joe

D-14 The System Catalog

SYSDISTRIB

Column Name tabid colno seqno constructed mode resolution confidence encdat

Type integer smallint integer date char(1) float float char(256)

Explanation table ID column number sequence number for multiple entries date distribution was created optimization level: L=low, M=medium, H=high resolution from UPDATE confidence from UPDATE ASCII-encoded histogram in fixedcharacter field; accessible to user

15

The sysdistrib table stores data distribution information created during execution of the UPDATE STATISTICS command with mode MEDIUM or HIGH. Distributions are not created when the LOW mode (default setting) is used. Data distributions provide detailed column information to the optimizer to improve the choice of execution paths for SQL SELECT statements.

Note
Sysdistrib entries which exist prior to the execution of UPDATE STATISTICS

with the LOW mode setting are not deleted and will continue to be used by the optimizer. You may wish to manually delete sysdistrib rows when reverting back to using UPDATE STATISTICS LOW.

The System Catalog D-15

SYSUSERS

Column Name username usertype priority password

Type char(8) char(1) smallint char(8)

Explanation name of database user D=DBA, R=RESOURCE, G=ROLE reserved for future use reserved for future use

16

The sysusers table identifies the database-level privileges. Each row contains the name of a user and that user's database privileges. The privileges are defined by the type of user as follows:

CONNECT RESOURCE DBA


Access the database Create/Alter/Drop own table Create/Alter/Drop own indexes Alter system catalog tables Grant/revoke database privilege Drop/start/roll forward database yes no no no no no yes yes yes no no no yes yes yes no yes yes

If you wish to check the privileges of a user named joe , you could use the following query:
SELECT usertype FROM sysusers WHERE username="joe"

Database privileges can be changed with the GRANT and REVOKE statements, but only by a user with DBA privileges.

D-16 The System Catalog

A row in sysusers may refer to a role. When a role is created, a row is added to the sysusers table and the role name is stored in the username column:
CREATE ROLE sales

The System Catalog D-17

SYSTABAUTH

Column Name grantor grantee tabid tabauth

Type char(8) char(8) integer char (8)

Explanation grantor for permission grantee (receiver) of table identifier authorization type

18

The systabauth table identifies the table-level privileges. Each row contains the grantor and grantee (receiver) of the privileges, the tabid of the table, and the type of authorization. The authorization type is represented by the following eight character code in the tabauth field:
s u * i d x a r
select update col level auth insert delete inde x alter references

If a privilege has been granted, the appropriate letter appears in the tabauth field. Otherwise, a hyphen appears, indicating that the privilege has not been granted. If privileges have been granted or revoked on the column level, an asterisk appears in the third character position. If the tabauth code is in uppercase, the user granted this privilege can also grant it to others. If the tabauth code is in lowercase, the user granted this privilege cannot grant it to others. For example, if a user has select, update, and insert privileges for a table, but cannot grant these privileges to others, the corresponding tabauth code will be su-i----.
D-18 The System Catalog

SYSCOLAUTH

Column Name grantor grantee tabid colno colauth

Type char(8) char(8) integer smallint char(3)

Explanation grantor for permission grantee (receiver) of permission table identifier column number authorization type

19

The syscolauth table identifies the column-level privileges. Each row contains the grantor and grantee (receiver) of the privileges, the tabid of the table, the number of the column within the table, and the authorization type. The authorization type is represented by the following two character code in the colauth field:
s u r
select update references

If a privilege has been granted, then the appropriate letter will appear in the colauth field. Otherwise, a hyphen will appear, indicating that the privilege has not been granted. If the colauth code is in uppercase, the user granted this privilege can also grant it to others. If the colauth code is in lowercase, the user granted this privilege cannot grant it to others. Column privileges can be changed using the GRANT and REVOKE commands. As an example, if you wished to grant update privileges to emily on column fname of table customer , you would use the command:
GRANT UPDATE(fname) ON customer TO emily;

The System Catalog D-19

SYSVIEWS

Column Name tabid seqno viewtext

Type integer smallint char(64)

Explanation table identifier line number of the SELECT portion of SELECT statement view

20

The sysviews table describes each view defined in the database. Each row contains the tabid of a view and a portion of the SQL statement that created that view. Each view is determined by a SELECT statement that returns the table that defines the view. The SELECT statement is stored in the viewtext field of the sysviews table. If the SELECT statement is longer than 64 characters, the SELECT statement will be broken into sections. Each section will have its own entry in the table. The seqno field designates the order of the sections of the SELECT statement. If you wanted to see the SELECT statement that was used to create a view whose tabid was 108, you could use the following query:
SELECT * FROM sysviews WHERE tabid=108;

which might return:

D-20 The System Catalog

tabid108
seqno 0 viewtext create view "laura".custview (firstname,lastname,company,city) a tabid 108 seqno 1 viewtext s select x0.fname ,x0.lname , x0.company ,x0.city from "laura".cu tabid 108 seqno 2 viewtext stomer x0 where (x0.city = 'Redwood City' ) with check option;

The System Catalog D-21

SYSDEPEND

Column Name btabid btype dtabid dtype

Type integer char(1) integer char(1)

Explanation tabid of base table or view base object type (T=table, V=View) tabid of dependent table dependent object type (V=View); only currently implemented

22

This table describes how views depend on other views or tables. Each row contains the tabid of a view and the tabid of a table or view upon which it depends. For example, one entry might be:
btabid btype dtabid dtype 101 T 103 V

This means that the view whose tabid is 103 depends on the base table whose tabid is 101. In other words, view 103 contains one or more columns from table 101.

D-22 The System Catalog

SYSSYNTABLE

Column Name tabid servername dbname owner tabname btabid

Type integer char(18) char(18) char(8) char(18) integer

Explanation table identifier server where base table is database where base table is owner of base table name of base table tabid of base table or view

23

The syssyntable table maps each synonym with the object it represents. Each row contains the tabid of a synonym and the tabid of the table to which the synonym refers. If you wanted to find all of the synonyms for a table called stock, you could use the following queries: First, find the tabid of the table using:
SELECT tabid FROM systables WHERE tabname="stock"

Then, list the tabid(s) of the synonym(s) using:


SELECT tabid FROM syssyntable WHERE btabid=table_tabid

Finally, match each synonym tabid with its name using:


SELECT tabname FROM systables WHERE tabid=synonym_tabid

These three SQL statements may be combined into the following nested SELECT statement:
SELECT tabname FROM systables WHERE tabid= (SELECT tabid FROM syssyntable WHERE btabid=

The System Catalog D-23

(SELECT tabid FROM systables WHERE tabname="stock"))

A synonym for a remote table will list the location and name of that remote table in the servername, dbname, and tabname columns.

D-24 The System Catalog

SYSCONSTRAINTS
Column Name constrid constrname owner tabid constrtype Type serial char(18) char(8) integer char(1) Explanation sequential constraint name user name of table identifier C=check P=primary key R= referential U=unique N=not null index name

idxname

char(18)

25

The sysconstraints table records constraints put on columns in database tables. The constraint types are: check, primary key, referential (foreign key), unique, and not null. Each row of the sysconstraints table contains a constraint name, the owner, the tabid of the table upon which the constraint applies, and the name of the index that defines the constraint. More than one constraint can be associated with an index. A constraint may be added using the CREATE TABLE or ALTER TABLE commands.

The System Catalog D-25

SYSREFERENCES

Column Name constrid primary ptabid updrule delrule matchtype pendant

Type integer integer integer char(1) char(1) char(1) char(1)

Explanation constraint identifier constrid of corresponding primary tabid of primary keys reserved for future use (=R) C=cascade delete; R=restrict reserved for future use (=N) reserved for future use (=N)

26

The sysreferences table lists the referential constraints placed on columns in the database. It contains a row for each referential constraint in the database. Referential constraints will also have a row placed in the sysconstraints table.

D-26 The System Catalog

SYSCHECKS

Column Name constrid type seqno checktext

Type integer char(1) smallint char(32)

Explanation constraint identifier stored as (B=binary, T=ascii text) line number text of the check constraint

27

The syschecks table holds the text of the check constraint. If the text is longer than 32 characters it will be stored in multiple rows. Each check constraint also has an entry in the sysconstraints system catalog table. The executable version of the check constraint is stored in the syschecks table.

The System Catalog D-27

SYSCOLDEPEND

Column Name constrid tabid colno

Type integer integer smallint

Explanation constraint identifier table that the constraint belongs to column that is specified in the

28

The syscoldepend table tracks all columns that are involved in a check constraint. One row is created in the syscoldepend table for each column included in the constraint. Since a check constraint can include more than one column in a table, syscoldepend can contain multiple rows for one check constraint.

D-28 The System Catalog

SYSDEFAULTS

Column Name tabid colno type

Type integer smallint char(1)

Explanation table identifier column that has a default value default type: L=literal U=USER C=CURRENT N=NULL T=TODAY S=SITENAME (dbservername)

default

char(256)

the literal default value

29

The sysdefaults table holds default values for columns. In order for a column to have a default, you must assign it one using the CREATE TABLE or ALTER TABLE statements. For example,
ALTER TABLE orders MODIFY order_date DATE DEFAULT TODAY

The System Catalog D-29

SYSPROCEDURES

Column Name procname owner procid mode retsize symsize datasize codesize numargs

Type char(18) char(8) serial char(1) integer integer integer integer integer

Explanation procedure name owner of procedure procedure identifier D=DBA, O=OWNER, compiled size (in bytes) of values compiled size (in bytes) of symbols compiled size (in bytes) of constant compiled size (in bytes) of instruction code number of procedure arguments

30

The sysprocedures table lists the characteristics for each stored procedure in the database. It contains one row for each procedure. A database server can create special-purpose stored procedures for internal use. The sysprocedures table identifies these protected procedures with the letter P in the mode column. You cannot modify or drop protected stored procedures, or display them through dbschema .

D-30 The System Catalog

SYSPROCBODY

Column Name Type procid integer datakey char(1)

seqno data

integer char(256)

Explanation procedure identifier data descriptor type: D = user document text T = procedure text R = return type list S = procedure symbol table L = constant procedure data (literals or strings) P = interpreter instruction code line number actual text of the procedure

31

The sysprocbody table holds both the compiled version of the stored procedure and the text version of the stored procedure. The text version (datakey = T) is only used by dbschema to list the stored procedure. The compiled version (datakey = P ) is the code that gets executed by the procedure.

The System Catalog D-31

SYSPROCPLAN

Column Name Type procid integer planid integer datakey char(1) seqno integer created date datasize integer data char(256)

Explanation procedure indentifier plan identifier either D=dependency list or Q=query line number of the plan date plan created size (in bytes) of list or plan encoded list or plan

32

The sysprocplan table holds two things that are needed for the execution of a procedure:
n

Execution plan (or query plan) w An execution plan is the way an SQL statement will be executed. It is the best path chosen by the optimizer. The query plan will be updated when UPDATE STATISTICS is run, or when an entry in the dependency list is changed. Dependency list w The dependency list holds items that, if changed, will signal that the execution plan should be updated. It is reviewed before every procedure is executed. For example, if an index that is used in the execution of an SQL statement inside of a procedure is dropped, the execution plan should be updated. The first time the procedure is executed after an item in the dependency list changes, the execution plan will be updated.

There may be more than one row in sysprocplan for every procedure plan. The seqno is used to order the rows.

D-32 The System Catalog

SYSPROCAUTH

Column Name grantor grantee procid procauth

Type char(8) char(8) serial char(1)

Explanation grantor of procedure grantee (receiver) for procedure procedure identifier type of permission granted: e = execute E = execute and ability to grant execute others

33

The sysprocauth table describes the privileges granted on a stored procedure. It contains one row for each set of privileges granted.

The System Catalog D-33

SYSTRIGGERS

Column Name trigid trigname owner tabid event old new mode

Type SERIAL CHAR(18) CHAR(8) INT CHAR(1) CHAR(18) CHAR(18) CHAR

Explanation Trigger ID Trigger name Owner of trigger ID of triggering table Triggering event Name of value before update Name of value after update Reserved for future use

34

The systriggers table holds information about the trigger. trigid trigname owner tabid event old new mode Holds the unique id of the trigger. It is the unique key of the table. Is the trigger name given in the CREATE TRIGGER statement. Is the login of the trigger owner. Is the id of the table that is part of the trigger event. It corresponds to tabid in systables . Is the triggering event: I = INSERT, U = UPDATE, D = DELETE. Is the correlation name specified in the CREATE TRIGGER statement after the OLD keyword. Is the correlation name specified in the CREATE TRIGGER statement after the NEW keyword. Is not used at this time.

D-34 The System Catalog

Indexes
Indexes are on the following columns:
n n

trigid trigname and owner

The System Catalog D-35

SYSTRIGBODY

Column Name trigid datakey seqno data

Type INT CHAR INT CHAR(256)

Explanation Trigger ID Type of data Sequence number English Text or code

36

The systrigbody system catalog table holds the actual code that is used by the database server to execute the trigger. trigid datakey Is the trigger id that corresponds to trigid in the systriggers table. Is the type of data contained in the data column: D - English text for the first half of the CREATE TRIGGER statement A - English text for the second half of the CREATE TRIGGER H - linearized code for the header S - linearized code for the symbol table B - linearized code for the body seqno data Is the sequence number to order the data columns. It is a unique number relative to the trigid and datakey column. Is the actual English text or code. The index for systrigbody is on trigid, datakey and seqno.

D-36 The System Catalog

SYSBLOBS

Column Name spacename type tabid colno

Type char(18) char(1) integer smallint

Explanation blobspace, dbspace or family media type (M-magnetic, table indentifier column number

37

The sysblobs table specifies the storage location of a BLOB column. It contains one row for every BLOB column in the table. A BLOB can be stored in one of the following locations:
n n n

dbspace: BLOBs that are stored in the table are in the same dbspace as the table is assigned to. blobspace: BLOBs can be assigned their own blobspace. family: BLOBs stored on optical media are assigned to a family. A family is a group of volumes.

The System Catalog D-37

SYSOPCLSTR

Column Name owner clstrname clstrsize tabid blobcol1 . . blobcol16 clstrkey1 . . clstrkey 16

Type char(8) char(18) integer integer smallint . . smallint smallint . .

Explanation owner of the cluster name of the cluster size of the cluster table identifier blob column number 1 . . blob column number 16 cluster key number 1

cluster key number 16

38

The sysopclstr table identifies the optical clusters that have been created using the CREATE OPTICAL CLUSTER statement. An optical cluster consists of BLOB columns and a cluster key.
n n

BLOB columns are the columns that will be clustered on the same volume. The BLOB columns must be on the same table. Up to 16 columns are allowed. A cluster key is the key that will separate BLOBs into clusters. It can be a composite key up to 16 columns. The cluster key must be in the same table as the BLOB columns in the cluster.

D-38 The System Catalog

SYSOLEAUTH

Column Name rolename grantee is_grantable

Type char(user size) char(user size) char(1)

Explanation name of role grantee (receiver) of role Y=grantable; N=not

39

The sysroleauth table describes the roles that are granted to users. It contains one row for each role that is granted to a user. The is_grantable column indicates whether the role was granted with the WITH GRANT OPTION on the GRANT statement. Roles are available beginning with IBM Informix-OnLine Dynamic Server version 7.10.UD1. When a role is created, a row is added to the sysusers table. When a role is granted to a user, a row is added to the sysroleauth table. The appropriate row is deleted from the sysroleauth table when a role is revoked from a user.

The System Catalog D-39

SYSOBJSTATE

Column Name Type objtype char(1) owner char(8) name char(18) tabid integer state char(1)

Explanation C=constraint, I=index, T=trigger owner of the database object name of the database object table identifier upon which object is D=disabled E=enabled F=filtering, with no integrity violation G=filtering with integrity violation

40

The sysobjstate system catalog table stores information about the state (object mode) of database objects. The database objects which have multiple modes are constraints, indexes, and triggers. These are the only database objects tracked within the sysobjstate table. When a user creates an object, a row is added to the sysobjstate table. Objects created by the database server, such as indexes to support referential integrity, are not listed in the sysobjstate table because their object mode cannot be changed. An object may be disabled, enabled, or set to filtering (except for triggers and indexes which allow duplicates). If set to filtering, then violations are stored in the violations and diagnostics tables.

D-40 The System Catalog

SYSVIOLATIONS

Column Name Type targettid integer viotid integer diatid integer maxrow integer

Explanation table ID of the base table table IDof the violations table table IDof the diagnostics table maximum number of rows which can inserted into the diagnostics table in single operation

41

When a START VIOLATIONS FOR TABLE tablename command is executed, a row is inserted into the sysviolations system catalog table. The row is deleted when the STOP VIOLATIONS FOR TABLE tablename command is issued. A START VIOLATIONS command will begin error logging against database objects which are in enabled or filtering mode. Errors are recorded in two tables: a violation table and a diagnostic table. The entry which is written to the sysviolations table keeps track of the association between the base table and the violation and diagnostic tables. It also records any limits which are specified for the number of entries to be stored per operation in the violations table. For example:
START VIOLATIONS FOR TABLE items MAX ROWS 1000;

The System Catalog D-41

System Catalog Summary


System Catalog tables: n Store data about the database n Serve as a good source of information about the database environment

42

The system catalog tables are used to store data about the database. They are useful in finding out about the characteristics of a particular database.

D-42 The System Catalog

Appendix E
System Monitoring Interface

System Monitoring Interface 09-2001 2001 International Business Machines Corporation

E-1

The Sysmaster Database


n n n n n n n n n n n n n

sysdatabases - Databases in the IDS system systabnames - Tables within a database syslogs - Logical log information sysdbspaces - Dbspace information syschunks - Chunk information syslocks - Lock information sysvpprof - VP information syssessions - Session information syssesprof - Session level profile information sysextents - Extent information syschkio - I/O statistics by chunk sysptprof -Tblspace profile information sysprofile - System profile information

The sysmaster database consists of over 50 tables. Of these tables, only some tables and some views are supported and documented by IBM Informix. For your protection, only the supported tables and views should be used in any programs as the unsupported tables may change between any IDS release. Some of the supported tables and views are:
n n

The sysdatabases table lists databases, owner and characteristics of the database. The systabnames table contains the names of all tables in the IDS system. To retrieve all tables in a database, run:
SELECT tabname FROM systabnames WHERE dbsname = db_name

n n n n n n n

The syslogs view contains information about the logical logs. You can use syslogs to determine if the logs need to be backed up. If size = used, then the log is full. The sysdbspaces view contains information about dbspaces. The syschunks view contains the chunks in the IDS system. The nfree column shows the number of pages in the chunk that are free. The syslocks view lists all active locks. The sysvpprof view contains all the active virtual processors. The syssessions view lists information about each session. The syssesprof view contains more information about each session.

E-2 System Monitoring Interface

n n n

The sysextents view lists extents allocated in the IDS system. The syschkio view contains I/O statistics by chunk. The sysptprof view lists information about the tblspaces at any one point in time. Only tables currently being used are listed in this view. Once the last user closes the table, the tblspace structure in shared memory is freed, and subsequently any profile statistics are lost. The sysprofile view lists certain events in the IDS system such as disk reads, disk writes, roll backs, checkpoints, and so on. Each row contains one profiled event and its value.

System Monitoring Interface E-3

How SMI Works

SELECT * from syslocks


IDS recognizes syslocks as an SMI table, and reads from shared memory instead of disk/buffer pool.

Shared Memory

When a SELECT statement is executed on a regular table, IDS reads the data dictionary information for the table to find the partition number and other information about the table. Then it will access the data from disk, if it is not in the buffer pool. When a SELECT statement is executed on an SMI table, IDS still reads the data dictionary information for the table listed in the SELECT statement. The SMI tables have a special partition number (the dbspace number within the partition number is 0). When IDS detects the special partition number, it knows to read a specific set of data in shared memory to satisfy the query. Because the SELECT statement is accessing real-time data in shared memory, the data between one SMI table and another may not be synchronized.

E-4 System Monitoring Interface

sysdatabases name partnum owner created is_buff_log is_ansi is_nls flags

Description Name of the database Tblspace id for the database Login name of the creator Date created 1 if buffered logging 1 if ansi mode 1 if NLS enabled Flags for logging mode

syslogs number uniqid size used is_used is_current is_backed_up is_new is_archived

systabnames partnum dbsname owner tabname collate

Description Tblspace id for the table Database name Login name of the creator Table name Locale to use for collation (for NLS)

is_temp flags sysvpprof vpid class usercpu syscpu

Description Log number Unique id of the log Log size in pages Pages used in log 1 if log is used 1 if log is the current log 1 if log has been backed up 1 if log has been added since the last archive 1 if log has been placed on the archive tape 1 if log is flagged as a temporary log file Current state of the log Description Virtual Processor id number VP class Seconds of user CPU consumed Seconds of system CPU consumed Description Database the lock is held on Table name the lock is held on Rowid the lock is on (0 means table lock) Key number the lock is on Type of lock (IS,S,IX,SIX,X....) Session id of the lock owner Session id of the first waiter on the lock

sysdbspaces dbsnum name owner fchunk nchunks is_mirrored is_blobspace is_temp flags

Description Dbspace number Dbspace name Login name of the creator Primary chunk number Number of chunks in the dbspace 1 if dbspace is mirrored 1 if dbspace is a blobspace 1 if dbspace is a temporary dbspace Current state of the dbspace

syslocks dbsname tabname rowidlk keynum type owner waiter

System Monitoring Interface E-5

syschunks chknum dbsnum nxchknum chksize offset nfree is_offline is_recovering is_blobchunk is_inconsistent flags fname mfname moffset mis_offline mis_recovering mflags

Description Chunk number Dbspace number Next chunk in this dbspace Chunk size in pages Offset into the device in pages Number of free pages in the chunk 1 if the chunk is offline 1 if the chunk is being recovered 1 if the chunk belongs to a blobspace 1 if the chunk is part of a logical recovery Flags indicating the state of the dbspace Chunk path name Path name of mirror (if it exists) Offset of the mirror chunk in pages 1 if mirror is off-line 1 if mirror is being recovered Current state of the mirror chunk Description Database name Table name Physical page address of start of extent Number of pages in the extent

sysptprof sysptprof dbsname dbsname tabname tabname partnum partnum lockreqs lockreqs lockwts lockwts deadlks deadlks lktouts lktouts isreads isreads iswrites iswrites isrewrites isrewrites isdeletes isdeletes bufreads bufreads bufwrites bufwrites seqscans seqscans pagreads pagreads pagwrites pagwrites syschkio chknum reads pagesread writes pageswritten mreads

Description Description Database name name Database Table name Table name Tblspace number Tblspace number # Lock # requests for table Lock requests for table # Lock # waits table Lockfor waits for table # Deadlocks for table # Deadlocks for table # Lock # timeouts for table Lock timeouts for table # Read calls calls #function Read function # Write#function calls calls Write function # of updates # of updates # Deletes for thisfor table # Deletes this table Buffer reads thisfor table Buffer for reads this table Buffer writes for table Buffer writes for table Sequential scans for table Sequential scans for table # Page#reads Page reads # Page#writes Page writes Description Chunk number Number of physical reads Number of pages read Number of physical writes Number of pages written Number of physical reads from mirror Number of pages read from mirror Number of physical writes to mirror Number of pages written to mirror

sysextents dbsname tabname start size

mpagesread mwrites mpageswritten

E-6 System Monitoring Interface

syssessions sid username uid pid hostname tty connected feprogram

Description Session id Login of user User id of the client process Process id of the client Host name of the client tty name of the client Time that the user connected (in UNIX time() format) Application program running as the client (not available in this release) Session pool address 1 if waiting on a latch(mutex) 1 if waiting on a lock 1 if waiting on buffer 1 if waiting on a chckpnt 1 if waiting on a log buffr 1 if waiting on a transctn 1 if session is an IDS monitor process 1 if primary thread in a critical section Current state of the user session Description DR server type DR server state DR server name DR buffer flush interval Network timeout (sec) Lost/found file path name

syssesprof sid lockreqs locksheld lockwts deadlks lktouts logrecs isreads iswrites isrewrites isdeletes iscommits isrollbacks longtxs bufreads bufwrites seqscans pagreads pagwrites total_sorts dsksorts max_sortdiskspace logspused maxlogsp

Description Session id Number of locks requested Number of locks currently held Number of times session waited for a lock Number of times a deadlock was detected Number of remote deadlock timeouts Number of log records for this session Number of row reads (includes catalog) Number of row writes Number of row rewrites (updates) Number of row deletes Number of commits Number of rollbacks Number of long transactions by this session Number of reads requiring disk access Number of page modifications Number of sequential scans Number of page reads Number of page writes Number of total sorts Number of sorts not in memory Maximum space used by a sort Logspace used by this transaction (bytes) Max logspace used by this sid (bytes)

pooladdr is_wlatch is_wlock is_wbuff is_wckpt is_wlogbuf is_wtrans is_monitor is_incrit state

sysdri type state name intvl timeout lostfound

System Monitoring Interface E-7

sysconfig cf_id cf_name cf_flags cf_original cf_effective cf_default

Description Unique numeric identifier Configuration parameter name Flags Value in ONCONFIG at boot-time Value effectively in use Value by default

sysprofile name srtspmax totsorts value

Description Name of profiled event Maximum disk space required by a sort Total number of sorts performed Value of profiled event of possible values

sysseswts sid reason numwaits cumtime maxtime

Description Session id Description of reason for wait Number of waits for this reason Cumulative time waited for this reason in microseconds Maximum time waited during this session for this reason

E-8 System Monitoring Interface

Appendix F
Working With IBM Informix Customer Support

Working With IBM Informix Customer Support 09-2001 2001 International Business Machines Corporation

F-1

Objectives
At the end of this module, you will be able to: n Understand the Problem Resolution Process n Work efficiently with an IBM Informix Support Engineer n Assign the appropriate priority status to your case n Remain informed of product updates, enhancements, and defects n Escalate problem cases n Plan for new versions of IBM Informix products

F-2 Working With IBM Informix Customer Support

IBM Informix Customer Support

Working to ensure: n Immediate connection to a Customer Support Engineer n Quick problem resolution

Whenever you contact IBM Informix for technical support, you expect your case to be resolved effectively and quickly. IBM Informix Customer Support understands that. IBM Informix constantly measures performance against stated service metrics in areas such as:
n n n

Time to reach an engineer Time to resolve a case The number of OpenLine calls answered in under one minute.

Rememberthe engineer on the other end of the phone or e-mail is there to get you back on your way. This module is designed to help you work more effectively with IBM Informix Customer Support whenever you have a problem.

Working With IBM Informix Customer Support F-3

The Customer Service Handbook

The Customer Services Handbook provides: n Time-saving worksheets n Instructions for setting case priorities n Important phone numbers n ACD Phone Menu options

Read the Customer Services Handbook


If available for your region, an IBM Informix Customer Services Handbook is your reference tool for working with the IBM Informix technical support organization. In addition to explaining IBM Informixs support, training, and consulting offerings, the handbook also contains:
n n n n

Worksheets for identifying problems Instructions for setting case priorities Important phone numbers Automatic Call Distribution (ACD) phone menu options (OpenLine)

In the Americas, IBM Informix automatically mails new OpenLine customers a welcome kit that contains the Customer Services Handbook. In North America, you can get additional copies of the handbook by calling IBM Informix Customer Services at 1 800 274 8184 and selecting Option 3. Latin American customers can call 1 800 550 8284 using AT&T Direct. An electronic version of the handbook is also available at IBM Informixs on-line support site, www.informix.com/techinfo. IBM Informix-OpenLine is IBM Informixs basic telephone-based support offering, enabling customers to reach IBM Informix Customer Support about technical problems.

F-4 Working With IBM Informix Customer Support

Characterizing Your Problem


The Problem Identification Worksheet ensures you have: n IBM Informix product and version information n IBM Informix serial number n Platform, OS, and OS version information. Additional questions to ask yourself: n Can I reproduce the problem on demand? n Why does the problem occur? Possible causes? n What recent changes may have caused the problem? n Can I produce a simple test case? Can I reproduce using the IBM Informix demonstration database?

Begin to characterize the problembefore you call The best start you can give to the case resolution process is to gather as much information as possible about your problem before you call. First, the Customer Support Engineer requires system information: the IBM Informix product and version exhibiting the problem, the IBM Informix serial number, and the platform (machine model), the OS, and the OS version on which the product is running. There is a Problem Identification Worksheet in the IBM Informix Customer Services Handbook you can use to ensure you have captured important and relevant diagnostic information. Having answers to these additional questions can speed up diagnosis and resolution:
n n n

Can I reproduce the problem on demand? Have I thought about why this problem occurs? What are the possible causes? Can I isolate them and test them individually? What has changed that caused the problem? Think about possible changes in the system, network, database, and application.

Working With IBM Informix Customer Support F-5

Have I created the most simple test case that will reproduce the problem? Does it reproduce on the stores demonstration database? Does it reproduce with only simple SQL, instead of requiring complex custom code, applications, or tables?

F-6 Working With IBM Informix Customer Support

Automatic Call Distribution (ACD)

OpenLine customers use the ACD phone menu option. When calling, ensure that you have your support entitlement identifier, which could be one of the following: n Support Contract Number n Product Serial Number n Support Access Number

Select the correct product from the ACD menu When you call an IBM Informix call center to open a technical support case, you may be greeted by an Automatic Call Distribution (ACD) menu. Select the appropriate product or category about which you are calling. Your call will be routed directly to an available Customer Support Engineer. The ACD menu is documented in the Customer Support Handbook . For the most current ACD menu, please visit the TechInfo Center Web site at www.informix.com/techinfo. At any time, you may select a menu option to direct your call to an IBM Informix customer service representative or you may ask to speak with a manager.

Tip
Have your support entitlement identifier ready. This may be a support contract number, product serial number, or support access number (SAN). The support engineer will ask you for this information before they begin to address your problem.

Working With IBM Informix Customer Support F-7

The Problem Resolution Process


1. Contact and Qualify 2. Characterize 3. Diagnose 4. Classify 5. Bug Resolution 6. Case Resolution

Know the six steps in the Problem Resolution Process IBM Informix developed the Problem Resolution Process as a standard method for resolving all technical support cases. This process is now part of IBM Informixs ISO 9000-based quality system. Knowing the six steps in the Problem Resolution Process will help you understand the stages that your case goes throughfrom your initial call to final resolution. Throughout the resolution process, check in from time to time with the support engineer who is working on your case. This will assist the support engineer in gathering any additional information that may be useful in resolving your problem. The six steps to the Problem Resolution Process are: 1. 2. 3. 4. 5. 6. Contact and Qualify Characterize Diagnose Classify Bug Resolution Case Resolution

These steps are detailed on the following pages.

F-8 Working With IBM Informix Customer Support

1. Contact and Qualify


Customer Support will ask for your name, get information on how to contact you after the initial call, and request your support entitlement identifier. You and the support engineer will work together to assess the priority of your case. The priority is based on the impact of the problem on your business operations.

2. Characterize
Give the Customer Support Engineer the details of the difficulty at hand. The engineer may ask questions to assist in getting the details needed. It may be necessary in this phase to provide the engineer with a test case that will reproduce the problem.
n n

The test case should represent only the problem at hand. To reduce the time needed in the diagnostic phase, please reduce the test case to only the steps needed. Include everything needed to execute the test case. Attempt to reproduce the problem using the stores demonstration database. If this is not possible, the database schema and test data will need to be included as part of the test case.

3. Diagnose
Your participation may or may not be required during this phase. If a test case was sent, the test case will be examined. However, many times, changes to the environment of your machine may be needed to correct the problem. These types of problems are generally specific to your machine, and testing will need to be done on your machine, consisting mostly in changes to environment settings.

4. Classify
This step is usually the last one. This is where the answer to the problem is found. This may be a workaround; for example, how to correctly configure the environment for what you are attempting to do.

5. Bug Resolution
If a bug (product defect) is found:
n n

If the defect is known and a fix has already been completed, you can be issued an interim release or a patch. If the defect is new, it is submitted to Research and Development for a fix.

Hint: When a fix is requested, the case can take additional time to close.

6. Case Resolution
Working With IBM Informix Customer Support F-9

This is the verification process. The Customer Support Engineer ensures that this issue is closed. Customer Support does this by getting verbal agreement from the customer to close the case.

Tip
Make sure that you give the IBM Informix engineer working on resolving your case a reliable way to communicate with you. Perhaps you have a mobile phone, pager, or a colleagues phone number you could give to the Customer Support Engineer, in case they need to reach you quickly. You can also request that the Customer Support Engineer send you periodic status updates via e-mail.

F-10 Working With IBM Informix Customer Support

Assigning Priority Settings


n n n

Priority Four: Priority Four cases are typically how-to questions. Priority Three: A Priority Three case can be an error code recovery. Priority Two: Priority Two cases, such as poor database performance or production application errors during runtime, can negatively impact your business but have not brought you to a standstill. Priority One: A Priority One case means a crisis has occurred your system is down, a major operational function is unavailable, or a critical interface has failed.

11

Assign the proper priority If the database world were to have an equivalent of life or death situations, priority setting would be it. IBM Informix Customer Support tracks customer cases by the priority level you set. Priority levels are defined by the business impact a problem has on your operations. If you do not set a priority for your case, your case priority will automatically be set to a default, or Priority Three case. IBM Informix Customer Support Engineers can help you determine the priority level of your problem and will attempt to characterize, resolve, or provide a workaround in a time frame that limits the impact on your business. It is important that you understand the priority levels when you open a case with Customer Support. When available for the region, customers should review the Customer Services Handbook for specific information about setting priority levels. They are defined as follows:
n n n

Priority Four: Priority Four cases are typically how-to questions. Priority Three: A Priority Three case can be an error code recovery. Priority Two: Priority Two cases, such as poor database performance or production application errors during runtime, can negatively impact your business but have not brought you to a standstill.

Working With IBM Informix Customer Support F-11

Priority One: A Priority One case means a crisis has occurredyour system is down, a major operational function is unavailable, or a critical interface has failed.

Important!
If you have a down system, call your IBM Informix Customer Support Center to speak with an emergency support engineer about your critical issues.

F-12 Working With IBM Informix Customer Support

Dial-Up Access and Confidentiality Agreement

The Dial-Up Access and Confidentiality Agreement grants IBM Informix permission to directly access your system in an emergency or temporary situation

13

Sign a Dial-Up Access and Confidentiality Agreement The Dial-Up Access and Confidentiality Agreement grants IBM Informix permission to directly access your system in an emergency or temporary situation: for example, to perform root cause analysis, observe a problem in real time, or fix a problem. To conduct a dial-up analysis, Customer Services must have a signed Dial-Up Access and Confidentiality Agreement on file. In most cases, this agreement requires legal review and approval. To facilitate rapid response in a Priority One situation, please request and complete a form from your local IBM Informix support office. The Dial-Up Access and Confidentiality Agreement form can also be found in the IBM Informix Customer Services Handbook.

Working With IBM Informix Customer Support F-13

Extended Hours Support


24 x 7 Emergency Recovery Service n Customer Support Engineers primary goal is to bring your database server on-line. n Diagnostics are performed during regular business hours n After-Hours Support n Customer Support Engineer is available for all problems, including diagnostic work.

14

Insure you have the right extended hours support coverage IBM Informix call centers are open during regular business hours. If you are running a missioncritical application, or you anticipate special support needs outside of normal business hours, you should consider an extended hours support option. Extended hours support is like an insurance policy for your database. Each year, you should evaluate your needs for technical support outside of business hours. Perhaps you have some development projects or some migration work you will be doing. Dont get stuck without the support you need when the call center is closed.

Tip
The North American call centers are open 7 AM to 7 PM CST. The Latin American call centers are open 7 AM to 7 PM EST.

F-14 Working With IBM Informix Customer Support

Extended Hours support options:


n

24 x 7 Emergency Recovery Service In order to reach support engineers who handle only down production system emergencies during non-business hours, you should consider the 24 x 7 Support option. If you contract for 24 x 7 Support, you open emergency cases outside of IBM Informix business hours by calling a toll-free telephone number provided in your contract. With 24 x 7 Support, the Customer Support Engineer has one main goal: to bring your database server instance back online. Diagnostic follow-up occurs during normal business hours. IBM Informix customers with 24 x 7 Support coverage call an emergency telephone number that automatically routes their down system call using a model called Followthe-Sun. The Follow-the-Sun call-routing model is a system IBM Informix developed to automatically answer the call in an open support hubone of four worldwide. In order to ensure that each of the hubs has up-to-the-minute access to your support records, all IBM Informix support locations use one common case-tracking system. This enables IBM Informix to seamlessly hand over cases, if required, to the next hub in the FollowThe-Sun chain. When you initiate a 24 x 7 Support contract, you will be provided with an additional Support Access Number (SAN). When you dial the 24 x 7 Support telephone number in order to resolve an emergency production situation, the support engineer who answers your call will first ask you for your 24 x 7 SAN in order to verify that you are eligible for 24 x 7 Support. In order for the support engineer to resolve your critical situation as quickly as possible, 24 x 7 Support customers are highly encouraged to maintain a signed Dial-Up Access and Confidentiality Agreement on file with IBM Informix Customer Services. After-Hours Support After-Hours Support is designed for customers that require extended support beyond the normal business hours for a specific project or a short period of time, including IBM Informix holidays and weekends. An After-Hours Support agreement is supplementary to any support agreement you have with IBM Informix. After-Hours support is more flexible that 24 x 7 Emergency Recovery Service support, as support can include diagnostic work. After-Hours support must be requested prior to use. If you request after-hours support more than one business day in advance, a discount is applied. Premier Support Services IBM Informix offers two Premier Support Services: Regency Services and IBM Informix-Enterprise Support.

Working With IBM Informix Customer Support F-15

The services include access to a dedicated Account Manager, who manages the relationship between the customer and IBM Informix support by delivering escalation of key issues, problem avoidance through pro-active support, coordination of IBM Informix resources for problem resolution, and case management. Regency Services offers multiple tiers to match the customer's needs. The entry level service provides case status reports, regular reports of urgent technical information, and management of critical issues. Higher levels of service provide more of the account manager's time, support for multiple projects, and bundled services such as training and on-site consulting. IBM Informix-Enterprise support offers similar account management services, plus a dedicated advanced support engineer at the customer site who provides the most responsive level of technical support including diagnostic and system recovery services. Further details can be found in the IBM Informix Customer Services Handbook or web site (http://www.informix.com/informix/services/csp/sup_bro/custsupp.htm). Contact your local IBM Informix Customer Support Manager to discuss these services.

F-16 Working With IBM Informix Customer Support

TechInfo
The TechInfo center provides electronic (web-based) access to: n technical alerts n special support offerings n product release information n defect reports, and n lifecycle information for planning purposes. You can also use TechInfo Center to open a technical support case.

17

Log in to TechInfo Center regularly All IBM Informix support customers and ICPP professionals are entitled to access TechInfo Center, IBM Informixs on-line technical support and information service, at www.informix.com/techinfo. In TechInfo Center you have electronic access to a wealth of information, including:
n n n n n

technical alerts special support offerings product release information defect reports, and lifecycle information for planning purposes.

You can also use TechInfo Center to open a technical support case. You can set up access to TechInfo Center by completing the TechInfo Center Order Form in the North America Customer Support Handbook . You can also select enroll from the menu options at www.informix.com/techinfo and complete the form on-line.

Working With IBM Informix Customer Support F-17

Tech Notes and CS Times


Tech Notes n For users who wish to keep up to date on the latest trends and technical information about IBM Informix products CS Times n Brings IBM Informix support customers the latest information about Customer Services programs and products
18

Read Tech Notes and CS Times Both Tech Notes and CS Times are published by IBM Informix Customer Services for customers. Tech Notes is IBM Informixs quarterly technical journal for users who wish to keep up to date on the latest trends and technical information about IBM Informix products. CS Times is a quarterly newsletter designed to bring IBM Informix support customers the latest information about Customer Services programs and products, as well as relevant company and industry information. Printed versions are mailed quarterly to support subscribers on record with IBM Informix Customer Services. On-line versions can be accessed at any time from the TechInfo Center Web site at www.informix.com/techinfo.

F-18 Working With IBM Informix Customer Support

Note
Tech Notes is currently looking for article submissions from experiences technical professionals. If you have an idea for a publication, contact cspubs@informix.com.

Working With IBM Informix Customer Support F-19

Case Escalation
The case escalation process may be instigated for a number of reasons, including: n need for additional resources n requirement for different skill set n justified concern that the case is not being handled effectively n a change in the scope of responsibility needed for case resolution, or an internal shift of the case for load balancing reasons.

20

Escalate if necessary Cases may be escalated by you, a support engineer, or a support manager. The case escalation process may be instigated for a number of reasons, including:
n n n n n

need for additional resources requirement for different skill set justified concern that the case is not being handled effectively a change in the scope of responsibility needed for case resolution, or an internal shift of the case for load balancing reasons.

If you choose to escalate a case, you should call your local IBM Informix support location and ask to speak with a manager. Please have the specific case number ready and a clear description of your reason for escalation. You may re-open a case if the recommendation or workaround fails to satisfy your requirements. To re-open a case, phone Customer Support and give the support engineer the relevant case number.

F-20 Working With IBM Informix Customer Support

The Latest and Greatest Version


Upgrade for success!
n

Patches and fixes are only available to customers running the latest or the previous maintenance/ enhancement release Upgrades are free to all current maintenance customers
21

Keep your IBM Informix versions current To keep your system running as smoothly as possible, please install version upgrades as often as you can. Version upgrades benefit you by helping you to avoid any problems that IBM Informix has already resolved for other customers. Best of all, upgrades are free to all current maintenance customers. Installing product upgrades also entitles you to request product patches and fixes. The IBM Informix Maintenance Delivery policy restricts the availability of patches and fixes to only those customers who are running the latest or the previous maintenance/enhancement release. For example, consider the release of IBM Informix Dynamic Server 7.31. Patches and fixes for the previous maintenance/enhancement release (version 7.30 in this example) are only available to customers for 12 months after the maintenance/enhancement release (version 7.31) is made available on a particular platform.

Working With IBM Informix Customer Support F-21

Use the Product Lifecycle for planning purposes Planning in advance to move to the latest major or enhancement release of a product, such as version 7.2 and 7.3 and version 7.30 and 7.31 respectively, also keeps the maintenance of your IBM Informix products current. IBM Informix maintains the Product Lifecycle Policy and corresponding product matrix in TechInfo Center on the Web. The product matrix, which is updated every January and July, reflects IBM Informixs plans to sell, enhance, and retire its products. Please refer to the Product Lifecycle section of TechInfo Center for more details about the Maintenance Delivery and Product Lifecycle Policies, or for the latest product lifecycle matrix.

F-22 Working With IBM Informix Customer Support

Appendix G
Using Global Language Support

Using Global Language Support 09-2001 2001 International Business Machines Corporation

G-1

Objectives
At the end of this module, you will be able to: n Set environment variables necessary for Global Language Support n List the components of a locale n Use the NCHAR and NVARCHAR data types n Explain the effect of collation sequence on various SQL statements

G-2 Using Global Language Support

Global Language Support


Global Language Support (GLS) provides support for: BE030 n International characters (non-ASCII) n Localized collation sequence n National currency symbols and format n Local date format n Local time format n Code set conversion

Global Language Support (GLS) provides support for international cultural and language conventions. This feature is available starting with the 7.2 release of IBM Informix Dynamic Server. Using GLS:
n n

n n

All user-specifiable objects such as tables, columns, views, statements, cursors, and variables, may be identified with national code sets including multibyte code sets. You have the option of using a localized collation sequence by using the data types NCHAR and NVARCHAR instead of CHAR and VARCHAR. The localized collation sequence is used in the ORDER BY and GROUP BY clauses of the SELECT statement and when an index is created on an NCHAR or NVARCHAR column. It is also used in the WHERE clause whenever logical, relational, or regular expression operators are used. Monetary formats can be used which reflect the language or cultural specifics of a country or culture outside the U.S. Different code sets can be specified for client applications, the database, and the database server. A process called code set conversion translates the characters passed from one locale to another.

Using Global Language Support G-3

What is a Locale?
A locale is a language environment composed of: n A code set n A collation sequence n A character classification n Numeric (non-money) formatting n Monetary formatting n Date and time formatting n Messages Define a locale with an environment variable. For example: setenv CLIENT_LOCALE ja_jp.sjis

A GLS locale represents the language environment for a specific location. It contains language specifications as well as regional and cultural information. A locale consists of a code set, a collation sequence, formatting specifications for numeric, money, date, and time values, and message definitions. You can define separate locales for the client application, the database, and the database server. The three environment variables which you can set are:
n n n

CLIENT_LOCALE DB_LOCALE SERVER_LOCALE

The specification of a locale defines the GLS behavior. No other flags or environment variables need to be set. The default locale for either the application, database, or database server is US 8859-1 English ( en_us.8859-1). Locale Naming Convention A locale name is composed of the following set of identifiers: language, territory, and code set. The language and the territory identifiers are each two characters separated by an underscore. The code set identifier is the suffix and is prefaced with a period. The client and database locales must be the same (unless there is code set conversion, which is explained in the following

G-4 Using Global Language Support

pages). The optional, 4 character modifier specifies an override collation sequence such as phone or dictionary ( phon or dict). Acceptable modifiers are listed along with the locale names when you run glfiles (see following pages). The locale name syntax is as follows:
LocaleN ame language _ territory

code_set @modifier

Messages If you desire error and warning messages in a language other than English, install an IBM Informix Language Supplement for a particular language. To reference pre-existing error messages in non GLS directories, use the DBLANG environment variable.

Using Global Language Support G-5

A Locale Specifies a Code Set


ASCII Code Set example:

ASCII Codes (decimal) 65 67 77 69 32 67 111

ASCII Symbol

All data accessed by computers is represented by a series of 1s and 0s. Non-binary data, such as a letter or symbol, must have a unique binary code to be recognized by the computer. A set of character codes used to represent all the characters and symbols in a language is called a code set. A code set is a mapping of characters to their binary representations. The ASCII code set is composed of 128 symbols including lower and upper case letters, digits 0-9, and various additional symbols such as {,/,+, and (. One byte of storage is required for every character in the ASCII code set. Since the maximum number of symbols which can be represented by one byte is 256, many languages are able to extend the ASCII code set beyond the standard 128 symbols and still use only one byte of storage per character. A locale specifies a particular code set. The default code set used by IBM Informix databases is ISO8859-1 (ASCII is a subset of ISO8859-1). The ISO8859-1 code set utilizes 8 bits, whereas ASCII utilizes 7 bits.

G-6 Using Global Language Support

Multibyte Code Sets


Logical Character Representation

A1

A2

C1

C2

C3

Physical Storage Representation

One byte of storage can represent a maximum of 256 symbols. If a code set defines more than 256 characters, some characters require more than 1 byte of storage. These are referred to as multibyte code sets. Some Asian languages use thousands of characters, some of which require 2 or 3 bytes of storage. IBM Informix GLS supports multibyte code sets using up to 4 bytes of storage per symbol. In an environment that uses multibyte code sets, character strings may contain a mixture of single-byte and multibyte characters. All character data types (CHAR, VARCHAR, NCHAR, and NVARCHAR) can accommodate multibyte characters. The diagram above illustrates the physical storage of multibyte characters and the corresponding logical characters. For example, A1A 2 represent the first and second bytes of a logical character which we can designate as A. B and D are single-byte characters, and C is a multibyte character that requires 3 bytes of storage. Most IBM Informix utilities (e.g. onstat and dbschema ) and application programming interfaces (ESQL/C, ESQL/COBOL) have been updated to be compatible with multibyte code sets.

Using Global Language Support G-7

Using Multibyte Code Sets


3 logical characters and 6 bytes of storage A1
1

A2
2

B1
3

B2
4

C1
5

C2
6

Column length: must accommodate physical length CREATE TABLE gls_test( multi_col CHAR(6) ...) Substring designators: operate on physical length multi_col[1,2] = displays logical character A multi_col[2,4] = displays logical character B
8

Column lengths, substring offsets, and substring lengths are defined in terms of the physical number of bytes, not the logical number of characters. When defining column length in a multibyte code set environment, make allowances for the maximum number of bytes each character can require. If the multibyte maximum is 2 bytes per character, the maximum length for any of the character data types would be:
2 * (maximum number of logical characters)

You may want to use VARCHAR, NVARCHAR, or TEXT if the number of characters is variable. Substring Designators Substring designators specify the byte offset and length of a portion of a string. They operate on physical storage, not on logical characters. For example, in a string composed of 2 byte characters, the expression multi_col[1,2] retrieves the first character A1A 2. It is possible to retrieve a partial character with a substring designator. For example, if mult_col contains the string A 1A 2B1B2C 1C 2, the expression mult_col[2,4] retrieves logical character B and partial characters A 2 and C 1. The database server resolves partial characters by returning white spaces. This behavior is exhibited by data types CHAR, VARCHAR, NCHAR, and NVARCHAR.

G-8 Using Global Language Support

For data types BYTE and TEXT, the database server returns all bytes without partial character replacement. Substring designators should be used only when it is possible to determine the physical location of the logical character(s) desired.The SQL functions LENGTH, OCTET_LENGTH, and CHAR_LENGTH can be used to determine the physical and logical lengths of strings in columns. Function LENGTH returns the string length in bytes minus trailing white spaces. Function OCTET_LENGTH returns the number of bytes and, function CHAR_LENGTH returns the number of logical characters. SQL Identifiers GLS allows you to use any alphabetic characters of a code set to form most SQL identifiers (names of tables, columns, views, indexes, etc.). The servername, dbspace names, and blobspace names are the exceptions. The locale defines which characters within a code set are considered alphabetic. Multibyte characters may be used within an identifier, but the physical length of an identifier must be 18 bytes or less. An identifier with multibyte characters will have fewer logical characters than its length in bytes.

Using Global Language Support G-9

A Locale Specifies a Collation Order


n n

Code set order: the physical order of characters in the code set Localized order: the language-specific order of characters
Code Set Order A C a b c Localized Order A a b C c
10

Collation is the order in which characters are sorted within a code set. IBM Informix database servers support two types of collation.
n

Code set order: The order of the character codes in the code set determines the sort order. For example, in the ASCII code set, A = 65 and B = 66. A will sort before B because 65 is less than 66. However, because a = 97 and M = 77, the string abc sorts after Me . Localized order: The locale determines the sort order. For example, even though the character might be represented by a code set code of 133, the locale file could list this character after A and before B (A = 65, = 133, B = 66). This would represent the more proper sort order for the language represented by the locale.

G-10 Using Global Language Support

Localized Specific Collation: NCHAR & NVARCHAR

Data Types

CHAR VARCHAR(max,reserve) TEXT NCHAR Localized order NVARCHAR(max,reserve)

Collation Order Code set order

11

The data types NCHAR and NVARCHAR differ from the CHAR and VARCHAR data types in that OnLine sorts the data with a localized collation order. For example, an index created on an NCHAR column is ordered in the localized sequence, whereas an index created on a CHAR column will be ordered in code set sequence. Data selected from a CHAR column and sorted with the ORDER BY clause will be output in code set order, whereas the output from an NCHAR column will be in localized order. The syntax for using NCHAR and NVARCHAR is essentially the same as for CHAR and VARCHAR:
CREATE TABLE gls_test( col_1 NCHAR(20), col_2 NVARCHAR(128,10) );

Using Global Language Support G-11

Collation Order and SQL Statements


Data types NCHAR and NVARCHAR only: CREATE INDEX nchar_idx ON gls_test(nchar_col1)
SELECT * FROM gls_test ORDER BY nchar_col1 SELECT * FROM gls_test WHERE nchar_col1 BETWEEN and b ... ... ... WHERE nchar_col1 IN (lvin,Johnson,Lane) WHERE nchar_col1 MATCHES * WHERE nchar_col1[1,1] >
12

The collation order of NCHAR and NVARCHAR data types depends on the localized order as defined in the locale. The major instances where localized collation order impacts processing are:
n n n

CREATE INDEX on an NCHAR or NVARCHAR column SELECT ... ORDER BY <NCHAR or NVARCHAR column> SELECT ... WHERE <NCHAR or NVARCHAR column> clause containing relational operators (=,<,>,>=,<=,!=), IN, BETWEEN, LIKE, or MATCHES.

The localized collation order specified by a locale dictates a specific ordering of characters. The expression:
SELECT col1 FROM tab1 WHERE col1 < c

executed on a CHAR or VARCHAR column (code set order) might return: A , C , a, and b . Executed on an NCHAR or NVARCHAR column (localized order), the results might be different: A, , a, b , and C. Localized collation sequences may specify case folding (case insensitivity) or characters which are equivalents. For example, if collation is in code set order (data types CHAR or VARCHAR), the expression:

G-12 Using Global Language Support

SELECT lname FROM customer WHERE lname IN (Azevedo,Llaner,Oatfield)

would return only one of Azevedo, azevedo, or zevedo whereas done in localized order, all three may be returned.

Using Global Language Support G-13

A Locale Specifies Numeric and Monetary Formats


n

Numeric w US English
w

3,225.01 French 3 225,01

Monetary w US English
w

$100,000.49 French 100 000,49FF


14

Numeric formats may be specified by the locale. They can impact the decimal separator, the thousands separator (and the number of digits in between), and the positive and negative symbol. This type of formatting applies to the end-user formats of numeric data (DECIMAL, INTEGER, SMALLINT, FLOAT, SMALLFLOAT) within a client application. It does not impact the format of the numeric data types in the database. The locale may have monetary formatting information that impacts values stored as the MONEY data type. The format may impact the currency symbol, the decimal separator, the thousands separator (and the number of digits in between), the positive and negative position and symbol, and the number of fractional digits to display. This formatting applies to the enduser format of MONEY data within a client application. It does not impact the format of the data stored in the database. DBMONEY You can also use the DBMONEY environment variable to specify the currency symbol for monetary values and the location of that symbol. The DBMONEY environment variable overrides the settings of the monetary category of the locale file.

G-14 Using Global Language Support

A Locale Specifies Date and Time Formats

Julian Year 1993 1912 1911 1910 1900

Ming Guo Year 82 01 01 02 12

15

The locale may include date and time formatting specifications. This can influence DATE and DATETIME column values, and may include names and abbreviations for days of the week and months of the year, commonly used representations for dates, time (12-hour and 24-hour), and date/time. GLS supports non-Gregorian calendars (for example, the Taiwanese Ming Guo year and the Arabic lunar calendar). Locale- specific date and time formatting impacts the presentation and entry of data at the client, not the way in which the data is stored in the database.

Using Global Language Support G-15

Date and Time Customization


Order of precedence: 1. DBDATE setenv DBDATE Y4MD- => 1995-10-25 2. DBTIME (for DATETIME year to second) setenv DBTIME %y - %m -%d %H:%M:%S=> 1995-10-25 16:30:28 3. GL_DATE setenv GL_DATE Day %d Month %m Year %Y (%A)=> Day 14 Month 11 Year 1995 (Tuesday) 4. GL_DATETIME (for DATETIME year to second) setenv GL_DATETIME %b %d, %Y at %H h %M m %S s => Oct 25, 1995 at 16 h 30 m 28 s
16

GLS recognizes the following environment variables for customizing date and time values (listed in order of precedence):
n n n n

DBDATE DBTIME (ESQL/C and ESQL/COBOL only) GL_DATE GL_DATETIME

It is recommended that you use GL_DATE and GL_DATETIME because of the greater flexibility. DBDATE and DBTIME are recognized for backward compatibility. Extensive date and time customization is available using these environment variables. They provide support for alternative dates and times including (Asian) formats such as the Taiwanese Ming Guo year and the Japanese Imperial-era dates. When the client requests a connection, it sends the date and time environment variables to the database server.

G-16 Using Global Language Support

Locales: Client, Database and Server


Server computer Client computer Log file Message-log

Client ESQL/C 7.2

Database server IDS Server locale Database

Client locale Database locale

acctng

17

A separate locale exists for a client application, a database, and a database server. When a database is created, a condensed version of the database locale is stored in the systables system catalog table. This information is used by the database server for operations such as handling regular expressions, collating character strings, and ensuring proper use of code sets (collation, character classification, and code set). The database locale for a particular database cannot be changed. If you wish to change the locale of a database, you must:
n n n n

unload the data. drop the database. create a new database with the desired locale (by setting DB_LOCALE). load the data.

Client applications use the client locale when they perform read and write operations on the client computer. Operations include reading a keyboard entry or a file, and writing to the screen, a file, or a printer. Most localized date, number, money, and message processing is performed by the client. The server locale determines how the database server performs I/O operations on the server computer. These I/O operations include reading or writing the following files:

Using Global Language Support G-17

n n n

Diagnostic files that the database server generates to provide additional diagnostic information Log files that the database server generates to record events Explain file, sqexplain.out, that is generated by executing the SQL statement SET EXPLAIN.

The database server is the only IBM Informix product that needs to know the server locale. Locale Compatibility The languages and territories of the client, database, and server locales may be different if the code sets are the same. Be careful, however, because GLS does not provide semantic translation. If the locale stored in the database is us_en.8859-1 and the CLIENT_LOCALE is fr_fr.8859-1 , a value stored in the database as $10.00 will display on the client as 10,00FF. There is no exchange rate calculation. Additionally, the code set of the locale stored in the database may differ from the CLIENT_LOCALE code set. However, there are restrictions. If a database is created with DB_LOCALE = aa_bb.cs1, then the CLIENT_LOCALE may equal any locale, cc_dd.cs2 , but only if locale cc_dd.cs1 exists and there is code set conversion between cs1 and cs2 (code set conversion is explained later in the chapter). If cc_dd.cs1 does not exist, then you will get error -23101. If the SERVER_LOCALE is not compatible with the DB_LOCALE (i.e. the code sets are different and not convertible), data is written to external files without code set conversion.

Note
Most processing relating to collation sequence or character classification is handled by the database server. Most processing related to formatting of date, number, and money values is performed by the client.

G-18 Using Global Language Support

Specifying Locales
n

Default setenv CLIENT_LOCALE en_us.8859-1 setenv DB_LOCALE en_us.8859-1 setenv SERVER_LOCALE en_us.8859-1 Example setenv CLIENT_LOCALE ja_jp.sjis setenv DB_LOCALE ja_jp.ujis setenv SERVER_LOCALE ja_jp.ujis

19

The following three environment variables specify the locales for the client application, database and database server, respectively.
n n n

CLIENT_LOCALE DB_LOCALE SERVER_LOCALE

When the client requests a connection, it sends CLIENT_LOCALE and DB_LOCALE to the database server. If the client and database locales sent by the client are not compatible with what is stored in the database, a warning will be returned to the client in the SQL communications area (SQLCA) via the SQLWARN7 warn flag (except when the code sets differ and code set conversion is available). The client application should check this flag after connecting to a database. The server locale, specified by SERVER_LOCALE, determines how the database server reads and writes external files.

Using Global Language Support G-19

Multiple Locales: Code Set Conversion


Log file Server locale Database server IDS Message-log

Client ESQL/C

Client locale Database Database locale acctng

20

In a client/server environment, character data might need to be converted from one code set to another if the client, database, or server computers use different code sets to represent the same characters. Converting character data from one code set to another is called code set conversion. Code set conversion is needed when:
n n

one language has different code sets representing subsets of the language. different operating systems encode the same characters in different ways.

In the client/server environment, the following situations require code set conversion:
n

If the client locale and database locale specify different code sets, the client application performs code set conversion so that the server computer is not loaded with this type of processing. If the server locale and server processing locale specify different code sets, the database server performs code set conversion when it writes to and reads from operating-system files such as log files.

Code set conversion does not convert words to different languages. For example, it does not convert the English word yes to the French word oui. It only ensures that each character is processed or printed the same regardless of how it is encoded. Code set conversion does not:

G-20 Using Global Language Support

n n

perform semantic translation. Words are not translated from one language to another. create characters which do not exist in the target code set. Conversion is from a valid source character to a valid target character via a conversion file.

Code Set Conversion file A code set conversion file is used to map source characters to target characters. If a conversion file does not exist for the source-to-target relationship, an error is returned to the client application when it begins execution. BYTE data is never converted. Use the glfiles utility to generate a listing of the code set conversion files available on your system. Compatible Locales The code set of the CLIENT_LOCALE (cc_dd.cs2) may differ from the code set of the locale stored in the database (aa_bb.cs1), only if locale cc_dd.cs1 exists and there is a code set conversion file between cs1 and cs2.

Using Global Language Support G-21

Code Set Conversion: Performance Consideration


n n n n

Minimize code set conversion Determine number and locales of clients Build databases with the locales which will minimize code set conversion Be aware of where code set conversion occurs: Client: CLIENT_LOCALE != DB_LOCALE Database server: DB_LOCALE != SERVER_LOCALE

22

Code set conversion requires processing resources. You should analyze your system configuration to determine the locale settings for clients, databases, and database servers which minimize code set conversion. For example, if an environment consists of 100 clients with locale ja_jp.ujis and 2 clients with locale ja_jp.sjis, it would be reasonable to create the database with locale ja_jp.ujis.

G-22 Using Global Language Support

Multibyte Character Support for Utilities/APIs


n

IDS utilities
onaudit dbload oncheck onload onshowaudit dbaccess onstat dbexport onunload dbimport dbschema

n n

ESQL/C ESQL/COBOL

23

Most OnLine utilities support multibyte characters (and 8-bit characters). ESQL/C and ESQL/ COBOL support multibyte characters as long as your compiler supports the same single-byte or multibyte code set that the ESQL source file uses. If your C compiler does not support the code set, you can use the CC8BITLEVEL environment variable (documented in the Guide to GLS Functionality) as a workaround to specify the preprocessing environment for your C compiler. For example, setting CC8BITLEVEL to 0 indicates to the ESQL preprocessor that the compiler does not support utilizing the 8th bit in strings and comments.

Using Global Language Support G-23

The glfiles Utility


Output from the glfiles utility displays GLS files on your system. n GLS locale files n IBM Informix code set conversion files n IBM Informix code set files glfiles -lc {for locale files} glfiles -cv {for conversion files} glfiles -cm {for code set files}

24

You can use the glfiles utility to find out what locale files, code set conversion files, and code set files are stored on your system. When you execute the glfiles utility, the output will be stored in a series of files in the current directory. Locale Files The locales will be stored in a file named lcX .txt , where X is the version of the locale object file. The lcX.txt file lists the locales in alphabetical order sorted on the name of the GLS object locale file. Code Set Conversion Files The code set conversion files will be stored in files named cvY.txt, where Y is the version number of the code set conversion object file. The cvY .txt file lists the code set conversions in alphabetical order, sorted on the name of the object code set conversion file. Most code set to code set conversions will have two code set conversion files: code set A => code set B and code set B => code set A. Code Set Files The list of code set files will be stored in files named cmZ .txt , where Z is the version number of the code set object file format. The cmZ .txt file lists the code sets in alphabetical order, sorted on the name of the GLS object code set file.
G-24 Using Global Language Support

Migrating to GLS from NLS or ALS


n n n n n

Determine the NLS or ALS locale. Determine whether GLS supports the old locale. If the old locale is supported, decide whether to keep the old locale or convert to a GLS custom locale. If staying with the old locale, no special steps are needed. The database will be converted to GLS when opened under 7.2. If changing to a new locale, the database must be unloaded and then re-created and loaded with the new locale.

25

Versions of IBM Informix prior to 7.2 supported NLS (Native Language Support) and ALS (Asian Language Support). If you are using NLS or ALS and are migrating to version 7.2, you will need to convert to GLS. Two types of locales are supported in version 7.2:
n

n n

IBM Informix GLS custom locales: these are the same for all operating systems. Distributed queries across different platforms will yield the same results as queries between different databases on the same database server. Locales compatible with operating-system locales: these may be different from one platform to another. In pre-7.2 versions of OnLine, NLS and ALS use operating system locales. Decide whether to use the current locale or to convert to an IBM Informix GLS custom locale. Upgrading to version 7.2 with the current, operating-system locale requires no special action on your part. However, distributed queries across dissimilar platforms might give incorrect results because of different locale category definitions. Changing from an operating system locale to a custom locale requires that you unload, then reload your data.

Using Global Language Support G-25

Migration Steps Follow the steps below when migrating to version 7.2:
n n n n n

Determine the current NLS or ALS locale. Reference chapter 8 of the IBM Informix Migration Guide for more information. Determine whether GLS supports the old locale (run glfiles). If the old locale is supported, decide whether to keep the old locale or convert to a GLS custom locale. If the old locale is not supported, then you must choose a new locale. If staying with the old locale, no special steps are needed. The database will be converted to GLS when opened under 7.2. If changing to a new locale, the database must be unloaded and then re-created and loaded with the new locale.

It is recommended that you read the IBM Informix Migration Guide prior to migrating to version 7.2.

G-26 Using Global Language Support

Exercise 1
1. In your student directory, run the glfiles utility. Examine the three files that are created (lcX.txt (locales), cvY.txt (code set conversion files), and cmZ.txt (code sets)) and answer the following questions: Is there a locale that supports British English? What code set does it use? What file name contains the Japanese codeset sjis? Is there a code set conversion available between ISO8859-1 and ISO 8859-2? Create a database with a different locale than what you are currently running. For example: set CLIENT_LOCALE and DB_LOCALE to fr_fr.8859-1 execute dbaccessdemo7 database_name run dbaccess and execute SELECT * FROM orders How does the money, date, and numeric format differ from a database created with the en_us.8859-1 locale? Examine the data types of the systables system catalog table of the database created in #2 above. Do they differ from data types in a database that uses the default locale? Can you find where the locale information is stored in the systables table? Set your CLIENT_LOCALE to a different locale than what it was set at in #2 but one that uses the same code set. Run dbaccess and try to access your new database. What happens? Now set CLIENT_LOCALE to a locale that uses a different code set and run dbaccess. What happens? Think of a way to demonstrate code set conversion and see if it works. Display a locale file and identify the different sections (look in $INFORMIXDIR/gls/ lcX/.. ). If you forget the locale of your database, you will only be able to access it if your CLIENT_LOCALE code set is the same as your database or is convertible. If you cannot access your database, then you can find out the database locale with the following query:
SELECT * FROM sysmaster:systabnames WHERE dbsname=your_database_name

2.

3.

4.

5. 6. 7.

The collate column contains the locale.

Using Global Language Support G-27

G-28 Using Global Language Support

Index

Index
Numerics
24 x 7 Emergency Recovery Service F-15

D
Data distributions what information is kept 13-10 Data type BYTE 2-14 DATE 2-9 DATETIME 2-9 DECIMAL/MONEY 2-6 FLOAT 2-5 INTEGER 2-5 INTERVAL 2-9 SMALLFLOAT 2-5 SMALLINT 2-5 TEXT 2-14 Database administration tasks 1-5 Database level privileges granting 14-5 Database object 11-3 disabled mode 11-4 enabled mode 11-4 Database object mode filtering 11-4 Database privilege DBA 14-4 Database privileges DBA 14-4 RES 14-4 RESOURCE 14-4 Database server 1-3 DBA privilege 14-4 DBCENTURY 2-11 Dbexport utility file structure 16-5 syntax 16-7 Dbimport utility 16-4 Dbload utility 16-17 command file 16-21 syntax 16-18 DBSCHEMA utility 4-17 data distributions 13-12 dbspace definition 3-4 Default values 9-4 Detached checking 9-17 diagnostics table 11-9, 11-22

A
ACD menu F-7 AFTER triggered action list 10-8 After-Hours Support F-15 ALTER FRAGMENT statement 6-25 ALTER INDEX statement 5-10 ALTER INDEX TO CLUSTER statement 5-26 ALTER TABLE statement 4-7, 4-9 , 4-11 Automatic Call Distribution menu F-7

B
BEFORE triggered action list 10-8

C
Cascading deletes 8-8, 8-9 Case Escalation F-20 Character Data Types 2-3 Check constraint 9-7 chunk definition 3-4 Cluster Index 5-7 Composite index 5-6 Confidence 13-15 CONNECT privilege 14-4 Constraint CHECK 9-4 , 9-8 , 11-3 checking 9-12, 9-17 NOT NULL 9-4, 9-6 , 11-3 referential 8-3 transaction-modes 9-12 UNIQUE 9-4 , 9-11, 11-3 Correlation name 10-6 , 10-10 CREATE DATABASE statement 3-10 CREATE INDEX statement 5-8 CREATE ROLE statement 14-15 CREATE TABLE statement 3-19 CREATE TRIGGER 10-6 CS Times F-18 Customer Services Handbook F-4 Cyclic 8-4

Index-1

Dial-Up Access and Confidentiality Agreement F-13 Disable an object 11-6 Disabling an object 11-7 DROP DATABASE statement 4-15 DROP INDEX statement 5-10 DROP table statement 4-15 DROP TRIGGER 10-23 DROP VIEW statement 15-5 DSS queries 6-5 Duplicate index 5-5

round robin 6-9 Fragmented 6-19

H
Hash Join 12-4 High Performance Loader (HPL) utility 17-3 job 17-9

I E
Enabling an object 11-8 Entity integrity 9-3 Environment variable DBCENTURY 2-11 DBDATE 2-9 DBDELIMITER 16-17 DBMONEY 2-6 DBSPACETEMP 4-4, 5-27 DBUPSPACE 13-16 INFORMIXDIR 1-4 INFORMIXSERVER 1-4 PATH 1-4 PDQPRIORITY 5-27 PSORT_DBTEMP 5-27 Example trigger 10-9 Exclusive 7-3 Expressions 6-16 Extent allocation of 3-16 growth of 3-15 size of 3-13 Extent size 3-13 NEXT size 3-13 extent size calculating 3-24 Extents 3-13 IECC user interface 17-11 IECC utility load job 18-3 Index altering 5-10 B+ tree 5-3 clustering 5-7 composite 5-6 creating 5-8 definition 5-3 dropping 5-10 duplicate 5-5 extent size 5-22 fill factor 5-9 key value locking 7-21 unique 5-5 Index fill factor description of 5-9 Indexes fragmented 6-19 Informix Customer Support definition F-3 Informix-Enterprise Support F-15 INFORMIX-OpenLine F-4 In-place alter 4-8, 4-10 Instance 1-3 Ipload utility 17-7

J F
Filter selectivity 12-10 Filtering mode 11-12 FOR EACH ROW triggered action list 10-8 Fragmentation advantages and disadvantages 6-11 expression-based 6-9, 6-13 , 6-16 extent sizes 6-10 guidelines 6-14 indexes 6-21 Join definition 12-3 hash 12-4 nested loop 12-4, 12-5

L
Lock mode page 3-17 page level 7-18 row 3-18 row level 7-18

Index-2

wait 7-17 Locking key value 7-21 Locks exclusive 7-3 shared 7-3 update 7-3

M
memory-resident tables 4-16

N
Nested loop join 12-4

REFERENCING clause 10-10 Referential 8-3 Referential Constraint, cyclic 8-4 REferential Constraint, multiple-path 8-4 Referential Constraint, Self-referencing 8-4 Referential integrity 8-3 Regency Services F-15 Remote tables 10-8 RENAME COLUMN statement 4-14 RENAME DATABASE statement 4-14 RENAME TABLE statement 4-14 RESO 14-4 Resolution, for data distributions 13-13 RESOURCE privilege 14-4 root dbspace 3-4 Row level locking 3-18 rowsize 3-23

O
Object modes 11-4, 11-5 oncheck utility 4-8 Onload utility 16-22 Onpload database 17-6 Onpload utility 17-6 Onunload utility 16-22 syntax 16-23 OpenLine F-4 OPTCOMPIND 12-15 OPTCOMPIND parameter 12-15 Optimization Path 12-9

S
Semantic integrity 9-3 SET 7-9 SET CONSTRAINTS statement 9-12 SET DATASKIP statement 6-29 SET EXPLAIN 12-23 SET ISOLATION 7-9 SET OPTIMIZATION 12-17 SET TRANSACTION statement 7-10 Slot table 3-22 SMI description of E-4 Standard tables 4-12 START VIOLATIONS statement 11-10 STOP VIOLATIONS statement 11-14 Support 24 x 7 Emergency Recovery Service F-15 After-Hours Support F-15 Informix-Enterprise Support F-15 Premier Support Services F-15 Regency Services F-15 Synonyms 4-5 sysdistrib system catalog table 13-21 sysdistrib table 13-11 sysfragments table 6-30 sysindexes table 5-28 , 13-6 Sysmaster database tables E-2 E-7 sysobjstate table 11-23 systables table 13-6 System catalog 3-3 systrigbody 10-24 systriggers 10-24 sysviolations table 11-24

P
page definition 3-13 Page header 3-22 Page level locking 3-17 Page structure 3-22 pageuse 3-23 PDQ queries 6-6 PDQPRIORITY values 6-7 Premier Support Services F-15 Problem Resolution Process F-8

Q
Query path 12-7

R
RAISE EXCEPTION 10-16 Raw tables 4-12

Index-3

T
Table/column level privileges granting 14-6 tblspace 3-14 Tech Notes F-18 TechInfo Center F-7, F-17 Temporary dbspaces 4-4 Temporary table 4-4 Temporary tables discussion of 4-4 Thread btcleaner thread 7-23 Trigger action 10-3 , 10-6 Trigger event 10-3, 10-6 Triggering table 10-3 Tuple 12-3

U
Unique index 5-5 unlogged tables 4-12 UPDATE STATISTICS 7-24 , 13-8 data distributions 13-10 resolution 13-13 UPDATE STATISTICS statement 13-4

V
View 15-3 violations table 11-9 , 11-21

W
Winpload utility 17-7, 17-10 devices 18-5 input fields 18-6 log file 18-18 reject file 18-18 unload job 18-12

Index-4

Managing and Optimizing IBM Informix Dynamic Server 7.x Databases

IBM Data Management Solutions Education Services

Volume 1 of 2 Version 2 09-2001 000-8677 October 29, 2001

Trademarks
IBM and the IBM logo are registered trademarks of International Business Machines Corporation. The following are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both:
AnswersOnLine; C-ISAM; Client SDK; Cloudscape; DataBlade; DynamicScalableArchitecture; DynamicServer; DynamicServer.2000; DynamicServerwithAdvancedDecision SupportOption; DynamicServer withExtended ParallelOption; DynamicServerwithUniversalDataOption; DynamicServerwithWebIntegration Option; DynamicServer,WorkgroupEdition; Foundation.2000; Illustra; Informix; Informix4GL; InformixExtendedParallelServer; InformixInternet Foundation.2000; InformixRedBrick DecisionServer; J/Foundation; MaxConnect; ON-Bar; OnLineDynamicServer; RedBrickandDesign; RedBrickDataMine; RedBrickDecisionServer; RedBrickMineBuilder; RedBrickDecisionscape; RedBrickReady; RedBrickSystems; RelyonRedBrick; UniData; UniData&Design; UniversalDataWarehouseBlueprint; UniversalDatabaseComponents; UniversalWebConnect; UniVerse; VirtualTableInterface; Visionary; WebIntegrationSuite

Microsoft, Windows, Window NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Java, JDBC, and all Java-based trademarks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Other company, product, or service names may be trademarks or service marks of others.

The information contained in this document has not been submitted to any formal IBM test and is distributed on an as is basis without any warranty either express or implied. The use of this information of the implementation of any of these techniques is a customer responsibility and depends on the customers ability to evaluate and integrate them into the customers operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will result elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk. The original repository material for this course has been certified as being Year 2000 compliant. Copyright International Business Machines Corporation 2001. All rights reserved. This document may not be reproduced in whole or in part without the prior written permission of IBM. Note to U.S. Government UsersDocumentation related to restricted rightsUse, duplication, or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp.

iii

Objectives
At the end of this course, you will be able to:
n n n n n n n n n n n n n n n

Use IBM Informix Dynamic Server data types Estimate the size and extent requirements for tables and indexes Use and tune the High Performance Loader Learn techniques and uses of triggers Create databases, tables and indexes Create fragmented tables and indexes Implement parallel database query (PDQ) Improve application performance through the use of SET EXPLAIN ON Understand concurrency control Create an indexing strategy to improve performance Explain the Informix Dynamic Server optimizer Implement referential and entity integrity Create and use views Control data security Use database utilities

Prerequisites
To maximize the benefits of this course, we require that you have met the following prerequisites:
n n

Relational Database Design or equivalent knowledge Structured Query Language or equivalent knowledge

iv

Acknowledgments
Course Developer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jeff Eckert, Monica Njoo, Kitty Stokes Technical Review Team. . . . . . . . .Scott Barney, Greg Butler, Lisa Childress, Helen Dalton, Dan Geoppo, Janet Grzesiak, Jim Jackson, Wendy Lo, Javad Movassaghi, Raj Muralidharan, Sue Rich Course Production Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Susan Dykman

Further Information
To obtain further information regarding IBM Informix training, please visit the IBM Informix Education Services website: http://www.informix.com/training.

Comments or Suggestions
Thank you for attending this training class. We strive to build the best possible courses, and we value your feedback. Help us to develop even better material by sending comments, suggestions and compliments to training_doc@informix.com.

vi

Table of Contents
Module 1 Introduction to Database Administration
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 An IBM Informix System . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3 Accessing the IBM Informix Instance . . . . . . . . . . . . . . . . .1-4 Database Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5 Creating Databases and Tables . . . . . . . . . . . . . . . . . . . . . 1-6 Assuring Data Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-7 Managing Concurrency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8 Optimizing Data Access . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9

Module 2

IBM Informix Dynamic Server Data Types


Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 CHAR vs. VARCHAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 Numeric Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 SERIAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7 DATE, DATETIME, INTERVAL . . . . . . . . . . . . . . . . . . . . . . 2-9 DBCENTURY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11 Binary Large Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13 TEXT vs. BYTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-14 True or False . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16 True or False . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-20

Module 3

Creating Databases and Tables


Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 Creating a Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 Location: Dbspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-4 Logging Modes: No Logging . . . . . . . . . . . . . . . . . . . . . . . . 3-6 Logging Modes: Buffered Logging . . . . . . . . . . . . . . . . . . . . 3-7 Logging Modes: Unbuffered Logging . . . . . . . . . . . . . . . . .3-8 Mode ANSI Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-9 CREATE DATABASE Statement . . . . . . . . . . . . . . . . . . . 3-10 Creating a Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11 Tables and Dbspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12 Extents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13

vii

Extent Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15 Table Lock Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-17 CREATE TABLE Statement . . . . . . . . . . . . . . . . . . . . . . . 3-19 Storing BLOB Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-20 Page Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-22 The Sysmaster Database . . . . . . . . . . . . . . . . . . . . . . . . . 3-25

Module 4

Table Maintenance
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 System Catalog Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3 Temporary Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4 Synonyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5 Altering a Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-7 In-Place ALTER TABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8 In-Place Alter Will Not Be Used If: . . . . . . . . . . . . . . . . . . . 4-10 Next Extent Size and Lock Mode . . . . . . . . . . . . . . . . . . . 4-11 7.31 Feature: Unlogged Tables . . . . . . . . . . . . . . . . . . . . . 4-12 Renaming Columns, Tables and Databases . . . . . . . . . . . 4-14 Dropping Tables and Databases . . . . . . . . . . . . . . . . . . . . 4-15 Memory Residency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16 The DBSCHEMA Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-17

Module 5

Indexes and Indexing Strategy


Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2 Index Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3 B+ Tree Splits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 Indexes: Unique and Duplicate . . . . . . . . . . . . . . . . . . . . . . 5-5 Composite Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6 Cluster Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7 The CREATE INDEX Statement . . . . . . . . . . . . . . . . . . . . . 5-8 Index Fill Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-9 Managing Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-10 Benefits of Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11 Index Join Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-12 Index Filter Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13 Index Columns Involved in Sorting . . . . . . . . . . . . . . . . . . 5-14 Avoid Highly Duplicative Indexes . . . . . . . . . . . . . . . . . . . 5-15 Avoid Heavy Indexing of Volatile Tables . . . . . . . . . . . . . . 5-17 Create Composit Indexes . . . . . . . . . . . . . . . . . . . . . . . . . 5-18

viii

Keep Key Size Small . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-19 INDEXES AND EXTENT SIZE . . . . . . . . . . . . . . . . . . . . . 5-22 Costs of Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-23 Mass Updates to a Table . . . . . . . . . . . . . . . . . . . . . . . . . . 5-25 Indexes and Empty Extents . . . . . . . . . . . . . . . . . . . . . . . . 5-26 Fast Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-27 SYSINDEXES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-28

Module 6

Fragmentation
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 Fragmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3 Advantages of Fragmentation . . . . . . . . . . . . . . . . . . . . . . .6-4 Parallel Scans and Fragmentation . . . . . . . . . . . . . . . . . . . 6-5 Parallel Scans (PDQ Queries) . . . . . . . . . . . . . . . . . . . . . . .6-6 Balanced I/O and Fragmentation . . . . . . . . . . . . . . . . . . . . . 6-8 Types of Distribution Schemes . . . . . . . . . . . . . . . . . . . . . . 6-9 Fragments and Extents . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10 Fragmenting a Table: Round Robin . . . . . . . . . . . . . . . . . 6-11 Fragmenting a Table: Expression . . . . . . . . . . . . . . . . . . . 6-13 Logical and Relational Operators . . . . . . . . . . . . . . . . . . . 6-15 Using Hash Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-16 Fragmenting by Expression . . . . . . . . . . . . . . . . . . . . . . . . 6-17 Fragmenting Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-19 CREATE INDEX Statement . . . . . . . . . . . . . . . . . . . . . . . . 6-21 ROWIDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-22 Guidelines for a Fragmentation Strategy . . . . . . . . . . . . . . 6-23 The ALTER FRAGMENT Statement . . . . . . . . . . . . . . . . . 6-25 How is ALTER FRAGMENT Executed? . . . . . . . . . . . . . . 6-28 Skipping Inaccessible Fragments . . . . . . . . . . . . . . . . . . . 6-29 Sysfragments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-30

Module 7

Concurrency Control
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2 Types of Concurrency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3 Read Concurrency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4 Dirty Reads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5 Committed Reads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6 Cursor Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-7 Repeatable Reads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8

ix

Setting the Level of Isolation . . . . . . . . . . . . . . . . . . . . . . . . 7-9 SET TRANSACTION Statement . . . . . . . . . . . . . . . . . . . . 7-10 Degree of Tolerable Interference . . . . . . . . . . . . . . . . . . . . 7-11 RETAIN UPDATE LOCKS - 7.31 Feature . . . . . . . . . . . . . 7-12 Update Concurrency: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-13 Database Level Locking . . . . . . . . . . . . . . . . . . . . . . . . . . 7-14 Table-Level Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-15 Setting the Lock Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-17 Page and Row Level Locking . . . . . . . . . . . . . . . . . . . . . . 7-18 Lock Access: Row/Page Level . . . . . . . . . . . . . . . . . . . . . 7-19 Deadlock Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-20 Key Value Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-21 What Happens After a DELETE? . . . . . . . . . . . . . . . . . . . 7-23 syslocks and syssessions . . . . . . . . . . . . . . . . . . . . . . . . . 7-25

Module 8

Referential Integrity
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2 What is Referential Integrity? . . . . . . . . . . . . . . . . . . . . . . . 8-3 Referential Constraints: Example . . . . . . . . . . . . . . . . . . . .8-5 Creating Referential Constraints . . . . . . . . . . . . . . . . . . . . . 8-6 Constraint Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-7 Cascading Deletes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-8 Self-Referencing Referential Constraints . . . . . . . . . . . . . 8-10 Delete/Update of a Parent Row . . . . . . . . . . . . . . . . . . . . . 8-11 Insert/Update of a Child Row . . . . . . . . . . . . . . . . . . . . . . . 8-12

Module 9

Other Constraints and Maintenance


Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2 Enforcing Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-3 Types of Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4 Default Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5 NOT NULL Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-6 Check Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-7 Example: Check Constraint . . . . . . . . . . . . . . . . . . . . . . . . . 9-8 Adding Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-10 Unique Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-11 Constraint Transaction Modes . . . . . . . . . . . . . . . . . . . . . . 9-12 Immediate Constraint Checking . . . . . . . . . . . . . . . . . . . . 9-13 Deferred Constraint Checking . . . . . . . . . . . . . . . . . . . . . . 9-15

Detached Constraint Checking . . . . . . . . . . . . . . . . . . . . . 9-17 Performance Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-18 Dropping a Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-19 System Catalog Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-20

Module 10

Creating and Using Triggers


Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2 What is a Trigger? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3 Why Use Triggers? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-4 CREATE TRIGGER Components . . . . . . . . . . . . . . . . . . . 10-6 Trigger Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-7 Trigger Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-8 Trigger Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-9 REFERENCING Clause . . . . . . . . . . . . . . . . . . . . . . . . . 10-10 REFERENCING Example . . . . . . . . . . . . . . . . . . . . . . . .10-11 The WHEN Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-12 Multiple Update Triggers on One Table . . . . . . . . . . . . . . 10-13 Cascading Triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-14 If a Trigger Fails? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-15 Discontinuing an Operation . . . . . . . . . . . . . . . . . . . . . . . 10-16 Trigger to Pass Values Into an SP . . . . . . . . . . . . . . . . . 10-17 Returning Values From a Procedure . . . . . . . . . . . . . . . .10-18 Triggers and Stored Procedures . . . . . . . . . . . . . . . . . . . 10-20 Cursors and Triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-21 Triggers and Constraint Checking . . . . . . . . . . . . . . . . . . 10-22 Dropping a Trigger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-23 How a Trigger is Executed . . . . . . . . . . . . . . . . . . . . . . .10-24 Getting Information About Triggers . . . . . . . . . . . . . . . . . 10-25

Module 11

Modes and Violation Detection


Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2 Types of Database Objects . . . . . . . . . . . . . . . . . . . . . . . . 11-3 Database Object Modes . . . . . . . . . . . . . . . . . . . . . . . . . . 11-4 Why Use Object Modes? . . . . . . . . . . . . . . . . . . . . . . . . . . 11-5 Disabling an Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-6 Creating a Disabled Object . . . . . . . . . . . . . . . . . . . . . . . . 11-7 Enabling a Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-8 Recording Violations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-9 Violations Tables Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 11-10

xi

Filtering Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-12 Turning Off Violation Logging . . . . . . . . . . . . . . . . . . . . . 11-14 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-15 Example (cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-16 Example (cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-17 Example (cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-18 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-19 Example 2 (cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-20 Violations Table Schema . . . . . . . . . . . . . . . . . . . . . . . . . 11-21 Diagnostic Table Schema . . . . . . . . . . . . . . . . . . . . . . . . 11-22 System Catalog Tables . . . . . . . . . . . . . . . . . . . . . . . . . . 11-23 System Catalog Tables (cont.) . . . . . . . . . . . . . . . . . . . . 11-24

Module 12

The IBM Informix Cost-Based Optimizer


Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-3 Primary Join Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-4 Nested Loop Join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-5 Hash Join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-6 Query Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-7 Calculating Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-8 Optimization Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-9 Step 1: Examine All Tables . . . . . . . . . . . . . . . . . . . . . . . 12-10 Step 2: Estimate Cost for Joined Pair . . . . . . . . . . . . . . .12-12 Step 3: Repeat for Each Extra Table . . . . . . . . . . . . . . . .12-13 OPTCOMPIND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-15 Optimizer Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . 12-17 Optimization LOW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-18 Optimization of Stored Procedures . . . . . . . . . . . . . . . . . 12-19 When to Try OPTIMIZATION LOW . . . . . . . . . . . . . . . . . 12-20 FIRST_ROWS Optimization . . . . . . . . . . . . . . . . . . . . . . 12-22 Using SET EXPLAIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-23 SET EXPLAIN Example 1 . . . . . . . . . . . . . . . . . . . . . . . .12-25 SET EXPLAIN Example 2 . . . . . . . . . . . . . . . . . . . . . . . .12-27 SET EXPLAIN Example 3 . . . . . . . . . . . . . . . . . . . . . . . .12-29 SET EXPLAIN Example 4 . . . . . . . . . . . . . . . . . . . . . . . .12-31 Current SQL Information . . . . . . . . . . . . . . . . . . . . . . . . . 12-32 Optimizer Directives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-34 EXPLAIN Directive . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-36 Using Directives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-37

xii

Module 13

Update Statistics and Data Distributions


Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2 Improving Query Performance . . . . . . . . . . . . . . . . . . . . . 13-3 UPDATE STATISTICS . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-4 Statistics Available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-6 MEDIUM and HIGH Mode . . . . . . . . . . . . . . . . . . . . . . . . . 13-8 How Distributions are Created . . . . . . . . . . . . . . . . . . . . .13-10 What Information is Kept . . . . . . . . . . . . . . . . . . . . . . . . . 13-11 Distribution Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-12 Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-13 Confidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-15 Space Utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-16 Guidelines for Creating Distributions . . . . . . . . . . . . . . . .13-17 Guidelines (cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-19 The DROP DISTRIBUTIONS Clause . . . . . . . . . . . . . . .13-20 The sysdistrib System Catalog Table . . . . . . . . . . . . . . .13-21 When Table Changes Affect Distribution . . . . . . . . . . . . .13-22

Module 14

Data Security
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-2 Levels of Data Security . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-3 Database Level Privileges . . . . . . . . . . . . . . . . . . . . . . . . . 14-4 Granting Database Level Privileges . . . . . . . . . . . . . . . . . 14-5 Table/Column Level Privileges . . . . . . . . . . . . . . . . . . . . . 14-6 Granting Table Level Privileges . . . . . . . . . . . . . . . . . . . . . 14-7 Granting Column Level Privileges . . . . . . . . . . . . . . . . . . . 14-8 Default Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-9 Stored Procedure Privileges . . . . . . . . . . . . . . . . . . . . . . 14-10 Revoking Database Level Privileges . . . . . . . . . . . . . . . .14-12 Revoking Table Level Privileges . . . . . . . . . . . . . . . . . . . 14-13 Role-Based Authorization . . . . . . . . . . . . . . . . . . . . . . . . 14-14 Roles and Permissions . . . . . . . . . . . . . . . . . . . . . . . . . .14-15 Using Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-16 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-17 GRANT and REVOKE FRAGMENT . . . . . . . . . . . . . . . . 14-18 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-19 System Catalog Tables . . . . . . . . . . . . . . . . . . . . . . . . . . 14-20

xiii

Module 15

Views
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-2 What is a View? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-3 Creating a View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-4 Creating Views: Examples . . . . . . . . . . . . . . . . . . . . . . . . . 15-6 A View that Joins Two Tables . . . . . . . . . . . . . . . . . . . . . . 15-7 A View on Another View . . . . . . . . . . . . . . . . . . . . . . . . . . 15-8 Restrictions on Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-9 Views: INSERT, UPDATE, DELETE . . . . . . . . . . . . . . . .15-10 The WITH CHECK OPTION Clause . . . . . . . . . . . . . . . . 15-11 More on WITH CHECK . . . . . . . . . . . . . . . . . . . . . . . . . .15-12 Views and Access Privileges . . . . . . . . . . . . . . . . . . . . . . 15-13 System Catalog Tables for Views . . . . . . . . . . . . . . . . . .15-14

Module 16

IBM Informix Dynamic Server Data Movement Utilities


Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-2 Loading and Unloading Data . . . . . . . . . . . . . . . . . . . . . . . 16-3 Dbexport/Dbimport Highlights . . . . . . . . . . . . . . . . . . . . . . 16-4 Directory/File Structure Created . . . . . . . . . . . . . . . . . . . . 16-5 Using Dbimport and Dbexport . . . . . . . . . . . . . . . . . . . . . . 16-6 Dbexport Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-7 Dbexport Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-9 Using Additional Options . . . . . . . . . . . . . . . . . . . . . . . . . 16-10 Dbimport Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-11 Dbimport Syntax Create Options . . . . . . . . . . . . . . . . . . . 16-13 Dbimport Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-14 Additional Dbimport Options . . . . . . . . . . . . . . . . . . . . . . 16-15 Dbload Highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-17 Dbload Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-18 Dbload Command File: Delimited . . . . . . . . . . . . . . . . . .16-19 Dbload Command File: Character Position . . . . . . . . . . . 16-21 Onunload/Onload Highlights . . . . . . . . . . . . . . . . . . . . . . 16-22 Onunload Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-23 Unload Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-24

Module 17

Introduction to the High Performance Loader


Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-2 High Performance Loader Features . . . . . . . . . . . . . . . . . 17-3 Parallel Loading and Unloading . . . . . . . . . . . . . . . . . . . . . 17-5

xiv

HPL Constituants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-6 User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-7 Jobs and Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-9 Exporting and Importing With Winpload . . . . . . . . . . . . . 17-10 Using the IECC Interface . . . . . . . . . . . . . . . . . . . . . . . . . 17-11

Module 18

Using the Winpload Interface


Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-2 Loading a Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-3 Naming the Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-4 Choosing Input Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-5 Defining Input Field Names . . . . . . . . . . . . . . . . . . . . . . . . 18-6 Defining Data Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-7 Mapping Input Fields to Columns . . . . . . . . . . . . . . . . . . . 18-8 Express Mode Loading . . . . . . . . . . . . . . . . . . . . . . . . . .18-10 Creating an Unload Table Job . . . . . . . . . . . . . . . . . . . . .18-12 Defining a Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-13 Unloading Blobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-15 Accessing Existing Jobs . . . . . . . . . . . . . . . . . . . . . . . . . 18-16 Running the Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-17 Reject and Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-18 Importing and Exporting Databases . . . . . . . . . . . . . . . . 18-20 Additional Courses Available . . . . . . . . . . . . . . . . . . . . . . 18-23

Appendix A

Using the IPLoad Interface


Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2 Starting the User Interface . . . . . . . . . . . . . . . . . . . . . . . . . A-3 GUI Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-4 GUI Tips (cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-5 Selecting a Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-6 Selecting and Creating a Job . . . . . . . . . . . . . . . . . . . . . . A-7 The Load Job Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-8 The Device Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-9 Device Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-10 Device Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-12 Creating a New Device Array . . . . . . . . . . . . . . . . . . . . . A-13 Defining Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-14 Format Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-15 Filters (for Loads) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-16

xv

Map View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Creating a Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Mapping Options . . . . . . . . . . . . . . . . . . . . . . . . . . Discard Records and Logfile . . . . . . . . . . . . . . . . . . . . . . Generating Formats and Maps . . . . . . . . . . . . . . . . . . . . The Unload Job Window . . . . . . . . . . . . . . . . . . . . . . . . . Generating Formats and Maps . . . . . . . . . . . . . . . . . . . .

A-17 A-18 A-19 A-20 A-21 A-22 A-23

Appendix B

Monitoring Load Operations


Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2 HPL Operations Overview . . . . . . . . . . . . . . . . . . . . . . . . . B-3 The Onpload Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-5 Threads in the Database Server . . . . . . . . . . . . . . . . . . . . B-7 Monitoring Onpload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-9 Configuration Parameters . . . . . . . . . . . . . . . . . . . . . . . . B-11 Configuration Parameters (cont.) . . . . . . . . . . . . . . . . . . B-12 Improving Performance . . . . . . . . . . . . . . . . . . . . . . . . . . B-14

Appendix C

Advanced Features Availabe Through IPLoad


Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-2 Load Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-3 Express Load Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-4 Deluxe Load Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-6 Browsing Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-8 Violations Table Browser . . . . . . . . . . . . . . . . . . . . . . . . . . C-9 Log File Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-10 Connecting to an Active Job . . . . . . . . . . . . . . . . . . . . . . C-11 No Conversion Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-12

Appendix D

The System Catalog


Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-2 The System Catalog: Data About Your Data . . . . . . . . . . . D-3 Automatic Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . D-4 Querying the System Catalog . . . . . . . . . . . . . . . . . . . . . . D-5 SYSTABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-6 SYSCOLUMNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-8 Calculating the Column Size . . . . . . . . . . . . . . . . . . . . . . D-10 SYSINDEXES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-11 SYSFRAGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-12

xvi

SYSFRAGAUTH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSDISTRIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSUSERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSTABAUTH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSCOLAUTH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSVIEWS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSDEPEND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSSYNTABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSCONSTRAINTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSREFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSCHECKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSCOLDEPEND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSDEFAULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSPROCEDURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSPROCBODY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSPROCPLAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSPROCAUTH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSTRIGGERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSTRIGBODY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSBLOBS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSOPCLSTR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSOLEAUTH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSOBJSTATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSVIOLATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System Catalog Summary . . . . . . . . . . . . . . . . . . . . . . . .

D-14 D-15 D-16 D-18 D-19 D-20 D-22 D-23 D-25 D-26 D-27 D-28 D-29 D-30 D-31 D-32 D-33 D-34 D-36 D-37 D-38 D-39 D-40 D-41 D-42

Appendix E

System Monitoring Interface


The Sysmaster Database . . . . . . . . . . . . . . . . . . . . . . . . . E-2 How SMI Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-4

Appendix F

Working With IBM Informix Customer Support


Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .F-2 IBM Informix Customer Support . . . . . . . . . . . . . . . . . . . . .F-3 The Customer Service Handbook . . . . . . . . . . . . . . . . . . . .F-4 Characterizing Your Problem . . . . . . . . . . . . . . . . . . . . . . .F-5 Automatic Call Distribution (ACD) . . . . . . . . . . . . . . . . . . . .F-7 The Problem Resolution Process . . . . . . . . . . . . . . . . . . . .F-8 Assigning Priority Settings . . . . . . . . . . . . . . . . . . . . . . . . .F-11 Dial-Up Access and Confidentiality Agreement . . . . . . . . .F-13

xvii

Extended Hours Support . . . . . . . . . . . . . . . . . . . . . . . . . . F-14 TechInfo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F-17 Tech Notes and CS Times . . . . . . . . . . . . . . . . . . . . . . . .F-18 Case Escalation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .F-20 The Latest and Greatest Version . . . . . . . . . . . . . . . . . . . . F-21

Appendix G

Using Global Language Support


Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-2 Global Language Support . . . . . . . . . . . . . . . . . . . . . . . . . G-3 What is a Locale? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-4 A Locale Specifies a Code Set . . . . . . . . . . . . . . . . . . . . . G-6 Multibyte Code Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-7 Using Multibyte Code Sets . . . . . . . . . . . . . . . . . . . . . . . . G-8 A Locale Specifies a Collation Order . . . . . . . . . . . . . . . . G-10 Localized Specific Collation: NCHAR & NVARCHAR . . . G-11 Collation Order and SQL Statements . . . . . . . . . . . . . . . G-12 A Locale Specifies Numeric and Monetary Formats . . . . G-14 A Locale Specifies Date and Time Formats . . . . . . . . . . G-15 Date and Time Customization . . . . . . . . . . . . . . . . . . . . . G-16 Locales: Client, Database and Server . . . . . . . . . . . . . . . G-17 Specifying Locales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-19 Multiple Locales: Code Set Conversion . . . . . . . . . . . . . . G-20 Code Set Conversion: Performance Consideration . . . . . G-22 Multibyte Character Support for Utilities/APIs . . . . . . . . . G-23 The glfiles Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-24 Migrating to GLS from NLS or ALS . . . . . . . . . . . . . . . . . G-25

xviii

Vous aimerez peut-être aussi