Académique Documents
Professionnel Documents
Culture Documents
| (materialised views) | | or | | ANSI JOIN syntax is used | | in the query | ----------------------------------| | no | ----------------------------------yes | Is there a RULE hint ? |-------> ----------------------------------| | no | ----------------------------------| Is RULE set at SYSTEM or SESSION | yes | level and no CHOOSE hint ? |-------> ----------------------------------| | no | ----------------------------------| Is ALL_ROWS or FIRST_ROWS set at | yes | SYSTEM or SESSION level and no |-------> | CHOOSE hint ? | ----------------------------------| | no | ----------------------------------| Are there statistics for any of | yes | the tables ? |-------> ----------------------------------| | no | RULE
RULE
RULE
COST
COST
Notes : ------(1) Hints in view definitions or in the SELECT clause of an INSERT or CREATE TABLE are included for the flowchart. (2) Recursive SQL choose the goal base on the instance setting. This includes SQL called from PL/SQL. Of course hints can be used in any SQL called by PL/SQL to override this default. Instance -------RULE FIRST_ROWS ALL_ROWS CHOOSE Recursive --------RULE CHOOSE CHOOSE CHOOSE
(3) If any table in a query has a parallel degreee greater than one
(including the default degree), Oracle uses the cost-based optimizer for that query--even if OPTIMIZER_MODE = RULE, or if there is a RULE hint in the query itself (4) In general, be wary of the prescence of any new features as they are likely to require the use of the Cost Based Optimizer.
.
Tips to Write Effective Queries and Explain Plan Contents Detect Driving Table Several Tips Operations, Sort Operations)
Una pequea "regla de oro" es que la tabla que conduce o maneja el querie debe ser la que elimina filas para optimizar la salida y se debe elegir de primero.
RBO (optimizador basado regla) elige el orden de manejo (tabla que maneja el querie) tomando las tablas de DERECHA a IZQUIERDA en la clusula la FROM (TABLAS PEQUEAS EN FILAS Y COLUMNAS A LA IZQUIERDA Y TABLAS GRANDES EN FILAS Y COLUMNAS A LA DERECHA). CBO (optimizador basado en costo) determina el orden basado en los costos derivados de las estadsticas recopiladas en las tablas (TABLAS GRANDES EN FILAS Y COLUMNAS A LA IZQUIERDA Y TABLAS PEQUEAS EN FILAS Y COLUMNAS A LA DERECHA).
Si no hay estadsticas, si el optimizer_mode es por costos, entonces CBO elige el orden que manejo de tablas de IZQUIERDA a DERECHA en de clusula WHERE, si el optimizer_mode entonces mtodo de RBO es utilizado (DERECHO a la IZQUIERDA). Por lo tanto, es importante saber la tabla que conduce, que tiene el nmero ms pequeo de filas en una pregunta.
La tabla que conduce es importante porque se recupera primero, y las filas de la segunda tabla entonces se combinan en el resultado fijado de la primera tabla. Si join de tres tablas, seleccione la tabla de la interseccin como la tabla que conduce. La tabla de la interseccin es la tabla que tiene muchas tablas dependientes en ella. SELECT something FROM detail, master WHERE detail.key = master.key
Now, there is usually other conditions in the WHERE clause and the optimizer (RULE based) will decide which table to use as a driving table depending on what conditions is present and what indexes is available.In this case the detail table will be the driving table because of the explicit condition on the key column in the table
SELECT something FROM detail, master WHERE detail.key = master.key AND detail.key = 'some value'
While in this case the master table will be driving table SELECT something FROM detail, master WHERE detail.key = master.key AND master.key = 'some value' So you have to look at whole statement and of course this is relevant for RULE based approach ONLY! The case for the Cost Based Optimizer (CBO) is that if it has enough statistical information about the tables, it can construct the plan based on data, not just structure and might use another access path for the same statement as the RULE based approach would use. The best way to identify the driving table is to look at the Explain plan for the query. The WHERE clause is the main decision maker about which indexes to use. You should always try to use your unique indexes first, and then if that is not possible then use a non-unique index. For a query to use an index, one or more fields from that index need to be mentioned in the WHERE clause. On concatenated indexes the index will only be used if the first field in the index is mentioned. The more of its fields are mentioned in the where clause, the better an index is used. Furthermore, if only some or one of the fields are used in the WHERE Clause, it should be the first fields in the index. If the second field in an index is used in a where clause but not the first field, the index will not get used. The "ORACLE8i the Complete Reference" book has a chapter on Optimizer. In this chapter under the description of Nested Loops method, the author talks about the importance of the FROM clause and the way the optimizer processes the tables listed there.
The TIPS
Although two SQL statements may produce the same result, Oracle may process one faster than the other. You can use the results of the EXPLAIN PLAN statement to compare the execution plans and costs of the two statements and determine which is more efficient. Following are some tips that help in writing efficient queries. <!--[if !supportLists]-->0. <!--[endif]-->
Existence of a row Do not use Select count(*) to test the existence of a row. Open an explicit cursor, fetch once, and then check cursor%NOTFOUND :
If you are going to insert a row or update one if that exists, instead of this: DECLARE /* Declare variables which will be used in SQL statements */ v_NewMajor VARCHAR2(10) := 'History'; v_FirstName VARCHAR2(10) := 'Scott'; v_LastName VARCHAR2(10) := 'Urman'; v_exists number := 0; BEGIN Select count(1) into v_exists from students WHERE first_name = v_FirstName AND last_name = v_LastName; If v_exists = 1 then /* Update the students table. */ UPDATE students SET major = v_NewMajor WHERE first_name = v_FirstName AND last_name = v_LastName; else INSERT INTO students (ID, first_name, last_name, major) VALUES (10020, v_FirstName, v_LastName, v_NewMajor); END IF; END; /
!!!! PROCEDURE PROCESO1 (..) IS V_resultado VARCHAR2(4000) := null; BEGIN . MODIFICA(V_RESULTADO); IF V_RESULTADO IS NOT NULL ROLLBACK; ELSE OTROPROCEDURE; IF V_RESULTADO IS NOT NULL ROLLBACK; ELSE COMMIT; END IF; END IF; END ; /
PROCEDURE MODIFICA ( P_RESULTADO IS /* Declare variables which will be used in SQL statements */ v_NewMajor VARCHAR2(10) := 'History'; v_FirstName VARCHAR2(10) := 'Scott'; v_LastName VARCHAR2(10) := 'Urman'; BEGIN p_resultado := NULL; begin /* Update the students table. */ UPDATE students SET major = v_NewMajor WHERE first_name = v_FirstName AND last_name = v_LastName; EXCEPTION WHEN DUP_VAL_ON_INDEX THEN P_resultado := ERROR, ACTUALIZANDO TABLA students LLAVE DUPLICADA||SQLERRM; WHEN OTHERS THEN P_resultado := ERROR, ACTUALIZANDO TABLA students OTRO ERROR ||SQLERRM; END; /* Check to see if the record was found. If not, then we need to insert this record. */ IF SQL%NOTFOUND THEN BEGIN INSERT INTO students (ID, first_name, OUT VARCHAR2 )
last_name, major) VALUES (10020, v_FirstName, v_LastName, v_NewMajor); EXCEPTION WHEN DUP_VAL_ON_INDEX THEN P_resultado := ERROR, INSERTANDO TABLA students LLAVE DUPLICADA||SQLERRM; WHEN OTHERS THEN P_resultado := ERROR, INSERTANDO TABLA students OTRO ERROR||SQLERRM; END; END IF; END; /
INSERT INTO RecognitionLog(MachineName,StartDateTime) values(p_MachineName,p_StartDateTime); p_RowsAffected := SQL%ROWCOUNT; COMMIT; RETURN 0; EXCEPTION When Dup_val_on_index then UPDATE RecognitionLog SET EndDateTime = p_EndDateTime, TotalRecognized = p_TotalRecognized, TotalRecognitionFailed = p_TotalRecognitionFailed WHERE MachineName = p_MachineName AND StartDateTime = p_StartDateTime; p_RowsAffected := SQL%ROWCOUNT; commit; RETURN 0; when others then rollback; p_RowsAffected := 0; return 1 1. Avoid the use of NULL or IS NOT NULL. Instead of: Select * from clients where phone_number is null; Use: Select * from clients where phone_number = 0000000000000000; 2. Use DECODE when you want to scan same rows repetitively or join the same table repetitively.
3. If three tables are being joined, select the intersection tables as the driving table. The intersection table is the table that has many tables dependent on it.
4. Always use table alias and prefix all column names with the aliases when you are using more than one table. 5. Use EXISTS instead of IN or NOT EXISTS in place of NOT IN.
EXISTS vs. IN The EXISTS function searches for the presence of a single row meeting the stated criteria as opposed to the IN statement which looks for all occurrences. For example:
PRODUCT - 1000 rows ITEMS - 1000 rows (A) SELECT p.product_id FROM products p WHERE p.item_no IN (SELECT i.item_no FROM items i); (B) SELECT p.product_id FROM products p WHERE EXISTS (SELECT '1' FROM items i WHERE i.item_no = p.item_no);
For query A, all rows in ITEMS will be read for every row in PRODUCTS. The effect will be 1,000,000 rows read from items. In the case of query B, a maximum of 1 row from ITEMS will be read for each row of PRODUCTS, thus reducing the processing overhead of the statement.
6. Use Joins in place of EXISTS. SELECT * FROM emp e WHERE EXISTS (SELECT d.deptno FROM dept d WHERE e.deptno = d.deptno AND d.dname = 'RESEARCH'); To improve performance use the following: SELECT * FROM emp e, dept d WHERE e.deptno = d.deptno AND d.dname = RESEARCH; 7. EXISTS in SELECT FROM WHERE place of DISTINCT. DISTINCT d.deptno, d.dname , dept d, emp e d.deptno = e.deptno(+);
SELECT d.deptno , d.dname FROM dept d WHERE EXISTS (SELECT e.deptno FROM emp e WHERE d.deptno = e.deptno);
8. Math Expressions. The optimizer fully evaluates expressions whenever possible and translates certain syntactic constructs into equivalent constructs. This is done either because Oracle can more quickly evaluate the resulting expression than the original expression or because the original expression is merely a syntactic equivalent of the resulting expression. Any computation of constants is performed only once when the statement is optimized rather than each time the statement is executed. Consider these conditions that test for monthly salaries greater than 2000: sal > 24000/12 sal > 2000 sal*12 > 24000 If a SQL statement contains the first condition, the optimizer simplifies it into the second condition. Note that the optimizer does not simplify expressions across comparison operators. The optimizer does not simplify the third expression into the second. For this reason, application developers should write conditions that compare columns with constants whenever possible, rather than conditions with expressions involving columns. The Optimizer does not use index for the following statement. SELECT * FROM emp WHERE sal*12 > 24000; Instead use the following statement. SELECT * FROM emp WHERE sal > 24000/12;
9. Never use NOT in an indexed column. Whenever Oracle encounters a NOT in an index column, it will perform full-table scan. SELECT * FROM emp WHERE NOT deptno = 0; Instead use the following. SELECT * FROM emp WHERE deptno > 0; 10. Never use a function / calculation on an indexed column (unless you are SURE that you are using an Index Function Based new in Oracle 8i). If there is any function is used on an index column, optimizer will not use index. Use some other alternative. If you dont have another choice, keep functions on the right hand side of the equal sign. The Concatenate || symbol will also disable indexes. Examples: /** Do not use **/ SELECT * FROM emp WHERE SUBSTR (ENAME, 1,3) = MIL; /** Suggested Alternative **/ Note: Optimizer uses the index only when optimizer_goal is set to FIRST_ROWS. SELECT * FROM emp WHERE ENAME LIKE 'MIL%;
/** Do not use **/ SELECT * FROM emp WHERE sal! = 0; Note: Index can tell you what is there in a table but not what is not in a table. Note: Optimizer uses the index only when optimizer_goal = FIRST_ROWS. /** Suggested Alternative **/ SELECT * FROM emp WHERE sal > 0;
/** Do not use **/ SELECT * FROM emp WHERE ename || job = MILLERCLERK; Note: || is the concatenate function. Like other functions it disables index. /** Suggested Alternative **/ Note: Optimizer uses the index only when optimizer_goal=FIRST_ROWS. SELECT * FROM emp WHERE ename = 'MILLER' AND job = CLERK;
11. Whenever possible try to use bind variables If possible, instead of doing this: DECLARE BEGIN UPDATE f_sales_detail SET quantity = 0 WHERE sales_id = 2314; END; Perform the following DECLARE v_quantity NUMBER := 0; <-- in-bind variables v_sales_id NUMBER := 2314; <-- in-bind variables BEGIN UPDATE f_sales_detail SET quantity = v_quantity WHERE sales_id = v_sales_id; END;
12. Use the same convention for all your queries. Remember that Select * from emp where dept = :dept_no Is different than Select * from EMP where dept = :dept_no 13. Do not use the keyword HAVING use the keyword WHERE instead
14. Avoid Multiple Sub queries where possible Instead of this: Update emp Set emp_cat = (select max (category) from emp_categories), sal_range = (select max(sal_range) from emp_categories); Use: Update emp Set (emp_cat, sal_range) = (Select max (category), max (sal_range) from emp_categories) ;
15. Use OR in place of IN Least Efficient: Select . From location Where loc_id in (10,20,30)
16. Do not Commit inside a Loop Do not use a commit or DDL statements inside a loop or cursor, because that will make the undo segments needed by the cursor unavailable. If I need to commit inside a cursor is better to use something like: Many applications commit more frequently than necessary, and their performance suffers as a result. In isolation a commit is not a very expensive operation, but lots of unnecessary commits can nevertheless cause severe performance problems. While a few extra commits may not be noticed, the cumulative effect of thousands of extra commits is very noticeable. Look at this test. Insert 1,000 rows into a test table -- first as a single transaction, and then committing after every row. Your mileage may vary, but these are my results, on an otherwise idle system show a performance blowout of more than 100% when committing after every row. create table t (n number); --BAD METHOD declare start_time number; begin start_time := dbms_utility.get_time; for i in 0..999 loop insert into t values (i); commit; end loop; dbms_output.put_line(dbms_utility.get_time - start_time || ' centiseconds'); end; / 102 centiseconds truncate table t; --GOOD METHOD declare start_time number; begin start_time := dbms_utility.get_time; for i in 0..999 loop insert into t values (i); end loop; commit; dbms_output.put_line(dbms_utility.get_time - start_time || ' centiseconds'); end; / 44 centiseconds
EXECUTIVE OVERVIEW
In Oracle Database 10g, only one query optimizer will be supported: the Cost-Based Optimizer (CBO). Oracles legacy optimizer, the Rule-Based Optimizer (RBO), is no longer supported. While the majority of applications running on Oracle today use the CBO, some applications continue to use the RBO. This paper describes the steps in migrated an application from the RBO to the CBO.
INTRODUCTION
Oracle introduced the Cost-Based Optimizer (CBO) over a decade ago, with the release of Oracle7 in 1992. Prior to that, only the Rule-Based Optimizer (RBO) was available. Over the years, the CBO has been substantially improved and it supports all of Oracles new features. In contrast, the RBO has not been enhanced since the introduction of the CBO and lacks support for many fundamental features that have been introduced since Oracle7, such as bitmap indexes, function-based indexes, hash joins, indexorganized tables, and partitioning. While major packaged applications, such as SAP, Oracle E-Business Suite, and Peoplesoft use the CBO, there remain some applications that use the RBO for historical reasons. The purpose of this document is to describe how to migrate an existing application from the RBO to the CBO, a necessary pre-cursor for migrating RBO-based application to Oracle Database 10g.
Because applications may generate very complex SQL, query optimizers must be extremely sophisticated and robust to ensure good performance. Determining whether or not to use an index, or choosing a specific join technique for joining two tables are the most basic operations of a query optimizer. However, a query optimizer is also capable of much more sophisticated operations. For example, query optimizers may transform SQL statements, so that view definitions are merged into the text of the query, subqueries may be flattened and converted into joins, and predicates may be pushed into other portions of the SQL tables. These sophisticated transformations enable a query optimizer to find more efficient ways to execute complex queries. Both the RBO and CBO of the Oracle Database are query optimizers. The inputs to both optimizers are SQL statements and the outputs from the optimizers are strategies for efficiently executing those SQL statements. These strategies for executing the SQL statements are called execution plans, and they can be viewed using Oracles EXPLAIN PLAN facility. As their names imply, the RBO is a rule-based (heuristic) query optimizer, while the CBO is a cost-based query optimizer. The RBO chooses its plans based on a set of fixed rules. For example, if one has a query of the form select * from EMP where EMPNO < 5000, and if there is an index on EMPNO, then one of the RBOs rules specifies that this query will always be executed using the index. The behavior of the RBO is entirely fixed; the execution plan for the sample query remains the same regardless of whether the EMP tables is 10 rows or 10 million rows and regardless of whether the EMPNO < 5000 predicate will return 2 rows or 2 million rows.
The CBO, which is Oracles sole query optimizer starting with Oracle Database 10g, uses a cost-based optimization strategy, in which multiple execution plans are generated for a given query, and then an estimated cost is computed for each plan. The query optimizer chooses the plan with the lowest estimated cost.
The CBO has built-in knowledge about the cost properties of all of the different access and join methods within the Oracle database. The CBO uses this knowledge in conjunction with statistical information about the database and the objects in this database. There are three primary areas of statistics used as input to the CBO: Statistics which describe the database objects involved in the query, e.g., the number of rows in a table, the number of distinct values in a column, and the number of leaf blocks of an index. Statistics on the relative performance of the hardware platform (so-called CPU statistics). These statistics help the CBO to understand how efficiently the underlying hardware platform can process cpu-intensive and io-intensive operations. Every combination of operating system, hardware server, and storage is different, so that CPU statistics allow the CBO to automatically compensate for the strengths and weaknesses of each configuration. Statistics on the buffer cache, which describe whether a given table or database object is typically cached or not.
The CBO requires accurate statistics in order to deliver good query performance. Hence, it is crucial that the statistics the CBO uses are available and representative. The section of Statistics Management, below, discusses how Oracle automates statistics collections as well as how DBAs can customize statistics collection. The use of statistics implies that the CBO by its nature is more dynamic than the RBO. Collecting new statistics could lead to some changes in execution plan if the new statistics are sufficiently different from the old. This behavior is desirable since, as a database grows over time, the optimal execution plans for an application may change as well. Moreover, for a packaged application, different installations may have different properties: a packaged application installed in a 50,000-employee company will have significantly different data characteristics than the same packaged application installed in a 200-employee
company. There may not exist a set of execution plans that is optimal for all installations. With the CBO, the particular properties of the installation will determine the execution plans through the use of optimizer statistics. In contrast to the dynamic properties of the CBO, the RBO has no notion of cost or cardinality. The RBO has no way of distinguishing a small table from a large one or a highly selective condition from a nonselective one. Consequently, the RBO is unable to generate an execution plan based on such properties or adjust execution plans over time as the properties of the database change. As a result, even RBO plans that were hand tuned to be optimal when the application was first implemented may become inefficient as data sets grow or change. A more in-depth description of the features and functionality of the CBO is available in Oracles documentation, in particular in the following books: Database Concepts, Database Performance Tuning Guide and Reference, and (for data-warehouse environments) Data Warehousing Guide.
APPLICATION MIGRATION
The process of migrating from the RBO to the CBO involves many of the same issues as any other database migration. We assume that the reader is familiar with the recommended practices, such as conducting performance testing on a test system before migrating the production system. General documentation about these practices can be found in Database Migration book of Oracles documentation. This white paper adds a more in-depth discussion of the issues related to optimizer migration, and specifically the issues of understanding and addressing changes in execution plans, than is provided in the Database Migration manual. Changes in execution plans due to different optimizer behavior always carry a certain element of risk. Let us say that an application generates 100,000 SQL-statements and 1,000 of them change as a result of the migration. Even if 999 out of those 1,000 plan changes resulted in improved performance, the single query that deteriorated could well outweigh the improvements if it is a critical query where the response time now is unacceptable. In other words, a small number of queries where the performance goes from acceptable to unacceptable may well overshadow a larger number of queries where performance goes from acceptable to even better. Hence, it is prudent to conduct sufficient testing before migrating a production system to ensure that there are no performance issues causing disruptions. Rather than suggest a single migration strategy, this white paper suggests a range of approaches for moving to the CBO depending upon the nature of the current RBO-based application. This section will briefly discuss three strategies for migrating to the RBO, based on three profiles: Type A: Description: High-profile, mission-critical application. Performance requirements: Any performance degradations could potentially have severe consequences on company operations. Migration goal: Zero performance degradations for all commonly-executed SQL statements. Type B: Description: Application used by large number of users, and significant performance changes would be serious. However, minor or temporary performance changes would only be an inconvenience. Migration goal: Minor performance degradations for <10% of the queries would be acceptable. Minimize risk of severe performance degradations, but balance migration effort with risk. Type C: Description: Application's performance is not mission critical. Has a small number of users and/or the usage of this application is peripheral to the end-user's job function. Minor performance degradations would likely be unnoticed.
Migration goal: Minor performance degradations are acceptable. Minimize the effort required for migration. The following provides a high-level migration strategy for each of these three types of applications, starting with the simplest scenario (Type C). This is only a high-level outline of the migration strategy. The subsequent sections provide more details on each step.
RELEVANT PARAMETERS
There are a few initialization parameters that are of specific interest when migrating from the RBO to the CBO.
OPTIMIZER_MODE
When migrating from the RBO to the CBO, the optimizer_mode parameter is of crucial importance. It can take the following values in Oracle9i: Choose. This value is the default and has the following meaning: If any object (table, index, etc.) referenced in the SQL-statement being optimized has associated optimizer statistics, the CBO will be used. Otherwise, the RBO will be used. If the CBO is used, it will try to optimize the statement in
all_rows mode (see below) so that it uses as little resources as possible to complete, thereby maximizing the throughput on the system. All_rows. This mode forces the CBO to optimize for minimal resource utilization (best throughput) regardless of whether optimizer statistics have been collected for the objects in the query. Rule. This mode forces the use of the RBO regardless of whether optimizer statistics exist. It should be noted that if certain types of objects, such as partitioned or index-organized tables, are referenced in a statement, the CBO will always be used regardless of the setting of optimizer_mode or the existence of optimizer statistics. Also, the existence of a hint inside the query will trigger the CBO, with some exceptions, such as a RULE hint. Note that the hint need not necessarily occur in the query text itself, but could reside the definition of a view referenced in the query. First_rows. This mode is meant to optimize for response time, i.e., the time it takes before the first result row is returned to the screen of the user. In Oracle9i (and later releases), we recommend using first_rows_n instead. First_rows_n. Legal values for n are 1, 10, 100, and 1000. This mode is meant to optimize for response time, i.e., the time it takes before the first batch of n rows is returned to the user. The size of the batch could be a single row or a screen full depending on the application; hence, the support for different values for n. The optimizer mode can also be set at the session level with alter session, or at the statement level using a hint. The setting at the session level overrides the initialization setting, and the setting at the statement level, in turn, overrides the setting for the session. When migrating from the RBO to the CBO, the optimizer_mode parameter is of interest for a number of different reasons: It is possible to optimize for response time rather than throughput (the default) using first_rows_n if it is appropriate for the application. Most OLTP applications should use the first_rows_n behavior when migrating to the CBO. This mode will provide the best performance for most OLTP applications and moreover will have the most similar execution plans to the RBO In order for the CBO to generate the same plan on the test system as it would on the production system, it is important that the optimizer statistics used on the test system are the same as they would be on the production system. To ensure that this is the case, it might be necessary to collect the optimizer statistics on the production system and export them to the test system. If the optimizer_mode is set to rule on an RBO-baed production system, then statistics can safely be gathered on the production system without impacting the end-users optimizer behavior. However, in order to prevent the existence of statistics from switching the production system to the CBO prematurely, it may be necessary to run the production system using the rule mode. This parameter can be set on a per-session basis. This will enable a DBA to set optimizer_mode to rule at the instance level, but selectively enabled specific users or session to use other setting of optimizer_mode. This technique can be used for testing, so that the behavior of different settings of optimizer_mode can be evaluated directly on the productions. This technique can also be used to migrate users individually (by setting this parameter on a per-session basis using a log-on trigger) to the CBO, rather than migrating all users at one time. After the migration of the production system to the CBO, the rule mode can be used to switch back to the RBO in case of an emergency.
OTHER PARAMETERS
In addition to the two parameters discussed so far, other optimizer parameter can also be of interest for the purpose of performance tuning. See the Oracle documentation manuals Database Performance Tuning Guide and Reference and Database Reference for further details.
CAPTURING A WORKLOAD
In order to analyze differences in execution plans, the SQL-statements of the relevant workload must be extracted. If the source code for the application is available, it may be possible to extract the SQL-text directly from that code. However, in many cases, extracting the SQL text is not an option. For example, the application source code may not be accessible or the SQL-text may be generated dynamically as the application is executed. A more general approach is to capture the statements by monitoring the contents of the cursor cache. The text of the cursors in the cache can be accessed using the V$SQL or V$SQLAREA views. (If the statement is longer than 1000 characters, the view V$SQLTEXT is needed in order to retrieve the full text of the statement.) In Oracle9i, the execution plan for each cursor is also available as well as statistics about the execution. This information is available through V$SQL_PLAN and V$SQL_PLAN_STATISTICS, which can be joined to V$SQL or V$SQLAREA to match up the SQL-text with the plan. Being able to get both the SQL-text and the corresponding execution plan simultaneously has several advantages. Firstly, it may obviate the extra effort needed to generate the plan using EXPLAIN PLAN. Secondly, it shows the actual plan and is therefore more accurate than EXPLAIN PLAN. This issue will be discussed in the section on Bind Variables. Oracles Statspack provides the functionality for monitoring these V$-views and taking a snapshot of their contents during regular intervals during a workload. See Oracle9i Database Performance Tuning Guide and Reference Release 2 (9.2). In order to capture the plans as well as the SQL-text, the statistics level needs to be set to 6 or higher. Moreover, in order to capture all the SQL-statements in the cursor cache, some threshold needs to be set to a level that is low enough. For example, the threshold for the number of executions could be set to 0. The interval between snapshots needs to be sufficiently small to minimize the risk that a statement is aged out of the cursor cache without being captured. How long that would be depends on the size of the cache and the arrival rate of SQL-statements that are not in the
cache. In addition, the period during which snapshots are taken needs to be long enough to ensure that all interesting aspects of the application have been exercised. It is also useful to be able to store the captured statements and execution plans in a repository for later use. Statspack records the snapshots in tables, and the data in these tables can be the foundation of such a repository. Ideally, a unique key should be assigned to each distinct SQL-statement. Such a key could also be used to identify each plan, e.g., through the STATEMENT_ID column in the standard PLAN_TABLE. This mechanism would support doing EXPLAIN PLAN SET STATEMENT_ID = INTO FOR for each SQL-statement if the queries cannot actually be executed on the test system. By creating a table with plans for each optimizer version that is tested, plan diffs can be performed through SQL-queries or through a simple PL/SQL script that compares the plans for each distinct SQL-statement to see if they are the same. See Appendix A for examples of such scripts. Maintaining a mechanism that maps each distinct SQL-statement to a unique key can be somewhat cumbersome, so it may be easier to use the SQL hash value instead. This value is directly available in the V$-views and also stored by Statspack for each statement in the snapshot. However, the hash value is not guaranteed to be unique and there is a slight risk of collisions and those would make plan comparisons more difficult for those particular statements where collisions occurred. For an application that generates several hundred thousand distinct statements, there may be a small number of such collisions. Dealing with a very small number of collisions when analyzing plan diffs may well be less time consuming than generating a mechanism that maintains keys that are guaranteed to be unique.
BIND VARIABLES
The Oracle EXPLAIN PLAN facility has never been able to fully guarantee that the execution plan shown for a SQL-statement involving bind variables would be the same plan that would actually be used when executing the statement as opposed to merely explaining it. The reason for the potential discrepancy is that EXPLAIN PLAN is allowed without any actual bindings for the bind variables and, therefore, Oracle is unaware of the actual data types or values of the execution time bindings. Prior to Oracle9i, the potential discrepancy would only concern statements where Oracle would insert a type conversion operator in some expression based on the actual data type of the value supplied as the binding for the bind variable. The plan generated by EXPLAIN PLAN would be unaware of this conversion operator and could show an index scan when the actual execution would use a full table scan. This kind of discrepancy should be relatively rare, but the introduction of the Bind Variable Peeking feature in Oralce9i increases the likelihood that such discrepancies will occur since the value of the bind variable may affect the plan as well. We will discuss this feature in the next section. However, it should be pointed out that Oracle9i also introduced the V$SQL_PLAN view, which is the only interface that is guaranteed to show the actual plan that a SQL-statement being executed is using.
Case 3 would seem to indicate that the bind variable mechanism is being misused by the application since some users will invariably get a highly suboptimal execution plan for the query and the overhead of the suboptimal plans will likely more than outweigh the benefits of cursor sharing. However, assuming that the bind variable mechanism is used correctly, it would be very beneficial for the optimizer to know which of cases 1 and 2 is the one that applies to the application. Oracle9i introduced the notion of bind variable peeking to deal with this issue. Bind variable peeking means that the optimizer will peek at the bind variable values submitted during a hard parse (that is, a compilation of a query that is not found in the cursor cache) and use those values to determine whether the range is narrow or wide and, hence, determine the optimal plan. Subsequent invocations of the same cursor while the original one is still cached will get the same plan based on the assumption that the use of bind variables means that cursor sharing is desired. The use of bind peeking could result in the execution plan being different from the plan that EXPLAIN PLAN will generate if the actual values that the optimizer peeks at affects the optimizers decisions. If the optimizer does not care about the actual bindings for a bind variable, it will not even bother peeking, but even if it does peek, it may still end up generating the same plan as EXPLAIN PLAN. There are two cases where the optimizer would peek at the actual bindings of a bind variable and where the actual bindings therefore could make a difference for what plan would get generated.
Range predicates. Example: sales_date between :1 and :2 and price > :3. Equality predicates when the column has histograms. Example: order_status = :4 assuming that order_status has histograms. In contrast, a condition like order_id = :1 will not trigger bind peeking assuming that order_id does not have histograms. (It may, for instance, be a primary key column, in which case histograms are not beneficial.) Obviously, the usefulness of a plan generated using EXPLAIN PLAN is limited if it is known that bind variable peeking would be invoked for the actual execution. However, the criteria, above, may help determine whether bind variable peeking would actually be an issue for a given query. Moreover, in many cases, examining the plan generated without peeking in conjunction with the query might be sufficient to determine if it would make a difference even under the assumption that it will actually take place.
If the performance impact is good overall and no major deterioration is found, this result should increase the confidence in the success of the migration. If, on the other hand, some of the new plans show unacceptable performance, a variety of steps may need to be taken depending on the exact nature of the plan changes. There are numerous techniques for query tuning in Oracle that can be applied to address underperforming queries, like creating histograms, changing the values of tuning parameters, using hints, etc. See Oracle9i Database Performance Tuning Guide and Reference, Release 2 (9.2). In addition, since the plan deterioration is in the context of a migration from the RBO to the CBO or from one version of the CBO to the next, it may be possible to use various features that could make Oracle revert to the old plan. We have already described the parameters optimizer_mode and optimizer_features_enable in an earlier section. There is also reason to consider the use of the Plan Stability feature which we will describe next.
STATISTICS MANAGEMENT
Since the CBO is dependent on accurate optimizer statistics in order to make good decisions, it is crucial to have a process in place that recollects the statistics at regular intervals to ensure that they are representative of the data. Oracle provides an Auto Gathering feature that significantly simplifies the task of keeping the optimizer statistics current in a production environment. We also discuss some of the underlying issues that may have to be addressed during a migration or when doing plan diffs.
minimum and maximum and the number of distinct values for the column. However, in some cases, the distribution of values is not uniform, but highly skewed. For example, a column for a status flag may take on a small number of distinct values, but 95 percent of the rows may have the value for normal. In order to be able to determine the selectivity of a condition involving a highly skewed column, Oracle supports histograms as a representation of the columns properties. By building a histogram with up to 254 buckets representing the distribution of values of the column, the optimizer can make accurate selectivity estimates even if the data distribution is highly skewed. However, the histogram representation of the column statistics uses up more space than the regular one. Hence, the task of managing the optimizer statistics has two dimensions. One is to make sure that all the tables and indexes have up-to-date statistics. The second is to make sure that all the columns that need histograms have them. Statistics management is done through the dbms_stats package. See Oracle9i Supplied PL/SQL Packages and Types Reference, Release 2 (9.2). This package contains routines for gathering (or deleting) statistics at various levels of granularity. For example, statistics can be gathered for all tables and indexes in a schema; statistics can be gathered for a specific table; or histograms can be created for a specific column. In addition, the package supports exporting and importing statistics between different systems, as well as storing a copy of the old statistics before starting a new collection of statistics. The package also supports sampling the objects in order to generate the statistics based on a subset of the data, and parallel execution of the statistics collection, both of which help speed up the statistics collection significantly. On a production system with tens of thousands of tables and hundreds of thousands of columns, it may be very hard to know what objects are in need of new statistics or what columns need histograms. Gathering statistics can require significant resources, so it should be avoided if the changes to the data of the object have been insignificant since the last time statistics were gathered. Likewise, computing histograms for columns where it is not needed is also a waste of resources. The auto gathering option for dbms_stats addresses these issues through the use of monitoring and automatic sizing histogram calculations.
MONITORING
There are two types of monitoring, DML-monitoring and column-usage monitoring. DML-monitoring means that Oracle keeps track of the approximate number of rows that have been modified in a table since last time the optimizer statistics were gathered. This feature will help dbms_stats determine whether the optimizer statistics for a given table should be considered stale and in need of being recollected. Column usage monitoring means that Oracle marks those columns that are used in a condition in the WHEREclause of a query. This type of monitoring is used to help dbms_stats determine whether there is any point of creating histograms for that column. Over time, precisely those columns that occur in WHERE-clauses will be marked. If a column never occurs in a WHERE-clause, there is no point in creating histograms for it no matter how skewed its data distribution is. Note that DML-monitoring has to be turned on explicitly since incurs a small amount of overhead for DML statements (typically less than 0.5 percent). The dbms_stats package supports turning on monitoring for all objects in a schema with a single command.
AUTO GATHERING
Assuming that monitoring is used, the following single dbms_stats command can be used to gather the relevant statistics: dbms_stats.gather_schema_stats(<schema_name>, options => 'GATHER AUTO');
This command tells dbms_stats to gather optimizer statistics for those objects for which the current statistics are considered stale. It will also create histograms for columns based on two criteria: The column must have been marked as occurring in some WHERE-clause, and the data distribution must be nonuniform. The first condition is determined through column usage monitoring. The second condition is determined during the actual computation of the column statistics. Moreover, the statistics collection will use sampling and use the minimum sample size (and hence the least amount of resources) required in order to generate accurate statistics. Note that the required sample size can be different for different objects and is determined dynamically based on the data distribution for each object. Finally, Oracle will automatically determine the degree of parallelism to use during statistics collection based on the current parameters setting relating to parallel queries. The only task that is not fully automated is the determination when it is appropriate to run the dbms_stats job to auto gather statistics. Even though auto gathering seeks to use up as little resources as possible, it could still have a significant impact on the system if a large number of large tables need new statistics. Hence, the DBA still has the responsibility to schedule the statistics gathering job at point in time when the load and usage pattern on the system permit it. We also strongly recommend that the old statistics is saved away before prior to collecting the new statistics. Normally, the old set of statistics is obsolete once the new set has been generated, but in the rare event that something goes wrong during the collection of the new statistics, the ability to fall back to the previous set of statistics can significantly reduce the negative impact in terms of performance and availability.
CONCLUSION
This white paper provided a basic strategy for migrating from the RBO to the CBO. By migrating to the CBO, RBO-based applications will be prepared to upgrade to Oracle Database 10g. Moreover, RBObased applications should gain both short-term and long-term performance and manageability gains via the CBO.
APPENDIX A
Below are some simple scripts intended to illustrate the process of performing plan diffs. First we give a script for generating plans for a set of statements using EXPLAIN PLAN rather then V$SQL_PLAN. This kind of script can be used to generate plans on the test system if the SQL-statements cannot actually be run there, e.g., due to lack of information about bind variable bindings. In this script, we assume that the text of the SQL-statements is stored in a table called sql_rep1 and that this table includes the columns sql_text and statement_id. We use statement_id as a way to match the plans with statements. We also need a plan table. In the script it is called st_plan and has the same definition as a standard PLAN_TABLE. In case statement_id is not a unique key for the SQL-statements, such as may happen if the SQL hash value is used, we give each plan a unique identifier in the remarks column. To get good performance for this operation, an index should be created on the statement_id column. We also use a table failed for those statements that give rise to errors during EXPLAIN PLAN, e.g., due to a permission problem.
create table failed(id varchar2(30));
The following PL/SQL script performs the explain plans and populates st_plan and failed.
declare id varchar2(30); stmnt long; c2 number; i integer := 0; dummy number; cursor c1 is select statement_id, sql_text from sql_rep1 order by statement_id; begin open c1; c2 := dbms_sql.open_cursor; loop fetch c1 into id, stmnt; exit when c1%notfound; begin dbms_sql.parse(c2, 'explain plan set statement_id = ''' || id || ''' into st_plan for ' || stmnt, dbms_sql.native); dummy := dbms_sql.execute(c2); update st_plan set remarks = to_char(i) where statement_id = id; i:= i + 1; commit; exception when others then insert into failed values(id); commit; end; end loop; dbms_sql.close_cursor(c2); close c1; end; /
The next script performs a plan diffs between two different tables with plans, plan_1 and plan_2. We assume that they have the same format as a standard PLAN_TABLE. We assume that plans with the same statement_id in the two tables correspond to the same SQL-statement, and we want to find those plans that are different for the same statement. First we create a table to store the result;
create table plan_diff(statement_id varchar2(30));
We make two assumptions about this particular plan diff. We are only interested in the basic operations, like join orders, join methods, and access paths. This particular diff does not compare PLAN_TABLE columns relating to partitioning keys, parallel query information, or predicate usage. Such information can be included in the diff by adding predicates to the query below. We also assume that there can be some statements for which there are plans in one of the tables but not in the other and that we do not want such plans to register in the diff. For this purpose, we intersect the result with the statement_id columns of the two tables.
insert into plan_diff select on (p1.statement_id = p2.statement_id and p1.id = p2.id and nvl(p1.operation, '0') = nvl(p2.operation, '0') and nvl(p1.object_name, '0') = nvl(p2.object_name, '0') and nvl(p1.options, '0') = nvl(p2.options, '0')) where (p1.id is null or p2.id is null) intersect select statement_id from plan_1 intersect select statement_id from plan_2; coalesce(p1.statement_id, p2.statement_id) from plan_1 p1 full outer join plan_2 p2