Friday, June 20, 2008

SQL Optimizers (RBO & CBO)

Whenever you execute a SQL statement, a component of the
database known as the optimizer must decide how best
to access the data operated on by that statement. Oracle supports two
optimizers: the rule-base optimizer (which was the original), and the cost-based
optimizer.

To figure out the optimal execution path for a statement, the
optimizers consider the following:


  • The syntax you've specified for the statement
  • Any conditions that the data must satisfy (the WHERE
    clauses)
  • The database tables your statement will need to access
  • All possible indexes that can be used in retrieving
    data from the table
  • The Oracle RDBMS version
  • The current optimizer mode
  • SQL statement hints
  • All available object statistics (generated via the
    ANALYZE command)
  • The physical table location (distributed SQL)
  • INIT.ORA settings (parallel query, async I/O, etc.)

Oracle gives you a choice of two optimizing alternatives: the
predictable rule-based optimizer and the more intelligent cost-based optimizer.


Understanding the Rule-Based Optimizer


The rule-based optimizer (RBO) uses a predefined set of
precedence rules to figure out which path it will use to access the database.
The RDBMS kernel defaults to the rule-based optimizer under a number of
conditions, including:



  • OPTIMIZER_MODE = RULE is specified in your INIT.ORA
    file
  • OPTIMIZER_MODE = CHOOSE is specified in your INIT.ORA
    file, and no statistics exist for
    any table involved in the statement
  • An ALTER SESSION SET OPTIMIZER_MODE = RULE command has
    been issued
  • An ALTER SESSION SET OPTIMIZER_MODE = CHOOSE
    command has been issued, and no statistics exist for
    any
    table involved in the statement
  • The rule hint (e.g., SELECT /*+ RULE */. . .) has been
    used in the statement

Using the RBO, the optimizer chooses an execution plan based on the access paths available and the ranks of these access paths. Oracle's ranking of the access paths is heuristic. If there is more than one way to execute a SQL statement, then the RBO always uses the operation with the lower rank. Usually, operations of lower rank execute faster than those associated with constructs of higher rank.

Rule-based optimizer condition rankings

RBO Path 1: Single Row by Rowid
RBO Path 2: Single Row by Cluster Join
RBO Path 3: Single Row by Hash Cluster Key with Unique or Primary Key
RBO Path 4: Single Row by Unique or Primary Key
RBO Path 5: Clustered Join
RBO Path 6: Hash Cluster Key
RBO Path 7: Indexed Cluster Key
RBO Path 8: Composite Index
RBO Path 9: Single-Column Indexes
RBO Path 10: Bounded Range Search on Indexed Columns
RBO Path 11: Unbounded Range Search on Indexed Columns
RBO Path 12: Sort Merge Join
RBO Path 13: MAX or MIN of Indexed Column
RBO Path 14: ORDER BY on Indexed Column
RBO Path 15: Full Table Scan

http://www.oracle.com.cn/other/9ionlinedoc/server.920/a96533/rbo.htm


Understanding the Cost-Based Optimizer


The CBO determines which execution plan is most efficient by considering available access paths and by factoring in information based on statistics for the schema objects (tables or indexes) accessed by the SQL statement. The CBO also considers hints, which are optimization suggestions placed in a comment in the statement.

The CBO performs the following steps:
The optimizer generates a set of potential plans for the SQL statement based on available access paths and hints. The optimizer estimates the cost of each plan based on statistics in the data dictionary for the data distribution and storage characteristics of the tables, indexes, and partitions accessed by the statement.
The cost is an estimated value proportional to the expected resource use needed to execute the statement with a particular plan. The optimizer calculates the cost of access paths and join orders based on the estimated computer resources, which includes I/O, CPU, and memory.
Serial plans with higher costs take more time to execute than those with smaller costs. When using a parallel plan, however, resource use is not directly related to elapsed time.
The optimizer compares the costs of the plans and chooses the one with the lowest cost

CBO Statistics in the Data Dictionary

The statistics used by the CBO are stored in the data dictionary. You can collect exact or estimated statistics about physical storage characteristics and data distribution in these schema objects by using the DBMS_STATS package or the ANALYZE statement.
Oracle strongly recommends that you use the DBMS_STATS package rather than ANALYZE to collect optimizer statistics. That package lets you collect statistics in parallel, collect global statistics for partitioned objects, and fine tune your statistics collection in other ways. Further, the cost-based optimizer will eventually use only statistics that have been collected by DBMS_STATS. However, you must use the ANALYZE statement rather than DBMS_STATS for statistics collection not related to the cost-based optimizer, such as:
To use the VALIDATE or LIST CHAINED ROWS clauses To collect information on freelist blocks
To maintain the effectiveness of the CBO, you must have statistics that are representative of the data. For table columns that contain values with large variations in number of duplicates, called skewed data, you should collect histograms.
The resulting statistics provide the CBO with information about data uniqueness and distribution. Using this information, the CBO is able to compute plan costs with a high degree of accuracy. This enables the CBO to choose the best execution plan based on the least cost.

How the CBO Optimizes SQL Statements for Fast Response


The CBO can optimize a SQL statement for fast response when the parameter OPTIMIZER_MODE is set to FIRST_ROWS_n, where n is 1, 10, 100, or 1000, or FIRST_ROWS. A hint FIRST_ROWS(n), where n is any positive integer, or FIRST_ROWS can be used to optimize an individual SQL statement for fast response.
Fast-response optimization is suitable for online users, such as those using Oracle Forms or Web access. Typically, online users are interested in seeing the first few rows and seldom look at the entire query result, especially when the result size is large. For such users, it makes sense to optimize the query to produce the first few rows as quickly as possible, even if the time to produce the entire query result is not minimized.
With fast-response optimization, the CBO generates a plan with the lowest cost to produce the first row or the first few rows. The CBO employs two different fast-response optimizations, referred to here as the old and new methods. The old method is used with the FIRST_ROWS hint or parameter value. With the old method, the CBO uses a mixture of costs and rules to produce a plan. It is retained for backward compatibility reasons.
The new method is totally based on costs, and it is sensitive to the value of n. With small values of n, the CBO tends to generate plans that consist of nested loop joins with index lookups. With large values of n, the CBO tends to generate plans that consist of hash joins and full table scans.
The value of n should be chosen based on the online user requirement and depends specifically on how the result is displayed to the user. Generally, Oracle Forms users see the result one row at a time and they are typically interested in seeing the first few screens. Other online users see the result one group of rows at a time.
With the fast-response method, the CBO explores different plans and computes the cost to produce the first n rows for each. It picks the plan that produces the first n rows at lowest cost. Remember that with fast-response optimization, a plan that produces the first n rows at lowest cost might not be the optimal plan to produce the entire result. If the requirement is to obtain the entire result of a query, then fast-response optimization should not be used. Instead use the ALL_ROWS parameter value or hint.

No comments: