Consider the following example of two tables processed by the Version 6 optimizer:
Table TAB1 has 16,384 rows
Table TAB2 has 1 row.
Suppose you select TAB2 as the driving table (by specifying it second in the FROM clause):
SELECT COUNT (*) FROM TAB1, TAB2 0.96 seconds elapsed
count phys cr cur rows
------ ----- ----- ----- ------
Parse 1 0 0 0
Execute 1 95 100 4 0
Fetch 1 0 0 0 1
Now suppose that you select TAB1 as the driving table:
SELECT COUNT (*) FROM TAB2, TAB1 26.09 seconds elapsed
count phys cr cur rows
------ ----- ----- ----- ------
Parse 1 0 0 0
Execute 1 95 49247 32770 0
Fetch 1 0 0 0 1
You can see that specifying the correct driving table makes a huge difference in performance (0.96 versus 26.09 seconds). What's going on? When Oracle processes multiple tables, it must join the two tables. There are four different join methods in the Oracle database, the Nested Loop Join, the Sort-Merge Join, the Hash Join, and the Cartesian Join. In our case, we performed a Cartesian Join. A Cartesian Join will join every row of the driving table with every row of the second table. You can identify a Cartesian Join by the absence of any join condition in the WHERE clause. If there was a join condition specified, then the other three join operations could have been considered.
A Sort-Merge Join has a sort phase and a merge phase, hence its name. First, it scans and sorts the first (driving) table (the one specified second in the FROM clause). Next, it scans the second table (the one specified first in the FROM clause) and merges all of the rows retrieved from the second table with those retrieved from the first table.
The Nested Loop Join reads every row in the driving table and then finds matches in the second table. If there is no index on the join column, a full table scan is performed on the second table for every row in the driving table. If an index exists on the join column, the rows in the second table can be accessed much more quickly.
The Hash Join method uses a hashing algorithm (hence its name) to divide the driving table's rows into multiple buckets. The second table is hashed with the same algorithm. If the second table will be joined with a row from the first table, it will be found in that small bucket. The Hash Join method can be a quick join operation due to the fact that the work is broken down into tiny pieces without having to resort to expensive sort operations or multiple scans of the same table.
The 95 physical reads shown on the Execute line in the example above is the minimum number of database "block reads" needed to read all rows from the two tables. The large overhead differences arise from the ensuing number of consistent buffer reads ("cr") and current mode buffer reads ("cur").
If TAB1 is processed first, then all 16,384 rows (from TAB1) must be read first. Using the Sort-Merge Join, these records are sorted, but cannot fit into the area of memory allocated for each user's sort space (specified in the initilialization parameter SORT_AREA_SIZE or the PGA_AGGREGATE_TARGET). Consequently, these records are sorted in small runs with the data stored in temporary segments on disk. After all runs are completed, Oracle merges your data to produce the sorted data. The single piece of TAB2 data is read and joined with the TAB1 data.
By comparison, if TAB2 is processed first, only one row needs to be read (from TAB2), and this row can be sorted within the sort area. The TAB1 table is then joined with the TAB2 table.
Now consider an example in which the number of rows in TAB2 is slightly greater-four instead of one. The performance differential is not quite as extreme in this case.
Table TAB1 has 16,384 rows.
Table TAB2 has 4 rows.
SELECT COUNT (*) FROM TAB1, TAB2 4.00 seconds elapsed
count phys cr cur rows
------ ----- ----- ----- ------
Parse 1 0 0 0
Execute 1 384 386 10 0
Fetch 1 0 0 0 1
SELECT COUNT (*) FROM TAB2, TAB1 37.32 seconds elapsed
count phys cr cur rows
------ ----- ----- ----- ------
Parse 1 0 0 0
Execute 1 95 49247 32770 0
Fetch 1 0 0 0 1
Note that the driving table is the second table in the FROM clause of the SQL statement. This convention applies to the Rule Based Optimizer only. The Cost Based Optimizer (CBO) does not take the order of the tables in the FROM clause into account. Instead, it uses statistics on both tables in order to determine the optimal driving table and optimal join method. In this respect, the CBO relieves the DBA or application developer from the duty of determining the most favorable driving table.
Finally, the RBO has been desupported in Oracle 10g. In the future, the only optimizer will be the CBO. DBA's should be moving away from the RBO and converting to the CBO.
0 comments:
Post a Comment