Because of the exponentially escalating choices of binary join orders available to the Optimizer, algorithms and heuristics must replace the brute force method of evaluating all possible combinations against one another to determine the lowest cost join plan.
Join order evaluation is a critical aspect of query optimization that is highly dependent on accurate statistics. Using out of date statistical and demographic data can result in inaccurate assumptions about the cardinalities of intermediate results. Such inaccurate assumptions cause errors in join plan estimation that propagate exponentially as a function of the number of joins a query makes.
Join Order Evaluation Process Overview
The following provides a simplified explanation of the process the Optimizer follows when evaluating join orders in the process of generating a join plan.
- Generate a 1-join lookahead plan.
-
Conditions Optimizer Action - The cost of the 1-join lookahead plan exceeds a threshold value.
- There exists one relation larger than a threshold value.
- The query has no outer join requirement.
Generate a 5-join lookahead plan and evaluate its cost. Keep the less-costly plan from the 1- and 5-join lookahead plans. -
Conditions Optimizer Action - The cost of the current plan exceeds a threshold value.
- A star join may be required.
Generate a plan that uses star/snowflake join optimization, if such a plan can be found, compare the star/snowflake and lookahead join plans, and keep the less-costly plan.
Lookahead Join Planning
When planning an optimal join sequence for a query, the Optimizer uses a "what if?" method to evaluate the possible sequences for their relative usefulness to the query plan. This "what if?" method is called lookahead, because the evaluation looks forward one or more plans to examine the ultimate cost of joining relations in different orders.
The goal of join order analysis is to reduce the number of join order possibilities to a workable subset. The Optimizer search of join space is driven by join conditions, and uses a recursive greedy algorithm to search and evaluate connected relations. In this context, the term greedy means the search generates the next logical expression for evaluation based on the result of the costing determined by the previous step. From the perspective of search trees, a greedy algorithm always begins at the bottom of the tree and works its way toward the top.
The following is a simplified explanation of lookahead join planning:
- Find the best 3-way or n-way join plans among the combinations considered.Follow these rules to find them:
- Use a depth-first search.
- Skip a join plan if any one of the following conditions is detected:
- A less costly plan exists that delivers a result with similar or better attributes.
- The accumulated cost exceeds that of the current candidate join plan.
- Only consider joins between connected relations.
- Join an unconnected relation only after all connected relations have been joined.
The Optimizer does not evaluate all possible combinations of relations, because the effort exceeds any realized optimizations. For example, a 10-way join has 17.6 x 109 possible combinations, and a 64-way join has 1.2 x 10124 possible combinations.
In pursuit of combinations driven mainly by join conditions, the Optimizer may overlook the best possible join plan, but in practice, that generally does not happen.
- Evaluate the best current join plan.
Conditions Optimizer Action For the first 2 joins of the best join plan: - The joins involve one relation connected with 2 relations not connected to each other
- Neither join is a product join
Try a compromise join where the 2 unconnected relations are product-joined in the first join and that result is then joined with a third relation. Compromise join plan is less costly than the best current join plan. Replace the first 2 joins on the best join plan with the compromise plan. The compromise plan is designed to enable a star join plan.
- Materialize the first join of the best n-way join into a relation and evaluate the materialized relation for the following conditions:
- The number of lookaheads is sufficient to generate the complete join plan.
- The compromise join plan is not used.
Conditions Optimizer Action Met Remaining joins of the best plan are also committed. Not met Second join is also committed if joining another relation with the result of the first join using the primary index for the newly joined relation. - Generate connection information for new relations based on the predicates.
- Iterate Stages 1 through 4 until only one active relation remains.
Example: Compromise Join Plan
Consider the following request.
SELECT * FROM t1, t2, t3, t4 WHERE x1=x2 AND y1=y3 AND z1=y4 ;
The first join diagram in the following graphic illustrates the connections among the relations in this query.
- t1 is connected to t2 because of the condition x1=x2
- t1 is connected to t3 because of the condition y1=y3
- t1 is connected to t4 because of the condition z1=y4
Step 2 shows that when the following conditions are true, the Optimizer tries to make a compromise join where the two unconnected relations are product-joined and the result is joined with the third relation:
- The first two joins of the best plan involve a relation connected with two relations not connected to one another
- Neither join is a product join.
The second join diagram in the graphic indicates that the join of t1 with t2 followed by a join of the resulting relation t1t2 with t3 is not as good as the compromise join of the unconnected relations t2 with t3 followed by a join of the resulting relation t2t3 with t1.
Because of this, the compromise join is selected for the join plan, as indicated by the third join diagram in the graphic.
Example: One-Join Lookahead Processing of n-Way Joins
The following graphic illustrates the result of an Optimizer analysis of the join space for a 6-way join. The relative cost of each join is given by an integer within the selected join sequence.
These relationships are explicitly stated in the following table:
Binary Join | Relative Cost | Resulting Relation |
---|---|---|
A - B | 1 | AB |
AB - C | 4 | ABC |
D - F | 1 | DF |
DF - E | 2 | DFE |
ABC - DFE | 5 | ABCDEF |
The following graphic illustrates one-join lookahead processing of an n-way join and how the Optimizer mat select one join sequence over another based on their relative costs:
The cost relationships among the relations are explicitly stated in the following table:
Binary Join | Relative Cost | Resulting Relation |
---|---|---|
D - E | 1 | DE |
D - F | 2 | DF |
DE - F | 5 | DEF |
DF - E | 2 | DEF |
ABC - DFE | 5 | ABCDEF |
The Optimizer sums the individual binary join costs to produce an overall join cost, and evaluates that cost determine the least costly join sequence. For this example, the following join sequence costs are determined:
((DE)F) = 1 + 5 = 6 ((DF)E) = 2 + 2 = 4
As a result, the second sequence, joining relations D and F first following by joining the resulting relation DF with relation E, is less costly and is therefore selected for the join plan.