While possible to select an optimum join order when a small number of relations is to be joined, the exponentially escalating choices of binary join orders available to the Optimizer rapidly reach a point where algorithms and heuristics must replace the brute force method of evaluating all possible combinations to determine the lowest cost join plan.
Join order evaluation is a critical aspect of query optimization that is highly dependent on accurate statistics. Using outdated statistical and demographic data can cause inaccurate assumptions about the cardinalities of intermediate results. Such inaccurate assumptions cause errors in join plan estimation that propagate exponentially as a function of the number of joins a query makes.
Join Order Evaluation Process Overview
The following provides a simplified explanation of the process the Optimizer follows when evaluating join orders in the process of generating a join plan.
- Generate a 1-join lookahead plan.
-
Conditions Action - The cost of the 1-join lookahead plan exceeds a threshold value.
- There exists one relation larger than a threshold value.
- The query has no outer join requirement.
Generate a 5-join lookahead plan and evaluate its cost. Keep the less-costly plan from the 1- and 5-join lookahead plans. -
Conditions Action - The cost of the current plan exceeds a threshold value.
- A star join may be required.
Generate a plan that uses star/snowflake join optimization, if such a plan can be found. If so, compare the star/snowflake and lookahead join plans, and keep the less costly plan.
Lookahead Join Planning
When planning an optimal join sequence for a query, the Optimizer uses a "what if?" method to evaluate the possible sequences for their relative usefulness to the query plan. This "what if?" method is commonly called lookahead in join optimization because the evaluation looks forward one or more plans to examine the ultimate cost of joining relations in orders.
The goal of a join order analysis is to reduce the number of join order possibilities to a workable subset. The Optimizer search of join space is driven by join conditions. To this end, the Optimizer uses a recursive greedy algorithm to search and evaluate connected relations. In this context, the term greedy means the search generates the next logical expression for evaluation based on the result of the costing determined by the previous step. From the perspective of search trees, a greedy algorithm begins at the bottom of the tree and works its way toward the top.
The following is a simplified explanation of lookahead join planning:
- Find the best 3-way or n-way join plans among the combinations considered.Follow these rules to find them:
- Use a depth-first search.
- Skip a join plan if any one of the following conditions is detected:
- A less costly plan exists that delivers a result with similar or better attributes.
- The accumulated cost exceeds that of the current candidate join plan.
- Only consider joins between connected relations.
- Join an unconnected relation only after all connected relations have been joined.
The Optimizer does not evaluate all possible combinations of relations because the effort exceeds any possible optimizations. For example, a 10-way join has 17.6 x 109 possible combinations, and a 64-way join has 1.2 x 10124 possible combinations.
In its pursuit of combinations that are driven mainly by join conditions, the Optimizer may overlook the best possible join plan, but in practice, that rarely happens.
- Evaluate the best current join plan.
Conditions Action Following conditions are found for the first 2 joins of the best join plan. - The joins involve one relation connected with 2 relations not connected to each other
- Neither join is a product join
Try a compromise join where the 2 unconnected relations are product-joined in the first join and that result is then joined with a third relation. Compromise join plan is less costly than the best current join plan Replace the first 2 joins on the best join plan with the compromise plan. The compromise plan is designed to enable a star join plan.
- Materialize the first join of the best n-way join into a relation and evaluate the materialized relation for the following conditions:
- The number of lookaheads is sufficient to generate the complete join plan.
- The compromise join plan is not used.
Conditions Action Met Remaining joins of the best plan are also committed. Not met If second join joins another relation with result of first join using primary index for newly joined relation, second join is also committed. - Generate connection information for new relations based on the predicates.
- Iterate Stages 1 through 4 until only one active relation remains.
Example: Compromise Join Plan
Consider the following request.
SELECT * FROM t1, t2, t3, t4 WHERE x1=x2 AND y1=y3 AND z1=y4 ;
The first join diagram in the following graphic shows the connections among the relations in this query.
- t1 is connected to t2 because of the condition x1=x2
- t1 is connected to t3 because of the condition y1=y3
- t1 is connected to t4 because of the condition z1=y4
You know from step 2 that when the following conditions are true, the Optimizer tries to make a compromise join where the two unconnected relations are product-joined and the result is joined with the third relation:
- The first two joins of the best plan involve a relation connected with two relations not connected.
- Neither join is a product join.
The second join diagram in the graphic indicates that the join of t1 with t2 followed by a join of the resulting relation t1t2 with t3 is not as good as the compromise join of the unconnected relations t2 with t3 followed by a join of the resulting relation t2t3 with t1.
Therefore, the compromise join is selected for the join plan, as indicated by the third join diagram in the graphic.
Example: One-Join Lookahead Processing of n-Way Joins
The following graphic shows the result of an Optimizer analysis of the join space for a 6-way join. The relative cost of each join is given by an integer within the selected join sequence.
These relationships are explicitly stated in the following table:
| Binary Join | Relative Cost | Resulting Relation |
|---|---|---|
| A - B | 1 | AB |
| AB - C | 4 | ABC |
| D - F | 1 | DF |
| DF - E | 2 | DFE |
| ABC - DFE | 5 | ABCDEF |
The following graphic shows one-join lookahead processing of an n-way join and how the Optimizer may select one join sequence over another based on their relative costs:
The cost relationships among the relations are explicitly stated in the following table:
| Binary Join | Relative Cost | Resulting Relation |
|---|---|---|
| D - E | 1 | DE |
| D - F | 2 | DF |
| DE - F | 5 | DEF |
| DF - E | 2 | DEF |
| ABC - DFE | 5 | ABCDEF |
The Optimizer sums the individual binary join costs to produce an overall join cost to evaluate to determine the least costly join sequence. For this example, the following join sequence costs are determined:
((DE)F) = 1 + 5 = 6 ((DF)E) = 2 + 2 = 4
Therefore, the second sequence, joining relations D and F first following by joining the resulting relation DF with relation E, is selected for the join plan as the less costly join.