While it is possible to select an optimum join order when a small number of relations is to be joined, the exponentially escalating choices of binary join orders available to the Optimizer rapidly reach a point at which various algorithms and heuristics must replace the brute force method of evaluating all possible combinations against one another to determine the lowest cost join plan.
Join order evaluation is another critical aspect of query optimization that is highly dependent on accurate statistics. Using out of date statistical and democraphic data can result in inaccurate assumptions about the cardinalities of intermetidate results. Such inaccurate assumptions cause errors in join plan estimation that propagate exponentially as a function of the number of joins a query makes.
Note that the enumeration of join orders is a separate process from the costing of alternatives.
Join Order Evaluation Process Overview
The following table provides a logical explanation of the process the Optimizer follows when it evaluates join orders in the process of generating a join plan.
- Use a recursive greedy algorithm with a 1-join lookahead to evaluate the existing join conditions.
IF the following conditions are found … THEN …
- The cost of the 1-join lookahead plan exceeds a threshold value.
- There exists one relation larger than the threshold.
- The query has no outer join requirement.
Generate a 5-join lookahead plan and evaluate its cost.
- Keep the less costly plan generated.
- Evaluate the new existing join conditions.
IF the following conditions are found … THEN …
- The cost of the current plan exceeds some threshold.
- A star join might be required.
Generate a better plan that uses star join optimization if such a plan can be found.
Otherwise, go to Stage 4.
- Find the best 3-way or n-way join plans among the combinations considered.Follow these rules to find them:
- Use a depth-first search.
- Skip a join plan if any one of the following conditions is detected:
- A less costly plan exists that delivers a result with similar or better attributes.
- The accumulated cost exceeds that of the current candidate join plan.
- Only consider joins between connected relations.
- Join an unconnected relation only after all connected relations have been joined.
The Optimizer does not evaluate all possible combinations of relations because the effort would exceed any optimizations realized from it. For example, a 10-way join has 17.6 x 109 possible combinations, and a 64-way join has 1.2 x 10124 possible combinations.
In its pursuit of combinations that are driven mainly by join conditions, the Optimizer might overlook the best possible join plan, but as a general rule, that does not happen.
- Create a temporary join relation before doing the next lookahead.
IF a unique ID is … THEN do the following … not used
- Build the assignment list for the temporary relation.
- Use it to estimate the spool size and output row cost.
- Map the term information of the residual term list to refer to the new relation.
When this temporary relation is removed, the assignment list is reused and the term list is remapped to refer to the inputs.
used Use existing map list information to estimate the spool size and output row cost.
- Evaluate the best current join plan.
IF the … THEN … following conditions are found for the first 2 joins of the best join plan.
- The joins involve one relation connected with 2 relations not connected to each other
- Neither join is a product join
try a compromise join where the 2 unconnected relations are product-joined in the first join and that result is then joined with a third relation. compromise join plan is less costly than the best current join plan replace the first 2 joins on the best join plan with the compromise plan.
The compromise plan is designed to enable a star join plan.
A compromise join is also considered during binary join planning when one or more IN condition are specified on one of the relations in the join. Call this table_a.
The list of values associated with the set of IN list conditions is product joined with the other relation, table_b, and that result is then joined with table_a. If table_b happens to be the result of joining two unconnected relations, then the resultant join plan is a star join plan similar to what is described in Star and Snowflake Join Optimization.
Consider the following schema and query, for example:
Table Name Table Type Relevant Index Columns Index Type daily_sales Fact sku_id
NUSI corp_day Dimension day_dt UPI locn Dimension locn_nmbr UPI alloc_sku Dimension sku_id UPI
Now consider how the Optimizer treats the following query against these tables:
SELECT SUM(sell_amt) FROM daily_sales AS a, locn AS l WHERE a.locn_nmbr = l.locn_nbr AND l.dvsn_code = ‘C’ AND a.day_dt IN (990101, 1010101, 971201);
During binary join planning for the daily_sales and locn tables, the Optimizer considers a compromise join by first using a product join to join locn with the list of day_dt values specified by the IN list condition, and then joining that result with daily_sales.
- Materialize the first join of the best N joins into a relation and evaluate the materialized relation for the following conditions:
- The number of lookaheads is sufficient to generate the complete join plan.
- The compromise join plan is not used.
IF the conditions are … THEN the … met remaining joins of the best plan are also committed. not met second join is also committed if it joins another relation with the result of the first join via the primary index for the newly joined relation.
- Generate connection information for new relations based on the predicates.
- Iterate Stages 1 through 8 until only one active relation remains.
For example, consider the following request.
SELECT * FROM t1, t2, t3, t4 WHERE x1=x2 AND y1=y3 AND z1=y4 ;
The first join diagram in the following graphic illustrates the connections among the relations in this query.
The explicit connections are these.
- t1 is connected to t2 because of the condition x1=x2
- t1 is connected to t3 because of the condition y1=y3
- t1 is connected to t4 because of the condition z1=y4
We know from step 6 that when the following conditions are true, the Optimizer tries to make a compromise join where the two unconnected relations are product-joined and the result is joined with the third relation:
- The first two joins of the best plan involve a relation connected with two relations not connected to one another
- Neither join is a product join.
The second join diagram in the graphic indicates that the join of t1 with t2 followed by a join of the resulting relation t1t2 with t3 is not as good as the compromise join of the unconnected relations t2 with t3 followed by a join of the resulting relation t2t3 with t1.
Because of this, the compromise join is selected for the join plan, as indicated by the third join diagram in the graphic.