Determining the Order of Joins | Join Planning/Optimization | Teradata Vantage - Determining the Order of Joins - Advanced SQL Engine - Teradata Database

SQL Request and Transaction Processing

Product
Advanced SQL Engine
Teradata Database
Release Number
17.05
17.00
Published
June 2020
Language
English (United States)
Last Update
2021-01-24
dita:mapPath
ykx1561500561173.ditamap
dita:ditavalPath
lze1555437562152.ditaval
dita:id
B035-1142
lifecycle
previous
Product Category
Teradata Vantage™

While it is possible to select an optimum join order when a small number of relations is to be joined, the exponentially escalating choices of binary join orders available to the Optimizer rapidly reach a point at which various algorithms and heuristics must replace the brute force method of evaluating all possible combinations against one another to determine the lowest cost join plan.

Join order evaluation is a critical aspect of query optimization that is highly dependent on accurate statistics. Using out of date statistical and demographic data can result in inaccurate assumptions about the cardinalities of intermediate results. Such inaccurate assumptions cause errors in join plan estimation that propagate exponentially as a function of the number of joins a query makes.

Join Order Evaluation Process Overview

The following provides a simplified explanation of the process the Optimizer follows when it evaluates join orders in the process of generating a join plan.

  1. Generate a 1-join lookahead plan.
  2. IF the following conditions are found … THEN …
    • The cost of the 1-join lookahead plan exceeds a threshold value.
    • There exists one relation larger than a threshold value.
    • The query has no outer join requirement.
    Generate a 5-join lookahead plan and evaluate its cost. Keep the less-costly plan from the 1- and 5-join lookahead plans.
  3. IF the following conditions are found … THEN …
    • The cost of the current plan exceeds a threshold value.
    • A star join might be required.
    Generate a plan that uses star/snowflake join optimization, if such a plan can be found. If it is, compare the star/snowflake and lookahead join plans, and keep the less-costly plan.

Lookahead Join Planning

When it is planning an optimal join sequence for a query, the Optimizer uses a "what if?" method to evaluate the possible sequences for their relative usefulness to the query plan. This "what if?" method is commonly referred to as lookahead in join optimization because the evaluation looks forward one or more plans to examine the ultimate cost of joining relations in various orders.

The goal of any join order analysis is to reduce the number of join order possibilities to a workable subset. The Optimizer search of join space is driven by join conditions. To this end, it uses a recursive greedy algorithm to search and evaluate connected relations. In this context, the term greedy means the search generates the next logical expression for evaluation based on the result of the costing determined by the previous step. From the perspective of search trees, a greedy algorithm always begins at the bottom of the tree and works its way toward the top.

The following is a simplified explanation of lookahead join planning:

  1. Find the best 3-way or n-way join plans among the combinations considered.
    Follow these rules to find them:
    • Use a depth-first search.
    • Skip a join plan if any one of the following conditions is detected:
      • A less costly plan exists that delivers a result with similar or better attributes.
      • The accumulated cost exceeds that of the current candidate join plan.
    • Only consider joins between connected relations.
    • Join an unconnected relation only after all connected relations have been joined.

    The Optimizer does not evaluate all possible combinations of relations because the effort would exceed any optimizations realized from it. For example, a 10-way join has 17.6 x 109 possible combinations, and a 64-way join has 1.2 x 10124 possible combinations.

    In its pursuit of combinations that are driven mainly by join conditions, the Optimizer might overlook the best possible join plan, but in practice, that generally does not happen.

  2. Evaluate the best current join plan.
    IF the … THEN …
    following conditions are found for the first 2 joins of the best join plan.
    • The joins involve one relation connected with 2 relations not connected to each other
    • Neither join is a product join
    try a compromise join where the 2 unconnected relations are product-joined in the first join and that result is then joined with a third relation.
    compromise join plan is less costly than the best current join plan replace the first 2 joins on the best join plan with the compromise plan.

    The compromise plan is designed to enable a star join plan.

  3. Materialize the first join of the best n-way join into a relation and evaluate the materialized relation for the following conditions:
    • The number of lookaheads is sufficient to generate the complete join plan.
    • The compromise join plan is not used.
    IF the conditions are … THEN the …
    met remaining joins of the best plan are also committed.
    not met second join is also committed if it joins another relation with the result of the first join via the primary index for the newly joined relation.
  4. Generate connection information for new relations based on the predicates.
  5. Iterate Stages 1 through 4 until only one active relation remains.

Example: Compromise Join Plan

Consider the following request.

SELECT *
FROM t1, t2, t3, t4
WHERE x1=x2
AND   y1=y3
AND   z1=y4 ;

The first join diagram in the following graphic illustrates the connections among the relations in this query.

The explicit connections are these.
  • t1 is connected to t2 because of the condition x1=x2
  • t1 is connected to t3 because of the condition y1=y3
  • t1 is connected to t4 because of the condition z1=y4

We know from step 2 that when the following conditions are true, the Optimizer tries to make a compromise join where the two unconnected relations are product-joined and the result is joined with the third relation:

  • The first two joins of the best plan involve a relation connected with two relations not connected to one another
  • Neither join is a product join.

The second join diagram in the graphic indicates that the join of t1 with t2 followed by a join of the resulting relation t1t2 with t3 is not as good as the compromise join of the unconnected relations t2 with t3 followed by a join of the resulting relation t2t3 with t1.

Because of this, the compromise join is selected for the join plan, as indicated by the third join diagram in the graphic.



Example: One-Join Lookahead Processing of n-Way Joins

The following graphic illustrates the result of an Optimizer analysis of the join space for a 6-way join. The relative cost of each join is given by an integer within the selected join sequence.



These relationships are explicitly stated in the following table:

This binary join … Has this relative cost … And produces this relation as a result …
A - B                     1 AB
AB - C                     4 ABC
D - F                     1 DF
DF - E                     2 DFE
ABC - DFE                     5 ABCDEF

The following graphic illustrates one-join lookahead processing of an n-way join and how the Optimizer might select one join sequence over another based on their relative costs:



The cost relationships among the relations are explicitly stated in the following table:

This binary join … Has this relative cost … And produces this relation as a result …
D - E                     1 DE
D - F                     2 DF
DE - F                     5 DEF
DF - E                     2 DEF
ABC - DFE                     5 ABCDEF

The Optimizer sums the individual binary join costs to produce an overall join cost which it then evaluates to determine the least costly join sequence. For this example, the following join sequence costs are determined:

((DE)F) = 1 + 5 = 6
((DF)E) = 2 + 2 = 4

As a result, the second sequence, joining relations D and F first following by joining the resulting relation DF with relation E, is selected for the join plan because it is the less costly join.