EXPLAIN Confidence Levels | Interpreting EXPLAIN Output | Teradata Vantage - EXPLAIN Confidence Levels - Advanced SQL Engine - Teradata Database

SQL Request and Transaction Processing

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Published
July 2021
Language
English (United States)
Last Update
2021-07-28
dita:mapPath
uqf1592445067244.ditamap
dita:ditavalPath
uqf1592445067244.ditaval
dita:id
B035-1142
lifecycle
previous
Product Category
Teradata Vantageā„¢

When the Optimizer estimates relation and join cardinalities, it does so with a level of confidence in the accuracy of the estimate. Optimizer confidence levels express a qualitative level of confidence that a given cardinality estimate is accurate given certain knowledge about the available statistics for the tables being analyzed in the process of optimizing a query.

Confidence levels are one of the factors the Optimizer employs to determine which of its available strategies is best at each step of creating a query plan. The lower the confidence level, the more conservative the strategy employed, particularly for join planning, because the errors in a query plan are cumulative (and in the case of join planning, errors are multiplicative). Because of this, the Optimizer chooses to pursue a less aggressive query plan, particularly when it comes to join planning, whenever it suspects the accuracy of the data it is using to plan a query step is not high, or is not as reliable as it might be if complete and accurate statistics were available.

A cardinality estimate that is based on stale statistics can be inaccurate, causing the Optimizer to generate a less optimal query plan than it otherwise would. It is even possible for the partial statistics collected from a dynamic single-AMP sample to produce a better plan than complete, but stale, statistics, depending on how poorly the stale statistics reflect the demographics of the current set of values making up the population of a column set or index.

About Optimizer Confidence Levels

An EXPLAIN can report any or all of the following confidence levels for a cardinality estimate:
  • No confidence
  • Low confidence
  • High confidence
  • Index Join confidence

    Vantage reports Index Join confidence only for join operations.

These confidence levels are based heavily on the presence or absence of statistics for the column and index sets specified as predicates in the SQL request being reported. The only exception to this is the case where the query conditions are so complex that statistics cannot be used. In such cases, an EXPLAIN reports no confidence.

Even when a join operation has no confidence, the Optimizer still uses any statistics that are available for the condition to enhance the likelihood of producing a better query plan than would otherwise be developed. A join operation where there is no confidence on one of the relations, but low, index join, or high on the other, has an overall confidence level of no confidence, even though there are statistics on the join columns of one of the relations. This is because the Optimizer always assumes a confidence level that is equal to the lower confidence level assigned to one of the relations in the join.

Understand that no confidence by itself does not indicate a plan is bad. Even with no confidence in some steps, the Optimizer can usually generate a reasonable plan, especially if you have collected appropriate statistics. If a step only references base tables, it can mean that you should consider collecting statistics on these tables.

Similarly, a high confidence level is not a guarantee of an accurate cardinality estimate. For example, suppose the Optimizer locates a query predicate value in one of the statistical histograms for a column or index set. In this case, confidence is assumed to be High. But suppose the available statistics are stale. The Optimizer, by pursuing the assumptions that accrue to a High confidence level, can then produce a bad plan as a result.

The following sections are meant to provide an overview of confidence levels only. They are in no way meant to be comprehensive, and do not take into account any special cases.

In general, confidence levels are assigned to the cardinality estimates for only two types of operations:
  • Single-table retrievals
  • Joins

Note that the confidence levels the Optimizer reports and the confidence levels recorded in the Level attribute of the QCD StatsRec table are not related in any way.

Confidence Levels For Single-Table Retrieval Operations

The following table lists the meaning of each confidence level in context and some of the reasons why each confidence level is assigned for single-table retrieval operations:

Confidence Level Meaning Reason
No Vantage has neither Low nor High confidence in the cardinality and distinct value estimates for the relation.

The Optimizer pursues conservative strategies to optimize the relevant steps.

Any of the following situations exists:
  • No statistics have been collected on the column or index sets specified in the predicate.
  • The predicate contains nondeterministic expressions for which base table statistics cannot be collected.

    For example, you cannot collect base table statistics for either of the following expressions:

    • udf_random(col1) (UDF declared as nondeterministic)
    • random(col1)

    You can collect statistics on a single-table join index or hash index defined using a complex expression (see Using Join Index Statistics to Estimate Single-Table Expression Cardinalities), and the optimizer can use those statistics to make single-table cardinality estimates. If you have collected such statistics, then a cardinality estimate can be made with high confidence.

  • For an aggregate estimation, no statistics have been collected on the grouping columns.
Low Vantage is moderately certain that the estimated cardinality and distinct value estimates for the relation are accurate.

The Optimizer pursues more aggressive strategies to optimize the relevant steps.

One of the following states exists for the relation:
  • There are conditions in the query on an index set for which no statistics have been collected.

    Cardinality estimates can be made based on sampling the index set.

  • There are conditions in the query on an index or column set with collected statistics that are ANDed with conditions on unindexed columns.
  • There are conditions in the query on an index or column set with collected statistics that are ORed with other conditions.
  • For an aggregate estimation, there are statistics on single columns of the grouping column set or statistics on some, but not all, of the grouping columns.

The confidence for single-AMP dynamic AMP statistical samples is always Low.

EXPLAIN reports always express No confidence in estimates where no statistics have been collected, but Vantage always samples dynamically from at least one AMP in such cases when the query is actually executed.

High Vantage is fairly certain that the estimated cardinality and distinct value estimates for the relation are accurate.

The Optimizer pursues more aggressive strategies to optimize the relevant steps.

  • Retrieval from a single relation with no predicates:
    • There are conditions in the query on the primary index and statistics have been collected on the primary index.
    • There are conditions in the query on the primary index, but no statistics have been collected on the primary index.

      The confidence is High under any of the following situations:

      • 5 or more AMPs are sampled dynamically.
      • Rows per value are sampled using a dynamic AMP sample.
      • No skew is detected.
  • Retrieval from a single relation with predicates:
    • Statistics have been collected on the predicate columns or indexes and there is no skew.
    • There are multiple equality predicates on which covering multitable statistics have been collected.
    • Single-table join index or hash index statistics have been collected on a complex expression specified in a predicate written against the underlying base table of the hash or single-table join index.
  • Retrieval from a single temporal relation with predicates and a unique join index that can be used as an access path:
    • Statistics have been collected on the join index that match the query predicate columns.
  • For an aggregate estimation, the confidence is high under any of the following situations:
    • The grouping columns are constants.
    • The grouping columns have equality predicates.
    • The grouping columns are all covered by a set of single multicolumn statistics.
    • Statistics have been collected on the single grouping column.
Index Join Not applicable Applies only to joins.

For a retrieval operation from a spool, the confidence level is the same as the confidence level for the step that generated the spool.

Confidence Levels For Join Operations

In the case of join operations, the Optimizer needs to know approximately how many rows will result from each step in each join operation required to perform an SQL request. It uses this information to select an optimal plan for joining the relations. Among other things, such as join method and join geography, the join plan determines the best order for joining the relations. Because you can join as many as 128 tables and single-table views per join clause, it is critical to minimize join cardinality estimation errors to ensure that an optimal query plan is generated.

Keep in mind that errors in join processing are cumulative, so it is critical to minimize the possibilities for errors to occur in join planning. The only way to ensure optimal join processing is to keep fresh statistics on all your indexes and non-index join columns.

Join cardinality and rows per value estimates nearly always have a lower confidence level than is seen for a single table retrieval under analogous conditions for several reasons, including the following:
  • Join cardinality estimates only rate High confidence when there is only a single row in both the left and right relation in the join.
  • The confidence level for a join operation never exceeds that of its input relations and assumes the confidence for the relation having the lower confidence.

The following table lists the meaning of each confidence level in context and some of the reasons why each confidence level is assigned for join operations. All relational joins are binary operations: no more than two relations are ever joined in a single operation. Instead, joins on multiple relations are chained together such that the result of an earlier join operation is spooled and then joined to the next relation in the sequence the Optimizer determines for its join plan.

Confidence Level Meaning Reason
No Vantage has neither Low nor High nor Index Join confidence in the estimated join cardinality. One or both of the relations in the join does not have statistics on the join columns.
Low Vantage is moderately certain that the estimated join cardinality is accurate.
  • Statistics have been collected on the join columns of both the left and right relations.
  • One relation in the join has Low confidence and the other has any of the following confidence levels:
    • Low
    • High
    • Index Join
High Vantage is fairly certain that the estimated join cardinality is accurate. One relation in the join has High confidence and the other has either of the following confidence levels:
  • High
  • Index Join
Index Join Vantage is fairly certain that the estimated join cardinality is accurate because of a uniqueness constraint on the join columns.
  • There is a unique index on the join columns.
  • There is a foreign key relationship between the two relations in the join.

Because of the way Vantage implements PRIMARY KEY and UNIQUE INDEX constraints, these are essentially two ways of saying the same thing.

Effect of Dynamic AMP Sampling On Reported Confidence Levels

The analysis of skew is based on the distribution of rows from each of the AMPs, and is a contributing factor to the confidence level expressed by the Optimizer.

Skew analysis computes the expected number of rows per AMP, and if that number is less than 5 percent of the expected number of rows per AMP, the AMP is moved to the skewed AMP list. If the total number of AMPs in the skewed list is less than or equal to 5 percent of the total number of AMPs sampled, then the confidence level is set to low, otherwise it is set to high.

When statistics are sampled from only one dynamically selected AMP, the confidence level is always set to low.