The Optimizer estimates relation and join cardinalities with a level of confidence in the accuracy of the estimate. Optimizer confidence levels express a qualitative level of confidence that a given cardinality estimate is accurate given certain knowledge about the available statistics for the tables being analyzed in the process of optimizing a query.
Confidence levels are one factor the Optimizer uses to determine the best strategy at each step of creating a query plan. The lower the confidence level, the more conservative the chosen strategy, especially for join planning. Errors in a query plan are cumulative, and for join planning, errors are multiplicative. Therefore, the Optimizer chooses a less aggressive query plan, especially for join planning, whenever the accuracy of the data used to plan a query step is not high, or is not as reliable as with complete and accurate statistics.
A cardinality estimate based on stale statistics can be inaccurate, causing the Optimizer to generate a less optimal query plan.Even the partial statistics collected from a dynamic single-AMP sample may produce a better plan than complete but stale statistics, depending on how poorly the stale statistics reflect the demographics of the current set of values making up the population of a column set or index.
Optimizer Confidence Levels
- No confidence
- Low confidence
- High confidence
- Index Join confidence
Vantage reports Index Join confidence only for join operations.
These confidence levels are based heavily on the presence or absence of statistics for the column and index sets specified as predicates in the SQL request being reported. The only exception to this is the case where the query conditions are so complex that statistics cannot be used. In such cases, an EXPLAIN reports no confidence.
Even when a join operation has no confidence, the Optimizer uses any statistics available for the condition to enhance the likelihood of producing a better query plan. A join operation where there is no confidence on one of the relations, but low, index join, or high on the other, has an overall confidence level of no confidence, even though there are statistics on the join columns of one of the relations. This is because the Optimizer always assumes a confidence level that is equal to the lower confidence level assigned to one of the relations in the join.
The term no confidence by itself does not indicate a bad plan. Even with no confidence in certain steps, the Optimizer can typically generate a reasonable plan, especially if you collected appropriate statistics. If a step references only base tables, consider collecting statistics on these tables.
Similarly, a high confidence level is not a guarantee of an accurate cardinality estimate. For example, suppose the Optimizer locates a query predicate value in one of the statistical histograms for a column or index set. In this case, confidence is assumed to be High. But suppose the available statistics are stale. The Optimizer, by pursuing the assumptions that accrue to a High confidence level, can then produce a bad plan as a result.
The following sections are meant to provide an overview of confidence levels only. These sections are in no way meant to be comprehensive, and do not take into account any special cases.
- Single-table retrievals
- Joins
Note that the confidence levels the Optimizer reports and the confidence levels recorded in the Level attribute of the QCD StatsRec table are not related in any way.
Confidence Levels for Single-Table Retrieval Operations
The following table lists the meaning of each confidence level in context and reasons each confidence level is assigned for single-table retrieval operations:
Confidence Level | Meaning | Reason |
---|---|---|
No | Vantage has neither Low nor High confidence in the
cardinality and distinct value estimates for the relation. The Optimizer pursues conservative strategies to optimize the relevant steps. |
Any of the following situations
exists:
|
Low | Vantage is moderately certain that the estimated
cardinality and distinct value estimates for the relation are
accurate. The Optimizer pursues more aggressive strategies to optimize the relevant steps. |
One of the following states exists for the relation:
The confidence for single-AMP dynamic AMP statistical samples is always Low. EXPLAIN reports express No confidence in estimates where no statistics have been collected, but Vantage samples dynamically from at least one AMP when the query is run. |
High | Vantage considers the estimated cardinality and distinct value estimates for the relation to be accurate. The Optimizer pursues more aggressive strategies to optimize the relevant steps. |
Retrieval from a single relation with no predicates:
Retrieval from a single relation with predicates:
Retrieval from a single temporal relation with predicates and a unique join index that can be used as an access path:
For an aggregate estimation, the confidence is high under any of the following situations:
For an aggregate estimation, the confidence is high under any of the following situations:
|
Index Join | Not applicable | Applies only to joins. |
For a retrieval operation from a spool, the confidence level is the same as the confidence level for the step that generated the spool.
Confidence Levels for Join Operations
For join operations, the Optimizer must know approximately how many rows to expect from each step in each join operation required to perform an SQL request. With this information, join method, and join geography, the Optimizer selects an optimal plan for joining the relations. Because each join clause can join up to 128 tables and single-table views, generating an optimal query plan depends on minimizing join cardinality estimation errors.
Errors in join processing are cumulative, so minimizing the possibilities for errors to occur in join planning is critical. The only way to make sure of optimal join processing is to keep fresh statistics on all your indexes and non-index join columns.
- Join cardinality estimates only rate High confidence when there is only a single row in both the left and right relation in the join.
- The confidence level for a join operation never exceeds that of its input relations and assumes the confidence for the relation having the lower confidence.
The following table gives the meaning of each confidence level in context and reasons each confidence level is assigned for join operations. All relational joins are binary operations: no more than two relations are ever joined in a single operation. Instead, joins on multiple relations are chained together such that the result of an earlier join operation is spooled and then joined to the next relation in the sequence the Optimizer determines for its join plan.
Confidence Level | Meaning | Reason |
---|---|---|
No | Vantage has neither Low nor High nor Index Join confidence in the estimated join cardinality. | One or both of the relations in the join does not have statistics on the join columns. |
Low | Vantage is moderately certain that the estimated join cardinality is accurate. |
|
High | Vantage considers the estimated join cardinality to be accurate. | One relation in the join has High
confidence and the other has either of the following confidence
levels:
|
Index Join | Vantage considers the estimated join cardinality to be accurate because of a uniqueness constraint on the join columns. |
Because of the way Vantage implements PRIMARY KEY and UNIQUE INDEX constraints, these are essentially two ways of saying the same thing. |
Effect of Dynamic AMP Sampling on Reported Confidence Levels
The analysis of skew is based on the distribution of rows from each of the AMPs, and is a contributing factor to the confidence level expressed by the Optimizer.
Skew analysis computes the expected number of rows per AMP. If that number is less than 5 percent, the AMP is moved to the skewed AMP list. If the total number of AMPs in the skewed list is less than or equal to 5 percent of the total number of AMPs sampled, the confidence level is set to low. Otherwise the confidence level is set to high.
When statistics are sampled from only one dynamically selected AMP, the confidence level is always set to low.