Optimizer Use of Statistical Profiles | VantageCloud Lake - Optimizer Use of Statistical Profiles - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

The COLLECT STATISTICS (Optimizer Form) statement creates histograms for, and updates statistics and demographics about, a specified column set or index. The statement uses this information to compute a statistical synopsis or profile of that index or column set to summarize its characteristics in a form that is useful for the Optimizer when generating access and join plans.

This topic explains the importance of collecting statistics frequently. The topic first describes basic statistics calculated and then explains, at a high level, how the Optimizer uses the computed statistical profile of your database.

See COLLECT STATISTICS (Optimizer Form) for syntax and usage information.

Statistics and Demographics Collected and Computed

The following set of variables represents the essential set of column statistics that are computed each time you perform the Optimizer form of the COLLECT STATISTICS statement.

You can view the summary statistics for a column or index by submitting a HELP STATISTICS (Optimizer Form) request. To view the detailed statistics for an index or column set, you must submit a SHOW STATISTICS request.

The descriptions of certain statistics depend on whether those statistics describe an equal-height interval or a high-biased interval. See Interval Histograms.

Different statistics are stored for a column depending on whether its values are highly skewed or not. If the distribution of column values is not skewed, then its statistics are stored in an equal-height interval histogram. If the distribution of column values is highly skewed, then its statistics are stored in a High-biased interval histogram.

The values for a column interval are exact only at the time of collection. Statistics for a column are a snapshot of its value distributions. Therefore, attribute descriptions in the following table use the word estimate.

Interval Type Attribute Description
All Date collected Reported by HELP STATISTICS and SHOW STATISTICS as Date.

The date on which statistics were last collected.

Time collected Reported by HELP STATISTICS and SHOW STATISTICS as Time.

The time at which statistics were last collected.

Number of rows Reported by SHOW STATISTICS as Number of Rows.

An estimate of the cardinality of the table.

Number of nulls Reported by SHOW STATISTICS as Number of Nulls.

An estimate of the number of rows with partial or completely null columns for the column or index column statistics set.

Number of intervals Reported by SHOW STATISTICS as Number of Intervals.

The number of intervals in the frequency distribution histogram containing the column or index statistics.

Number of all nulls Reported by SHOW STATISTICS as Number of All Nulls.

An estimate of the number of rows with all columns null for the column or index column statistics set.

Sampled percent Reported by SHOW STATISTICS as Sampled Percent.

The approximate percentage of total rows in the table included in the sample.

Null or 0 indicates that sampling is not active, which means that full statistics (a 100% sample) are being collected.

Version number Reported by SHOW STATISTICS as Version.

The version number of the statistics structure in effect when the statistics were collected.

Table cardinality estimate from a single-AMP sample Reported by SHOW STATISTICS as OneAMPSampleEst for SUMMARY statistics.

Used by the Optimizer to extrapolate cardinality estimates and to detect table growth.

Updated whenever summary statistics or statistics on any column set are collected or refreshed.

Table cardinality estimate from an all-AMP sample Reported by SHOW STATISTICS as AllAMPSampleEst for SUMMARY statistics.

Used by the Optimizer to extrapolate cardinality estimates and to detect table growth.

Updated whenever summary statistics or statistics on any column set are collected or refreshed.

Number of distinct values Reported by HELP STATISTICS and SHOW STATISTICS as Number of Uniques.

An estimate of the number of unique values for the column.

Minimum value for the interval Reported by SHOW STATISTICS as Min Value.

An estimate of the smallest value for the specified column or index in the interval.

Maximum number of rows per value Not reported by HELP STATISTICS or SHOW STATISTICS.

An estimate of the maximum number of rows having the specified value for the column.

Typical number of rows per value Not reported by HELP STATISTICS or SHOW STATISTICS.

An estimate of the most common number of rows having the specified value for the column.

Equal-height Maximum value for the interval Reported by SHOW STATISTICS as MaxValue.

An estimate of the largest value for the column or index in the interval.

Modal value for the Reported by SHOW STATISTICS as ModeValue.

An estimate of the most frequently occurring value or values for the column or index in the interval.

Number of rows having the modal value Reported by SHOW STATISTICS as Mode Frequency.

An estimate of the distinct number of rows in the interval having its modal value for the column or index.

Number of non-modal values Reported by SHOW STATISTICS as Non-Modal Values.

An estimate of the number of distinct non-modal values for the column or index in the interval.

Number of rows not having the modal value Reported by SHOW STATISTICS as Non-Modal Rows.

An estimate of the skewness of the distribution of the index or column values within the interval.

High-biased Biased value Reported by SHOW STATISTICS as BiasedValue.
Biased value frequency Reported by SHOW STATISTICS as BiasedValueFreq.