The COLLECT STATISTICS (Optimizer Form) statement creates histograms for, and updates statistics and demographics about, a specified column set or index. It then uses this information to compute a statistical synopsis or profile of that index or column set to summarize its characteristics in a form that is useful for the Optimizer when it generates its access and join plans.
Sometimes it must seem as if every page you read in the Teradata Vantage SQL manual set recommends that you collect statistics frequently. This topic explains why you should do so. The topic first describes some of the basic statistics calculated and then explains, at a very high level, how the Optimizer uses the computed statistical profile of your database.
For syntax and usage information about COLLECT STATISTICS (Optimizer Form), see Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144.
Statistics and Demographics Collected and Computed
The following set of variables represents the essential set of column statistics that are computed each time you perform the Optimizer form of the COLLECT STATISTICS statement.
You can view the summary statistics for a column or index by submitting a HELP STATISTICS (Optimizer Form) request. To view the detailed statistics for an index or column set, you must submit a SHOW STATISTICS request.
For more information, see Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144.
The description of some statistics depends on whether they describe an equal-height interval or a high-biased interval. For more information, see Interval Histograms.
Different statistics are stored for a column depending on whether its values are highly skewed or not. If the distribution of column values is not skewed, then its statistics are stored in an equal-height interval histogram. If the distribution of column values is highly skewed, then its statistics are stored in a High-biased interval histogram.
Note the use of the term estimate in the attribute descriptions documented by the following table. The values for a column interval are exact only at the moment their demographics are collected. The statistics stored for a column are, at best, only a snapshot of its value distributions.
Attribute | Description |
---|---|
Statistics Maintained for All Interval Types | |
Date collected | Reported by HELP STATISTICS and SHOW STATISTICS as Date. The date on which statistics were last collected. |
Time collected | Reported by HELP STATISTICS and SHOW STATISTICS as Time. The time at which statistics were last collected. |
Number of rows | Reported by SHOW STATISTICS as Number of Rows. An estimate of the cardinality of the table. |
Number of nulls | Reported by SHOW STATISTICS as Number of Nulls. An estimate of the number of rows with partial or completely null columns for the column or index column statistics set. |
Number of intervals | Reported by SHOW STATISTICS as Number of Intervals. The number of intervals in the frequency distribution histogram containing the column or index statistics. |
Number of all nulls | Reported by SHOW STATISTICS as Number of All Nulls. An estimate of the number of rows with all columns null for the column or index column statistics set. |
Sampled percent | Reported by SHOW STATISTICS as Sampled Percent. The approximate percentage of total rows in the table included in the sample. Null or 0 indicates that sampling is not active, which means that full statistics (a 100% sample) are being collected. |
Version number | Reported by SHOW STATISTICS as Version. The version number of the statistics structure in effect when the statistics were collected. |
Table cardinality estimate from a single-AMP sample | Reported by SHOW STATISTICS as OneAMPSampleEst for SUMMARY statistics. Used by the Optimizer to extrapolate cardinality estimates and to detect table growth. Updated whenever summary statistics or statistics on any column set are collected or refreshed. |
Table cardinality estimate from an all-AMP sample | Reported by SHOW STATISTICS as AllAMPSampleEst for SUMMARY statistics. Used by the Optimizer to extrapolate cardinality estimates and to detect table growth. Updated whenever summary statistics or statistics on any column set are collected or refreshed. |
Number of distinct values | Reported by HELP STATISTICS and SHOW STATISTICS as Number of Uniques. An estimate of the number of unique values for the column. |
Minimum value for the interval | Reported by SHOW STATISTICS as Min Value. An estimate of the smallest value for the specified column or index in the interval. |
Maximum number of rows per value | Not reported by HELP STATISTICS or SHOW STATISTICS. An estimate of the maximum number of rows having the particular value for the column. |
Typical number of rows per value | Not reported by HELP STATISTICS or SHOW STATISTICS. An estimate of the most common number of rows having the particular value for the column. |
Equal-Height Interval Statistics | |
Maximum value for the interval | Reported by SHOW STATISTICS as MaxValue. An estimate of the largest value for the column or index in the interval. |
Modal value for the | Reported by SHOW STATISTICS as ModeValue. An estimate of the most frequently occurring value or values for the column or index in the interval. |
Number of rows having the modal value | Reported by SHOW STATISTICS as Mode Frequency. An estimate of the distinct number of rows in the interval having its modal value for the column or index. |
Number of non-modal values | Reported by SHOW STATISTICS as Non-Modal Values. An estimate of the number of distinct non-modal values for the column or index in the interval. |
Number of rows not having the modal value | Reported by SHOW STATISTICS as Non-Modal Rows. An estimate of the skewness of the distribution of the index or column values within the interval. |
High-Biased Interval Statistics | |
Biased value | Reported by SHOW STATISTICS as BiasedValue. |
Biased value frequency | Reported by SHOW STATISTICS as BiasedValueFreq. |