16.20 - Keep Statistics Current - Teradata Database - Teradata Vantage NewSQL Engine

Teradata Vantage™ SQL Data Definition Language Detailed Topics

Product
Teradata Database
Teradata Vantage NewSQL Engine
Release Number
16.20
Release Date
March 2019
Content Type
Programming Reference
Publication ID
B035-1184-162K
Language
English (United States)

If a table or partition has been extensively modified since its statistics were last computed, residual statistics are likely to cause the Optimizer to generate poor access and join plans. A table or partition is considered extensively modified if more than 10 percent of the rows are added to or deleted from the table or, in the case of a row-partitioned table, more than 10 percent of the rows are added to or deleted from a partition. Therefore, statistics should be periodically recollected or dropped. For all partitioned tables, any refreshment operation should include the system-derived PARTITION column set. See Collecting Statistics on the PARTITION Column and the Row Partitioning Column Set. Keeping statistics fresh is an important factor in producing accurate query plans even when your system enables derived statistics and extrapolation. For details, see Teradata Vantage™ SQL Request and Transaction Processing, B035-1142.

If you recollect PARTITION statistics any time a partition or table becomes populated after having been empty, or becomes empty after having been populated, the operation should be fairly fast.

Following is an example of how stale statistics can produce an inefficient query plan.

Table Cardinality Statistics? Remarks
A 1,000,000 Y Statistics are not current.

They were collected during an earlier phase, when the cardinality of the table was only 1,000 rows.

B 75,000 N The Optimizer collects a dynamic AMP statistical sample for this table.

You submit a request that performs a product join between tables A and B, and one of the tables must be duplicated on all AMPs. The Optimizer chooses to redistribute the rows from table A because when it checks the dictionary for statistics, it sees that the cardinality of table A is only 1,000 rows, which is far fewer than the 75,000 rows of table B (estimated from a dynamic AMP sample). Because table A currently has 1,000,000 rows, a difference of three orders of magnitude from what the current, stale, statistics indicate, the Optimizer uses an incorrect assumption to redistribute the 1,000,000 rows of table A instead of the 75,000 rows of table B. As a result, the query runs much longer than it would have if the Optimizer had current statistics from which to create its query plan.

Even though the Optimizer can use derived statistics and extrapolation when it detects stale statistics, those methods base their calculations on the current statistics in the interval histogram for a column set, so their estimates can be more accurate because they are based on statistics that are more current than the statistics currently stored. See Teradata Vantage™ SQL Request and Transaction Processing, B035-1142.

The derived statistics framework uses statistics taken from a dynamic AMP sample if statistics have not been collected on a column set. For details, see Teradata Vantage™ SQL Request and Transaction Processing, B035-1142. The Optimizer also compares stored dynamic AMP cardinality estimates with current dynamic AMP cardinality estimates to extrapolate cardinalities to determine table growth.

You should consider statistics to be stale in the following cases.
  • The cardinality of the table or, in the case of a partitioned table, of any of its partitions, has changed significantly, for example, by more than 10 percent. However, this percentage is just a recommendation and can vary for a particular table or system.
  • The range of values for an index, column, or partition on which statistics have been collected has changed significantly, for example, by more than 10 percent. However, this percentage is just a recommendation and can vary for a particular table or system.

To make an informal assessment of whether the table or partition cardinality has changed significantly, you can use the following statements:

Statement Information Returned
HELP STATISTICS table_name;
  • Number of unique values for each statistic on a table
  • Date and time statistics were last collected
SELECT COUNT(*)
FROM table_name;
Cardinality of statistics on unique indexes for comparison with the report produced by HELP CURRENT STATISTICS
SELECT COUNT (DISTINCT column_name)
FROM table_name;
Cardinality of statistics on nonunique columns for comparison with the report produced by HELP CURRENT STATISTICS

HELP CURRENT STATISTICS shows the extrapolated unique value counts. SHOW CURRENT STATISTICS VALUES displays the extrapolated summary information for a column or index.

You can update previously collected statistics by performing a COLLECT STATISTICS request without specifying an explicit list of columns or indexes. See Collecting Statistics When No COLUMN or INDEX Clause Is Specified. The system then automatically recollects statistics for all existing statistics. This operation requires a full-table scan, so it can take a significant amount of time. See Collecting and Recollecting Statistics Is Often Time Consuming.

The efficacy of collected statistics varies with the types of access used on a table. If performance does not seem to be improved by the statistics you have collected, then the Optimizer is probably not using the column to access or join the table. When you observe this behavior and EXPLAIN reports do not indicate that the Optimizer is using the column set as you thought it might, use the DROP STATISTICS statement to remove the statistics for that column from the data dictionary.

Perform a HELP STATISTICS request to determine which columns and indexes currently have statistics and to see a portion of the collected information or use SHOW STATISTICS VALUES to report complete information.

The Teradata Viewpoint Stats Manager portlet allows you to manage Teradata Database statistics collection, which includes the ability to collect and analyze statistics, create and control jobs, and manage recommendations. See Teradata® Viewpoint User Guide, B035-2206.

You can also run a query that reports the last time statistics were collected for each column. The following query lists database names, column names, and the date statistics were last collected.

SELECT DatabaseName
       ,TableName
       ,ColumnName
       ,CAST(LastCollectTimeStamp AS Date) As CollectionDate
       ,CAST(LastAlterTimeStamp AS Date) As LastAlter
       ,current_date - collectiondate AS FromCurrent
       ,lastalter - collectiondate AS FromAlter
FROM DBC.StatsV
WHERE ColumnName IS NOT NULL;

To limit the report, you can edit the query to specify a subset of columns.