Options for Use with Collecting Statistics | VantageCloud Lake - Using the THRESHOLD Options to Collect and Recollect Statistics - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

Statistics collection thresholds enable you to minimize the unnecessary collection of statistics. For a system-determined threshold, the Optimizer automatically determines whether a recollection is needed or whether extrapolation is adequate. You can collect and recollect threshold statistics on tables.

The THRESHOLD options enable you to automatically skip recollecting statistics if the amount of data change since the last statistics collection or the age of the current statistics is below the thresholds in effect for the statistics.

The Optimizer can determine thresholds based on collected statistics, column histories, change counts, and so on. Alternatively, you can determine thresholds in a COLLECT STATISTICS request by specifying a change percentage, number of days, or both.

You can submit collect statistics requests at regular intervals, and the Optimizer can use stored threshold and historical data to determine whether the specified recollection of statistics is necessary or not. The Optimizer can use the same data to determine when to recollect statistics for those columns that exceed the specified threshold. For example, if you determine that the threshold is to be a 10% data change and you submit a request to recollect statistics, the Optimizer does not recollect those statistics if the change in the data from the last collection is less than 10%.

If you specify a THRESHOLD option for first time statistics collections, the Optimizer collects the statistics and stores the THRESHOLD options you specified for future use without consulting the threshold values you specify. The Optimizer uses the saved collection thresholds to determine whether the current statistics collection can be skipped or not when you submit recollection requests at a later time.

Specifying a new THRESHOLD option for statistics recollection overrides the current setting, if any. The Optimizer applies the new THRESHOLD value to the current request, and remembers the updated THRESHOLD option for future statistics recollection.

You can specify the following kinds of thresholds to control recollecting statistics.
  • Change-based
  • Time-based

For a change-based threshold, the optimizer consults the system maintained UDI (Update, Delete, Insert) counts, random AMP samples and available table history to determine the amount of data change from the last statistics collection.

You can also specify a combination of change-based and time-based thresholds in a single COLLECT STATISTICS request.

For SYSTEM THRESHOLD or SYSTEM THRESHOLD PERCENT, the Optimizer considers column history and extrapolations techniques to determine the threshold. When recollecting full statistics for a small table NUSI, for example, the Optimizer ignores threshold information and collects full statistics because collecting full statistics provides more accurate statistics for a small cost.

A time-based threshold uses the age of the current statistics to determine when statistics are refreshed.

For SYSTEM THRESHOLD DAYS, the system uses a change-based threshold for recollecting statistics.