Statistics collection thresholds enable you to minimize the unnecessary collection of statistics. For a system-determined threshold, the Optimizer automatically determines whether a recollection is needed or whether extrapolation is adequate. You can collect and recollect threshold statistics on tables.
The THRESHOLD options enable you to automatically skip recollecting statistics if the amount of data change since the last statistics collection or the age of the current statistics is below the thresholds in effect for the statistics.
The Optimizer can automatically determine the appropriate thresholds to apply based on previously collected statistics, column histories, change counts, and other factors, or you can explicitly specify the thresholds as part of a COLLECT STATISTICS request by specifying a change percentage, a number of days, or both.
You can submit collect statistics requests at regular intervals, and the Optimizer can use stored threshold and historical data to determine whether the specified recollection of statistics is necessary or not. The Optimizer can use the same data to determine when to recollect statistics for those columns that exceed the specified threshold. For example, if you determine that the threshold is to be a 10% data change and you submit a request to recollect statistics, the Optimizer does not recollect those statistics if the change in the data from the last collection is less than 10%.
If you specify a THRESHOLD option for first time statistics collections, the Optimizer collects the statistics and stores the THRESHOLD options you specified for future use without consulting the threshold values you specify. The Optimizer uses the saved collection thresholds to determine whether the current statistics collection can be skipped or not when you submit recollection requests at a later time.
Specifying a new THRESHOLD option for statistics recollection overrides the current setting, if any. The Optimizer applies the new THRESHOLD value to the current request, and remembers the updated THRESHOLD option for future statistics recollection.
- Change-based
- Time-based
For a change-based threshold, the optimizer consults the system maintained UDI (Update, Delete, Insert) counts, random AMP samples and available table history to determine the amount of data change from the last statistics collection.
You can also specify a combination of change-based and time-based thresholds in a single COLLECT STATISTICS request.
For SYSTEM THRESHOLD or SYSTEM THRESHOLD PERCENT, the Optimizer considers column history and internal extrapolations techniques to determine the appropriate change threshold. When the Optimizer recollects full statistics for a small table NUSI, for example, it ignores threshold information and collects full statistics because, in that case, collecting full statistics provides more accurate statistics for a very small cost.
A time-based threshold uses the age of the current statistics to determine when statistics are refreshed.