Using Sampling to Collect Statistics | Teradata Vantage - Reducing the Cost of Collecting Statistics by Sampling - Advanced SQL Engine

Using Sampling to Collect Statistics | Teradata Vantage - Reducing the Cost of Collecting Statistics by Sampling - Advanced SQL Engine - Teradata Database

SQL Data Definition Language Detailed Topics

Product

Advanced SQL Engine

Teradata Database

Release Number

17.10

Published

July 2021

Language

English (United States)

Last Update

2021-07-27

dita:mapPath

imq1591724555718.ditamap

dita:ditavalPath

imq1591724555718.ditaval

dita:id

B035-1184

lifecycle

Product Category

Teradata Vantage™

You can specify a system-determined sample percentage to use a downgrade approach for determining when to switch to sampling. The Optimizer samples intelligently and adjusts the percentage of sampling to use, up to full statistics, while maintaining the quality of the collected statistics.

The downgrade approach works as follows.

When you initially submit a request to collect sampled statistics, the Optimizer collects full statistics. That is, the statistics are not sampled.
The Optimizer uses the full statistics to determine when to reduce sampling to a smaller percentage.
On subsequent requests to recollect statistics, the Optimizer collects full statistics until adequate statistics have been captured to provide a reliable historical record.
With the statistics history, the Optimizer can recognize, for example, whether the column data is skewed, the column is rolling or static, and so forth.
The Optimizer then considers the column usage data (detailed buckets or only summary data) that it maintains in DBC.StatsTbl.UsageType and the user-specified number of intervals to determine an appropriate time to reduce collection from full statistics to sampled statistics.

The database is more aggressive in reducing the percentage of sampled statistics for histograms whose summary data is frequently used, but whose detailed interval data is not.
The Optimizer determines an optimal sampling percentage to use for statistics recollections and determines the appropriate formula to use for scaling based on the history and nature of the column. For more information, see Teradata Vantage™ - SQL Request and Transaction Processing, B035-1142.
Finally, the Optimizer compares recollected statistics to its history data for quality.
The Optimizer lowers the sampling percentage only if a smaller sampling percentage provides quality results.

The automated sampling percentage reduction feature alleviates you from having to determine the columns for which sampled statistics can be collected and the columns for which full statistics should be collected. By collecting full statistics initially, this method provides the Optimizer with the necessary information to make an intelligent decision about how much to sample in subsequent recollections of statistics.

For small tables, skewed columns, columns included in a partitioning expression, and columns that require detailed histogram buckets, the Optimizer always collects full statistics.

The Optimizer can detect very non-singular columns using full statistics, and in subsequent recollections intelligently switch to sampled statistics at a significantly lower percentage than 100%. For example, the Optimizer might sample only 2% of table rows and then apply the appropriate scaling formula for nonunique columns to obtain the extrapolated full statistics, and obtain quality statistics much faster than by collecting full statistics.

For rolling, unique, and nearly-unique columns as detected in the full statistics collection, the Optimizer can subsequently collect sampled statistics at a much lower percentage and apply a linear scaling formula to generate extrapolated full statistics.

After a few collections of sampled statistics, depending on demographics patterns and change counts, the Optimizer can recollect full statistics to verify, adjusting the sample percentage and the extrapolation method used as necessary.