Sampling Statistics with the USING SAMPLE Option - Teradata Database

Teradata Database Administration

Product
Teradata Database
Release Number
15.10
Language
English (United States)
Last Update
2018-10-06
Product Category
Software

If you find that recollection on columns with evenly distributed data (that is, data with little or no data skew) is too costly in terms of performance, try using sampled statistics instead.

To use sample statistics, specify the USING SAMPLE option of the COLLECT STATISTICS request. You can collect sample statistics with a system-determined sample or you can specify a sample percentage. If you specify the sample percentage, Teradata reads the specified percent of rows to collect statistics.

If Teradata determines the sample percent, the system determines when to switch to sampling and the appropriate degree of sampling that can be used without sacrificing quality. The downgrade approach works like this:

1 The first time you request a sample statistics collection, Teradata collects full statistics. This gives Teradata information it needs to determine when to downgrade to a smaller percentage.

2 On subsequent requests to recollect statistics, Teradata collects full statistics until it has collected enough statistics history.

3 Using the statistics history, Teradata evaluates the nature of the column (for example, skewed, rolling, or static). Teradata considers the column usage (detailed or summary data), which it maintains in DBC.StatsTbl.UsageType field, and the user-specified number of intervals to decide when to downgrade to sample statistics. The system is more aggressive in downgrading to sample statistics for the histograms whose summary data is used.

4 Teradata determines a sample percentage (2% to 100%) for recollecting statistics and decides which formula to use for scaling based on the history and nature of the column.

5 To ensure quality, Teradata compares the recollected statistics to the history. The system lowers the sampling percentage only if a smaller sample provides quality results.

For example, Teradata never switches to sampled statistics for small tables, skewed columns, a column that is member of the partitioning expression, or columns that require details.

After few sample statistics collections, Teradata may collect full statistics to verify and adjust the sample percentage and the extrapolation method.

Sample statistics are not supported on views and queries.

A DBS Control utility Cost Profile field AutoSampleStats defaults all the COLLECT STATISTICS requests to the SYSTEM SAMPLE option. You can override the Cost Profile default for individual COLLECT STATISTICS requests by explicitly specifying the required sample percentage.

To determine the sampling percentage in use, you can query the Data Dictionary table column DBC.StatsTbl.SampleSizePct.

Example  

The following requests sample statistics on the o_orderkey with user-specified sample percentage. Statistics are collected by reading 10% of the table rows:

COLLECT STATISTICS
             USING SAMPLE 10 PERCENT
            COLUMN o_orderkey
                ON orders;

Example  

The following requests system-determined sample statistics on the o_orderdatetime column. The default sample percentage in DBS Control determines the number of rows sampled. Teradata collects full statistics and remembers the sample option. Teradata considers downgrading the sample percentage after the system captures enough history and the column is identified as eligible for sample statistics:

COLLECT STATISTICS
             USING SYSTEM SAMPLE
            COLUMN o_orderdatetime
                ON orders; 

For more information, also see “Sampled Statistics” in SQL Request and Transaction Processing.

For More Information

For guidelines and suggested tools for collecting statistics, see the following:

  • “COLLECT STATISTICS (Optimizer Form)” in SQL Data Definition Language
  • SQL Request and Transaction Processing:
  • “Query Capture Facility”
  • Submit the SHOW STATISTICS statement with either the SUMMARY option to see summary statistics or the VALUES option to see detailed statistics. For more information, see “SHOW STATISTICS (Optimizer Form)” in SQL Data Definition Language.
  • Teradata Viewpoint Stats Manager portlet described in Viewpoint documentation.