Comparison of Full and Sampled Statistics - Analytics Database - Teradata Vantage

SQL Data Definition Language Detailed Topics

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Teradata Vantage
Release Number
17.20
Published
June 2022
ft:locale
en-US
ft:lastEdition
2024-12-13
dita:mapPath
vuk1628111288877.ditamap
dita:ditavalPath
qkf1628213546010.ditaval
dita:id
jbg1472252759029
lifecycle
latest
Product Category
Teradata Vantage™

The following table compares the two methods of collecting statistics using COLLECT STATISTICS (Optimizer Form) and lists the most productive uses.

The ability to specify thresholds for collecting and recollecting statistics minimizes these issues.

Method Characteristics Best Use
Full statistics
  • Collects all statistics for all of the data.
  • Time consuming.
  • Most accurate of the three methods of collecting statistics.
  • Stored in interval histograms in the data dictionary.
  • Best choice for columns or indexes with highly skewed data values.
  • Recommended for small tables, where a small table is one with fewer than 1,000 rows per AMP.
  • Recommended for selection columns having a moderate to low number of distinct values.
  • Recommended for most NUSIs and other selection columns.

    Collection time on NUSIs is very fast.

  • Recommended for all column sets or index column sets for the following cases.

    Where full statistics add value.

    Where sampling does not provide satisfactory statistical estimates.

User-sampled statistics
  • Collects all statistics for a sample of the data, not just cardinality estimates.
  • Significantly faster collection time than full statistics.
  • Stored in interval histograms in the data dictionary.
  • Acceptable for columns or indexes that are highly singular, that is, the number of distinct values approaches the cardinality of the table.
  • Recommended for unique columns, unique indexes, and for columns or indexes that are highly singular.

    Sampled statistics are useful for very large tables, such as tables with tens of billions of rows.

  • Not recommended for small tables, that is, tables whose cardinality is less than 20 times the number of AMPs in the system.