Comparison of Full and Sampled Statistics - Advanced SQL Engine - Teradata Database

SQL Data Definition Language Detailed Topics

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Published
July 2021
Language
English (United States)
Last Update
2021-07-27
dita:mapPath
imq1591724555718.ditamap
dita:ditavalPath
imq1591724555718.ditaval
dita:id
B035-1184
lifecycle
previous
Product Category
Teradata Vantage™

The following table compares the two methods of collecting statistics using COLLECT STATISTICS (Optimizer Form) and lists the most productive uses.

The ability to specify thresholds for collecting and recollecting statistics minimizes these issues.

Method Characteristics Best Use
Full statistics
  • Collects all statistics for all of the data.
  • Time consuming.
  • Most accurate of the three methods of collecting statistics.
  • Stored in interval histograms in the data dictionary.
  • Best choice for columns or indexes with highly skewed data values.
  • Recommended for small tables, where a small table is one with fewer than 1,000 rows per AMP.
  • Recommended for selection columns having a moderate to low number of distinct values.
  • Recommended for most NUSIs and other selection columns.

    Collection time on NUSIs is very fast.

  • Recommended for all column sets or index column sets for the following cases.

    Where full statistics add value.

    Where sampling does not provide satisfactory statistical estimates.

User-sampled statistics
  • Collects all statistics for a sample of the data, not just cardinality estimates.
  • Significantly faster collection time than full statistics.
  • Stored in interval histograms in the data dictionary.
  • Acceptable for columns or indexes that are highly singular, that is, the number of distinct values approaches the cardinality of the table.
  • Recommended for unique columns, unique indexes, and for columns or indexes that are highly singular.

    Sampled statistics are useful for very large tables, such as tables with tens of billions of rows.

  • Not recommended for small tables, that is, tables whose cardinality is less than 20 times the number of AMPs in the system.