16.20 - Rules and Guidelines for COLLECT STATISTICS (Optimizer Form) - Teradata Database - Teradata Vantage NewSQL Engine

Teradata Vantage™ SQL Data Definition Language Detailed Topics

Product
Teradata Database
Teradata Vantage NewSQL Engine
Release Number
16.20
Release Date
March 2019
Content Type
Programming Reference
Publication ID
B035-1184-162K
Language
English (United States)
The following rules apply to using COLLECT STATISTICS.
  • You can submit a COLLECT STATISTICS request in the following ways:
    • As a single-statement request.
    • As the only statement in a macro.
    • As the only or last statement in an explicit Teradata session mode transaction bracketed by BEGIN and END TRANSACTION statements.
  • Because base global temporary tables do not contain data, the statistics you collect on them have dummy histograms without intervals and a cardinality of 0 rows.

    When you specify TEMPORARY to collect statistics on a materialized global temporary table for the first time, without specifying COLUMN or INDEX options, the materialized table inherits its statistics definitions from the base temporary table.

    Specifying the TEMPORARY keyword to collect statistics on a global temporary table materializes the table in the current session, if it is not already materialized.

    When you log off from a session or if the system forces a logoff, Teradata Database automatically drops the statistics from all materialized temporary tables.

  • The system treats multicolumn statistics the same as index statistics if an index has been defined on the same column set. For an example, see Collecting Statistics on Multiple Columns.
  • Teradata Database preserves the ordering of columns when you collect multicolumn statistics. However, if the base table statistics are collected on an index using the INDEX keyword, the system uses index column ordering in the ascending order of the field ids.
  • The maximum number of column and index sets you can recollect on a base table, global temporary table, hash index, or join index is 512.

    This limit can be lower than 512 column or index sets subject to limits on the other resources such as the number of available spools and plastic or concrete step segment sizes.

  • You cannot collect statistics on columns with a data type of Period, XML, BLOB, CLOB, or any UDT other than Geospatial.

    You can collect statistics on the BEGIN and END expressions of a Period column.

    You cannot collect statistics on an entire column with the JSON data type. However, you can collect statistics on extracted portions of the JSON data type. See Teradata Vantage™ JSON Data Type, B035-1150.

  • You can collect statistics on a base table column defined using a complex expression. You can also collect statistics on a single-table join index or hash index column defined using a complex expression in the respective select or column list. Either method enables the Optimizer to make accurate single-table cardinality estimates for queries that specify complex expressions in their predicates that it can match to a simple index column that is defined on that expression or a superset of it. For details, see Teradata Vantage™ SQL Request and Transaction Processing, B035-1142. However, COLLECT STATISTICS on base table is preferred as base table's statistics can be directly inherited by all join indexes.
  • The following rules apply to collecting statistics on partitioned tables.
    • You can collect statistics on both the primary index, if the table is primary-indexed, and row partitioning column sets.
    • If you specify to collect statistics using the name of the primary index, then statistics are collected only on the primary index column set.
  • You cannot submit a COLLECT STATISTICS request on a table where:
    • Statistics have been dropped
    • COLLECT STATISTICS … COLUMN or COLLECT STATISTICS … INDEX clause has not been specified

      You cannot recollect statistics on a table on which statistics have been dropped.

  • You cannot specify a USING SAMPLE clause for standard recollection of statistics on implicitly specified column and index sets.
  • When you specify a USING option for first time statistics collection or to reset an existing option, Teradata Database automatically applies the same options for subsequent recollections of statistics.
  • When you specify the FOR CURRENT option, Teradata Database uses the USING options that you specify only for the current COLLECT STATISTICS request.

    Teradata Database does not remember the USING options you specify FOR CURRENT for future recollections.

  • To collect multiple statistics, you should group the statistics into a single COLLECT STATISTICS request.

    For first time collections, group the statistics that specify the same USING options together. This enables the Optimizer to apply global optimizations such as early aggregation and aggregation rollups to hasten the collection process.

    You can group recollections of statistics without regard to the various USING options specified for individual statistics.

    You should use a table- level COLLECT STATISTICS request without specifying column or index references to refresh all the statistics for a given table or constant expression.

  • You cannot specify the SUMMARY option with column references or USING options.
  • When you recollect statistics on a column or index, Teradata Database automatically updates the table-level demographics.