COLLECT STATISTICS (QCD Form) - Teradata Database

SQL Data Manipulation Language

Product
Teradata Database
Release Number
15.10
Language
English (United States)
Last Update
2018-10-06
dita:id
B035-1146
lifecycle
previous
Product Category
Teradata® Database

Collects demographic data for one or more columns and indexes of a table, computes a statistical profile of the collected data, and stores the synopsis in the TableStatistics table of the specified QCD database.

Statistics collected by this statement are used for index analysis and validation tasks performed by various database query analysis tools: they are not used by the Optimizer to process queries. For information on COLLECT STATISTICS (Optimizer Form), see SQL Data Definition Language Syntax and Examples and SQL Data Definition Language Detailed Topics.

where:

 

Syntax Element …

Specifies …

FOR SAMPLE percentage

FOR SAMPLE percentage PERCENT

Percentage of table rows to be sampled to build the statistics for the specified column and index sets. The percentage value must be less than 100.

For example, if you specify a value of 5, then 5 percent of the rows for table_name on each AMP are read to collect their statistics.

This option is not recognized for single‑column statistics requests for the PARTITION column of a row-partitioned table. The system automatically increases the percentage to 100.

Note that the restriction for COLLECT STATISTICS (Optimizer Form) on using sampling with columns of a row-partitioned table that are also members of the partitioning expression for that table do not apply to COLLECT STATISTICS (QCD Form).

PERCENT is an optional keyword to indicate that the value specified for percentage is a percentage.

INTO QCD_name

Name of the QCD database into which the collected sampled statistics are to be written.

The sampled statistics are written to the TableStatistics table of the specified QCD database.

SET QUERY query_ID

Value inserted into QCD.TableStatistics.QueryId.

Each unique composite of query_ID and statistics_ID enables you to store a separate set of statistics in QCD.TableStatistics for a given column or index.

The default value is 0.

SAMPLEID statistics_ID

Value to be inserted into QCD.TableStatistics.StatisticsId.

Each unique composite of query_ID and statistics_ID enables you to store a separate set of statistics in QCD.TableStatistics for a given column or index.

The default value is 1.

UPDATE MODIFIED

Store modified statistics stored in QCD.TableStatistics.ModifiedStats.

If you do not specify the UPDATE MODIFIED option, the system stores unmodified statistics in QCD.TableStatistics.StatisticsInfo.

database_name

user_name

Name of the containing database or user for table_name if different from the current database or user.

table_name

Name of the table for which the specified column and index sampled statistics and sample size are to be collected.

You cannot collect QCD statistics on volatile, journal, global temporary, or global temporary trace tables.

COLUMN column_name

Name of a column set on which sampled statistics are to be collected.

You can collect statistics on both a column set and on the system‑derived PARTITION column in the same COLLECT STATISTICS request.

You cannot collect statistics on UDT or LOB columns.

COLUMN PARTITION

Statistics are to be collected on the system‑derived PARTITION column for a table. The value collected is the system‑derived partition number, which ranges from 1 through 65,535 inclusive, or is zero for non-partitioned.

You can collect statistics on both a column set and on the system‑derived PARTITION column in the same COLLECT STATISTICS request.

Even though table_name need not identify a partitioned table, and the system does not return an error if you request PARTITION column statistics on a nonpartitioned table, the operation serves no purpose.

You cannot collect PARTITION statistics on the following:

  • Join indexes
  • Global temporary tables
  • Volatile tables
  • INDEX index_name

    Name of the index on which sampled statistics are to be collected.

    INDEX column_name

    Names of the columns in the column set on which sampled statistics are to be collected.

    COLLECT STATISTICS (QCD Form) is a Teradata extension to the ANSI SQL:2011 standard.

    You must have the following privileges to perform COLLECT STATISTICS (QCD Form):

  • INDEX or DROP TABLE on table_name or its containing database.
  • INSERT and UPDATE on the TableStatistics table or its containing QCD database.
  • Normally invoked by client‑based Teradata Database query analysis tools.

    Table statistics collected by the QCD form of COLLECT STATISTICS are stored in the QCD table named TableStatistics. For details about the TableStatistics table, see SQL Request and Transaction Processing.

    Unless otherwise noted, the rules and guidelines for collecting QCD statistics are the same as those for Optimizer statistics. See “Rules and Guidelines for COLLECT STATISTICS (Optimizer Form)” in SQL Data Definition Language Detailed Topics.

    If you have defined a column set as a composite index, it makes no difference whether you collect statistics on the composite using the COLUMN keyword or the INDEX keyword. The result is identical.

    For example, suppose you have created an index on the column set (x1,y1) of table t1. The following COLLECT STATISTICS requests gather the identical statistics for the named QCD into the identical QCD table columns:

         COLLECT STATISTICS FOR SAMPLE 30 PERCENT ON t1 
           INTO myqcd COLUMN (x1,y1);
     
         COLLECT STATISTICS FOR SAMPLE 30 PERCENT ON t1 
           INTO myqcd INDEX (x1,y1);

    For information about why you should collect PARTITION statistics for the Optimizer, see “Collecting Statistics on the System‑Derived PARTITION Column and the Row Partitioning Column Set of a Partitioned Table” in SQL Data Definition Language Detailed Topics.

    Because PARTITION is not a reserved keyword, you can use it as the name for any column in a table definition, but you should not. This practice is strongly discouraged in all cases, and particularly for partitioned tables, because if you name a non‑PARTITION column partition, the system resolves it as a regular column. As a result, you cannot collect statistics on the system‑derived PARTITION column of any table that also has a user‑defined column named partition.

    If a table is partitioned by a single‑column expression, its column statistics are inherited as PARTITION statistics. In this special case, you need not collect single-column PARTITION statistics.

    For example, assume the following table definition:

         CREATE TABLE table_1 (
           col_1 INTEGER, 
           col_2 INTEGER) 
         PRIMARY INDEX(col_1) PARTITION BY col_2; 

    If you collect individual column statistics on col_2, which is the partitioning column, then those statistics are also inherited as PARTITION statistics.

    You can also collect PARTITION statistics on NPPI tables, which quickly provides up to date statistics on the number of rows in the table. This is because the partition number for every row in an NPPI table is 0.

    You can specify a USING SAMPLE clause to collect single‑column PARTITION statistics, but the specification is ignored. The system automatically resets the sampling percentage to 100. The sampling field in a detailed statistics report is always reported as 0 to document this behavior. See “COLLECT STATISTICS (QCD Form)” on page 537. Note that all valid sampling percentages specified for a USING SAMPLE clause are honored for multicolumn PARTITION statistics.

    There is a limit of 32 sets of multicolumn statistics that can be collected on a given table. Because single‑column PARTITION statistics use index ids in the same range as multicolumn statistics (between 129 and 160, inclusive), collecting PARTITION statistics effectively reduces the limit on the number of multicolumn statistics that can be collected to 31.

    The system ignores user‑specified column ordering for multicolumn PARTITION statistics. This is consistent with non‑PARTITION multicolumn statistics and multicolumn indexes. The columns are ordered based on their internal field id. Because the system‑derived PARTITION column has field id value of 0, it always takes the first position in multicolumn statistics.

    You cannot collect statistics on UDT columns.

    You cannot collect statistics on BLOB or CLOB columns.

    Unlike the standard COLLECT STATISTICS statement, COLLECT STATISTICS (QCD Form) does not write the statistics it collects into the data dictionary for use by the Optimizer. The COLLECT STATISTICS (QCD Form) writes its statistics into the TableStatistics table of the specified QCD database, where the information is used by various database query analysis tools to perform index analysis and validation.

    Because COLLECT STATISTICS (QCD Form) does not write its statistics to the data dictionary, it does not place locks on dictionary tables. Through appropriate control of the sample size, you can collect sample statistics for index analysis during production hours without a noticeable negative effect on system performance.

    Unlike COLLECT STATISTICS (Optimizer Form), you cannot recollect statistics implicitly on indexes and columns for which statistics have been collected in the past. You must specify an explicit index or column name each time you collect statistics using COLLECT STATISTICS (QCD Form).

    You can place a COLLECT STATISTICS (QCD Form) request anywhere within a transaction because it is not treated as a DDL statement by the Transaction Manager.

    COLLECT STATISTICS (QCD Form) always collects its own fresh statistics on candidate index columns, even if a statistical profile of the specified columns or index columns already exists in the data dictionary or QCD for table_name, to ensure that index analyses are always performed using current statistics.

    Fresh statistics are also collected whenever you perform an INSERT EXPLAIN request and specify the WITH STATISTICS clause. See “INSERT EXPLAIN” on page 577 for more information.

    The larger the sample size used to collect statistics, the more likely the sampled statistics are an accurate representation of population statistics. There is a trade-off for this accuracy: the larger the sample size, the longer it takes to collect the statistics.

    You cannot use sampling to collect COLUMN statistics for the partitioning columns of row‑partitioned tables and you should not use sampling to collect INDEX statistics for those columns. The collect statistics on the system­‑derived PARTITION or PARTITION#Ln columns.

    The following example collects statistics on a single‑column NUSI named orderDateNUPI on the order_date column from a random sample of 10 percent of the rows in the orders table and writes them to the TableStatistics table of the QCD database named MyQCD.

         COLLECT STATISTICS FOR SAMPLE 10 PERCENT 
         ON orders INDEX orderDateNUPI
         INTO MyQCD; 

    The following example collects sampled statistics on the same index as “Example 1: Collect Statistics on a Single-Column NUSI” using different syntax:

         COLLECT STATISTICS FOR SAMPLE 10 PERCENT 
         ON orders INDEX (order_date)
         INTO MyQCD; 

    The following example collects statistics on the system‑derived PARTITION column for the row-partitioned orders table and writes them to a user‑defined QCD called myqcd.

    Even though you have specified a sampling percentage of 30 percent, the system ignores it and uses a value of 100 percent (see “Collecting QCD Statistics on the PARTITION Column of a Table” on page 540).

         COLLECT STATISTICS FOR SAMPLE 30 PERCENT 
         ON orders COLUMN PARTITION
         INTO myqcd;

    The following example collects multicolumn sampled statistics on the following column set for the row‑partitioned orders table and writes them to a user‑defined QCD called myqcd:

  • System‑derived PARTITION column
  • quant_ord
  • quant_shpd
  • Because you are requesting multicolumn statistics, the system honors the specified sampling percentage of 20 percent, unlike the case for single‑column PARTITION statistics. See “Collecting QCD Statistics on the PARTITION Column of a Table” on page 540 and “Example 3: Collecting Single‑Column PARTITION Statistics” on page 542.

         COLLECT STATISTICS FOR SAMPLE 20 PERCENT 
         ON orders COLUMN (quant_ord, PARTITION, quant_shpd)
         INTO myqcd; 

    For more information about index analysis, see:

  • “COLLECT DEMOGRAPHICS” on page 534
  • “INITIATE INDEX ANALYSIS” on page 560
  • “INSERT EXPLAIN” on page 577
  • “RESTART INDEX ANALYSIS” on page 590
  • SQL Request and Transaction Processing
  • “DROP STATISTICS (QCD Form)” on page 544
  • “HELP STATISTICS (Optimizer Form)” in SQL Data Definition Language Detailed Topics
  • For more information about client‑based query analysis tools, see:

  • Teradata Index Wizard User Guide
  • Teradata Viewpoint User Guide, chapter on the Stats Manager