COLLECT STATISTICS (QCD Form) - Teradata Database

SQL Data Manipulation Language

Product
Teradata Database
Release Number
15.00
Language
English (United States)
Last Update
2018-09-28
dita:id
B035-1146
lifecycle
previous
Product Category
Teradata® Database

COLLECT STATISTICS (QCD Form)

Purpose  

Collects demographic data for one or more columns and indexes of a table, computes a statistical profile of the collected data, and stores the synopsis in the TableStatistics table of the specified QCD database.

Statistics collected by this statement are used for index analysis and validation tasks performed by various database query analysis tools: they are not used by the Optimizer to process queries. For information on COLLECT STATISTICS (Optimizer Form), see SQL Data Definition Language Syntax and Examples and SQL Data Definition Language Detailed Topics.

Syntax  

where:

 

Syntax Element …

Specifies …

FOR SAMPLE percentage

FOR SAMPLE percentage PERCENT

the percentage of table rows to be sampled to build the statistics for the specified column and index sets. The percentage value must be less than 100.

For example, if you specify a value of 5, then 5 percent of the rows for table_name on each AMP are read to collect their statistics.

This option is not recognized for single‑column statistics requests for the PARTITION column of a row-partitioned table. The system automatically increases the percentage to 100.

Note that the restriction for COLLECT STATISTICS (Optimizer Form) on using sampling with columns of a row-partitioned table that are also members of the partitioning expression for that table do not apply to COLLECT STATISTICS (QCD Form).

PERCENT is an optional keyword to indicate that the value specified for percentage is a percentage.

INTO QCD_name

the name of the QCD database into which the collected sampled statistics are to be written.

The sampled statistics are written to the TableStatistics table of the specified QCD database.

SET QUERY query_ID

a value inserted into QCD.TableStatistics.QueryId.

Each unique composite of query_ID and statistics_ID enables you to store a separate set of statistics in QCD.TableStatistics for a given column or index.

The default value is 0.

SAMPLEID statistics_ID

a value to be inserted into QCD.TableStatistics.StatisticsId.

Each unique composite of query_ID and statistics_ID enables you to store a separate set of statistics in QCD.TableStatistics for a given column or index.

The default value is 1.

UPDATE MODIFIED

that you want to store modified statistics stored in QCD.TableStatistics.ModifiedStats.

If you do not specify the UPDATE MODIFIED option, the system stores unmodified statistics in QCD.TableStatistics.StatisticsInfo.

database_name

user_name

the name of the containing database or user for table_name if different from the current database or user.

table_name

the name of the table for which the specified column and index sampled statistics and sample size are to be collected.

You cannot collect QCD statistics on volatile, journal, global temporary, or global temporary trace tables.

COLUMN column_name

the name of a column set on which sampled statistics are to be collected.

You can collect statistics on both a column set and on the system‑derived PARTITION column in the same COLLECT STATISTICS request.

You cannot collect statistics on UDT or LOB columns.

COLUMN PARTITION

that statistics are to be collected on the system‑derived PARTITION column for a table. The value collected is the system‑derived partition number, which ranges from 1 through 65,535 inclusive, or is zero for non-partitioned.

You can collect statistics on both a column set and on the system‑derived PARTITION column in the same COLLECT STATISTICS request.

Even though table_name need not identify a partitioned table, and the system does not return an error if you request PARTITION column statistics on a nonpartitioned table, the operation serves no purpose.

You cannot collect PARTITION statistics on the following PPI tables:

  • Join indexes
  • Global temporary tables
  • Volatile tables
  • INDEX index_name

    the name of the index on which sampled statistics are to be collected.

    INDEX column_name

    the names of the columns in the column set on which sampled statistics are to be collected.

    ANSI Compliance

    COLLECT STATISTICS (QCD Form) is a Teradata extension to the ANSI SQL:2011 standard.

    Required Privileges

    You must have the following privileges to perform COLLECT STATISTICS (QCD Form):

  • INDEX or DROP TABLE on table_name or its containing database.
  • INSERT and UPDATE on the TableStatistics table or its containing QCD database.
  • Invocation

    Normally invoked by client‑based Teradata Database query analysis tools.

    Where Column and Index Statistics Are Stored

    Table statistics collected by the QCD form of COLLECT STATISTICS are stored in the QCD table named TableStatistics. See SQL Request and Transaction Processing for details about the TableStatistics table.

    Rules and Guidelines for COLLECT STATISTICS (QCD Form)

    Unless otherwise noted, the rules and guidelines for collecting QCD statistics are the same as those for Optimizer statistics. See “Rules and Guidelines for COLLECT STATISTICS (Optimizer Form)” in SQL Data Definition Language Detailed Topics.

    Collecting QCD Statistics on Multiple Columns

    If you have defined a column set as a composite index, it makes no difference whether you collect statistics on the composite using the COLUMN keyword or the INDEX keyword. The result is identical.

    For example, suppose you have created an index on the column set (x1,y1) of table t1. The following COLLECT STATISTICS requests gather the identical statistics for the named QCD into the identical QCD table columns:

         COLLECT STATISTICS FOR SAMPLE 30 PERCENT ON t1 
           INTO myqcd COLUMN (x1,y1);
     
         COLLECT STATISTICS FOR SAMPLE 30 PERCENT ON t1 
           INTO myqcd INDEX (x1,y1);

    Collecting QCD Statistics on the PARTITION Column of a PPI Table

    For information about why you should collect PARTITION statistics for the Optimizer, see “Collecting Statistics on the System-Derived PARTITION Column and the Partitioning Column Set of a PPI Table” in SQL Data Definition Language Detailed Topics.

    Because PARTITION is not a reserved keyword, you can use it as the name for any column in a table definition, but you should not. This practice is strongly discouraged in all cases, and particularly for PPI tables, because if you name a non‑PARTITION column partition, the system resolves it as a regular column. As a result, you cannot collect statistics on the system‑derived PARTITION column of any PPI table that also has a user‑defined column named partition.

    If a table is partitioned by a single‑column expression, its column statistics are inherited as PARTITION statistics. In this special case, you need not collect single-column PARTITION statistics.

    For example, assume the following table definition:

         CREATE TABLE table_1 (
           col_1 INTEGER, 
           col_2 INTEGER) 
         PRIMARY INDEX(col_1) PARTITION BY col_2; 

    If you collect individual column statistics on col_2, which is the partitioning column, then those statistics are also inherited as PARTITION statistics.

    You can also collect PARTITION statistics on NPPI tables, which quickly provides up to date statistics on the number of rows in the table. This is because the partition number for every row in an NPPI table is 0.

    You can specify a USING SAMPLE clause to collect single‑column PARTITION statistics, but the specification is ignored. The system automatically resets the sampling percentage to 100. The sampling flag in a detailed statistics report (see “COLLECT STATISTICS (QCD Form)” on page 557) is always reported as 0 to document this behavior. Note that all valid sampling percentages specified for a USING SAMPLE clause are honored for multicolumn PARTITION statistics.

    There is a limit of 32 sets of multicolumn statistics that can be collected on a given table. Because single‑column PARTITION statistics use index ids in the same range as multicolumn statistics (between 129 and 160, inclusive), collecting PARTITION statistics effectively reduces the limit on the number of multicolumn statistics that can be collected to 31.

    The system ignores user‑specified column ordering for multicolumn PARTITION statistics. This is consistent with non‑PARTITION multicolumn statistics and multicolumn indexes. The columns are ordered based on their internal field id. Because the system‑derived PARTITION column has field id value of 0, it always takes the first position in multicolumn statistics.

    COLLECT STATISTICS (QCD Form) and UDTs

    You cannot collect statistics on UDT columns.

    COLLECT STATISTICS (QCD Form) And Large Objects

    You cannot collect statistics on BLOB or CLOB columns.

    COLLECT STATISTICS (QCD Form) Does Not Collect Statistics For The Optimizer

    Unlike the standard COLLECT STATISTICS statement, COLLECT STATISTICS (QCD Form) does not write the statistics it collects into the data dictionary for use by the Optimizer. The COLLECT STATISTICS (QCD Form) writes its statistics into the TableStatistics table of the specified QCD database, where the information is used by various database query analysis tools to perform index analysis and validation.

    Because COLLECT STATISTICS (QCD Form) does not write its statistics to the data dictionary, it does not place locks on dictionary tables. Through appropriate control of the sample size, you can collect sample statistics for index analysis during production hours without a noticeable negative effect on system performance.

    Implicit Recollection of QCD Statistics Is Not Supported

    Unlike COLLECT STATISTICS (Optimizer Form), you cannot recollect statistics implicitly on indexes and columns for which statistics have been collected in the past. You must specify an explicit index or column name each time you collect statistics using COLLECT STATISTICS (QCD Form).

    COLLECT STATISTICS (QCD Form) Is Not Treated As DDL By The Transaction Manager

    You can place a COLLECT STATISTICS (QCD Form) request anywhere within a transaction because it is not treated as a DDL statement by the Transaction Manager.

    COLLECT STATISTICS (QCD Form) Collects New Statistics Even If Statistics Exist In the Data Dictionary or QCD

    COLLECT STATISTICS (QCD Form) always collects its own fresh statistics on candidate index columns, even if a statistical profile of the specified columns or index columns already exists in the data dictionary or QCD for table_name, to ensure that index analyses are always performed using current statistics.

    Fresh statistics are also collected whenever you perform an INSERT EXPLAIN request and specify the WITH STATISTICS clause. See “INSERT EXPLAIN” on page 596 for more information.

    Quality of Statistics As a Function of Sample Size

    The larger the sample size used to collect statistics, the more likely the sampled statistics are an accurate representation of population statistics. There is a trade-off for this accuracy: the larger the sample size, the longer it takes to collect the statistics.

    You cannot use sampling to collect COLUMN statistics for the partitioning columns of PPI tables and you should not use sampling to collect INDEX statistics for those columns. The collect statistics on the system­‑derived PARTITION or PARTITION#Ln columns.

    Example  

    The following example collects statistics on a single‑column NUSI named orderDateNUPI on the order_date column from a random sample of 10 percent of the rows in the orders table and writes them to the TableStatistics table of the QCD database named MyQCD.

         COLLECT STATISTICS FOR SAMPLE 10 PERCENT 
         ON orders INDEX orderDateNUPI
         INTO MyQCD; 

    Example  

    The following example collects sampled statistics on the same index as “Example 1” using different syntax:

         COLLECT STATISTICS FOR SAMPLE 10 PERCENT 
         ON orders INDEX (order_date)
         INTO MyQCD; 

    Example : Collecting Single‑Column PARTITION Statistics

    The following example collects statistics on the system‑derived PARTITION column for the PPI orders table and writes them to a user‑defined QCD called myqcd.

    Even though you have specified a sampling percentage of 30 percent, the system ignores it and uses a value of 100 percent (see “Collecting QCD Statistics on the PARTITION Column of a PPI Table” on page 560).

         COLLECT STATISTICS FOR SAMPLE 30 PERCENT 
         ON orders COLUMN PARTITION
         INTO myqcd;

    Example : Collecting Multicolumn PARTITION Statistics

    The following example collects multicolumn sampled statistics on the following column set for the PPI orders table and writes them to a user‑defined QCD called myqcd:

  • System‑derived PARTITION column
  • quant_ord
  • quant_shpd
  • Because you are requesting multicolumn statistics, the system honors the specified sampling percentage of 20 percent, unlike the case for single‑column PARTITION statistics (see “Collecting QCD Statistics on the PARTITION Column of a PPI Table” on page 560 and “Example 3: Collecting Single‑Column PARTITION Statistics” on page 562).

         COLLECT STATISTICS FOR SAMPLE 20 PERCENT 
         ON orders COLUMN (quant_ord, PARTITION, quant_shpd)
         INTO myqcd; 

    For More Information

    For more information about index analysis, see:

  • “COLLECT DEMOGRAPHICS” on page 554
  • “INITIATE INDEX ANALYSIS” on page 579
  • “INSERT EXPLAIN” on page 596
  • “RESTART INDEX ANALYSIS” on page 609
  • SQL Request and Transaction Processing.
  • “DROP STATISTICS (QCD Form)” on page 564.
  • “HELP STATISTICS (Optimizer Form)” in SQL Data Definition Language Detailed Topics.
  • For more information about client‑based query analysis tools, see:

  • Teradata Index Wizard User Guide
  • Teradata Viewpoint User Guide, chapter on the Stats Manager