Usage Notes - Teradata Database

SQL Data Manipulation Language

Product
Teradata Database
Release Number
16.10
Published
June 2017
Language
English (United States)
Last Update
2018-04-25
dita:mapPath
psg1480972718197.ditamap
dita:ditavalPath
changebar_rev_16_10_exclude_audience_ie.ditaval
dita:id
B035-1146
lifecycle
previous
Product Category
Teradata® Database

Invocation

Normally invoked by client-based Teradata Database query analysis tools.

Where Column and Index Statistics Are Stored

Table statistics collected by the QCD form of COLLECT STATISTICS are stored in the QCD table named TableStatistics. See SQL Request and Transaction Processing .

Rules and Guidelines for COLLECT STATISTICS (QCD Form)

Unless otherwise noted, the rules and guidelines for collecting QCD statistics are the same as those for Optimizer statistics. See “Rules and Guidelines for COLLECT STATISTICS (Optimizer Form)” in SQL Data Definition Language Detailed Topics .

Collecting QCD Statistics on Multiple Columns

If you have defined a column set as a composite index, it makes no difference whether you collect statistics on the composite using the COLUMN keyword or the INDEX keyword. The result is identical.

For example, suppose you have created an index on the column set (x1,y 1) of table t1. The following COLLECT STATISTICS requests gather the identical statistics for the named QCD into the identical QCD table columns:

     COLLECT STATISTICS FOR SAMPLE 30 PERCENT ON t1
       INTO myqcd COLUMN (x1,y1);

     COLLECT STATISTICS FOR SAMPLE 30 PERCENT ON t1
       INTO myqcd INDEX (x1,y1);

Collecting QCD Statistics on the PARTITION Column of a Table

For information about why you should collect PARTITION statistics for the Optimizer, see “Collecting Statistics on the System-Derived PARTITION Column and the Row Partitioning Column Set of a Partitioned Table” in SQL Data Definition Language Detailed Topics .

Because PARTITION is not a reserved keyword, you can use it as the name for any column in a table definition, but you should not. This practice is strongly discouraged in all cases, and particularly for partitioned tables, because if you name a non-PARTITION column partition, the system resolves it as a regular column. As a result, you cannot collect statistics on the system-derived PARTITION column of any table that also has a user-defined column named partition.

If a table is partitioned by a single-column expression, its column statistics are inherited as PARTITION statistics. In this special case, you need not collect single-column PARTITION statistics.

For example, assume the following table definition:

     CREATE TABLE table_1 (
       col_1 INTEGER,
       col_2 INTEGER)
     PRIMARY INDEX(col_1) PARTITION BY col_2;

If you collect individual column statistics on col_2, which is the partitioning column, then those statistics are also inherited as PARTITION statistics.

You can also collect PARTITION statistics on NPPI tables, which quickly provides up to date statistics on the number of rows in the table. This is because the partition number for every row in an NPPI table is 0.

You can specify a USING SAMPLE clause to collect single-column PARTITION statistics, but the specification is ignored. The system automatically resets the sampling percentage to 100. The sampling field in a detailed statistics report is always reported as 0 to document this behavior. Note that all valid sampling percentages specified for a USING SAMPLE clause are honored for multicolumn PARTITION statistics.

There is a limit of 32 sets of multicolumn statistics that can be collected on a given table. Because single-column PARTITION statistics use index ids in the same range as multicolumn statistics (between 129 and 160, inclusive), collecting PARTITION statistics effectively reduces the limit on the number of multicolumn statistics that can be collected to 31.

The system ignores user-specified column ordering for multicolumn PARTITION statistics. This is consistent with non-PARTITION multicolumn statistics and multicolumn indexes. The columns are ordered based on their internal field id. Because the system-derived PARTITION column has field id value of 0, it always takes the first position in multicolumn statistics.

COLLECT STATISTICS (QCD Form) and UDTs

You cannot collect statistics on UDT columns.

COLLECT STATISTICS (QCD Form) And Large Objects

You cannot collect statistics on BLOB or CLOB columns.

COLLECT STATISTICS (QCD Form) Does Not Collect Statistics For The Optimizer

Unlike the standard COLLECT STATISTICS statement, COLLECT STATISTICS (QCD Form) does not write the statistics it collects into the data dictionary for use by the Optimizer. The COLLECT STATISTICS (QCD Form) writes its statistics into the TableStatistics table of the specified QCD database, where the information is used by various database query analysis tools to perform index analysis and validation.

Because COLLECT STATISTICS (QCD Form) does not write its statistics to the data dictionary, it does not place locks on dictionary tables. Through appropriate control of the sample size, you can collect sample statistics for index analysis during production hours without a noticeable negative effect on system performance.

Implicit Recollection of QCD Statistics Is Not Supported

Unlike COLLECT STATISTICS (Optimizer Form), you cannot recollect statistics implicitly on indexes and columns for which statistics have been collected in the past. You must specify an explicit index or column name each time you collect statistics using COLLECT STATISTICS (QCD Form).

COLLECT STATISTICS (QCD Form) Is Not Treated As DDL By The Transaction Manager

You can place a COLLECT STATISTICS (QCD Form) request anywhere within a transaction because it is not treated as a DDL statement by the Transaction Manager.

COLLECT STATISTICS (QCD Form) Collects New Statistics Even If Statistics Exist In the Data Dictionary or QCD

COLLECT STATISTICS (QCD Form) always collects its own fresh statistics on candidate index columns, even if a statistical profile of the specified columns or index columns already exists in the data dictionary or QCD for table_name, to ensure that index analyses are always performed using current statistics.

Fresh statistics are also collected whenever you perform an INSERT EXPLAIN request and specify the WITH STATISTICS clause. See INSERT EXPLAIN for more information.

Quality of Statistics As a Function of Sample Size

The larger the sample size used to collect statistics, the more likely the sampled statistics are an accurate representation of population statistics. There is a trade-off for this accuracy: the larger the sample size, the longer it takes to collect the statistics.

You cannot use sampling to collect COLUMN statistics for the partitioning columns of row-partitioned tables and you should not use sampling to collect INDEX statistics for those columns. The collect statistics on the system-derived PARTITION or PARTITION#Ln columns.