COLLECT STATISTICS Syntax Elements (Optimizer Form) - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905
SUMMARY
Update only object-level statistical information such as cardinality, one-AMP and all-AMPs sampling estimates, average row size, average block size, and so on.
Calculates an estimated compression ratio for the primary subtable of tables compressed manually with block-level compression. The compression ratio is also calculated implicitly when statistics are collected on a column or index. The compression information is collected for informational purposes only and is not used by the Optimizer in determining the execution plan.
If you specify SUMMARY, you cannot specify USING options, column references, or explicit COLUMN or INDEX references.
You cannot specify SUMMARY for volatile tables.

using_option


You cannot specify USING options with the SUMMARY option.

SAMPLE
Scan a system-determined percentage of table rows to collect the specified statistics. SAMPLE has the same meaning as SYSTEM SAMPLE and is only provided for backward compatibility to enable existing COLLECT STATISTICS scripts that specify the USING SAMPLE option to continue to run. Use the SYSTEM SAMPLE option instead of SAMPLE.
Geospatial NUSIs are sampled at 100%.
SYSTEM SAMPLE
Scan a system-determined percentage of table rows to collect statistics.
The database may collect a sample of 100% multiple times before downgrading the sample percentage to a lower value.
SYSTEM SAMPLE is the default if you do not specify a sample option.
You can only specify this option if you also specify an explicit column or index. You cannot specify SYSTEM SAMPLE for a standard recollection of statistics on an implicitly specified column or index set.
Geospatial NUSIs are sampled at 100%.
SAMPLE n PERCENT
Scans the percentage of table rows that you specify rather than scanning all of the rows in the table to collect statistics, where n is a decimal or an integer from 2 through 100.
Specifying SAMPLE 100 PERCENT is equivalent to collecting full statistics.
For recollection of statistics, SAMPLE n PERCENT overrides any previous SAMPLE option specifications and instead scans n percent of the rows in the table.
Geospatial NUSIs are sampled at 100%.
NO SAMPLE
Use a full-table scan to collect the specified statistics.
You can only specify this option if you also specify an explicit index or column set.
Geospatial NUSIs are sampled at 100%.
SYSTEM THRESHOLD
Do not collect statistics if the percentage of changed data and the age of the current statistics are below the values determined by the system.
This option is valid only for tables.
You can only specify this option if you also specify an explicit index or column set.
You cannot use this option to collect statistics on geospatial NUSIs.
SYSTEM THRESHOLD PERCENT
Change the percentages. This is the default if you do not specify a PERCENT threshold option.
This option is valid only for tables.
You can only specify this option if you also specify an explicit index or column set.
You cannot use this option to collect statistics on geospatial NUSIs.
SYSTEM THRESHOLD DAYS
The age in days of the current statistics. SYSTEM THRESHOLD DAYS is the default if you do not specify a DAYS threshold option.
This option is only valid for tables.
The system does not enforce day thresholds, so recollection of statistics is determined by the PERCENT threshold option.
You can only specify this option if you also specify an explicit index or column set.
You cannot use this option to collect statistics on geospatial NUSIs.
THRESHOLD n PERCENT
Recollect statistics if the percentage of change in the statistics exceeds the specified percentage, where n is a decimal or an integer.
Statistics are not recollected if:
  • The age of the statistics and the percentage of change do not exceed the values specified for THRESHOLD n DAYS and THRESHOLD n PERCENT.
  • The percentage of change does not exceed the number specified in THRESHOLD n PERCENT and NO THRESHOLD DAYS is specified.
You cannot use this option to collect statistics on geospatial NUSIs.
THRESHOLD n DAYS
Recollect statistics if the age of the statistic is greater than or equal to the number of days specified, where n is an integer from 1 through 9999.
This option is only valid for tables.
You can only specify this option if you also specify an explicit index or column set.
Statistics are not recollected if:
  • The age of the statistics and the percentage of change do not exceed the values specified for THRESHOLD n DAYS and THRESHOLD n PERCENT.
  • The age of statistics does not exceed the specified number of days specified in THRESHOLD n DAYS and NO THRESHOLD PERCENT is specified.
You cannot use this option to collect statistics on geospatial NUSIs.
NO THRESHOLD
One of the following:
  • Do not apply any thresholds to the collection of statistics.
  • Remove the existing threshold before collecting the statistics.
You can only specify this option if you also specify an explicit index or column set.
You cannot use this option to collect statistics on geospatial NUSIs.
NO THRESHOLD PERCENT
One of the following:
  • Do not apply a PERCENT change threshold to the collection of statistics.
  • Remove the existing PERCENT change threshold before collecting the statistics.
You can only specify this option if you also specify an explicit index or column set.
The effect of this option varies, depending on whether the request is for a first collection of statistics or for a recollection:
  • For the first collection of statistics, NO THRESHOLD PERCENT overrides the system default setting.
  • For recollections of statistics, NO THRESHOLD PERCENT overrides any previous change threshold percent specification.
Statistics are not recollected if the age of the statistics does not exceed the specified number of days specified in THRESHOLD n DAYS and NO THRESHOLD PERCENT is specified.
You cannot use this option to collect statistics on geospatial NUSIs.
NO THRESHOLD DAYS
One of the following:
  • Do not apply a DAYS threshold to the collection of statistics.
  • Remove the existing DAYS threshold before collecting the statistics.
You can only specify this option if you also specify an explicit index or column set.
For recollection of statistics, NO THRESHOLD DAYS overrides any previous THRESHOLD n DAYS specification.
Statistics are not recollected if the percentage of change does not exceed the number specified in THRESHOLD n PERCENT and NO THRESHOLD DAYS is specified.
You cannot use this option to collect statistics on geospatial NUSIs.
SYSTEM MAXINTERVALS
The system-determined maximum number of intervals is used for this histogram.
You can only specify this option if you also specify an explicit index or column set.
This option is only valid for tables.
You cannot use this option to collect statistics on a geospatial NUSI.
MAXINTERVALS n
Specifies the maximum number of histogram intervals to be used for the collected statistics, where n an integer from 0 through 500. The database may adjust the specified maximum number of intervals depending on the maximum histogram size.
This option is valid only for tables.
You can only specify this option if you also specify an explicit index or column set.
If the optimizer uses summary statistics most of the time and rarely uses detailed statistics, consider lowering the MAXINTERVALS value you specify to 0 to make the Optimizer aware of its statistics usage. However, make sure that this changed does not degrade the performance of queries that use detailed statistics.
You cannot use this option to collect statistics on a geospatial NUSI.
SYSTEM MAXVALUELENGTH
The system-determined maximum column width for histogram values such as MinValue, ModeValue, MaxValue, and so on.
This option is valid only for tables.
You can only specify this option if you also specify an explicit index or column set.
You cannot use this option to collect statistics on a geospatial NUSI.
MAXVALUELENGTH n
Specifies the maximum size for histogram values such as MinValue, ModeValue, MaxValue, and so on, where n is an integer.
For single-character statistics on CHARACTER and VARCHAR columns, n specifies the number of characters. For all other options, n specifies number of bytes.
You can only specify this option if you also specify an explicit index or column set.
You cannot use this option to collect statistics on a geospatial NUSI.
This option is valid only for tables.
FOR CURRENT
The USING options specified for the current statistics recollection request are to be used only for the current COLLECT STATISTICS request and that the previously specified options for the request are to be used for any future recollections of statistics, unless otherwise overridden.

index_specification

Index for which statistics are to be collected.

UNIQUE
The index for which statistics are to be collected is a unique index. This option is not supported for geospatial indexes, because a geospatial index must be a NUSI.
index_name
The name of the named index for which statistics are to be collected. Specify column names with a named index only when the index is defined with an ORDER BY clause.
ALL
The ALL option was used to create the index. You cannot use this option to collect statistics on a geospatial NUSI.
column_name_1
The names of one or more columns on which the index exists and for which statistics are to be collected.
You cannot collect statistics on columns with a data type of Period, JSON, XML, BLOB, CLOB.
However, you can collect statistics on the BEGIN and END expressions for a Period type column.
A geospatial NUSI column cannot be specified with other columns.
Statistics can be collected on extracted portions of the JSON type. See Collecting Statistics on JSON Data.
ORDER BY
An ORDER BY clause was used to create the index.
ORDER BY orders the index by the contents of a single column.
You cannot use this option to collect statistics on a geospatial NUSI.
VALUES
The ORDER BY VALUES option was used to create the index.
VALUES can order a single numeric column of four bytes or less. VALUES is the default.
You cannot use this option to collect statistics on a geospatial NUSI.
HASH
The ORDER BY HASH option was used to create the index.
HASH hash-orders a single column instead of hash-ordering all columns (the default).
You cannot use this option to collect statistics on a geospatial NUSI.
column_name_2
The name of the column on which the index is ordered.
You cannot collect statistics on columns with a data type of Period, JSON, XML, BLOB, CLOB.
However, you can collect statistics on the BEGIN and END expressions for a Period type column.
Statistics can be collected on extracted portions of the JSON type. See Collecting Statistics on JSON Data.

column_specification

Statistics are to be collected for a column set.

expression
An expression or a list of expressions for which statistics are to be collected. The result of the expression must be a data type on which field statistics can be collected directly, such as an integer, character, or date. Expressions can reference scalar UDFs, and the BEGIN or END expressions of a Period type column. See SQL Expressions.
You cannot specify nondeterministic expressions such as RANDOM, nondeterministic UDFs, and nondeterministic CAST or EXTRACT operations.
column_name
The names of the columns for which statistics are to be collected.
The maximum number of columns on which joint statistics can be collected for a single COLLECT STATISTICS request is 64.
You cannot collect statistics on columns with a data type of Period, JSON, XML, BLOB, CLOB.
You can collect statistics on the BEGIN and END expressions for a Period type column.
A geospatial NUSI column cannot be specified with other columns.
Statistics can be collected on extracted portions of the JSON type. See Collecting Statistics on JSON Data.
PARTITION
Statistics are to be collected on the system-derived PARTITION column set for a table.
Collect PARTITION statistics on tables that are partitioned by row or column, because the Optimizer uses those statistics to do the costing and cardinality estimation based on the partition elimination. For all partitioned tables, include the system-derived PARTITION column set in any refreshment operation.
You cannot use this option to collect statistics on a geospatial NUSI.
You cannot collect statistics on the system-derived PARTITION#Ln columns.
statistics_name
The statistics collected are to be saved under statistics_name.
A statistics_name is required if statistics are for other than column references. When recollecting statistics, if the column ordering is different or if the expressions do not find an exact match with existing statistics, the Optimizer collects the statistics as new. Naming the statistics and using the names during recollections makes sure that the existing statistics are recollected instead of creating a new set of statistics.
You cannot specify this option for COLUMN PARTITION or COLUMN (PARTITION).
You can use the statistics name for recollections, copies, transfers, HELP STATISTICS, SHOW STATISTICS, and DROP STATISTICS.

collection_source

Following are the ON clause options.

TEMPORARY
Statistics are to be collected for a materialized instance of a global temporary table in the current session. A request to collect statistics on a global temporary table materializes the global temporary table, if the table is not already materialized in the session.
You cannot use this option to collect statistics on a geospatial NUSI. This keyword is not valid for permanent tables.
You cannot collect system-derived PARTITION or PARTITION#L n statistics for global temporary tables.
database_name
The containing database for the table_name or join_index_name, if other than the default database.
user_name
The containing user for the table_name or join_index_name, if other than the default database.
table_name_1
The name of the table or global temporary table for which column or index statistics are to be collected. You cannot collect statistics on journal tables.
join_index_name
The name of the join index for which statistics are to be collected.

from_option

You can specify the following options.

TEMPORARY
The name of the global temporary source table from which statistics are to be copied to the target table.
database_name
The containing database for the table_name_2 or join_index_name, if other than the default database.
user_name
The containing user for the table_name_2 or join_index_name, if other than the default database.
table_name_2
The name of the base source table from which statistics are to be copied to the target table.
join_index_name
The name of the join index for which statistics are to be collected.
column_name_3
The column name set, partition, or index for which statistics are to be copied from the source table to the target table.
PARTITION
The column name set, partition, or index for which statistics are to be copied from the source table to the target table.