Using COLLECT STATISTICS | VantageCloud Lake - General Usage Guidelines for COLLECT STATISTICS (Optimizer Form) - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

Why You Need to Collect Statistics

The purpose of collecting statistics is to compute statistical data that the Optimizer uses to optimize table access and join plans. The computed statistical data is retained in a synopsis data structure that is stored in the data dictionary for use during the optimizing phase of SQL request parsing.

Accurate demographics enable the Optimizer to determine the least costly access and join plans for a query. Accurate table demographic information is especially important when data to be accessed using a column or index may be skewed.

Naming Collected Statistics

When you collect statistics on an index or column set, you can optionally specify a name for them. A name is required if statistics are for other than column references.

You can later use the name for recollections, copies, transfers, help information, show information, and for dropping the collected statistics. The rules for naming statistics are the same as the rules for naming database objects, which are documented in System Validated Object Names.

When you recollect statistics, if the column ordering is different, the Optimizer treats the statistics as new. By naming the statistics you collect and then using the names when you recollect statistics guarantees that the existing statistics are recollected instead of creating a new set of statistics.

Duplicate names are not allowed for table or constant expression statistics. You can only use COLUMN specification to recollect statistics using statistics names.

PARTITION Statistics and ALTER TABLE TO CURRENT Requests

Refresh PARTITION statistics when an ALTER TABLE TO CURRENT request for a table or join index completes because the existing PARTITION statistics are longer be valid after the reconciliation process.

Statistics on the partitioning column itself are not stale and are still valid if you do not specify a null partition handler, but refresh the statistics on a partitioning column if you specify a null partition handler.

Location of Stored Statistics

Table statistics collected by the Optimizer form of COLLECT STATISTICS are stored in the form of interval histograms in DBC.StatsTbl. See How the AMP Software Collects Statistics.

Collecting and Recollecting Statistics

Because Vantage does a full-table scan when collecting full statistics on a table, or an index scan when collecting statistics on an index (except when collecting only PARTITION statistics), the process can take a long time. See Reducing the Cost of Collecting Statistics by Sampling. The exception is collecting statistics on the system-derived column PARTITION columns, a fast operation. The scan duration depends on table size, number of partitions, system configuration, and system workload.

Additional resources are consumed by the creation of interval histograms and the computation of the statistical measures used to summarize the characteristics of a column set or index. See Interval Histograms.

Columns that are not indexed or that are unique indexes take more time to process than NUSI columns.

Because of the time needed to collect and recollect statistics, consider specifying one or more of the USING options, particularly those that establish recollection thresholds, to establish rules for when Vantage recollects statistics. For more information about these options, see Comparison of Full and Sampled Statistics and Using the THRESHOLD Options to Collect and Recollect Statistics

Using the THRESHOLD Options to Collect and Recollect Statistics.

Collect statistics on newly created, empty tables. This defines the columns, indexes, and partitions for a partitioned table and the synoptic data structures for subsequent collection of statistics and demographics. For a definition of synoptic data structure, see Optimal Times to Collect or Recollect Statistics.

Also recollect statistics on newly created tables after population.

When you perform a HELP STATISTICS request on an empty table, the Unique Value column returns a value of 0.

FOR CURRENT Option

If you specify the FOR CURRENT option in a USING clause, Vantage applies that USING clause only for the current collection of statistics.

If you do not specify FOR CURRENT, then Vantage applies the USING clause to the current statistics collection and all subsequent collections of statistics on the specified column set.

Locks and Concurrency

When you perform a COLLECT STATISTICS request, the system places an ACCESS lock on the table from which the demographic data is being collected. The database places a rowhash-level WRITE lock on DBC.StatsTbl after the statistics have been collected and holds the lock only long enough to update the dictionary rows. You can easily see this behavior documented in the boldface text in stage 4 of the following EXPLAIN of a COLLECT STATISTICS request on COLUMN x1.

EXPLAIN COLLECT STATISTICS ON t1 COLUMN x1;
*** Help information returned. 18 rows.
*** Total elapsed time was 1 second.

Explanation
----------------------------------------------------------------
1) First, we lock DF2.t1 for access.
2) Next, we do a COLLECT STATISTICS step from DF2.t1 by way
   of an all-rows scan into Spool 3 (all_amps), which is built
   locally on the AMPs.
3) Then we save the UPDATED STATISTICS from Spool 3 (Last Use) into
   Spool 4, which is built locally on the AMP derived from 
    DBC.StatsTbl by way of the primary index.
 4) We lock DBC.StatsTbl for write on a RowHash. 
5) We do a single-AMP MERGE DELETE to DBC.StatsTbl from Spool 4 
   (Last Use) by way of a RowHash match scan.  New updated rows are
   built and the result goes into Spool 5 (one-amp), which is built
   locally on the AMPs.
6) We do a single-AMP MERGE into DBC.StatsTbl from Spool 5 (Last Use).
7) We spoil the parser's dictionary cache for the table.
8) Finally, we send out an END TRANSACTION step to all AMPs involved 
   in processing the request.
-> No rows are returned to the user as the result of statement 1.

These locks prevent parsing of new requests against the data table for which the statistics are being collected until the operation completes.

COLLECT STATISTICS requests can run concurrently with DML requests, other COLLECT STATISTICS requests, DROP STATISTICS requests, and HELP STATISTICS requests against the same table.

Do not collect statistics for the first time concurrently on the same table. However, you can collect or recollect statistics from different tables and run other statistics-related requests such as DROP STATISTICS, HELP STATISTICS, and SHOW STATISTICS concurrently.

COLLECT STATISTICS requests can also run concurrently with a CREATE INDEX request against the same table if the requests do not specify the same index.

The following are exceptions to this rule. The exceptions assume you do not specify a LOCKING modifier on the table with the COLLECT STATISTICS request (or if you do specify a LOCKING modifier, the modifier requests only ACCESS-level locking).
  • If a transaction containing a DML request places an EXCLUSIVE lock on a table, the DML request blocks COLLECT STATISTICS requests, or a COLLECT STATISTICS request blocks the other DML request, depending on which request places its table lock first. COLLECT STATISTICS uses an ACCESS lock on the table by default. Do not specify an EXCLUSIVE lock on a table with a DML request.
  • A COLLECT STATISTICS request is blocked from getting its row-hash WRITE locks on DBC.StatsTbl. These locks are requested near the completion of the COLLECT STATISTICS operation and are held only long enough to update the dictionary rows with the newly collected statistics.

    This can cause other requests to be blocked if the dictionary information for the table is not already cached. After getting the WRITE lock, the COLLECT STATISTICS request finishes quickly and parsing of the blocked requests resumes.