Automating Statistics Collection and Monitoring | Teradata Vantage - Automating Statistics Collection and Monitoring - Advanced SQL Engine - Teradata Database

Application Programming Reference

Product
Advanced SQL Engine
Teradata Database
Release Number
17.05
17.00
Published
June 2020
Language
English (United States)
Last Update
2021-01-22
dita:mapPath
cpn1571792172880.ditamap
dita:ditavalPath
lze1555437562152.ditaval
dita:id
B035-1090
lifecycle
previous
Product Category
Teradata Vantage™

You can manage statistics collection manually using Automated Statistics Management open APIs or automatically using the Teradata Viewpoint Stats Manager portlet. For more information about this portlet, see the Teradata Viewpoint documentation.

To identify any already existing statistics that you want to incorporate into automated procedure:

  1. Call an Automate-related open API.

    This API copies the qualifying statistics definitions from the DBC dictionary to the TDSTATS database.

    The TDSTATS database only stores metadata required to conduct Automated Statistic Management specific tasks on the statistics. You can use Teradata Studio or Teradata Studio Express to view details about the TDSTATS database, including tables, table columns, and views.

    For a given database or table, the initial Automate operation is performed once.

    If you subsequently collect additional statistics on these same tables or create new tables within a previously automated database, any resulting new dictionary stored statistics can be copied by calling the Automate operation again.

    Teradata recommends that you periodically call the Automate operation on the same objects to ensure all statistics on those objects are automated. You can repeat the same Automate-related call as many times as needed.

  2. After a given set of statistics are automated, call the following open APIs:
    1. PrepCollect to generate a prioritized list of collections based on the automated statistics definitions stored in the TDSTATS database. For more information, see Preparing and Collecting Statistics or PrepCollect.
    2. RunCollect or ReCollectTable to perform a recollection on the set of statistics. For more information, see Preparing and Collecting Statistics.
    3. DeAutomateStats to reverse the operation or the SelectAutomatedStats open API to list the statistics that are currently automated. For more information, see DeAutomateStats or SelectAutomatedStats.

Orphaned Statistics

If you issue an SQL DROP STATISTICS, DROP TABLE, or DROP DATABASE statement on a statistic that was previously automated, the resulting entry in the TDSTATS database is orphaned (that is, the PrepCollect API will not prepare a collection on that particular statistic again). For more information about this open API, see PrepCollect.

To remove an orphaned statistic from the TDSTATS database, you can call the ResyncStats open API. For more information about this open API, see ResyncStats.

Statistics from Dropped and Recreated Objects

A table is said to be Reincarnated after being temporarily dropped and subsequently recreated with the same column definitions, but potentially different data. Reincarnated tables are common to the ETL process where a "query" table is briefly dropped and then immediately recreated by renaming a "shadow" table that contains data from the original query table plus some recently loaded data. All reincarnated tables share the same underlying condition of having a newly assigned internal table id stored in their definition in system table DBC.TVM. Rather than leaving them orphaned and eventually removed, users have the option of repairing the TDStats stored definitions by marking them for preservation. For more information, see Open API PreserveAfterRecreate.

Example of Preserving TDStats Data During Table Recreation

A typical ETL flow involving a previously automated query table that is reincarnated by renaming a shadow table requires the following SQL statements and API calls.

Initial setup steps (performed once):

  1. CREATE TABLE QueryTable (….);
  2. Load or INSERT data into QueryTable
  3. COLLECT STATISTICS …. ON QueryTable; /* user defined stats on columns and indexes */
  4. CALL AutomateStats('Db1','QueryTable',…);
  5. CALL PreserveAfterRecreate('Db1','QueryTable',NULL,:NumStatsToPreserve);
    Step number 5 is required to ensure that TDStats metadata for QueryTable will survive the table recreation.

Table recreation steps (periodically repeated):

  1. CREATE TABLE ShadowTable AS Db1.QueryTable WITH DATA AND STATS;
  2. Load additional data into ShadowTable using bulk load methods
  3. COLLECT SUMMARY STATISTICS ON ShadowTable; /* refresh summary-only stats */
  4. DROP TABLE QueryTable;
  5. RENAME TABLE ShadowTable AS QueryTable;
  6. CALL ResyncStats('Db1','QueryTable',:ResyncId,:NumRepaired);
    Step number 6 is required to ensure that TDStats metadata for QueryTable will survive the table recreation.

Excluded Statistics

Automate-related open APIs automatically exclude statistics (such as Geospatial statistics) that do not support the SQL COLLECT STATISTICS statement with THRESHOLD option.

If a statistic is excluded, you will receive a warning message from the AutomateReport open API highlighting the statistic (for details, see AutomateReport).

For more information about the SQL COLLECT STATISTICS statement with THRESHOLD option, see Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144.

Functionality

If you want to ... Use the following SQL interface...
copy statistics definitions for a specified database or table from the DBC dictionary to the TDSTATS database AutomateStats.
copy the definition of a single specified statistic from the dictionary to the TDSTATS database AutomateSingleStat.
copy the definitions of those statistics identified by a prior Analyzer-related operation as actively used but not yet automated to the TDSTATS database AutomateUsedStats.
remove specified statistics definitions from the TDSTATS database DeAutomateStats.
list statistics definitions copied or removed by a prior invocation of an Automate-related open API AutomateReport.
list automated statistics definitions that currently reside in the TDSTATS database SelectAutomatedStats.
remove orphaned statistics definitions from the TDSTATS database that are no longer defined in the dictionary ResyncStats.