Collecting Statistics on a Single-Table Join Index - Advanced SQL Engine

Collecting Statistics on a Single-Table Join Index - Advanced SQL Engine - Teradata Database

SQL Data Definition Language Detailed Topics

Product

Advanced SQL Engine

Teradata Database

Release Number

17.10

Published

July 2021

Language

English (United States)

Last Update

2021-07-27

dita:mapPath

imq1591724555718.ditamap

dita:ditavalPath

imq1591724555718.ditaval

dita:id

B035-1184

lifecycle

Product Category

Teradata Vantage™

As is true for hash indexes, you should collect the statistics on base table columns rather than on single-table join index columns for most applications.

An important exception to this guideline is the case where a single-table join index is defined on a complex expression that is also specified as a predicate expression on the relevant base table column. Collecting statistics on the single-table join index for those expressions enhances the ability of the Optimizer to estimate single-table cardinalities for a query that specifies the base table expression in its predicate. See Collecting Statistics on Join Indexes and Teradata Vantage™ - SQL Request and Transaction Processing, B035-1142.

If the join index is sparse (see Sparse Join Indexes), you should collect statistics on the join index itself rather than on its underlying base table.

See Collecting Statistics on Base Table Columns Instead of Single-Table Join Index Columns for further information.

You can collect and drop statistics on the primary index columns of a single-table join index using an INDEX clause to specify the column set.

For example, suppose a join index has the following definition.

    CREATE JOIN INDEX OrdJIdx8 AS
      SELECT o_custkey, o_orderdate
      FROM Orders
    PRIMARY INDEX (o_custkey, o_orderdate)
    ORDER BY (o_orderdate);

Then you can collect statistics on the index columns as a single object as shown in the following example:

    COLLECT STATISTICS ON OrdJIdx8 INDEX (o_custkey, o_orderdate);

Note that you can only use columns specified in a multicolumn PRIMARY INDEX clause definition if you specify the keyword INDEX in your COLLECT STATISTICS request.

A poorer choice would be to collect statistics using the column clause syntax. To do this, you must perform two separate statements:

    COLLECT STATISTICS ON OrdJIdx8 COLUMN (o_custkey);

    COLLECT STATISTICS ON OrdJIdx8 COLUMN (o_orderdate);

It is always better to collect the statistics for multicolumn indexes on the index itself rather than individually on its component columns because its selectivity is much better when you collect statistics on the index columns as a set. Note that you can collect statistics on multiple unindexed columns as well. See Collecting Statistics on Multiple Columns.

For example, a query with a condition like WHERE x=1 AND y=2 is better optimized if statistics are collected on INDEX (x,y) than if they are collected individually on column x and column y.

The same restrictions hold for the DROP STATISTICS statement.

See Comparison of Hash and Single-Table Join Indexes for a list of the similarities between single-table join indexes and hash indexes.