16.20 - Collecting Statistics on a Hash Index - Teradata Database - Teradata Vantage NewSQL Engine

Teradata Vantage™ SQL Data Definition Language Detailed Topics

Product
Teradata Database
Teradata Vantage NewSQL Engine
Release Number
16.20
Release Date
March 2019
Content Type
Programming Reference
Publication ID
B035-1184-162K
Language
English (United States)

You should collect statistics on appropriate columns of a hash index frequently just as you would for any base table or join index. For most applications, you should collect the statistics on base table columns rather than on hash index columns. See Collect Statistics on Base Table Columns Instead of Hash Index Columns.

An important exception to this guideline is the case where a hash index is defined on a complex expression that a predicate expression specifies on the relevant base table column.

A complex expression is defined as any expression that specifies something other than a simple column reference on the right hand side of a predicate. The Optimizer can only use statistics from those complex expressions that can be mapped completely to a hash index expression.

You cannot collect statistics on a UDT column. This includes UDT columns that are components of an index. The Optimizer uses dynamic AMP sampling information for equality predicates on UDT columns and default selectivity for other predicates on UDT indexes for costing. The dynamic AMP sampling provides limited statistics information about the index. For example, it cannot detect nulls or skew. If a UDT index access path does not show any improved performance, you should consider dropping the index to avoid the overhead involved in its storage and maintenance.

Collecting statistics on the hash index for those expressions enhances the ability of the Optimizer to estimate single-table cardinalities for a query that specifies the base table expressions in its predicate. See Teradata Vantage™ SQL Request and Transaction Processing, B035-1142.

Always consider using the SAMPLE options when you collect and recollect statistics on a hash index. See Reducing the Cost of Collecting Statistics by Sampling for further information about these options and recommendations on how to use them.

When you create a hash index with a BY clause, you can collect and drop statistics on those columns using an INDEX clause to specify the column set.

For example, suppose a hash index has the following definition:

    CREATE HASH INDEX ord_hidx_8 (o_custkey, o_orderdate) ON orders
    BY (o_custkey, o_orderdate)
    ORDER BY (o_orderdate);

Then you can collect statistics on the partitioning columns as shown in the following example:

    COLLECT STATISTICS ON ord_hidx_8 INDEX (o_custkey, o_orderdate);

Note that you can only use columns specified in a BY clause definition if you use the keyword INDEX in your COLLECT STATISTICS statement.

A poorer choice would be to collect statistics using the column clause syntax. To do this, you must perform two separate requests.

    COLLECT STATISTICS ON ord_hidx_8 COLUMN o_custkey;

    COLLECT STATISTICS ON ord_hidx_8 COLUMN o_orderdate;

You should collect the statistics for a multicolumn index on the index itself rather than on the individual columns because you have more selectivity when you collect statistics on the index columns as a set.

For example, a query with a condition like WHERE x=1 AND y=2 is better optimized if statistics are collected on INDEX (x,y) than if they are collected individually on column x and column y.

The same restrictions hold for the DROP STATISTICS statement.