15.00 - Collecting Statistics on a Hash Index - Teradata Database

Teradata Database SQL Data Definition Language Detailed Topics

prodname
Teradata Database
vrm_release
15.00
category
Programming Reference
featnum
B035-1184-015K

Collecting Statistics on a Hash Index

You should collect statistics on appropriate columns of a hash index frequently just as you would for any base table or join index. For most applications, you should collect the statistics on base table columns rather than on hash index columns. See “Why You Should Collect Statistics on Base Table Columns Instead of Hash Index Columns” on page 328 for further information.

An important exception to this guideline is the case where a hash index is defined on a complex expression that a predicate expression specifies on the relevant base table column.

A complex expression is defined as any expression that specifies something other than a simple column reference on the right hand side of a predicate. The Optimizer can only use statistics from those complex expressions that can be mapped completely to a hash index expression.

Note: You cannot collect statistics on a UDT column. This includes UDT columns that are components of an index. The Optimizer uses dynamic AMP sampling information for equality predicates on UDT columns and default selectivity for other predicates on UDT indexes for costing. The dynamic AMP sampling provides limited statistics information about the index. For example, it cannot detect nulls or skew. If a UDT index access path does not show any improved performance, you should consider dropping the index to avoid the overhead involved in its storage and maintenance.

Collecting statistics on the hash index for those expressions enhances the ability of the Optimizer to estimate single‑table cardinalities for a query that specifies the base table expressions in its predicate (see SQL Request and Transaction Processing for details).

Always consider using the SAMPLE options when you collect and recollect statistics on a hash index. See“Reducing the Cost of Collecting Statistics by Sampling” on page 179 for further information about these options and recommendations on how to use them.

When you create a hash index with a BY clause, you can collect and drop statistics on those columns using an INDEX clause to specify the column set.

For example, suppose a hash index has the following definition:

    CREATE HASH INDEX ord_hidx_8 (o_custkey, o_orderdate) ON orders
    BY (o_custkey, o_orderdate)
    ORDER BY (o_orderdate);

Then you can collect statistics on the partitioning columns as shown in the following example:

    COLLECT STATISTICS ON ord_hidx_8 INDEX (o_custkey, o_orderdate);

Note that you can only use columns specified in a BY clause definition if you use the keyword INDEX in your COLLECT STATISTICS statement.

A poorer choice would be to collect statistics using the column clause syntax. To do this, you must perform two separate requests.

    COLLECT STATISTICS ON ord_hidx_8 COLUMN o_custkey;
 
    COLLECT STATISTICS ON ord_hidx_8 COLUMN o_orderdate;

It is always better to collect the statistics for a multicolumn index on the index itself rather than individually on its component columns because its selectivity is much better when you collect statistics on the index columns as a set.

For example, a query with a condition like WHERE x=1 AND y=2 is better optimized if statistics are collected on INDEX (x,y) than if they are collected individually on column x and column y.

The same restrictions hold for the DROP STATISTICS statement.