You should collect statistics on appropriate columns of a hash index frequently just as you would for any base table or join index. For most applications, you should collect the statistics on base table columns rather than on hash index columns. See Collect Statistics on Base Table Columns Instead of Hash Index Columns.
An important exception to this guideline is the case where a hash index is defined on a complex expression that a predicate expression specifies on the relevant base table column.
A complex expression is defined as any expression that specifies something other than a simple column reference on the right hand side of a predicate. The Optimizer can only use statistics from those complex expressions that can be mapped completely to a hash index expression.
Collecting statistics on the hash index for those expressions enhances the ability of the Optimizer to estimate single-table cardinalities for a query that specifies the base table expressions in its predicate. See Teradata Vantage™ SQL Request and Transaction Processing, B035-1142.
Always consider using the SAMPLE options when you collect and recollect statistics on a hash index. See Reducing the Cost of Collecting Statistics by Sampling for further information about these options and recommendations on how to use them.
When you create a hash index with a BY clause, you can collect and drop statistics on those columns using an INDEX clause to specify the column set.
For example, suppose a hash index has the following definition:
CREATE HASH INDEX ord_hidx_8 (o_custkey, o_orderdate) ON orders BY (o_custkey, o_orderdate) ORDER BY (o_orderdate);
Then you can collect statistics on the partitioning columns as shown in the following example:
COLLECT STATISTICS ON ord_hidx_8 INDEX (o_custkey, o_orderdate);
Note that you can only use columns specified in a BY clause definition if you use the keyword INDEX in your COLLECT STATISTICS statement.
A poorer choice would be to collect statistics using the column clause syntax. To do this, you must perform two separate requests.
COLLECT STATISTICS ON ord_hidx_8 COLUMN o_custkey; COLLECT STATISTICS ON ord_hidx_8 COLUMN o_orderdate;
You should collect the statistics for a multicolumn index on the index itself rather than on the individual columns because you have more selectivity when you collect statistics on the index columns as a set.
For example, a query with a condition like WHERE x=1 AND y=2 is better optimized if statistics are collected on INDEX (x,y) than if they are collected individually on column x and column y.
The same restrictions hold for the DROP STATISTICS statement.