16.10 - Hash Indexes - Teradata Database

Teradata Database Design

Teradata Database
Release Number
Release Date
June 2017
Content Type
User Guide
Publication ID
English (United States)

Hash indexes are file structures that share properties in common with both single-table join indexes and secondary indexes.

From an architectural perspective, the incorporation of auxiliary structures as a transparently embedded element of the hash index column set is what most distinctly distinguishes hash indexes from single-table join indexes. These auxiliary structures are components of the base table, and are added to the hash index definition by default if they are not explicitly declared by the CREATE HASH INDEX column set definition. Because it is not clear what the default auxiliary structures Teradata Database uses when it create a hash index, you should always consider creating an equivalent single-table join index in preference to a hash index. Also, multivalue compression from the base table may be carried over to a join index, but it is not carried over to a hash index.

If the columns you specify for the hash index column set duplicate the default auxiliary structure columns, then those columns are not added redundantly. The auxiliary structures provide indexed access to base table rows.

If you do not specify a partition key explicitly with the BY clause of the CREATE HASH INDEX statement, then the system adds this auxiliary pointer data to the hash index rows automatically and then uses it to partition them.

Comparison of Hash Indexes and Single-Table Join Indexes

The reasons for using hash indexes are similar to those for using single-table join indexes. Not only can hash indexes optionally be specified to be distributed in such a way that their rows are AMP-local with their associated base table rows, they can also provide a transparent direct access path to those base table rows to complete a query only partially covered by the index. This facility makes hash indexes somewhat similar to secondary indexes in function. Hash indexes are also useful for covering queries so that the base table need not be accessed at all.

The most apparent external difference between hash and single-table join indexes is in the syntax of the SQL statements used to create them. The syntax for CREATE HASH INDEX is similar to that for CREATE INDEX. As a result, it is simpler to create a hash index than to create a functionally comparable single-table join index.

The following list summarizes the similarities of hash and single-table join indexes.

  • The primary function of both is to improve query performance.
  • Both are maintained automatically by the system when the relevant columns of their base table are updated by a DELETE, INSERT, or UPDATE statement.
  • Both can be the object of any of the following SQL statements.
    • COLLECT STATISTICS (Optimizer Form)
  • Both receive their space allocation from permanent space and are stored in distinct tables.
  • Both can be hash- or value-ordered. You must drop and rebuild all value-ordered (but not hash-ordered) hash and join indexes after you run the Reconfig utility (see Support Utilities).
  • Both can be row compressed.
  • Both can be FALLBACK protected.
  • Both can be used to transform a complex expression into a simple index column. This transformation permits you to collect statistics on the expression, which the Optimizer can then use to make single-table cardinality estimates for a matching complex column predicate specified on the base table and for mapping a query expression that is identical to an expression defined in the join index, but is found within a non-matching predicate (see SQL Request and Transaction Processing for details).
  • Neither can be queried or directly updated.
  • A hash index cannot have a partitioned primary index, but a single-table join index can.
  • A hash index must have a primary index, but a single-table join index can be created with or without a primary index if it is column-partitioned.
  • A hash index cannot be column-partitioned, but a single-table join index can be column-partitioned.
  • Neither can be used to partially cover a query that contains a TOP n or TOP m PERCENT clause.
  • Neither can be implemented with row compression if they specify a UDT in their select list because both create an internal column1 and column2 index when compressed.
  • Neither can be defined using the system-derived PARTITION column.
  • Both share the same restrictions for use with the MultiLoad, FastLoad, and Archive/Recovery utilities.

The following table summarizes the important differences between hash and join indexes.

Hash Index Join Index
Indexes one table only. Can index multiple tables.

This is not true for column-partitioned join indexes, which can only index a single table.

A logical row corresponds to one and only one row in its referenced base table. A logical row can correspond to either of the following, depending on how the join index is defined:
  • One and only one row in the referenced base table.
  • Multiple rows in the referenced base tables.
Column list cannot specify aggregate or ordered analytical functions. Select list can specify aggregate functions.
Can specify a UDT column in its column list. Can only specify a UDT in its select list if it is not row-compressed.
Cannot have a nonunique secondary index. Can have a nonunique secondary index.
Supports transparently added, system-defined primary index columns that point to the underlying base table rows. Does not add underlying base table row pointers implicitly.

Pointers to underlying base table rows can be created explicitly by defining one element of the column list using the keyword ROWID.

Note that you can only specify ROWID in the outermost SELECT of the CREATE JOIN INDEX statement. See “CREATE JOIN INDEX” in SQL Data Definition Language Detailed Topics.

Cannot be specified for NoPI or column-partitioned base tables. Can be specified for both NoPI and column-partitioned base tables.
Cannot be column partitioned. Can be column partitioned.
Cannot be row partitioned. Primary index of uncompressed row forms and column-partitioned join indexes, can be row partitioned.
Cannot be defined on a table that also has triggers. Can be defined on a table that also has triggers.
Column multivalue compression, if defined on a referenced base table, is not added transparently by the system and cannot be specified explicitly in the hash index definition. Column multivalue compression, if defined on a referenced base table, is added transparently by the system with no user input, but cannot be specified explicitly in the join index definition.
Index row compression is added transparently by the system with no user input. Index row compression, if used, must be specified explicitly in the CREATE JOIN INDEX request by the user.

It is possible to define a join index that has nearly the identical functionality to a hash index. The only essential differences between hash and join indexes is their respective DDL creation syntax. Otherwise, the functionality of hash indexes is a proper subset of the functionality of join indexes.

Summary of Hash Index Functions

A hash index always has at least one of the following functions.

  • Replicates all, or a vertical subset, of a single base table and partitions its rows with a user-specified partition key column set, such as a foreign key column to facilitate joins of very large tables by hashing them to the same AMP.
  • Provides an access path to base table rows to complete partial covers.

The AMP software updates hash indexes automatically, so the only task a DBA must perform is to keep the statistics on hash index or base table columns current (see Collecting Statistics on Hash Index Columns).

Similarities of Hash Indexes to Base Tables

In many respects, a hash index is identical to a base table.

For example, you perform any of the following statements against a hash index:

  • COLLECT STATISTICS (Optimizer Form)
  • DROP STATISTICS (Optimizer Form)

You cannot do the following things with hash indexes:

  • Query or update hash index rows.

    For example, if ordCustHdx is a hash index, then the following query is not legal:

         SELECT o_status, o_date, o_comment
         FROM ordCustHdx;
  • Store and maintain arbitrary query results.
  • Create secondary indexes on its columns.
  • Create them on either a NoPI or column-partitioned table.

Similarities of Hash Indexes to Secondary Indexes

Hash indexes are file structures that can be used either to resolve queries by accessing the index instead of its underlying base table or to enhance access performance when they do not cover a query by providing a secondary access path to requested base table rows. They can either substitute for or point to base table rows.

Because of these properties, hash indexes are useful for queries where the index structure contains all the columns referenced by a query, thereby covering the query, and making it possible to retrieve the requested data from the index rather than accessing its underlying base tables. For obvious reasons, an index with this property is often referred to as a covering index.

Hash indexes put in double duty by also providing pointers to base table rows to facilitate their access in situations where the index does not cover the query. This application of hash indexes is similar to how the Optimizer uses secondary indexes when it creates its access plans.

Because the distribution of rows to AMPs of hash index rows is under user control through the specification of an explicit partition key in the CREATE HASH INDEX statement, rows can be distributed in various ways, depending on the requirements of an application. Distribution of secondary index rows, on the other hand, is not under user control. Because of this, they are less adaptable to the specific requirements of an individual application than hash indexes.

Both secondary and hash indexes provide access paths to base table rows and can be selected by the Optimizer to cover SQL queries. Both also share a similar DDL syntax that is very different from the syntax used to create single-table join indexes.

The key difference between hash and secondary indexes is that the hash index partition key is user-selectable, which often makes it more useful for processing queries.

Hash Index Applications

Hash indexes are useful for queries where the index table contains the columns referenced by a query, thereby allowing the Optimizer to cover it by planning to access the index rather than its underlying base table. An index that supplies all the table columns requested by a query is said to cover that table for that query and, for obvious reasons, is referred to as a covering index. Note that query covering is not restricted to hash indexes: other indexes can also cover queries.

Hash indexes can be defined on a table in place of secondary indexes. This usage makes more sense when you distribute the hash index row to the AMPs so that it facilitates a wider range of query processing than a secondary index might. Because hash indexes can potentially carry more of an update burden than secondary indexes, you should not define them on a table when a secondary index would serve the same intended function. Keep in mind that this depends on how the updates are done. For example, a single row update might be faster with a secondary index, but an INSERT … SELECT might be faster with a hash index.

Compression of Hash Index Rows

Compression refers to a logical row compression in which multiple sets of non-repeating column values are appended to a single set of repeating column values. This allows the system to store the repeating value set only once, while any non-repeating column values are stored as logical segmental extensions of the base repeating set.

When the following is true, the system automatically compresses the rows of a hash index:

  • The ORDER BY column set is specified in the column_1 list and none of the columns in the order key is unique.
  • The primary index columns are specified in the column_1 list.

    As the following table indicates, the primary index columns for a hash index are always part of the column_1 list, whether specified explicitly or not.

IF you create the index … THEN the primary index columns …
without a BY clause are part of the column_1 list by default.
with a BY clause must be part of the column_1 list because all columns specified in the BY clause column list must also be specified in the column_1 list.

Rows having the same values for the order key are compressed into a single physical row having fixed and repeating parts. If the columns do not fit into a single physical row, they spill over to additional physical rows as is necessary.

The fixed part of the row is made up of the explicitly-defined columns that define the column_1 list. The repeating part is composed of the remaining, implicitly-defined columns.

The system only compresses row sets together if they are inserted by the same INSERT statement. This means that rows that are subsequently inserted are not appended as logical rows to existing compressed row sets, but rather are compressed into their own self-contained row sets.

Fallback With Hash Indexes

You can define fallback for hash indexes. The criteria for deciding whether to define a hash index with fallback are similar to those used for deciding whether to define fallback on base tables.

If you do not define fallback for a hash index and an AMP is down, then the following additional factors become critical.

  • The hash index cannot be used by the Optimizer to solve any queries that it covers.
  • The base tables on which the hash index is defined cannot be updated.

Restrictions for NoPI Tables

You cannot create a hash index on a NoPI or a column-partitioned table because hash indexes are defined with transparently added, system-defined primary index columns from the underlying base table primary index that point to the underlying base table rows, and NoPI tables do not have a primary index.

Restrictions on Load Utilities

You cannot use FastLoad, MultiLoad, or the Teradata Parallel Transporter operators LOAD and UPDATE to load data into base tables that have hash indexes because those indexes are not maintained during the execution of these utilities. If you attempt to load data into base tables with hash indexes using these utilities, the load operation aborts and the system returns an error message to the requestor.

To load data into a hash-indexed base table, you must drop all defined hash indexes on the table before you can run FastLoad, MultiLoad, or the Teradata Parallel Transporter operators LOAD and UPDATE.

Load utilities like Teradata Parallel Data Pump, BTEQ, and the Teradata Parallel Transporter operators INSERT and STREAM, which perform standard SQL row inserts and updates, are supported for hash-indexed tables.

Restriction on Partial Coverage of Queries Containing a TOP n or TOP m PERCENT Clause

A hash index cannot be used to partially cover a query that specifies the TOP n or TOP m PERCENT option.

Compression of Hash Indexes at the Block Level

Because hash indexes are primary tables, they can be compressed at the block level. See Compression Types Supported by Teradata Database for more information about data block compression.

Further Information

Consult SQL Data Definition Language for more detailed information on creating and using hash indexes to enhance the performance of your database applications.