Summary of Primary Index Selection Criteria | Teradata Vantage - Summary of Primary Index Selection Criteria - Advanced SQL Engine - Teradata Database

Database Design

Product
Advanced SQL Engine
Teradata Database
Release Number
17.05
17.00
Published
June 2020
Language
English (United States)
Last Update
2021-01-22
dita:mapPath
qby1588121512748.ditamap
dita:ditavalPath
lze1555437562152.ditaval
dita:id
B035-1094
lifecycle
previous
Product Category
Teradata Vantage™

The following table summarizes the guidelines for selecting columns to be used as primary indexes.

Guideline Comments
Select columns that are most frequently used to access rows. Restrict selection to columns that are either unique or highly singular.
Select columns that are most frequently used in equality predicate conditions. Equality conditions permit the system to hash directly to the row having the conditional value. When the primary index is unique, the response is never more than one row.

Inequality conditions require additional processing.

Select columns that distribute rows evenly across the AMPs. Distinct values distribute evenly across all AMPs in the configuration. This maximizes parallel processing.

Rows having duplicate NUPI values hash to the same AMP and often are stored in the same data block. This is good when rows are only moderately nonunique.

Rows having NUPI columns that are highly nonunique distribute unevenly, use multiple data blocks, and incur multiple I/Os.

Extremely nonunique primary index values can skew space usage so markedly that the system returns a message indicating that the database is full even when it is not.

This occurs when an AMP exceeds the maximum bytes threshold for a user or database calculated by dividing the PERMANENT = n BYTES specification by the number of AMPs in the configuration, causing the system to incorrectly perceive the database to be “full.”

Select columns that are not volatile. Volatile columns force frequent row redistribution.
Select columns having very many more distinct values than the number of AMPs in the configuration. If this guideline is not followed, row distribution skews heavily, not only wasting disk space, but also devastating system performance.

This rule is particularly important for large tables.

Do not select columns defined with Period, ARRAY, VARRAY, Geospatial, JSON, XML, BLOB, CLOB, XML-based UDT, BLOB-based UDT, or CLOB-based UDT data types. You cannot specify columns that have BLOB, CLOB, BLOB-based UDT, CLOB-based UDT, XML-based UDT, Period, ARRAY, VARRAY, Geospatial, or JSON data types in a primary index definition. If you attempt to do so, the CREATE request aborts.

You can, however, specify Period data type columns in the partitioning expression of a partitioned table.

Do not select aggregated columns of a join index. When defining the primary index for a join index, you cannot specify any aggregated columns.

If you attempt to do so, the CREATE JOIN INDEX request aborts.