Summary of Primary Index Selection Criteria | Teradata Vantage - Summary of Primary Index Selection Criteria - Advanced SQL Engine

Summary of Primary Index Selection Criteria | Teradata Vantage - Summary of Primary Index Selection Criteria - Advanced SQL Engine - Teradata Database

Database Design

Product

Advanced SQL Engine

Teradata Database

Release Number

17.05

17.00

Published

June 2020

Language

English (United States)

Last Update

2021-01-22

dita:mapPath

qby1588121512748.ditamap

dita:ditavalPath

lze1555437562152.ditaval

dita:id

B035-1094

lifecycle

Product Category

Teradata Vantage™

The following table summarizes the guidelines for selecting columns to be used as primary indexes.

Guideline	Comments
Select columns that are most frequently used to access rows.	Restrict selection to columns that are either unique or highly singular.
Select columns that are most frequently used in equality predicate conditions.	Equality conditions permit the system to hash directly to the row having the conditional value. When the primary index is unique, the response is never more than one row. Inequality conditions require additional processing.
Select columns that distribute rows evenly across the AMPs.	Distinct values distribute evenly across all AMPs in the configuration. This maximizes parallel processing. Rows having duplicate NUPI values hash to the same AMP and often are stored in the same data block. This is good when rows are only moderately nonunique. Rows having NUPI columns that are highly nonunique distribute unevenly, use multiple data blocks, and incur multiple I/Os. Extremely nonunique primary index values can skew space usage so markedly that the system returns a message indicating that the database is full even when it is not. This occurs when an AMP exceeds the maximum bytes threshold for a user or database calculated by dividing the PERMANENT = n BYTES specification by the number of AMPs in the configuration, causing the system to incorrectly perceive the database to be “full.”
Select columns that are not volatile.	Volatile columns force frequent row redistribution.
Select columns having very many more distinct values than the number of AMPs in the configuration.	If this guideline is not followed, row distribution skews heavily, not only wasting disk space, but also devastating system performance. This rule is particularly important for large tables.
Do not select columns defined with Period, ARRAY, VARRAY, Geospatial, JSON, XML, BLOB, CLOB, XML-based UDT, BLOB-based UDT, or CLOB-based UDT data types.	You cannot specify columns that have BLOB, CLOB, BLOB-based UDT, CLOB-based UDT, XML-based UDT, Period, ARRAY, VARRAY, Geospatial, or JSON data types in a primary index definition. If you attempt to do so, the CREATE request aborts. You can, however, specify Period data type columns in the partitioning expression of a partitioned table.
Do not select aggregated columns of a join index.	When defining the primary index for a join index, you cannot specify any aggregated columns. If you attempt to do so, the CREATE JOIN INDEX request aborts.