Guidelines for Choosing Primary Index Columns - Advanced SQL Engine

Guidelines for Choosing Primary Index Columns - Advanced SQL Engine - Teradata Database

Database Administration

Product

Advanced SQL Engine

Teradata Database

Release Number

17.10

Published

July 2021

Language

English (United States)

Last Update

2021-07-27

dita:mapPath

upb1600054424724.ditamap

dita:ditavalPath

upb1600054424724.ditaval

dita:id

B035-1093

lifecycle

Product Category

Teradata Vantage™

There are three essential factors to consider when choosing a primary index:

Uniform data distribution (the most important consideration)
Optimal access to the data
The volatility of indexed column values

Use the following guidelines for selecting columns to be used as primary indexes:

Select columns that consist mainly of unique, distinct values.
This is the most important consideration. Columns with distinct values distribute data rows evenly across all AMPs in the configuration. This maximizes parallel processing by allowing all the processors to participate in processing the data rows for a request.

Avoid using columns with a small number of distinct values that are repeated frequently or columns that have many nulls. This will cause uneven distribution of data rows resulting in some AMPs having more rows to process than others, and will increase the potential for performance bottlenecks.

Recommendation: Select columns having significantly more distinct values than the number of AMPs in the configuration.
Select columns that are most frequently used in equality predicate conditions.
Select columns that are most frequently used to access rows.
The PI provides the most efficient method to access data in a table. Therefore, choosing the PI on the most frequently used access path provides for optimal data access. For example, if the table is frequently joined with a specific set of tables, consider defining the PI on the column set that is typically used as the join condition.

Equality conditions permit direct access to the row. Inequality conditions require additional processing.
Select columns that do not have any of the following data types: BLOB, CLOB, Period, UDT, ST_Geometry, MBR.
Choose as few columns as possible for the PI to optimize its generality.
Select primary indexes so that query plans use AMP-local processing and avoid row redistributions.
Select columns that are not volatile.
Volatile columns force frequent row redistribution because Advanced SQL Engine assigns table rows to AMPs based on the PI values.

Several DBSControl values, including those for AMP buffer size, affect actual row distribution.
Recommendation: Select a PI column set that will never be (preferred) or rarely be updated.

You may encounter conflicts when selecting the optimum primary index for a table. For example, choosing a PI that gives good distribution of data across the AMPs must be balanced against choosing a PI that reflects the most common usage pattern of the table. Also, some applications might favor one type of primary index, while other applications might perform optimally using a different primary index.

Recommendation: Make even data distribution, not data access patterns, the main priority when selecting a primary index. Tables and queries change. Access patterns change. However, data distribution does not. You can always add additional non-primary indexes, such as secondary and join indexes, to facilitate particular applications.

Related Information

Topic	Resources for Further Information
Indexing: Primary indexes Partitioned primary indexes Secondary indexes Join indexes (materialized views)	Teradata Vantage™ - Database Design, B035-1094 Teradata Vantage™ - Database Introduction, B035-1091 Teradata Vantage™ - SQL Fundamentals, B035-1141
No primary index tables	Teradata Vantage™ - Database Design, B035-1094
Referential integrity	Teradata Vantage™ - SQL Fundamentals, B035-1141