Guidelines for Choosing Primary Index Columns - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905
There are three essential factors to consider when choosing a primary index:
  • Uniform data distribution (the most important consideration)
  • Optimal access to the data
  • The volatility of indexed column values
Use the following guidelines for selecting columns to be used as primary indexes:
  • Select columns that consist mainly of unique, distinct values.

    This is the most important consideration. Columns with distinct values distribute data rows evenly across all AMPs in the configuration. This maximizes parallel processing by allowing all the processors to participate in processing the data rows for a request.

    Avoid using columns with few distinct values that are repeated frequently or columns that have many nulls. This causes uneven distribution of data rows, resulting in some AMPs having more rows to process than others, and increases the potential for performance bottlenecks.

    Recommendation: Select columns having significantly more distinct values than the number of AMPs in the configuration.

  • Select columns that are most frequently used in equality predicate conditions.
  • Select columns that are most frequently used to access rows.

    The PI provides the most efficient method to access data in a table. Therefore, choosing the PI on the most frequently used access path provides for optimal data access. For example, if the table is frequently joined with a specific set of tables, consider defining the PI on the column set that is typically used as the join condition.

    Equality conditions permit direct access to the row. Inequality conditions require additional processing.

  • Select columns that do not have any of the following data types: BLOB, CLOB, Period,ST_Geometry, MBR.
  • Minimize the number of columns for the PI to optimize its generality.
  • Select primary indexes so that query plans use AMP-local processing and avoid row redistributions.
  • Select columns that are not volatile.

    Volatile columns force frequent row redistribution because Analytics Database assigns table rows to AMPs based on the PI values.

You may encounter conflicts when selecting the optimum primary index for a table. For example, a PI that gives good distribution of data across the AMPs must be balanced against a PI that reflects the most common usage pattern of the table. Also, different applications may perform optimally using different primary indexes.

Recommendation: Make even data distribution, not data access patterns, the main priority when selecting a primary index. Tables and queries change. Access patterns change. However, data distribution does not. You can add additional non-primary indexes, such as secondary and join indexes, to facilitate individual applications.

Related Information

Topic Resources for Further Information
Indexing:
  • Primary indexes
  • Partitioned primary indexes
  • Secondary indexes
  • Join indexes (materialized views)
No primary index tables NoPI Tables, Column-Partitioned NoPI Tables, and Column-Partitioned NoPI Join Indexes
Referential integrity
Referential integrity is supported only on the Block File System on the primary cluster, not on the Object File System.
Referential Integrity