Performance Considerations for Primary Indexes | Teradata Vantage - 17.10 - Performance Considerations for Primary Indexes - Advanced SQL Engine - Teradata Database

Teradata Vantageā„¢ - Database Design

Advanced SQL Engine
Teradata Database
Release Number
July 2021
English (United States)
Last Update

This topic describes general performance considerations for both unique and nonunique primary indexes, paying particular attention to how the range of primary index singularity affects performance.

Guideline for All Primary Indexes

Define the primary index for a table on as few columns as possible. Retrievals on columns that do not match the entire primary index do not use the primary index to retrieve the matching rows.

Hashing efficiency increases as the number of primary index columns decreases.

Guidelines for Row Partitioning

The key guideline for determining the optimum granularity for the row partitions of a partitioned table is the nature of the workloads that most commonly access the PPI table or join index. The higher the number of row partitions you define for a partitioned table, the more likely an appropriate range query against the table will perform more quickly, given that the partition granularity is such that the Optimizer can eliminate all but one partition.

On the other hand, it is generally best to avoid specifying too fine a partition granularity. For example if query workloads never access data at a granularity of less than one month, there is no benefit to be gained by defining partitions with a granularity of less than one month. Furthermore, unnecessarily fine partition granularity is likely to increase the maintenance load for a partitioned table, which can lead to overall system performance degradation. So even though too fine a partition granularity itself does not introduce performance degradations, the underlying maintenance on such a table can indirectly degrade performance.

You should also consider the following items if a table you are designing is planned to support tactical queries. By knowing the specifics of your workloads, you can optimize your partitioning to best suit the queries in those workloads.

  • For all-AMP tactical queries against partitioned tables, you should specify a constraint on the partitioning column set in the WHERE clause.
  • If a query joins partitioned tables that are partitioned identically, using their common partitioning column set as a constraint enhances join performance still more if you also include an equality constraint between the partitioning columns of the joined tables.

General Considerations

The following factors apply equally to UPIs and NUPIs.

  • Each table has 0 or 1, and no more than 1, primary index.
  • The primary index for a table need not be isomorphic with the primary key for the table. This is obviously true for NUPIs, because primary key values are unique by definition.
  • A primary index can contain nulls, while a primary key cannot.

    Primary index nulls are not a good thing. If you anticipate that the primary index for a table might frequently be null, you should consider using a different column.

  • All single-value primary index accesses are confined to a single AMP. In most cases, a single-value primary index access requires only 1 I/O.

    If the cylinder index is not in cache, a primary index access requires 2 I/Os.

Unique Primary Index Considerations

The following factors describe the general performance considerations associated with UPIs.

  • UPIs guarantee an even distribution of table rows across the AMPs when there are many more unique values than AMPs on the system.
  • UPIs always guarantee the best retrieval and update performance for single table row access.
  • UPI values are never associated more than one base table row.
  • UPIs retrievals and updates never require spool.
  • Uniqueness is enforced by the system using the UPI (thereby avoiding the need for duplicate row checks for multiset tables).

Nonunique Primary Index Considerations

The following factors describe the general performance considerations associated with NUPIs.

  • At best, NUPIs distribute table rows evenly across the AMPs and hash values.

    At worst, NUPI table row distribution can be skewed in various ways (see SQL Scripts for Detecting Skew) and, if so, decrement both retrieval and update performance.

  • NUPIs at best can effect good retrieval and update performance.
  • NUPI values can involve more than one base table row.
  • NUPI retrievals can require spool.
  • Duplicate NUPI values hash to the same AMP and often are stored in the same data block.
  • If all the rows for a duplicate NUPI hash value fit into a single data block, then only 1 or 2 I/Os are required to store or access the entire set.

    2 I/Os are only required when the cylinder index is not cached.

  • If duplicate rows are excluded because the table is defined as SET and has no uniqueness constraint, then the system must make a duplicate row check for every table row inserted or updated.

    See Duplicate Row Checks for SET Tables with NUPIs and Using Unique Secondary Indexes to Enforce Row Uniqueness for further details.