Considerations for Choosing a Primary Index - Teradata Database

Teradata Database Design

Product
Teradata Database
Release Number
15.10
Language
English (United States)
Last Update
2018-10-06
Product Category
Software

Considerations for Choosing a Primary Index

Selecting the optimum primary index for a table or uncompressed join index is often a complex task because some applications might favor one type of primary index, while other applications might perform more optimally using a different primary index. Tables can have only one primary index, however, so you must select one that best suits the majority of the applications that a table serves. Of course, if the overhead costs justify the expense, you can define multiple join indexes with different primary indexes.

You can always add additional indexes, such as secondary, hash, and join indexes, to facilitate particular applications. Be aware that these indexes all incur various overhead costs, including:

  • Disk space required to store their subtables.
  • System performance degrades whenever base table rows are updated because the index values for any indexed columns affected by that update must also be updated.
  • You should always consider these tradeoffs when planning your indexes, then be sure to test them to ensure that the assumptions that lead to your choices are correct. For example, if you design a primary index with even row distribution as your principal criterion, analyze the actual distribution of table rows to ensure that they are evenly distributed.

    For many applications, particularly those that use range queries heavily, a partitioned primary index can provide a better solution to resolving these issues than a nonpartitioned primary index because it provides efficient access both via the primary index columns as well as via a constraint on the partitioning columns. As always, you should confirm that the partitioning actually improves query performance by carefully examining EXPLAIN reports and collecting the appropriate statistics.

    You should always collect statistics on the PARTITION column and the partitioning columns.

    The recommended practice for recollecting statistics is to set appropriate thresholds for recollection using the THRESHOLD options of the COLLECT STATISTICS statement. See “COLLECT STATISTICS in SQL Data Definition Language for details on how to do this.

    You should also weigh the costs of the index against the benefits it provides. This is particularly important if you have also defined a USI on the table because additional maintenance is required to enforce uniqueness, thus potentially neutralizing or even reducing the overall performance advantage of the index.

    Creating a partitioned table does not guarantee that row-partition elimination plan. A partitioning might not be used for any of the following common reasons:

  • It is not applicable to the actual queries in the workload.
  • The Optimizer cost analysis for a query determines that another plan is less expensive.
  • The query does not conform to any number of restrictions.
  • In some cases, a query plan with partitioning might not perform as well as one without partitioning (see “Row‑Partitioned and Nonpartitioned Primary Index Access for Typical Operations” on page 351 for specific examples).

    Various partitioning strategies can be followed.

  • For some applications, defining the partition expressions such that each row partition has approximately the same number of rows might be an effective strategy.
  • This task is far easier for single‑level PPIs than for multilevel PPIs, though it can still be thought of as a goal to be approximated as best as possible.

  • For other applications, having a varying number of rows per partition might be desirable. For example, more frequently accessed data (such as for the current year) might be divided into finer partitions (such as weeks) but other data (such as previous years) may have coarser partitions (such as months or multiples of months).
  • Note that partitioning in this manner can make altering the partitions more difficult.

  • Alternatively, defining each range with equal width, even if the number of rows per range varies, might be important.
  • The most important factors for row partitioning are accessibility and maximization of row partition elimination. In all cases, defining a primary index (or having no primary index) that distributes the rows of the table fairly evenly across the AMPs is critical for efficient parallel processing. See “Evaluating the Relative Merits of Partitioning Versus Not Partitioning” on page 361 for further information.