Considerations for Choosing a Primary Index - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

Selecting the optimum primary index for a table or uncompressed join index is often a complex task because applications may favor different types of primary index or perform better using different primary indexes. Tables can have only one primary index, so you must select one that best suits the majority of the applications that a table serves. If the overhead costs justify the expense, you can define multiple join indexes with different primary indexes.

You can add additional indexes, such as secondary and join indexes, to facilitate individual applications. These indexes incur overhead costs, including:
  • Disk space required to store their subtables.
  • System performance degrades whenever base table rows are updated because the index values for any indexed columns affected by that update must also be updated.

Consider these tradeoffs when planning your indexes, then test to make sure that the assumptions that lead to your choices are correct. For example, if you design a primary index with even row distribution as your principal criterion, check that the actual distribution of table rows is even.

For applications that use range queries heavily, a partitioned primary index can provide a better solution to resolving these issues than a nonpartitioned primary index, because a partitioned primary index provides efficient access both using the primary index columns and using a constraint on the partitioning columns. Confirm that the partitioning improves query performance by carefully examining EXPLAIN reports and collecting the appropriate statistics.

Collect statistics on the PARTITION column and the partitioning columns.

The recommended practice for recollecting statistics is to set appropriate thresholds for recollection using the THRESHOLD options of the COLLECT STATISTICS statement. For details, see COLLECT STATISTICS (Optimizer Form).

Weigh the costs of the index against its benefits. This is particularly important if you have also defined a USI on the table because additional maintenance is required to enforce uniqueness, thus potentially neutralizing or even reducing the overall performance advantage of the index.

Creating a partitioned table does not guarantee that row-partition elimination plan. A partitioning cannot be used for any of the following reasons:
  • The partitioning is not applicable to the actual queries in the workload.
  • The Optimizer cost analysis for a query determines that another plan is less expensive.
  • The query does not conform to any number of restrictions.

A query plan with partitioning may not perform as well as one without partitioning.

Partitioning strategies:
  • Defining the partition expressions such that each row partition has approximately the same number of rows.

    This task is far easier for single-level PPIs than for multilevel PPIs, but can still be a goal to be approximated as best as possible.

  • Varying the number of rows per partition. For example, more frequently accessed data (such as for the current year) may be divided into finer partitions (such as weeks), but other data (such as previous years) may have coarser partitions (such as months or multiples of months).

    Partitioning in this manner can make altering the partitions more difficult.

  • Alternatively, defining each range with equal width, even if the number of rows per range varies, may be important.

The most important factors for row partitioning are accessibility and maximization of row partition elimination. Defining a primary index that distributes the rows of the table evenly across the AMPs (or having no primary index) is critical for efficient parallel processing.