16.10 - Selecting a Secondary Index - Teradata Database

Teradata Database Design

Product
Teradata Database
Release Number
16.10
Release Date
June 2017
Content Type
User Guide
Publication ID
B035-1094-161K
Language
English (United States)

When assigning columns to be a secondary index for a table, there are numerous factors to consider, the most important of all being the selectivity of the index.

While USI retrievals are always very efficient, the efficiency of NUSI retrievals varies greatly depending on their selectivity.

You can use the Teradata Index Wizard client utility to assist your selection of secondary and single-table join indexes for a table. The utility recommends a set of indexes based on specific query workloads you supply. The Teradata Index Wizard can also validate its recommendations to ensure they actually provide the benefits expected. See Teradata Index Wizard User Guide for further information about how to use the utility and SQL Request and Transaction Processing for information about how the utility performs its component tasks.

Optimal Data Access

Selectivity is a relative term that refers to the number of rows returned by an index. Most retrievals aim to return only a select few rows: very specific answers in response to a very specific request.

An index that returns a small number of rows is said to be highly selective. This is a positive attribute.

Indexes that return a large number of rows are said to have low selectivity. This is generally a negative attribute; so negative that, as often as not, the Optimizer selects a full-table scan over a NUSI with low selectivity because the full-table scan can be less costly.

All UPIs and USIs are highly selective by definition, as are most well-chosen NUPIs. High selectivity is favored not only because of its precision, but also because of its low cost, involving a very small number of disk I/Os, which is always a performance-enhancing attribute.

Criteria for Selecting a Secondary Index

The following rules of thumb and performance considerations apply to selecting a unique or nonunique column set as a secondary index for a table.

  • Consider naming secondary indexes whenever possible using a standard naming convention.
  • Avoid assigning secondary indexes to frequently updated column sets.
  • Avoid assigning secondary indexes to columns with lumpy distributions because there is a slight chance the Optimizer might mistake their usefulness.
  • Avoid creating excessive secondary indexes on a table, particularly for a table used heavily, or even moderately, for OLTP processing. The less frequently the table is updated, the more desirable a multiple index solution.
  • Consider building secondary indexes on column sets frequently involved in the following clauses, predicates, and other logical operations:
    • Selection criteria
    • Join criteria
    • ORDER BY clauses
    • GROUP BY clauses
    • Foreign keys (because of join and subquery processing)
    • UNION, DISTINCT, and other sort operations

      When these operations act on well-indexed column sets, the number of scans and sorts that must be performed on the data by the database manager can be greatly reduced.

  • Consider creating USIs for NoPI tables that require frequent single-row access because the only alternative is a full-table scan.
  • Consider creating NUSIs for NoPI tables that require frequent set selection access because the only alternative is a full-table scan.
  • Consider creating a simple NUSI on geospatial columns that are frequently queried. This is especially true for requests that contain geospatial predicate terms, geospatial join terms, or both.

    Note the following about geospatial secondary indexes:

    • You cannot create a USI on a geospatial column.
    • You cannot create a composite geospatial NUSI. Geospatial indexes can only be defined on a single geospatial column.
  • Consider creating covering indexes when possible and cost effective (including considering the cost of maintaining the index). The Optimizer frequently selects covering indexes to substitute for a base table access whenever the overall cost of the query plan is reduced. Such index-only access promotes faster retrievals.

    Many applications are well served by join indexes, which can be used profitably in many covering situations where multiple columns are frequently joined. See Join and Hash Indexes for further information about join indexes.

  • Consider creating secondary indexes on columns frequently operated on by built-in functions such as aggregates.
  • Consider assigning a uniqueness constraint such as PRIMARY KEY, UNIQUE, or USI, as appropriate, to the primary or other alternate key of any table built with a NUPI. This both enforces uniqueness, eliminating the burden of making row uniqueness checks, and enhances retrieval for applications where the primary or other alternate key is frequently used as a selection or join criterion.

    This guideline is situational and is contingent on a number of factors. The various factors involved in the recommendation are described in Using Unique Secondary Indexes to Enforce Row Uniqueness.

    A primary or alternate key USI might not be a good decision for a table that is frequently updated by OLTP applications.

  • Plan to dynamically drop and recreate secondary indexes to accommodate specific processing and performance requirements such as bulk data loading utilities, database archives, and so on.

    Create appropriate macros to perform these drop and create index operations if you need to undertake such specific processing tasks regularly.

  • Ensure that your indexes are being used as planned by submitting EXPLAIN request modifiers to audit index selection for those queries they are designed to facilitate.

    Indexes that are never selected by the Optimizer are a burden to the system for the following reasons.

    • They consume disk resources that could profitably be used to store data or indexes that are used.
    • They degrade update processing performance unnecessarily.
  • You can include UDT columns in a secondary index definition.
  • Never attempt to include columns defined with an XML, BLOB, CLOB, BLOB-based UDT, CLOB-based UDT, XML-based UDT, Period, JSON, ARRAY, or VARRAY data type in a secondary index definition.

    You can define a simple NUSI, but not a composite NUSI, on a geospatial column.

    You cannot include a column defined with a geospatial data type in a USI definition.

  • Never attempt to define a secondary index on a global temporary trace table. See SQL Data Definition Language.