15.00 - Indexes - Teradata Database

Teradata Database SQL Fundamentals

prodname
Teradata Database
vrm_release
15.00
category
Programming Reference
featnum
B035-1141-015K

Indexes

An index is a mechanism that the SQL query optimizer can use to make table access more performant. Indexes enhance data access by providing a more-or-less direct path to stored data to avoid performing full table scans to locate the small number of rows you typically want to retrieve or update.

The Teradata Database parallel architecture makes indexing an aid to better performance, not a crutch necessary to ensure adequate performance. Full table scans are not something to be feared in the Teradata Database environment. This means that the sorts of unplanned, ad hoc queries that characterize the data warehouse process, and that often are not supported by indexes, perform very effectively for Teradata Database using full table scans.

Classic Indexes and Teradata Database Indexes

The classic index for a relational database is itself a file made up of rows having these parts:

  • A (possibly unique) data field in the referenced table.
  • A pointer to the location of that row in the base table (if the index is unique) or a pointer to all possible locations of rows with that data field value (if the index is nonunique).
  • Because Teradata Database is a massively parallel architecture, it requires a more efficient means of distributing and retrieving its data. One such method is hashing. All Teradata Database indexes are based on row hash values rather than raw table column values, even though secondary, hash, and join indexes can be stored in order of their values to make them more useful for satisfying range conditions.

    Selectivity of Indexes

    An index that retrieves many rows is said to have weak selectivity.

    An index that retrieves few rows is said to be strongly selective.

    The more strongly selective an index is, the more useful it is. In some cases, it is possible to link together several weakly selective nonunique secondary indexes by bit mapping them. The result is effectively a strongly selective index and a dramatic reduction in the number of table rows that must be accessed.

    For more information on linking weakly selective secondary indexes into a strongly selective unit using bit mapping, see “NUSI Bit Mapping” on page 39.

    Row Hash and RowID

    Teradata Database table rows are self-indexing with respect to their primary index and so require no additional storage space. When a row is inserted into a table, Teradata Database stores the 32-bit row hash value of the primary index with it.

    Because row hash values are not necessarily unique, Teradata Database also generates a unique 32-bit numeric value (called the Uniqueness Value) that it appends to the row hash value, forming a unique RowID. This RowID makes each row in a table uniquely identifiable and ensures that hash collisions do not occur.

    If a table is defined with a partitioned primary index (PPI), the RowID also includes the combined partition number to which the row is assigned, where the combined partition number is derived from the partition numbers for each level of the PPI. For more information on PPIs, see “Partitioned and Nonpartitioned Primary Indexes” on page 31.

     

    For tables with no PPI …

    For tables with a PPI …

    The first row having a specific row hash value is always assigned a uniqueness value of 1, which becomes the current high uniqueness value. Each time another row having the same row hash value is inserted, the current high value increments by 1, and that value is assigned to the row.

    The first row having a specific combined partition number and row hash value is always assigned a uniqueness value of 1, which becomes the highest current uniqueness value. Each time another row having the same combined partition number and row hash value is inserted, the current high value increments by 1, and that value is assigned to the row.

    Table rows having the same row hash value are stored on disk sorted in the ascending order of RowID.

    Uniqueness values are not reused except for the special case in which the highest valued row within a row hash is deleted from a table.

    Table rows having the same combined partition number and row hash value are stored on disk sorted in the ascending order of RowID.

    Uniqueness values are not reused except for the special case in which the highest valued row within a combined partition number and row hash is deleted from a table.

    A RowID for a row might change, for instance, when a primary index or partitioning column is changed, or when there is complex update of the table.

    Index Hash Mapping

    Rows are distributed across the AMPS using a hashing algorithm that computes a row hash value based on the primary index. The row hash is a 32-bit value. Depending on the system setting for the hash bucket size, either the higher-order 16 bits or the higher-order 20 bits of a hash value determine an associated hash bucket.

    Normally, the hash bucket size is 20 bits for new systems. If you are upgrading from an older release, the hash bucket size may be 16 bits. If a 20-bit hash bucket size is more appropriate for the size of a system, you can use the DBS Control Utility to change the system setting for the hash bucket size. For details, see “DBS Control utility” in Utilities: Volume 1 (A-K).

     

    IF the hash bucket size is …

    THEN the number of hash buckets is …

    16 bits

    65536.

    20 bits

    1048576.

    The hash buckets are distributed as evenly as possible among the AMPs on a system.

    Teradata Database maintains a hash map—an index of which hash buckets live on which AMPs—that it uses to determine whether rows belong to an AMP based on their row hash values. Row assignment is performed in a manner that ensures as equal a distribution as possible among all the AMPs on a system.

    Advantages of Indexes

    The intent of indexes is to lessen the time it takes to retrieve rows from a database. The faster the retrieval, the better.

    Disadvantages of Indexes

  • They must be updated every time a row is updated, deleted, or added to a table.
  • This is only a consideration for indexes other than the primary index in the Teradata Database environment. The more indexes you have defined for a table, the bigger the potential update downside becomes.

    Because of this, secondary, join, and hash indexes are rarely appropriate for OLTP situations.

  • All Teradata Database secondary indexes are stored in subtables, and join and hash indexes are stored in separate tables, exerting a burden on system storage space.
  • When FALLBACK is defined for a table, a further storage space burden is created because secondary index subtables are always duplicated whenever FALLBACK is defined for a table. An additional burden on system storage space is exerted when FALLBACK is defined for join indexes or hash indexes or both.
  • For this reason, it is extremely important to use the EXPLAIN modifier to determine optimum data manipulation statement syntax and index usage before putting statements and indexes to work in a production environment. For more information on EXPLAIN, see SQL Data Manipulation Language.

    Teradata Database Index Types

  • Primary index
  • In general, all Teradata Database tables require a primary index because the system distributes tables on their primary indexes. Primary indexes can be:

  • Unique or nonunique
  • Partitioned or nonpartitioned
  • Secondary index
  • Secondary indexes can be unique or nonunique.

  • Join index (JI)
  • Hash index
  • Unique Indexes

    A unique index, like a primary key, has a unique value for each row in a table.

    Teradata Database defines two different unique indexes:

  • Unique primary index (UPI)
  • UPIs provide optimal data distribution and are typically assigned to the primary key for a table. When a NUPI makes better sense for a table, then the primary key is frequently assigned to be a USI.

  • Unique secondary index (USI)
  • USIs guarantee that each complete index value is unique, while ensuring that data access based on it is always a two-AMP operation.

    Nonunique Indexes

    A nonunique index does not require its values to be unique. There are occasions when a nonunique index is the best choice as the primary index for a table.

    NUSIs are also very useful for many decision support situations.

    Partitioned and Nonpartitioned Primary Indexes

    Primary indexes can be partitioned or nonpartitioned.

    A nonpartitioned primary index (NPPI) is the traditional primary index by which rows are assigned to AMPs.

    A partitioned primary index (PPI) allows rows to be partitioned, based on some set of columns, on the AMP to which they are distributed, and ordered by the hash of the primary index columns within the partition.

    A PPI can improve query performance through partition elimination. A PPI provides a useful alternative to an NPPI for executing range queries against a table, while still providing efficient access, join, and aggregation strategies on the primary index.

    A multilevel PPI allows each partition at a level to be subpartitioned based on a partitioning expression, where the maximum number of levels is 15.

    A multilevel PPI provides multiple access paths to the rows in the base table and can improve query performance through partition elimination at each of the various levels or combination of levels.

    A PPI can only be defined as unique if all the partitioning columns are included in the set of primary index columns.

    Join Indexes

    A join index is an indexing structure containing columns from one or more base tables and is generally used to resolve queries and eliminate the need to access and join the base tables it represents.

    Teradata Database join indexes can be defined in the following general ways.

  • Simple or aggregate
  • Single-table or multitable
  • Hash-ordered or value‑ordered
  • Complete or sparse
  • For details, see “Join Indexes” on page 31.

    Hash Indexes

    Hash indexes are used for the same purposes as are single-table join indexes, and are less complicated to define. However, a join index offers more choices.

    For additional information, see “Hash Indexes” on page 32.

    Creating Indexes For a Table

    Use the CREATE TABLE statement to define a primary index and one or more secondary indexes. (You can also define secondary indexes using the CREATE INDEX statement.) You can define the primary index (and any secondary index) as unique, depending on whether duplicate values are to be allowed in the indexed column set. A partitioned primary index cannot be defined as unique if one or more partitioning columns are not included in the primary index.

    To create hash or join indexes, use the CREATE HASH INDEX and CREATE JOIN INDEX statements, respectively.

    Determining the Usefulness of Indexes

    The selection of indexes to support a query is not under user control. You cannot provide the Teradata Database query optimizer with pragmas or hints, nor can you specify resource options or control index locking.

    The only references made to indexes in the SQL language concern their definition and not their use. Teradata SQL data manipulation language statements do not provide for any specification of indexes.

    There are several implications of this behavior.

  • To ensure that the optimizer has access to current information about how to best optimize any query or update made to the database, you must use the COLLECT STATISTICS statement to collect statistics regularly on all indexed columns, all frequently joined columns, and all columns frequently specified in query predicates.
  • To ensure that your queries access your data in the most efficient manner possible:
  • Use the EXPLAIN request modifier or the Teradata Visual Explain client utility to try out various candidate queries or updates and to note which indexes are used by the optimizer in their execution (if any) as well as to examine the relative cost of the operation.
  • Use the Teradata Index Wizard client utility to recommend and validate sets of secondary indexes, single-table join indexes, and PPIs for a given query workload.
  •  

    For more information on …

    See …

    using the EXPLAIN request modifier

    SQL Data Manipulation Language

    using the Teradata Visual Explain client utility

    Teradata Visual Explain User Guide

    additional performance-related information about how to use the access and join plan reports produced by EXPLAIN to optimize the performance of your databases

    Database Design

    using the Teradata Index Wizard client utility

  • Teradata Index Wizard User Guide
  • SQL Request and Transaction Processing
  • collecting and maintaining accurate database statistics

    “COLLECT STATISTICS” in SQL Data Definition Language