15.10 - Primary Indexes - Teradata Database

Teradata Database SQL Fundamentals

Product
Teradata Database
Release Number
15.10
Language
English (United States)
Last Update
2018-10-06

The primary index for a table controls the distribution and retrieval of the data for that table across the AMPs. Both distribution and retrieval of the data is controlled using the Teradata Database hashing algorithm (see “Row Hash and RowID” on page 28).

If the primary index is defined as a partitioned primary index (PPI), the data is partitioned, based on some set of columns, on each AMP, and ordered by the hash of the primary index columns within the partition.

Data accessed based on a primary index is always a one-AMP operation because a row and its index are stored on the same AMP. This is true whether the primary index is unique or nonunique, and whether it is partitioned or nonpartitioned.

In general, most Teradata Database tables require a primary index. (Some tables in the Data Dictionary do not have primary indexes and a global temporary trace table is not allowed to have a primary index.) To create a primary index, use the CREATE TABLE statement.

If you do not assign a primary index explicitly when you create a table, Teradata Database assigns a primary index, based on the following rules.

 

Primary Index

Primary
Key

Unique Column Constraint

Teradata Database selects the …

no

YES

no

primary key column set to be a UPI.

no

no

YES

first column or columns having a UNIQUE constraint to be a UPI.

no

YES

YES

primary key column set to be a UPI.

no

no

no

first column defined for the table to be a NUPI.

If the data type of the first column in the table is a LOB, then the CREATE TABLE operation aborts and the system returns an error message.

In general, the best practice is to specify a primary index instead of having Teradata Database select a default primary index.

When choosing the primary index for a table, there are two essential factors to keep in mind: uniform distribution of the data and optimal access.

With respect to uniform data distribution, consider the following factors:

  • The more distinct the primary index values, the better.
  • Rows having the same primary index value are distributed to the same AMP.
  • Parallel processing is more efficient when table rows are distributed evenly across the AMPs.
  • With respect to optimal data access, consider the following factors:

  • Choose the primary index on the most frequently used access path.
  • For example:

  • If rows are generally accessed by a range query, consider defining a PPI on the table that creates a useful set of partitions.
  • If the table is frequently joined with a specific set of tables, consider defining the primary index on the column set that is typically used as the join condition.
  • Primary index operations must provide the full primary index value.
  • Primary index retrievals on a single value are always one-AMP operations.
  • Although it is true that the columns you choose to be the primary index for a table are often the same columns that define the primary key, it is also true that primary indexes often comprise fields that are neither unique nor components of the primary key for the table.

    In addition to uniform distribution of data and optimal access considerations, other guidelines and performance considerations apply to selecting a unique or a nonunique column set as the primary index for a table.

    Other considerations can include:

  • Primary and other alternate key column sets
  • The value range seen when using predicates in a WHERE clause
  • Whether access can involve multiple rows or a spool file or both
  • For more information on criteria for selecting a primary index, see Database Design.

    The decision to define a single-level or multilevel Partitioned Primary Index (PPI) for a table depends on how its rows are most frequently accessed. PPIs are designed to optimize range queries while also providing efficient primary index join strategies and may be appropriate for other classes of queries. Performance of such queries is improved by accessing only the rows of the qualified partitions.

    A PPI increases query efficiency by avoiding full table scans without the overhead and maintenance costs of secondary indexes.

    The most important factors for PPIs are accessibility and maximization of partition elimination. In all cases, it is critical for parallel efficiency to define a primary index that distributes the rows of the table fairly evenly across the AMPs.

    For more information on partitioning considerations, including information on column partitioning, called Teradata Columnar, see Database Design, SQL Data Definition Language Detailed Topics, and SQL Data Manipulation Language.

    Restrictions apply to the columns you choose to be the primary index for a table. For partitioned primary indexes, further restrictions apply to the columns you choose to be partitioning columns. Here are some of the restrictions:

  • A primary index column or partitioning column cannot be a column that has a CLOB, BLOB, ST_Geometry, MBR, or Period data type.
  • Partitioning expressions for single-level PPIs may only use the following general forms:
  • Column (must be INTEGER or a data type that casts to INTEGER)
  • Expressions based on one or more columns, where the expression evaluates to INTEGER or a data type that casts to INTEGER
  • The CASE_N and RANGE_N functions
  • Partitioning expressions of a multilevel PPI may only specify CASE_N and RANGE_N functions.
  • You cannot compress columns that are members of the primary index column set or are partitioning columns.
  • Other restrictions apply to the partitioning expression for PPIs. For example:

  • Comparison of character data where the server character set is KANJI1 or KANJISJIS is not allowed.
  • The expression for the RANGE_N test value must evaluate to BYTEINT, SMALLINT, INTEGER, DATE, CHAR, VARCHAR, GRAPHIC or VARGRAPHIC.
  • Nondeterministic partitioning expressions are not allowed, including cases that might not report errors, such as casting TIMESTAMP to DATE, because of the potential for wrong results.
  • For details on all the restrictions that apply to primary indexes and partitioned primary indexes, see CREATE TABLE in SQL Data Definition Language.

  • Defined with the CREATE TABLE data definition statement.
  • CREATE INDEX is used only to create secondary indexes.

  • Modified with the ALTER TABLE data definition statement.
  • Some modifications, such as partitioning and primary index columns, require an empty table.

  • Automatically assigned by CREATE TABLE if you do not explicitly define a primary index. However, the best practice is to always specify the primary index, because the default may not be appropriate for the table.
  • Can be composed of as many as 64 columns.
  • A maximum of one can be defined per table.
  • Can be partitioned or nonpartitioned.
  • Partitioned primary indexes are not automatically assigned. You must explicitly define a partitioned primary index.

  • Can be unique or nonunique.
  • Note that a partitioned primary index can only be unique if all the partitioning columns are also included as primary index columns. If the primary index does not include all the partitioning columns, uniqueness on the primary index columns may be enforced with a unique secondary index on the same columns as the primary index.

  • Defined as nonunique if the primary index is not defined explicitly as unique or if the primary index is specified for a single column SET table.
  • Controls data distribution and retrieval using the Teradata Database hashing algorithm.
  • Improves performance when used correctly in the WHERE clause of an SQL data manipulation statement to perform the following actions.
  • Single-AMP retrievals
  • Joins between tables with identical primary indexes, the optimal scenario
  • Partition elimination when the primary index is partitioned
  • For more information on using primary indexes to enhance the performance of your databases, see Database Design.