16.10 - Unique Secondary Indexes - Teradata Database

Teradata Database Design

Product
Teradata Database
Release Number
16.10
Release Date
June 2017
Content Type
User Guide
Publication ID
B035-1094-161K
Language
English (United States)

USIs are always preferable to NUSIs for row access using a single value, but might not be as efficient as NUSIs for range query access. The usual criterion for choosing between a USI and a NUSI is the data to which it is applied. Data to be indexed tends to be either inherently unique or inherently nonunique.

USIs are useful both for base table access (because USI access is, at worst, a two-AMP operation) and for enforcing data integrity by applying a uniqueness constraint on a column set. Like a unique primary index, a unique secondary index can be used to guarantee row uniqueness.

USIs can be assigned to any column you want to constrain to contain unique values, including row-level security constraint columns.

Using Unique Secondary Indexes to Enforce Row Uniqueness

When a non-primary index uniqueness constraint is created, whether it is a PRIMARY KEY or UNIQUE constraint, Teradata Database implements it as a USI.

As a general guideline, whenever you define a primary index for a multiset table to be a NUPI, particularly if the table is created in ANSI/ISO session mode (where the default for tables is multiset), you should consider defining one of the following uniqueness constraints on its primary key or other alternate key to facilitate row access and joins.

  • Unique secondary index
  • UNIQUE NOT NULL constraint
  • PRIMARY KEY NOT NULL constraint

    PRIMARY KEY and UNIQUE constraints are both mapped internally as USIs unless they are used to define the default UPI for a table. See Primary Index Defaults.

Of course, you should always consider adding such constraints, including unique join indexes (see Functions of Single-Table Join Indexes), to your tables when they facilitate row access or joins. This is particularly true for NoPI tables, because specifying a USI in a request is the only way to access a single row in a NoPI table (see NoPI Tables, Column-Partitioned NoPI Tables, and Column-Partitioned NoPI Join Indexes). This is true unless a NUSI happens to be defined that indexes only a single row.

All manner of database constraints are often useful for query optimization, and the richer the constraint set specified for a database, the more opportunities there are to enhance query optimization.

The likely benefits of adding uniqueness constraints are not restricted to multiset NUPI tables. It is also true that if you create a SET NUPI table, the system performs duplicate row checks by default unless you place a uniqueness constraint on the table. Unique constraint enforcement is often a less costly method of enforcing row uniqueness than system duplicate row checks.

Avoid defining a uniqueness constraint on the primary or alternate key of a multiset NUPI table solely to enforce row uniqueness because MultiLoad and FastLoad do not support target tables with non-primary index uniqueness constraints. You can avoid MultiLoad and FastLoad problems associated with indexes by loading data into a table that is otherwise identical with the target table, but has no constraints or non-primary indexes defined on it. After loading the data into that table, you can then INSERT … SELECT it into the target table that has the desired constraints and indexes defined on it.

Furthermore, USIs impose a performance cost because their index subtables must be maintained by the system as rows are inserted, deleted, and each time the column set in the base table they reference is updated.

USI Access

USI access is usually a two-AMP operation because Teradata Database typically distributes a USI row to a different AMP than the base table row the index points to. If the system distributes the USI subtable row to the same AMP as the base table row it points to, then only one AMP is accessed (but it is still a two-step operation).

The following stages are involved in a USI base table row access.

  • The requested USI value is accessed by hashing to its subtable.
  • The pointer to the base table row is read and used to access the stored row directly.

The flow diagram illustrates the following query. Note that table_name is an NPPI table.

     SELECT *
     FROM table_name
     WHERE USI_column = value;
 


The process for locating a row using a USI is as follows.

  1. After checking the syntax and lexicon of the query, the Parser looks up the Table ID for the USI subtable that contains the specified USI value.
  2. The hashing algorithm hashes the USI value.
  3. The Generator creates an AMP step message containing the USI Table ID, USI row hash value, and USI data value.
  4. The Dispatcher uses the USI row hash to send the message across the BYNET to AMP 3, which contains the appropriate USI subtable row.
  5. The file system on AMP 3 locates the appropriate USI subtable using the USI Table ID.
  6. The file system on AMP 3 uses the USI rowID to locate the appropriate index row in the subtable.

    This operation might require a search through a number of rows with the same row hash value before the row with the desired value is located.

  7. AMP 3 reads the base table rowID from the USI row and distributes a a message containing the base table ID and the rowID for the requested row across the BYNET to AMP 10, which contains the requested base table row.

    The distribution is based on the hash bucket value in the rowID of the base table row.

  8. The file system uses the rowID to locate the base table row.

Example: Single-Row Lookup

The following example performs a single-row lookup. The column named cust_num is a USI for the customer table, which is an NPPI table.

     SELECT name, phone
     FROM customer
     WHERE cust = 3;

The example customer table and USI subtable are as follows.

customer table and USI subtable
 
USI Subtable   Customer
RowID CustNum BaseTable_RowID RowID CustNum CustName CustPhone
PK
USI USI
DQ + 1 1 H6 + 1 B4 + 1 3 Brown 444-3333
VP + 1 5 J5 + 1 A2 + 1 7 Black 333-4444
R9 + 2 3 B4 + 1 N6 + 1 13 Rice 888-9999
R9 + 1 13 N6 + 1 E3 + 1 2 James 555-4444
3J + 1 2 E3 + 1 E3 + 3 14 Brown 555-4444
22 + 1 14 E3 + 3 L2 + 1 10 Smith 222-9999
3S + 1 9 D7 + 2 J5 + 1 5 Smith 444-6666

The secondary index value (cust_num) is hashed to generate rowID R9 (note that +2 represents the uniqueness number). The AMP retrieves row R9+2 from the secondary index subtable. The subtable row contains the rowID of the base table row, which can then be accessed.

Unique Secondary Indexes and Performance

USIs provide alternate access paths.

Statistics play an important part in optimizing access when USIs define conditions for the following operations.

  • Joining tables
  • Satisfying WHERE predicates that specify comparisons, string matching, or complex conditionals
  • Satisfying LIKE expressions
  • Processing aggregates

Because of the additional overhead for index maintenance, USI values should not change frequently. When you change the value of a secondary index, Teradata Database must undertake the following maintenance operations.

  1. Delete secondary index references to the current value.
  2. Generate secondary index references to the new value.

The same considerations apply to NUSIs.

Creating a Unique Secondary Index as a Composite of a Row-Level Security Constraint Column and a NUPI Column Set

You can create a USI for a row-level security-protected table as a composite of a row-level security constraint column and the columns of a NUPI for the table. This property can be used to implement polyinstantiation.

Polyinstantiation is a property that allows a relation to contain multiple rows with the same primary key value, where the multiple instances are distinguished by their security levels, where a security level is defined by a row-level security constraint column.

For this property not to violate the relational model, the security level instances would need to be defined as components of a composite primary key.

Restrictions on Load Utilities

You cannot use FastLoad, MultiLoad, or the Teradata Parallel Transporter operators LOAD and UPDATE to load data into base tables that have unique secondary indexes. If you attempt to load data into base tables with USIs using these utilities, the load operation aborts and returns an error message to the requestor.

Before you can load data into a USI-indexed base table, you must first drop all defined USIs before you can run FastLoad, MultiLoad, or the Teradata Parallel Transporter operators LOAD and UPDATE.

Load utilities like Teradata Parallel Data Pump, BTEQ, and the Teradata Parallel Transporter operators INSERT and STREAM, which perform standard SQL row inserts and updates, are supported for USI-indexed tables.

You cannot drop a USI to enable batch data loading by utilities such as MultiLoad and FastLoad as long as requests are running that use that index. Each such query places an ACCESS or READ lock on the index subtable while it is running, so it blocks the completion of any DROP INDEX transactions until the ACCESS or READ lock is removed. Furthermore, as long as a DROP INDEX transaction is waiting to get an EXCLUSIVE lock or is running, requests and batch data loading jobs against the underlying table of the index cannot begin processing.

ALWAYS GENERATED … NO CYCLE identity columns can be a better choice than USIs for the task of enforcing row uniqueness in multiset NUPI tables because they are supported by both MultiLoad and FastLoad. However, identity columns cannot be used to facilitate row access or joins.

If you define any non-primary index uniqueness constraints on a table, you must drop them all before you use MultiLoad or FastLoad to load rows into that table, then recreate them after the load operation completes (see Using Unique Secondary Indexes to Enforce Row Uniqueness for a workaround).

If the MultiLoad or FastLoad operation loads any duplicate rows into the table, then you cannot recreate a uniqueness constraint on the table. You must first detect the duplicates and then remove them from the table.

Application-based methods of duplicate row management make several assumptions that are difficult to enforce. These assumptions are.

  • The application code used to detect and reject duplicate rows is implemented identically across all applications in the enterprise.
  • The application code used to detect and reject duplicate rows, even if universally implemented, is correct for all situations.
  • All updates to the database are made by means of those applications. This assumption neglects the possibility of ad hoc interactive SQL updates that bypass the application-based duplicate row detection and rejection code.

See Designing for Database Integrity for more information about the advantages of using system-based declarative constraints to enforce database integrity.

See Design Issues for Tactical Queries for a description of the special design considerations that must be evaluated for using USIs to support tactical queries.

USI Hashing

USIs are hash-partitioned on their index columns, as indicated by the following graphic.



USI Subtable Row Structure

See Row Structure for Packed64 Systems and Row Structure for Aligned Row Format Systems for additional information about the row structures of secondary indexes.

The internal partition number for a secondary index row is always 0, which is indicated by 2 flag bit settings.

The table header for the base table indicates whether it is partitioned or not and, if it is partitioned, whether it has 2-byte or 8-byte partitioning.

Teradata Database creates additional USI index subtable rows as they are needed.

Row hash and uniqueness may actually represent other values for NoPI, PA, and value-ordered objects.

Packed64 Format USI Subtable Row Structure for a Nonpartitioned Load-Isolated Base Table

The USI subtable row structure for a nonpartitioned load-isolated base table is described by the following graphic.



The internal partition number for the base table row is implicitly 0 and is not stored.

This graphic shows a load-isolated table. Tables without load isolation do not have the RowLoadID.

Packed64 Format USI Subtable Row Structure for a Partitioned Load-Isolated Base Table

The USI subtable row structure for a partitioned load-isolated base table is described by the following graphic.



Because the base table is a partitioned table, the Internal Partition Number field stores the internal partition number for the corresponding base table row in a physical 2-byte or 8-byte field.

This graphic shows a load-isolated table. Tables without load isolation do not have the RowLoadID.

Aligned Row Format USI Subtable Row Structure for a Nonpartitioned Load-Isolated Base Table

The USI subtable row structure for a nonpartitioned load-isolated base table is described by the following graphic.



Because the internal partition number for nonpartitioned base table rows is always 0, the Internal Partition Number field pictured here is logical only, which is why it does not consume 2 bytes or 8-bytes of overhead for the row like USI rows do for partitioned base tables.

This graphic shows a load-isolated table. Tables without load isolation do not have the RowLoadID.

Aligned Row Format USI Subtable Row Structure for a Partitioned Load-Isolated Base Table

The USI subtable row structure for an index on a partitioned load-isolated base table is described by the following graphic.



Because the base table is a partitioned table, the Partition Number field stores the internal partition number for the corresponding base table row in a physical 2-byte or 8-byte field at the beginning of the base table RowID field.

This graphic shows a load-isolated table. Tables without load isolation do not have the RowLoadID.

USI Row Structure Field Definitions

Stored Data Length (bytes) Function
Row length 2 Defines the number of bytes in the row.

If the row is aligned, this field includes any required pad bytes necessary to make the row length a multiple of 8 bytes. If the row is packed or packed 64, the row length does not include any extra pad bytes. Note that in this case, when space is allocated for the row, value is rounded up to a multiple of 2 bytes.

RowID 8 Defines the USI row uniquely for its subtable by combining its row hash value with a uniqueness value.
Row hash 4 Defines the output of the hashing algorithm, which is a unique (or nearly unique) value based on a mathematical transformation of the unique secondary index value.
Uniqueness value 4 Defines a system-generated integer that ensures that the rowID is unique within a table.
Overhead (flag byte and 1st presence byte) 2 2 single-bit flag fields (set to zero) indicate the internal partition number is 0 for the USI row.
Secondary index value up to 65,524 Defines the column values for the unique secondary index.
Alignment pad bytes between 0 and 7 bytes Ensures that the Base Table Row ID field begins on a modulo(8) boundary. If the Base Table Row IDs field naturally aligns on a modulo 8 boundary, there is no field of alignment pad bytes at the end of the Secondary Index Value field.
Base table rowID
  • 8 bytes for an nonpartitioned base table row.
  • 10-16 bytes for a partitioned base table row.
Defines the rowID of the base table row this secondary index row identifies. This row is usually on a different AMP from its unique secondary index rowID.

Because the rowID of a partitioned table row also contains the internal partition number for the row, it is 2 or 8 bytes longer than the rowID of a nonpartitioned table row.

 Internal partition number
  • 0 for an nonpartitioned base table row.
  • 2 for a partitioned base table row with 2-byte partitioning.
  • 8 for a partitioned base table row with 8-byte partitioning.
Defines the partition number for a partitioned base table row.

If the base table row is from an nonpartitioned base table, the internal partition number is implicitly 0 and is not stored.

The table header for the base table indicates whether it is partitioned or not and, if it is partitioned, whether it has 2-byte or 8-byte partitioning.

     Row hash 4 Defines the output of the hashing algorithm on the primary index value.
    Uniqueness value 4 Defines a system-generated integer that ensures that the rowID is unique within a table.
    RowLoadID Value 8 An integer that records the committed property of a row in a load-isolated table.
Trailing pad bytes 0-7 pad bytes The trailing bytes are used to round up the row length of an aligned row to an 8 byte multiple. The trailing pad bytes are included in the row-length field (the 1st two bytes of the row). If the entire row naturally aligns on a modulo(8) boundary, there is no field of trailing alignment pad bytes

For a NoPI or column-partitioned table, the 8 bytes of row hash and uniqueness value are treated as a hash bucket, which has either 16 or 20 bits, and as a 44-bit uniqueness value. If the hash bucket is a 16-bit bucket, the 4 bits between the 16-bit hash bucket and the 44-bit uniqueness value are not used.

Using Unique Secondary Index Maintenance and Rollback to Optimize Query Design

Teradata Database processes USI maintenance operations (INSERT … SELECT, full-file DELETE, join DELETE, and UPDATE) block-at-a-time rather than row-at-a-time, whenever possible.

When the original index maintenance is processed block-at-a-time, the USI change rows are transient journaled block-at-a-time. As a result, the rollback of the USI change rows are block-at-a-time, that is, block optimized.

USI change rows are redistributed to the owner AMP, sorted, and applied block-at-a-time to the USI subtable, such that the index data blocks are updated once rather than multiple times.