16.10 - Nonunique Secondary Indexes - Teradata Database

Teradata Database Design

Product
Teradata Database
Release Number
16.10
Release Date
June 2017
Content Type
User Guide
Publication ID
B035-1094-161K
Language
English (United States)

Nonunique secondary indexes are typically assigned to nonunique column sets that frequently appear in WHERE clause selection conditions, join conditions, ORDER BY and GROUP BY clauses, foreign keys, and miscellaneous other conditions such as UNION, DISTINCT, and any attribute that is frequently sorted.

You can define a NUSI on a row-level security constraint column.

You can also define a simple NUSI, but not a composite NUSI, on a geospatial column.

Highly selective NUSIs are useful for reducing the cost of frequently made selections and joins on nonunique columns, and provide extremely fast access for equality conditions. This is particularly true for NoPI tables (see NoPI Tables, Column-Partitioned NoPI Tables, and Column-Partitioned NoPI Join Indexes), where the only other access method might be a full-table scan. Note that NUSIs with low selectivity can be less efficient than a full-table scan.

NUSIs are also useful for range access and in-list conditions and for geospatial indexes.

Also note the following about NUSIs:

  • NUSI access is always an all-AMPs operation unless the index is defined on the same columns as the primary index. This is allowed when the NUSI is value-ordered or when the table or join index is partitioned and not all the partitioning columns are included in the primary index.
  • The subtables must be scanned in order to locate the relevant pointers to base table rows. This is a fast lookup process when a NUSI is specified in an equality or range condition because the NUSI rows are either hash-ordered or value-ordered on each AMP.
  • NUSI subtables are not covered by the active read fallback feature (see Physical Database Integrity for details).

Relationship Between a NUSI Subtable Row and Base Table Rows

A particular NUSI subtable row points to one or many base table rows on that same AMP. The relationship between a NUSI value and any individual AMP in a configuration is either 0:1, 1:1, or 1:M.

This relationship … Reflects the fact that …
0:1 an AMP contains no NUSI subtable index rows for a particular NUSI value.
1:1 an AMPs contains 1 NUSI subtable index row for a particular NUSI value.
1:M an AMP contains more than 1 NUSI subtable index row for a particular NUSI value.

NUSI Row Structure

The subtable structure of a non-geospatial NUSI row is described by the following graphics. See Row Structure for Packed64 Systems and Row Structure for Aligned Row Format Systems for additional information.

The number of base table rows that can be referenced in a NUSI row is limited by the maximum row size for the system and the length of the secondary index value. See NUSI Sizing Equation for more information about sizing NUSI subtables.

Teradata Database creates additional NUSI index subtable rows as they are needed.

For geospatial indexes, the Secondary Index Value field contains the Minimum Bounding Rectangle (MBR) for the object being indexed. Unlike other Teradata Database indexes, geospatial NUSIs are maintained in a Hilbert R-tree structure that sits on top of the Teradata file system rather than as rows in a relational index subtable.

Scalar NUSIs contain index key values that are equal to their base table counterparts, while geospatial NUSIs contain index keys that are only an approximation of their base table counterparts (other than case of characters if the ALL option is not specified). Leaf rows in the Teradata implementation for geospatial R-trees have the form MBR (index key), base table row ID list.

For more information about Hilbert R-trees and geospatial NUSIs, see SQL Request and Transaction Processing.

Packed64 Format NUSI Subtable Row Structure for a Nonpartitioned Base Table

The NUSI subtable row structure for an index on an nonpartitioned base table is described by the following graphic. The figure shows a row in a table without load isolation. If the table is load isolated and NUSI is defined WITH LOAD IDENTITY, an 8-byte RowLoadID is added to the Base Table Row ID.

On a load-isolated table, a NUSI without a RowLoadID is possible if the user indicates WITH NO LOAD IDENTITY explicitly in the index definition.


Packed64 Format NUSI Subtable Row Structure for a Partitioned Base Table

The NUSI subtable row structure for an index on a partitioned base table is described by the following graphic. The figure shows a row in a table without load isolation. If the table is load isolated and NUSI is defined WITH LOAD IDENTITY, an 8-byte RowLoadID is added to the Base Table Row ID.

On a load-isolated table, a NUSI without a RowLoadID is possible if the user indicates WITH NO LOAD IDENTITY explicitly in the index definition.


Aligned Row Format NUSI Subtable Row Structure for a Nonpartitioned Base Table

The NUSI subtable row structure for an index on an nonpartitioned base table is described by the following graphic. The figure shows a row in a table without load isolation. If the table is load isolated and NUSI is defined WITH LOAD IDENTITY, an 8-byte RowLoadID is added to the Base Table Row ID.



Aligned Row Format NUSI Subtable Row Structure for a Partitioned Base Table

The NUSI subtable row structure for an index on a partitioned base table is described by the following graphic. The figure shows a row in a table without load isolation. If the table is load isolated and NUSI is defined WITH LOAD IDENTITY, an 8-byte RowLoadID is added to the Base Table Row ID.



Each rowID in the base table rowID list consumes an additional 2 or 8 bytes over rowIDs in NUSI subtable rows for an nonpartitioned base table or join index. This is because either 2 or 8 additional bytes are required to store the internal partition number of the base table rowID.

Definitions for NUSI Row Structure Fields

Stored Data Length (bytes) Function
Row length 2 Defines the number of bytes in the row.

If the row is aligned, this field includes any required pad bytes necessary to make the row length a multiple of 8 bytes. If the row is packed or packed 64, the row length does not include any extra pad bytes. Note that in this case, when space is allocated for the row, value is rounded up to a multiple of 2 bytes.

RowID 8 Defines the NUSI row uniquely for its subtable by combining its row hash value with a uniqueness value.
Row hash 4 Defines the output of the hashing algorithm, which is a unique (or nearly unique) value based on a mathematical transformation of the nonunique secondary index value.
Uniqueness value 4 Defines a unique system-generated integer that ensures that the rowID is unique within a table.
Overhead 2 2 single-bit flag fields indicate whether the internal partition number is 0.
Secondary index value up to (65,524) - (table rowID list bytes) Defines the column values for the nonunique secondary index.
Alignment pad bytes between 0 and 7 bytes Ensures that the Base Table Row ID field begins on a modulo(8) boundary. If the Base Table Row IDs field naturally aligns on a modulo 8 boundary, there is no field of alignment pad bytes at the end of the Secondary Index Value field.
Base table rowID list
  • 8 per nonpartitioned base table row.
  • 10 per 2-byte partitioned base table row.
  • 16 per 8-byte partitioned base table row.
Defines the row IDs for the rows on the same AMP to which this nonunique index points.
Trailing pad bytes 0-7 pad bytes The trailing bytes are used to round up the row length of an aligned row to an 8 byte multiple. The trailing pad bytes are included in the row-length field (the 1st two bytes of the row). If the entire row naturally aligns on a modulo(8) boundary, there is no field of trailing alignment pad bytes.

When the system adds new base table Row IDs to a NUSI Row ID list, it continues to write from the end of the row. This means that it overwrites the previously existing trailing alignment pad bytes with the newly added entries to the list of Row IDs. Note that because the two alignment pads serve independent requirements, there might be pad bytes at the beginning of the row ID list as well as at the end of row, but there are never any pad bytes in the middle of the Row ID list.

NUSI Access and Performance

NUSI access specifies a three-part BYNET message that is identical to the three-part message used for primary index access except that the subtable ID in the message references the NUSI subtable rather than the base table.

NUSI requests are all-AMP requests unless the NUSI is defined on the same columns as primary index.

The usefulness of a NUSI is correlated with the number of rows per value: the higher number of rows per value, the less useful the index. If the number of rows for a NUSI value exceeds the number of data blocks in the table, the usefulness of the NUSI might be questionable. On the other hand, as NUSI values approach uniqueness (meaning that the number of rows per value is either close to 1 or is significantly less than the number of AMPs in the system), an all-AMPs table access is wasteful and you should consider defining a join index (see Join and Hash Indexes) to support DML requests against the table instead of a NUSI.

Because NUSI access is usually an all-AMPs operation, NUSIs may seem to have limited value. If you have to access all AMPs in the configuration to locate the requested rows, why bother with an index?

The answer to that question is provided by the following list of ways that NUSIs can improve the performance of your decision support queries:

  • NUSI access is often faster than a full-table scan, particularly for extremely large tables. A full-table scan is also an all-AMP operation.
  • A NUSI that covers (see NUSIs and Query Covering for a definition of covering) the columns requested by a query is often included in Optimizer access plans.
  • A NUSI that covers a LIKE expression or any selective inequality conditions is often included in Optimizer access plans.
  • A NUSI on the same columns as the primary index (this is only allowed when the primary index does not include all the columns of all the partitioning columns) may be more efficient than accessing using the primary index when there is no or limited partition elimination for a query.

While NUSI access is usually an all-AMPs operation, keep in mind that the AMPs work in parallel. If all the AMPs have qualified rows, then this is a very efficient operation. If some or many of the AMPs do not have qualified rows, then those AMPs are doing work just to determine that they have no qualified rows. Note that if there are more rows per NUSI value than AMPs, it is likely that every AMP will have one or more qualified rows.

At the same time, as the number of rows pointed to per NUSI value increases, the efficiency of a NUSI read decreases proportionately. Depending on demographics and environmental cost variables, the Optimizer will specify a full-table scan instead of a NUSI access when it determines that the scan would be a more efficient access method.

The flow diagram illustrates the following query.

     SELECT *
     FROM table_name
     WHERE NUSI_column = ‘CA’;


The process used by this example for locating a row using the NUSI value CA is as follows.

  1. After checking the syntax and lexicon of the query, the Parser looks up the TableID for the NUSI subtable that contains the NUSI value CA.
  2. The hashing algorithm hashes the NUSI value.
  3. The Generator creates an AMP steps message containing the NUSI TableID (734596), NUSI row hash value (53), and NUSI data value (CA) and then the Dispatcher distributes it across the BYNET to all AMPs.
  4. The file system on a receiving AMP locates the appropriate NUSI subtable using its TableID.
  5. The file system on a receiving AMP uses the NUSI row hash value to locate the appropriate index row in the subtable.
  6. If there is a NUSI row, its table rowID list is scanned for base table row IDs.
  7. The file system uses the row IDs to locate the base table rows containing the NUSI value CA.

For more information about secondary indexes and performance, see Unique Secondary Indexes and Performance.

Using NUSIs

When using NUSIs there should be fewer rows that satisfy the NUSI qualification condition than there are data blocks in the table. Whether the Optimizer uses a NUSI depends on the percent of rows that satisfy the NUSI qualification condition and the overhead of reading the NUSI table, as follows.

Qualifying Rows                                                         Result
< 1 per block NUSI access is generally faster than full-table scan. As the number of qualified rows approaches the number of data blocks, the additional cost of reading the NUSI subtable can make NUSI access more costly.

For example, if there are 100 rows per block and 1 in 1000 rows qualify, the Optimizer reads 1 in every 10 blocks. NUSI access is faster.

>= 1 per block A full-table scan is faster than NUSI access.

For example, if there are 100 rows per block and 1% of the data qualifies, the Optimizer reads almost every block. A full-table scan might be faster.

In some cases, the values of a NUSI are distributed unevenly throughout a table. At other times, some values might represent a large percent of the table, while other values have few instances. When values are distributed unevenly and statistics are available on the NUSI that allow the Optimizer to see the values distribution, Teradata Database can use a NUSI as follows.

  • Perform a full-table scan for queries on values that represent a large percentage of table.
  • Use a NUSI for queries on the remaining values.

You can use a request like the following to report secondary index nonuniqueness.

     .SET RETLIMIT 20
     SELECT index_column_set, COUNT(*)
     FROM table_name
     GROUP BY 1 
     ORDER BY 2 desc;

NUSIs and Block Size

Effective use of a NUSI requires that fewer rows qualify than there are data blocks in the base table.

Consider, for example, 100-byte rows. With a average block size of 31.5 KB, each multirow data block contains approximately 315 rows. This means fewer than one in 315 rows (that is, less than 0.32%) can qualify for a given NUSI value if the index is to be effective.

When the average block size is 63.5 KB, fewer than one in 635 rows (that is, less than 0.16%) can qualify for the NUSI to be effective.

When the average block size is 127.5 KB, fewer than one in 1275 rows (that is, less than 0.08%) can qualify for the NUSI to be effective. When the average block size is 1KB, fewer than one in 10 rows (that is, less than 10%) can qualify for the NUSI to be effective. This becomes even more significant if your system uses 1MB data blocks.

To reset the global absolute-maximum size for data blocks, see “PermDBSize” in Utilities.

Selectivity Considerations

Selectivity refers to the percentage of rows in a table containing the same nonunique secondary index value. An index that has high selectivity retrieves few rows. A unique primary index retrieval, for example, is highly selective because it never returns more than one row. An index that has low selectivity retrieves many rows.

Low Selectivity Indexes

When an index is said to have low selectivity, that means that many rows have the same NUSI value and there are relatively few distinct values.

A column with those characteristics is usually a poor choice for a NUSI because the cost of using it may be as high or higher than a full-table scan.

For example, assume that employee table contains 10,000 rows of about 100 bytes each, and there are only 10 different departments. If an average employee data block is 2,560 bytes and can store about 25 rows, then the entire table requires about 400 data blocks.

If dept_no is defined as a nonunique secondary index on the employee table, and the dept_no values are evenly distributed, then the following query accesses about 1,000 row selections.

     SELECT *
     FROM employee
     WHERE dept_no = 300;

Each AMP reads its own rows of the dept_no secondary index subtable. If any rows contain the index value 300, the AMP uses the associated rowIDs to select the data rows from its portion of the employee table.

Regardless of the number of AMPs involved, this retrieval requires 1,000 row selections from the employee table. To satisfy this number of select operations, it is likely that all 400 employee data blocks would have to be read.

If that were the case, then the number of I/O operations undertaken by the retrieval could easily exceed the number required for a full-table scan. In such instances, a table scan would actually be a much more efficient solution than a NUSI-based retrieval.

High Selectivity Index

If deptno is a high selectivity index, where few employee rows share the same deptno value, then using deptno for retrieval provides better performance than a full-table scan. Because of these selective conditions, the Optimizer specifies NUSI access for request processing when it is less costly than a full-table scan.

Generic NUSI Read Process Flow

The following flowchart illustrates the logic of a NUSI read. The graphic also illustrates a somewhat more concrete NUSI read example where the rowID value B2,1 read (where B2 is the row hash value and 1 is the uniqueness value for the row and assuming that the table being accessed is an nonpartitioned base table) is followed in turn through the master index, the cylinder index, and the data block to access the base table row set matching that value.



The following process describes the process flow for locating a row using a NUSI.

  1. Read the first rowID from the subtable row.

    In the graphic example, the value read is B2, 1.

  2. Scan the Master Index for the required Cylinder Index.
  3. Scan the Cylinder Index for the required data block.
  4. Scan the rows in the data block until the row having the requested rowID is located.
  5. Return the retrieved row set to the requestor.