16.10 - Minimizing Duplicate NUPI Row Checks - Teradata Database

Teradata Database Design

Product
Teradata Database
Release Number
16.10
Release Date
June 2017
Content Type
User Guide
Publication ID
B035-1094-161K
Language
English (United States)

There are several ways to minimize or even eliminate duplicate NUPI row checks while still preventing duplicate row from being inserted. This topic describes some of the ways to achieve that goal.

  • Use a non-primary index column set to define a UNIQUE constraint or USI.
  • Use multiple columns to define NUPIs in order to make the index as close to being unique as possible.
  • Keep the number of NUPI duplicate rows for each value below 100.

Eliminating NUPI Duplicate Row Checks

You can eliminate duplicate row checking entirely by defining a uniqueness constraint such as PRIMARY KEY or UNIQUE NOT NULL or a USI on the primary key (or some other alternate key that can be constrained uniquely) of the table. See Using Unique Secondary Indexes to Enforce Row Uniqueness for the factors that need to be considered prior to making this design decision.

Optimizing Performance by Using Multicolumn NUPIs

You can minimize duplicate row check performance issues by making your NUPIs as close to being unique as possible. The more singular the NUPI value (that is, the closer to being unique it is), the more likely all rows having that NUPI can be stored within a single data block.

A powerful method for achieving the goal of maximal singularity is to define the NUPI on multiple columns. If you decide to use this approach to enhance the uniqueness of a NUPI, keep in mind that you should also define primary indexes on the smallest possible column set (see Access Demographics). The goal is to optimize the tradeoff between enhancing the generality of the index for row retrieval and reducing the number of duplicate row checks that must be performed.

Drawbacks of this method are described in Advantages and Disadvantages of Multicolumn NUPIs.

Consider the following example. Suppose you have a table with 3 name columns: last_name, first_name, and middle_name. You determine that you must use one or more of the name columns as the NUPI for the table.

  1. Start with last_name.

    Depending on your demographics, this might be a usable choice. If your demographics indicate that your population has mostly English names, you will probably find numerous Johnsons, Smiths, and Jones among your last name pool.

    This would provide a fairly skewed distribution of rows across your AMPS.

  2. Now add first_name to last_name.

    Given the same population, you are still likely to find multiple names like Robert Smith or Jennifer Johnson, but the singularity of this NUPI is still greatly enhanced over the single column last_name NUPI.

  3. Now add middle_name to last_name and first_name.

    You might still find duplicate names like Robert David Smith, but the probability is great that there are fewer Robert David Smiths in your population than there are Robert Smiths.

    The distribution of rows should be fairly even using this NUPI.

Advantages and Disadvantages of Multicolumn NUPIs

The following table lists the advantages and disadvantages of multicolumn NUPIs.

                               Advantages                            Disadvantages
Singularity is enhanced. Primary index retrievals are possible only when you specify all of the NUPI columns in the WHERE clause of your SELECT requests.
Number of rows per value is lessened. Partial primary index values cannot be hashed.
Selectivity is enhanced. Usability is decreased.
                                                              Summary Statement
The more columns defined for a NUPI, the closer it is to being unique. The more columns defined for a NUPI, the less generalized its use.

Keep the Number of Duplicate NUSI Value Rows Per Table Below 100

See Duplicate Row Checks for NUPIs for a detailed explanation of why it is essential to follow this guideline.