16.10 - Primary Index Uniqueness and Row Distribution - Teradata Database

Teradata Database Design

Product
Teradata Database
Release Number
16.10
Release Date
June 2017
Content Type
User Guide
Publication ID
B035-1094-161K
Language
English (United States)

The singularity of the primary index for a table is directly related to how evenly table rows are distributed across the AMPs. The more singular the index, the more optimal the use of disk space, with primary index uniqueness producing the most efficient space utilization. Because primary indexes are a component of base table rows, they do not require subtables or any other disk space beyond that required to store the primary index column data in the base table row.

The following mini-case studies illustrate how the degree of singularity of the values in the column selected to be the primary index for a table affects the distribution of rows across the AMPs.

The examples use the following customer table as their basis. Assignment of the primary index varies from case to case. Only a small sample of table rows is shown. Assume a 6-AMP system.

customer
customer
cust_num cust_name cust_status parent_cust_num sales_emp_num
PK FK NN
SA
00001 Caspian Sales A 183 1252
00002 Nota Bene Ltd. I 520 0492
00003 Maxwell, Inc. A 301 0655
00004 Saionz Inc. A Null 1973
00005 JellyJamUp A 222 0034
00006 Kimble Korp. A Null 0005
00007 Kraus GmbH I Null 0708
00008 Piccolo SpA I 183 0708
00009 The David Fields Company A 222 1439

Case Study 1: Fewer Distinct Primary Index Values Than AMPs

This study defines a NUPI on parent_cust_number, the foreign key. There are only five distinct values for this attribute in the entire customer table.

Never define a primary index on a column that has fewer distinct values than the number of AMPs in the system unless no other columns are available. Because only five of the 65,536 available hash buckets are used for this table, the very best distribution you can hope for is that each of those five buckets is assigned to a different AMP.

The illustration shows an example of such a best case distribution of rows.



Case Study 2: Highly Nonunique Primary Index Values

This study defines a NUPI on customer_status. Because there are only two distinct values for this attribute (Active and Inactive), all rows hash to two AMPs, leaving the other four AMPs in the system unaffected by the customer table.

Distributions that are skewed like this cause major performance problems because they fail to harness the powerful parallel processing capabilities that make Teradata unique among commercially available relational database management systems.

The illustration shows the customer table rows stored on only two AMPs.



Case Study 3: Highly Singular Primary Index Values

A carefully selected NUPI distributes the rows of a table nearly as evenly as a UPI. If you are fortunate, the NUPI column might never contain duplicates.

The illustration shows how evenly customer table rows are hashed when a NUPI is defined on the sales_employee_number attribute.



Case Study 4: Unique Primary Index Values

If a table has many more unique values than the system has AMPS, a UPI always provides an even distribution of rows across the AMPs. To ensure efficient space utilization, you should define the primary index for a table on a unique column unless you plan to hash the rows of a minor entity table to the same AMPs as matching rows from a major entity table in order to enhance join processing. Be aware that join indexes might be a better solution, depending on the particular application. See Join and Hash Indexes for more information.

The illustration shows an ideal distribution of customer table rows when a UPI is defined on the system-assigned primary key, the customer_number attribute.