Identifying Candidate Primary Keys | Database Design | Teradata Vantage - Identifying Candidate Primary Keys - Advanced SQL Engine

Identifying Candidate Primary Keys | Database Design | Teradata Vantage - Identifying Candidate Primary Keys - Advanced SQL Engine - Teradata Database

Database Design

Product

Advanced SQL Engine

Teradata Database

Release Number

17.10

Published

July 2021

Language

English (United States)

Last Update

2021-07-27

dita:mapPath

kko1591750222108.ditamap

dita:ditavalPath

kko1591750222108.ditaval

dita:id

B035-1094

lifecycle

Product Category

Teradata Vantage™

The primary key for a relation variable is an attribute set that uniquely identifies each tuple in that relation variable. This property is true for any alternate key, not just the candidate key that is selected to be the primary key for a relation.

Primary keys do not denote either of the following properties.

Order
Tuples within a relation are not ordered in any way.
Access path
Keep in mind that normalization is a logical process, not a physical implementation of the database.

Primary keys are not used to access rows on disk, though they are a frequent choice as the primary index for tables (see Indexes and Maps and Primary Index, Primary AMP Index, and NoPI Objects). The primary index does define a storage path and at least one access path for table rows.

Procedure for Identifying Primary Keys

The selection of a primary key for a relation is the final step in a (possible) series of identifications of unique attribute sets as follows:

Identify any superkeys that exist in the attribute set.
A superkey is any set of (possibly redundant) attributes that uniquely identifies the tuples of a relation.

Every relation has at least one superkey and might have only one.
Eliminate any redundant, or otherwise unnecessary, attributes in the identified superkeys and produce a candidate key set.
A candidate key is a nonredundant attribute set that uniquely identifies the tuples of a relation. Note that it is possible for composite candidate keys to overlap.

By definition, every relation has at least one candidate key and often has only one candidate key.
Select the primary key from the set of identified candidate keys.
Selection of the primary key from a set of candidate keys is ultimately an arbitrary decision, but when there are multiple candidate keys to choose from, a good rule of thumb is to select the one having the fewest attributes, particularly if you plan to use the primary key as the primary index for the table.

See Performance Considerations for Primary Indexes for specific information on selecting primary indexes to optimize retrieval and hashing performance.
The unselected candidate keys, if any, are referred to as alternate keys. Assign a UNIQUE constraint to any alternate key not selected to be the primary key for a relation to ensure its integrity with respect to the referential integrity rule (see The Referential Integrity Rule).
Every relation has one and only one primary key.

Surrogate Keys

Situations occur where the identification and choice of a simple primary key is difficult, if not impossible. There might be no single column that uniquely identifies the tuples of a relation variable or, looking ahead to physical design, there might be performance or query condition considerations that argue against using a composite key. In these situations, and only in these situations, surrogate keys are a solution.

As previously defined (see Surrogate key in Definitions), a surrogate key is an artificial, simple key used to identify individual entities when there is no natural key or when the situation demands a simple key, but no natural simple key exists.

Surrogate keys do not identify individual entities in a meaningful way: they are simply an arbitrary method to distinguish among those entities. As a result, it is often difficult to maintain referential integrity relationships among tables with surrogate keys.

Surrogate keys are typically arbitrary system-generated sequential integers. See the information about CREATE TABLE (Column Definition Clause) in Teradata Vantage™ - SQL Data Definition Language Detailed Topics, B035-1184 for information about how to generate surrogate keys in Vantage.

Rules for Primary Keys

There are several rules that define the bounds of primary keys. Note that these rules actually apply to all candidate keys for a relation, not just to its primary key.

The first and second rules in the following list are absolute, and the third is strongly advised:

Primary key attributes cannot be null.
This principle is known as the entity integrity rule, and it is one of the fundamental principles of relational database theory. See The Referential Integrity Rule for a definition of the other fundamental integrity rule.

By definition, nulls are not unique because they represent missing values that cannot be distinguished from one another.

Although the database relational model does not explicitly state that alternate keys cannot be null, the constraint is implicit because a column set cannot be a potential primary key (a candidate key) if it is null or contains nulls.
Primary and alternate key attributes cannot contain duplicate values.
By definition, a primary key is a unique identifier. If it contains duplicate values, it cannot be unique (this means that multiset tables cannot have true primary keys).
Primary and alternate key values should never be modified.
This rule is neither part of the relational model, nor is it absolute, because there are occasions when primary key updates must be made.

Whenever a primary key is updated, it is possible, and even probable, that a series of coordinated foreign key updates must also be made to maintain the consistency of the database. However, it is equally likely that those cascaded updates will never be performed, leaving the database in a state that does not reflect reality even if all the system integrity constraints have been satisfied, which is the principal reason primary key updates are discouraged so strongly.

Guidelines for Selecting Primary Keys

This topic presents several recommendations for selecting the attributes that make up the primary key for a relation variable. These guidelines apply, the necessary changes being made, to all candidate keys:

Select numeric attributes as primary keys.
Numeric keys are both easier to produce uniquely and easier to manage than character data.

Always assign numeric attributes as the primary key if the key is to be system-generated.
Whenever possible, use system-assigned keys (see Surrogate key in Definitions) to simplify the maintenance of uniqueness on the primary key attribute set. This is another factor that argues in favor of numeric keys.
Select primary key attributes that can remain unique for the life of a relation variable.
Construct primary keys on attributes that rarely, if ever, change during the life of a relation variable.
Never use intelligent keys.
An intelligent key carries semantics about the tuple it identifies.

The expression is also used to describe a key constructed around one or more misprojected attributes that would not be in the relation variable were the database properly normalized.

Note that intelligent keys and natural keys are not the same thing. See Intelligent key and Natural key in Definitions for details.
Whenever relation variables are paired in a supertype-subtype relationship, always assign the same primary key to both.
Use a consistent convention when naming primary key attributes (see below).
Because it is important to track the originating entity for attributes, always identify foreign keys with a unique code.

For example, if you use the column naming convention described in Guidelines for Naming Columns, you might extend the convention as indicated in the following example:

Suppose the relation Lineitem has the following primary key composed of the three attributes indicated by preliminary attribute names:

Line_item
line_item_number order_number order_year_date

PK

FK FK
You might consider naming the primary key attributes as follows.
- line_item_number
- line_item_order_FK_number
- line_item_order_FK_year_date
  The FK embedded in order_number and order_year_date indicate that the attributes are foreign keys that reference the order relation variable.
  
  When attribute naming has been completed, the primary key attribute names should look like the following relation definition fragment:
  
  Line_item with Renamed Primary Key Attributes
  line_item_number order_FK_number order_FK_year_date
  
  PK
  
  FK FK

Line_item
line_item_number	order_number	order_year_date
PK
	FK	FK

Line_item with Renamed Primary Key Attributes
line_item_number	order_FK_number	order_FK_year_date
PK
	FK	FK