Identity Columns, Duplicate Column Values, and Duplicate Rows - Analytics Database - Teradata Vantage

SQL Data Definition Language Detailed Topics

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Teradata Vantage
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-12-13
dita:mapPath
vuk1628111288877.ditamap
dita:ditavalPath
qkf1628213546010.ditaval
dita:id
jbg1472252759029
lifecycle
latest
Product Category
Teradata Vantage™

The use of identity columns presents 2 different duplication issues.

  • Duplicate identity column values
  • Unintentionally duplicated rows

Values for identity columns are guaranteed to be unique only when the column is specified using GENERATED ALWAYS … NO CYCLE unless otherwise constrained.

Duplicate identity column values can occur in either of the following situations.

  • The column is specified to be GENERATED ALWAYS, but the CYCLE option is specified.

    In this case, the column can reach its maximum or minimum value and then begin recycling through previously generated values.

  • The column is specified to be GENERATED BY DEFAULT and an application specifies a duplicate value for it.

Duplicate rows can occur in any of the following situations.

  • A previously completed load task is resubmitted erroneously.

    Tables that do not have identity columns, but that are either specified to be SET tables or have at least 1 unique index do not permit the insertion of duplicate rows.

  • Teradata Parallel Data Pump runs without the ROBUST option enabled and a restart occurs.
  • A session aborts, but rows inserted before the abort occurred are not deleted before the session is manually restarted.
In many cases, such rows are not duplicates in the sense defined by the relational model. For example, in the case of a load task mistakenly being run multiple times, the new rows are not considered to be duplicates in the strict relational sense because even though they are the same client row (where they do not have the uniqueness-enforcing identity column value that is defined for them on the server), they have different identity column values on the server and, therefore, are not duplicates of one another.

Suppose, for example, you accidentally load employee Reiko Kawabata into an employee table twice, where the employee_number column is an identity column. After doing this, you have 2 employee rows that are identical except for their different employee_number values. While this is an error from the perspective of the enterprise, the 2 rows are not duplicates of one another because they have different employee_number values. The problem is not with the feature, which works exactly as it is designed to work.

This means that it is imperative for you to enforce rigorous guidelines for dealing with identity column tables at your installation to ensure that these kinds of nebulous duplications do not corrupt your databases.