The use of identity columns presents 2 different duplication issues.
- Duplicate identity column values
- Unintentionally duplicated rows
Values for identity columns are guaranteed to be unique only when the column is specified using GENERATED ALWAYS … NO CYCLE unless otherwise constrained.
Duplicate identity column values can occur in either of the following situations.
- The column is specified to be GENERATED ALWAYS, but the CYCLE option is specified.
In this case, the column can reach its maximum or minimum value and then begin recycling through previously generated values.
- The column is specified to be GENERATED BY DEFAULT and an application specifies a duplicate value for it.
Duplicate rows can occur in any of the following situations.
- A previously completed load task is resubmitted erroneously.
Tables that do not have identity columns, but that are either specified to be SET tables or have at least 1 unique index do not permit the insertion of duplicate rows.
- Teradata Parallel Data Pump runs without the ROBUST option enabled and a restart occurs.
- A session aborts, but rows inserted before the abort occurred are not deleted before the session is manually restarted.
Suppose, for example, you accidentally load employee Reiko Kawabata into an employee table twice, where the employee_number column is an identity column. After doing this, you have 2 employee rows that are identical except for their different employee_number values. While this is an error from the perspective of the enterprise, the 2 rows are not duplicates of one another because they have different employee_number values. The problem is not with the feature, which works exactly as it is designed to work.
This means that it is imperative for you to enforce rigorous guidelines for dealing with identity column tables at your installation to ensure that these kinds of nebulous duplications do not corrupt your databases.