Teradata Database allocates identity column numbers differently depending on whether an operation is a single row or USING request modifier-based insert or an INSERT ... SELECT operation.
|Type of Insert Operation||Location of Identity Column Numbers Cache|
|INSERT … SELECT||AMP.|
Teradata Database uses a batch numbering scheme to generate identity column numbers. When the initial batch of rows for a bulk insert arrives at a PE or AMP, the system reserves a range of numbers before it begins to process the rows. Each PE or AMP retrieves the next available value for the identity column from dictionary table DBC.IdCol and immediately increments it by the value of the setting for the IdCol Batch Size parameter in the DBS Control record.
After a range of numbers is reserved, the system stores the first number in the range in a vproc-local identity column cache. Different tasks doing concurrent inserts on the same identity column table allot a number for each row being inserted and increment it in the cache. When the last reserved number has been issued, the PE or AMP reserves another range of numbers and updates the entry for this identity column in DBC.IdCol.
This process explains the following apparent numbering anomalies.
- Because the Teradata architecture is highly parallel, generated identity column numbers do not necessarily reflect the chronological order in which rows are inserted.
- Sequential numbering gaps can occur. Because the cached range of reserved numbers is not preserved across system restarts, exact increments cannot be guaranteed. For example, the identity column values for 1,000 rows inserted into a table with an INCREMENT BY value of 1 might not be numbered from 1 to 1,000 if a system restart occurs before the identity column number pool is exhausted.
The following properties of an identity column INSERT operation determine whether the identity number for a row is generated on the PE before being distributed to its destination AMP or if the identity number for a row is generated on the destination AMP after having been distributed.
- If you insert a value into an identity column that is also the primary index for the table and defined as GENERATED ALWAYS, Teradata Database generates the identity column value on the PE before it hashes the row to its AMP.
- If you insert a value into an identity column that is not the primary index for a table, Teradata Database generates the identity column value on the destination AMP for the row.