15.00 - Rowhash and RowID - Teradata Database

Teradata Database Design

Product
Teradata Database
Release Number
15.00
Content Type
User Guide
Publication ID
B035-1094-015K
Language
English (United States)

Rowhash and RowID

Primary‑indexed Teradata Database table rows are self-indexing based on their primary index and so require no additional storage space.

When a row is inserted into a table, the file system stores the 32‑bit rowhash value of the primary index in place with the row.

Rows inserted into a NoPI table or column‑partitioned table (see “NoPI Tables, Column‑Partitioned Tables, and Column-Partitioned Join Indexes” on page 280) are allocated to the AMPs differently (see “Row Allocation for Teradata Parallel Data Pump” on page 237), and it is the hash bucket number for the assigned AMP with a 44‑bit uniqueness value rather than the rowhash value with a 32‑bit uniqueness value that is stored in their rowID.

A rowID always includes an internal partition number, which is 0 if there is no partitioning, and compressed in some cases when its value is 0.

Because rowhash values are not necessarily unique (see “Hash Bucket Number” on page 225, the AMP software also produces a unique 32‑bit numeric value (called the uniqueness value, see “Uniqueness Value” on page 227) that it appends to the rowhash value, forming a unique rowID. The value assigned depends on the uniqueness values that have already been assigned. The system assigns uniqueness values in ascending numerical order. For rows in a table, uniqueness is achieved by combining the internal partition number (0 for a non-PPI table), rowhash value and uniqueness value, in that order.

This rowID makes each row in a table uniquely identifiable, even if it is otherwise a duplicate of one or more other rows. Duplicate rows can only occur for MULTISET tables that have no uniqueness constraints. You are strongly discouraged from creating such tables when they are primary-indexed tables.

Note the following things about rowIDs.

  • The rowID for a given row can change if the value of a primary index or partitioning column for that row changes.
  • The rowID can also change for a given row of a column-partitioned table if the row is updated.

  • RowIDs can be reused after they are no longer associated with a row.
  • This means that while a rowID uniquely identifies a row at any particular time, a rowID that once identified a row later might not be associated with any row, or might be associated with a different row because the original row it was associated with might either have taken a different rowID or have been deleted.

    The first row having a specific internal partition number and rowhash value that is inserted into a table on the AMP that owns the hash bucket value within the row hash is always assigned a uniqueness value of 1. Additional table rows for that AMP that have the same internal partition number and rowhash value are assigned uniqueness values in a numerically increasing order. The rows are stored on disk, sorted in ascending order of rowID.

    To determine uniqueness violations for UPIs or duplicate rows for SET NUPI tables, Teradata Database scans all rows in the rowhash. A duplicate row is defined as a row that matches one or more other rows in a relation exactly.

    For a UPI PPI or SET NUPI PPI, Teradata Database scans only the rows with the same internal partition number and rowhash value to search for uniqueness violations or duplicate rows, respectively, not all the rows in other partitions that have the same rowhash value.

    This scan terminates by reading the last row in the rowhash, and the uniqueness value of that last row is the highest current value. When the system next needs to assign a uniqueness value, it increments the value for the last row read by one.

    Uniqueness values are not reused except for the special case in which the row with the highest uniqueness value within a rowhash is deleted from a table.