Rowhash and RowID
Primary‑indexed and primary-AMP-indexed Teradata Database table rows are self-indexing based on their primary index and so require no additional storage space.
When a row is inserted into a PI table, the file system stores the 32‑bit rowhash value of the primary index and a 32-bit uniqueness in the rowID in place with the row.
Rows inserted into a PA or NoPI table (see “NoPI Tables, Column‑Partitioned NoPI Tables, and Column-Partitioned NoPI Join Indexes” on page 230) are stored on the AMPs differently (see “Row Assignment for NoPI Tables” on page 185), and it is a 20-bit hash bucket number for the assigned AMP with a 44‑bit uniqueness value rather than the rowhash value with a 32‑bit uniqueness value that is stored in their rowID.
The system selects the hash bucket in increasing order as one defined for the AMP per the NoPI hash map on which the row is inserted. A 44-bit uniqueness value enables a maximum of 17,592,186,044,415 rows per hash bucket. If the maximum for a hash bucket is exceeded, Teradata Database sets the hash bucket bits to the next bucket number for the AMP per the NoPI hash map and resets the uniqueness value to 1.
Rows inserted into a primary-indexed table or a primary-AMP-indexed table are assigned to an AMP the same way but are arranged on that AMP differently. For details, see “Row Assignment for Primary-Indexed Column-Partitioned Tables” on page 185 and “Row Assignment for Primary AMP Indexed Column-Partitioned Tables” on page 185.
A rowID always includes an internal partition number, which is 0 if there is no partitioning, and compressed in some cases when its value is 0.
Because rowhash values are not necessarily unique (see “Hash Bucket Number” on page 174, the AMP software also produces a unique numeric value (called the uniqueness value, see “Uniqueness Value” on page 176) that it appends to the rowhash (for PI) or hash bucket (for PA or NoPI) value, forming a unique rowID. The value assigned depends on the uniqueness values that have already been assigned. The system assigns uniqueness values in ascending numerical order for each unique combination of internal partition number and rowhash (for PI) or hash bucket (for PA or NoPI) value. For rows in a table, uniqueness is achieved by combining the internal partition number (0 for a non-partitioned table), rowhash (for PI) or hash bucket (for PA or NoPI) value, and uniqueness value, in that order.
This rowID makes each row in a table uniquely identifiable, even if it is otherwise a duplicate of one or more other rows. Duplicate rows can only occur for MULTISET tables that have no uniqueness constraints.
Note the following things about rowIDs.
The rowID may also change in some cases for a given row of a column-partitioned table if the row is updated.
This means that while a rowID uniquely identifies a row at any particular time, a rowID that once identified a row later might not be associated with any row, or might be associated with a different row because the original row it was associated with might either have taken a different rowID or have been deleted.
The first row having a specific internal partition number and rowhash (for a PI) or hash bucket (for a PA or NoPI) value that is inserted into a table on the AMP assigned a uniqueness value of 1. Additional table rows for that AMP that have the same internal partition number and rowhash (for a PI) or hash bucket (for a PA or NoPI) value are assigned uniqueness values in a numerically increasing order. The rows are stored on disk, sorted in ascending order of rowID.
To determine uniqueness violations for UPIs or duplicate rows for SET NUPI tables, Teradata Database scans all rows in the rowhash. A duplicate row is defined as a row that matches one or more other rows in a relation exactly.
For a row-partitioned UPI or row-partitioned SET NUPI, Teradata Database scans only the rows with the same internal partition number and rowhash value to search for uniqueness violations or duplicate rows, respectively, not all the rows in other partitions that have the same rowhash value.
Join indexes, hash indexes, and secondary indexes may have referencing rowIDs. For a rowID that references a column-partitioned object, the column partition number (within the internal partition number) is set to 1 by convention and for consistency. A referencing rowID indicates a specific table row, not necessarily to a specific physical row. A rowID can easily be adjusted by adding a constant delta on dereferencing to a desired column partition number in order to access a physical row containing a specific column partition value for the table row.