Minimizing Hash Collisions - Teradata Database

Teradata Database Design

Teradata Database
Release Number
English (United States)
Last Update
Product Category

Minimizing Hash Collisions

To minimize this problem for a primary index or secondary index, Teradata Database defines 4.2 billion hash values. The AMP software adds a system‑generated 32‑bit uniqueness value to the rowhash value. The resulting 64‑bit value prefixed with an internal partition number is called the rowID, and this value uniquely identifies each row in a system, making a scan to retrieve a particular row among several having the same rowhash a trivial task. A scan must check each of the rows to determine if it has the searched for value and not another value that has the same rowhash value.

The following graphics illustrate the structure of a Teradata Database rowID for a PI table on systems defined to have 65,536 hash buckets and 1,048,576 hash buckets, respectively.

If the rowID were for an nonpartitioned table, the internal partition number is 0 and is compressed in the rowID in the row header of the row and in a referencing rowID in a secondary index. For a table with 2-byte partitioning, the internal partition number is compressed to two bytes.