15.00 - Hash Bucket Number - Teradata Database

Teradata Database Design

Teradata Database
User Guide

Hash Bucket Number

The first 16 or 20 bits of the rowhash are a hash bucket number, which is used to define the number of the hash bucket to which the hashed row belongs. The number of hash buckets for a Teradata system depends on the DBS Control parameters CurHashBucketSize and NewHashBucketSize. See Utilities: Volume 1 (A-K) and “Hash Maps” on page 229 for more information on hash bucket sizes.

The remaining 16 (or 12) bits are a remainder from the operation of the hash function on the original input value. There is no advantage to a 16‑bit hash bucket number, and you should use 20‑bit numbers unless you have a good reason not to. The next several pages explain the reasons for this.

The values input to the hashing algorithm are the column set values that constitute an index on the table in question.

The following graphics illustrate the structure of a Teradata Database hash bucket number and the remainder (modulo) of the hashing operation on the primary index value for systems with 65,536 hash buckets and 1,048,576 hash buckets, respectively.


A 32-bit rowhash provides 4.2 billion possible rowhash values, which reduces hash collisions to a level that eliminates their impact on retrieval performance for all intents and purposes.

Because the number of possible index column values for a given domain can easily exceed that number of values, further information, called the uniqueness value, is required to make each row uniquely identifiable. This situation occurs for tables hashed on a nonunique primary index, where different rows can have the same primary index value, and so the same rowhash value, and also for tables hashed on a unique primary index if there is a hash collision.

The expectation is that all new customers will configure their systems to use the larger number of buckets no matter how many AMPs their system has. However, if your system currently uses a 16‑bit hash bucket size, you must decide when it is most advantageous for you to convert to the larger number of buckets.

This decision is based on the following factors.

  • Whether the number of AMPs configured on the system results in a large enough imbalance to mandate a change to a 20‑bit hash bucket size.
  • When there is sufficient down time available to you to run the reconfiguration or to perform a sysinit and restore your data from an archive.
  • There is no simple formula to determine when this conversion should be done. The following table shows the percentage of imbalance that can be expected for a given number of AMPs. As the number of AMPs increases, the percentage of imbalance also increases. A system with 1,000 or more AMPs should consider converting to a 20-bit hash bucket size.


    Number of AMPs

     Number of Hash

      Low Number of
    Buckets Per AMP

     High Number of
    Buckets Per AMP