Hash Bucket Number - Teradata Database

Teradata Database Design

Product
Teradata Database
Release Number
15.10
Language
English (United States)
Last Update
2018-10-06
Product Category
Software

Hash Bucket Number

The first 16 or 20 bits of the rowhash are a hash bucket number, which is used to define the number of the hash bucket to which the hashed row belongs. The number of hash buckets for a Teradata system depends on the DBS Control parameters CurHashBucketSize and NewHashBucketSize. See Utilities and “Hash Maps” on page 177 for more information on hash bucket sizes.

The remaining 16 (or 12) bits are a remainder from the operation of the hash function on the original input value. There is no advantage to a 16‑bit hash bucket number, and you should use 20‑bit numbers unless you have a good reason not to. The next several pages explain the reasons for this.

The values input to the hashing algorithm are the column set values that constitute an index on the table in question or, for a NoPI object, the hashed combination of their AMP vproc ID and an internal counter.

The following graphics illustrate the structure of a Teradata Database hash bucket number and the remainder (modulo) of the hashing operation for systems with 65,536 hash buckets and 1,048,576 hash buckets, respectively.

 

A 32-bit rowhash provides 4.2 billion possible rowhash values, which reduces hash collisions to a level that eliminates their impact on retrieval performance for all intents and purposes.

Because the number of possible values for a given domain may exceed 4.2 billion, further information, called the uniqueness value, is required to make each row uniquely identifiable. This situation occurs for tables hashed on a nonunique primary index or primary AMP index, where different rows can have the same index value, and so the same rowhash value, and also for tables hashed on a unique primary index if there is a hash collision.

The expectation is that all new customers will configure their systems to use the larger number of buckets no matter how many AMPs their system has. However, if your system currently uses a 16‑bit hash bucket size, you must decide when it is most advantageous for you to convert to the larger number of hash buckets.

This decision is based on the following factors.

  • Whether the number of AMPs configured on the system results in a large enough imbalance to mandate a change to a 20‑bit hash bucket size.
  • When there is sufficient down time available to you to run the reconfiguration or to perform a sysinit and restore your data from an archive.
  • There is no simple formula to determine when this conversion should be done. The following table shows the percentage of imbalance that can be expected for a given number of AMPs. As the number of AMPs increases, the percentage of imbalance also increases. A system with 1,000 or more AMPs should consider converting to a 20-bit hash bucket size.

     

    Number of AMPs

     Number of Hash
           Buckets

      Low Number of
    Buckets Per AMP

     High Number of
    Buckets Per AMP

          Percentage
           Imbalance

                  100

              65,536

                 655

                 656

                 0.15

                  200

              65,536

                 327

                 328

                 0.31

                  300

              65,536

                 218

                 219

                 0.46

                  400

              65,536

                 163

                 164

                 0.61

                  500

              65,536

                 131

                 132

                 0.76

                  750

              65,536

                   87

                   88

                 1.14

               1,000

              65,536

                   65

                   66

                 1.52

               2,000

              65,536

                   32

                   33

                 3.03

               3,000

              65,536

                   21

                   22

                 4.55

               4,000

              65,536

                   16

                   17

                 5.88

               5,000

              65,536

                   13

                   14

                 7.14

               6,000

              65,536

                   10

                   11

                 9.09

               7,000

              65,536

                     9

                   10

               10.00

               8,000

              65,536

                     8

                     9

               11.11

               9,000

              65,536

                     7

                     8

               12.50

             10,000

              65,536

                     6

                     7

               14.29