15.00 - Hash Bucket Number - Teradata Database

Teradata Database Design

prodname
Teradata Database
vrm_release
15.00
category
User Guide
featnum
B035-1094-015K

Hash Bucket Number

The first 16 or 20 bits of the rowhash are a hash bucket number, which is used to define the number of the hash bucket to which the hashed row belongs. The number of hash buckets for a Teradata system depends on the DBS Control parameters CurHashBucketSize and NewHashBucketSize. See Utilities: Volume 1 (A-K) and “Hash Maps” on page 229 for more information on hash bucket sizes.

The remaining 16 (or 12) bits are a remainder from the operation of the hash function on the original input value. There is no advantage to a 16‑bit hash bucket number, and you should use 20‑bit numbers unless you have a good reason not to. The next several pages explain the reasons for this.

The values input to the hashing algorithm are the column set values that constitute an index on the table in question.

The following graphics illustrate the structure of a Teradata Database hash bucket number and the remainder (modulo) of the hashing operation on the primary index value for systems with 65,536 hash buckets and 1,048,576 hash buckets, respectively.

 

A 32-bit rowhash provides 4.2 billion possible rowhash values, which reduces hash collisions to a level that eliminates their impact on retrieval performance for all intents and purposes.

Because the number of possible index column values for a given domain can easily exceed that number of values, further information, called the uniqueness value, is required to make each row uniquely identifiable. This situation occurs for tables hashed on a nonunique primary index, where different rows can have the same primary index value, and so the same rowhash value, and also for tables hashed on a unique primary index if there is a hash collision.

The expectation is that all new customers will configure their systems to use the larger number of buckets no matter how many AMPs their system has. However, if your system currently uses a 16‑bit hash bucket size, you must decide when it is most advantageous for you to convert to the larger number of buckets.

This decision is based on the following factors.

  • Whether the number of AMPs configured on the system results in a large enough imbalance to mandate a change to a 20‑bit hash bucket size.
  • When there is sufficient down time available to you to run the reconfiguration or to perform a sysinit and restore your data from an archive.
  • There is no simple formula to determine when this conversion should be done. The following table shows the percentage of imbalance that can be expected for a given number of AMPs. As the number of AMPs increases, the percentage of imbalance also increases. A system with 1,000 or more AMPs should consider converting to a 20-bit hash bucket size.

     

    Number of AMPs

     Number of Hash
           Buckets

      Low Number of
    Buckets Per AMP

     High Number of
    Buckets Per AMP

          Percentage
           Imbalance

                  100

              65,536

                 655

                 656

                 0.15

                  200

              65,536

                 327

                 328

                 0.31

                  300

              65,536

                 218

                 219

                 0.46

                  400

              65,536

                 163

                 164

                 0.61

                  500

              65,536

                 131

                 132

                 0.76

                  750

              65,536

                   87

                   88

                 1.14

               1,000

              65,536

                   65

                   66

                 1.52

               2,000

              65,536

                   32

                   33

                 3.03

               3,000

              65,536

                   21

                   22

                 4.55

               4,000

              65,536

                   16

                   17

                 5.88

               5,000

              65,536

                   13

                   14

                 7.14

               6,000

              65,536

                   10

                   11

                 9.09

               7,000

              65,536

                     9

                   10

               10.00

               8,000

              65,536

                     8

                     9

               11.11

               9,000

              65,536

                     7

                     8

               12.50

             10,000

              65,536

                     6

                     7

               14.29