Number of Presence Bits Required to Represent Compressed Values - Teradata Database

Teradata Database Design

Product
Teradata Database
Release Number
15.10
Language
English (United States)
Last Update
2018-10-06
Product Category
Software

Number of Presence Bits Required to Represent Compressed Values

Because compression is signaled by populating octets of presence bit fields in the row header, there is a tradeoff between the capacity required to add presence octets and the savings realized from multivalue compression at the base table row level: the size of the bit field required to express multivalue compression increases non-linearly as a function of the number of values to be compressed. With respect to the savings gained by multivalue compression, this tradeoff is almost always worth doing.

The following table indicates the relationship between the number of values (and null, which is always compressed by default) specified for compression per column and the number of compress bits required to indicate that number of values.

 

          Nullable?

       Number of Column Values

                   Compressed

   Number of Compress Bits in the Row

                     Header Bit Field

              Yes

                 null

                                1

Shared with the presence bit.

               No

                 1

                                1

               No

                 2 - 3

                                2

               No

                 4 - 7

                                3

               No

                 8 - 15

                                4

               No

                 16 - 31

                                5

               No

                 32 - 63

                                6

               No

                 64 - 127

                                7

               No

                 128 - 255

                                8

              Yes

                 1 + null

                                1

              Yes

                 2 - 3 + null

                                2

              Yes

                 4 - 7 + null

                                3

              Yes

                 8 - 15 + null

                                4

              Yes

                 16 - 31 + null

                                5

              Yes

                 32 - 63 + null

                                6

              Yes

                 64 - 127 + null

                                7

              Yes

                 128 - 255 + null

                                8

The compress bits value represents an index into the compress multivalue array for that column in the table header. This is illustrated by the following graphic showing that the presence bit pattern 10 points to the compressed character string “Los Angeles” in the table header. Note that because only 3 values are compressed for this column, 2 compression bits are required to uniquely identify them.

The table definition DDL for this illustration is as follows:

     CREATE TABLE MVCompress (
       StreetAddress VARCHAR(40),
       City          CHARACTER(20) COMPRESS (’New York’,
                     ’Los Angeles’, ’Chicago’) NOT NULL,
       StateCode     CHARACTER(2));

As you can see, all the possible combinations of bit patterns are used to identify and locate the values. For this particular example, the bit patterns and their corresponding values are those indicated in the following table:

 

FOR this pattern of presence bits …

The value for City is …

AND stored in …

                    00

not compressed.

the row.

                    01

’Chicago’ and is compressed

the table header.

                    10

’Los Angeles’ and is compressed

the table header.

                    11

’New York’ and is compressed

the table header.

From the perspective of the row header, the best policy is to compress the highest number of values in each octet bit array, because doing so does not add any more space in the row header (nor does null compression, which is handled by presence bits in the row header for each row). In other words, the best practice is to optimize multivalued compression for 1, 3, 7, 15, 31, 63, 127, or 255 values. You should evaluate all these possibilities to determine which yields the best compression for a particular column.

You must also take into account the number of bytes compression transfers to the table header, because those bytes count against the maximum row length of 64 KB.

The compressed values are stored adjacent to one another in ascending value order and descending order of field size following the end of Field5 in the table header. See “Table Header Components” on page 711.