Presence Bits | Database Design | Teradata Vantage - Presence Bits - Advanced SQL Engine - Teradata Database

Database Design

Product
Advanced SQL Engine
Teradata Database
Release Number
17.05
17.00
Published
June 2020
Language
English (United States)
Last Update
2021-01-22
dita:mapPath
qby1588121512748.ditamap
dita:ditavalPath
lze1555437562152.ditaval
dita:id
B035-1094
lifecycle
previous
Product Category
Teradata Vantage™
Much of the following discussion does not apply to column-partitioned tables and join indexes except as it applies to column partitions with ROW format and to multi-column partition values in containers.

Presence bits indicate the status of each column with respect to its nullability, multivalue compressibility, and autocompressibility. Compression presence bits are added to the row header of each row to specify how multivalue compression is used for that row.

Each row has at least one octet of presence bits and can have more, depending on the degree of the table and the cumulative number of values compressed. The difference between bytes and octets is conceptual. The 8 bits of a byte define an atomic unit, while each individual bit of an octet is an atomic flag in a bit array. For the purpose of storage capacity analysis, both are treated as bytes. All rows for a given table have the same presence bits defined because nullability and multivalue compressibility are table attributes.

The first bit of the first presence bits octet is always set to 1, so the first octet defines the nullability and multivalue compressibility for no more than seven columns. When necessary, additional presence bit octets are added to the row header. Although only 7 of the bits in the first octet are available for use as presence bits, all 8 bits of successive octets are available. Eight bits are used because 255 compressed values plus null require 28, or 256 bit combinations, to be represented.

For a column-partitioned table or join index there is one set of presence bits for each column partition value in the container. If a presence bit is 1, a value is present. If a presence bit is 0, no value is present.

A compression-enabled column for a table has a maximum of 256 presence bits set, depending on how many values are compressed using multivalue compression.

The meaning of the number of presence bits set per column for compression of a single value is provided in the following table.

This type of presence bit field… Has this many presence bits … For this type of column …
Nullability 0 Non-nullable.
1 Nullable.
Compressibility 0 Not compressed or compressed on nulls only.
1 Compressed on a value.

The total number of presence bits for a given row is the sum of the nullability presence bits and the compressibility presence bits.

The following table expresses the same information in a slightly different way.

Compressible Nullable
Bit Value Meaning Bit Value Meaning
       0 The column is multivalue compressed.        0 The column is null.
       1 A non-compressed column value is present.        1 The column is not null.
An algorithmically compressed column adds an extra bit to indicate whether the column is algorithmically compressed or not, as follows:
Compressible Nullable
Bit Value Meaning
       0 The column is not algorithmically compressed.
       1 The column is algorithmically compressed.
1 to 8 bits for each multivalue compressible column (8 bits because 255 compressed values plus null require 28, or 256 bit combinations, to be represented). 1 bit for each nullable column.

The following table provides a comprehensive mapping of the presence bits and their various combinations for the multivalue compression case:

WHEN the presence bits for a column have these values … THEN the column … AND …
Compress Null
no bit no bit is not compressible is not nullable.
0 no bit is compressed is not nullable.
1 no bit contains uncompressed column values is not nullable.
no bit 0 is not compressible is null.
no bit 1 is not compressible is not null.
0 0 is compressed is null.
1 1 is not compressed is not null.
1 0 is not compressed is null.
0 1 is not compressed. is null.

Mappings of COMPRESS bit values for multivalued compression generalize from this specific case as illustrated by the following table.

IF the presence bit is … THEN the data is …
1 not compressed.

The corresponding compress bits are all 0.

0 compressed.
  • If the corresponding compress bits are all 0, then the compress value is null.
  • If the corresponding compress bits are not all 0, then the compress value is an index to the compress multivalue array in the table header.

The following table presents a set of examples that clarifies the correspondence between presence bits and data attribute specifications.

FOR this column definition … The presence bits are … For these characters …
col_1 CHAR(1) NOT NULL none A
col_1 CHAR(1) NOT NULL COMPRESS (‘A’)  0 A
1 B
col_1 CHAR(1) COMPRESS 0 null
1 A
col_1 CHAR(1) COMPRESS (‘A’) 00 null
01 A
10 B
10 C
col_1 CHAR(1) COMPRESS (‘A’, ‘B’, ‘C’, ‘D’) 0000 null
0001 A
0010 B
0011 C
0100 D
1000 E
col_1 CHAR(1) NOT NULL COMPRESS (‘A’, ‘B’, ‘C’, ‘D’) 001 A
010 B
011 C
100 D
000 E

Number of Presence Bits Required to Represent Compressed Values

Because compression is signaled by populating octets of presence bit fields in the row header, there is a tradeoff between the capacity required to add presence octets and the savings realized from multivalue compression at the base table row level: the size of the bit field required to express multivalue compression increases non-linearly as a function of the number of values to be compressed. With respect to the savings gained by multivalue compression, this tradeoff is almost always worth doing.

The following table indicates the relationship between the number of values (and null, which is always compressed by default) specified for compression per column and the number of compress bits required to indicate that number of values.

Nullable? Number of Column Values Compressed Number of Compress Bits in the Row Header Bit Field
Yes                  null                                 1

Shared with the presence bit.

No                  1                                 1
No                  2 - 3                                 2
No                  4 - 7                                 3
No                  8 - 15                                 4
No                  16 - 31                                 5
No                  32 - 63                                 6
No                  64 - 127                                 7
No                  128 - 255                                 8
Yes                  1 + null                                 1
Yes                  2 - 3 + null                                 2
Yes                  4 - 7 + null                                 3
Yes                  8 - 15 + null                                 4
Yes                  16 - 31 + null                                 5
Yes                  32 - 63 + null                                 6
Yes                  64 - 127 + null                                 7
Yes                  128 - 255 + null                                 8

The compress bits value represents an index into the compress multivalue array for that column in the table header. This is illustrated by the following graphic showing that the presence bit pattern 10 points to the compressed character string “Los Angeles” in the table header. Note that because only 3 values are compressed for this column, 2 compression bits are required to uniquely identify them.

The table definition DDL for this illustration is as follows:

     CREATE TABLE MVCompress (
       StreetAddress VARCHAR(40),
       City          CHARACTER(20) COMPRESS (’New York’,
                     ’Los Angeles’, ’Chicago’) NOT NULL,
       StateCode     CHARACTER(2));


As you can see, all the possible combinations of bit patterns are used to identify and locate the values. For this particular example, the bit patterns and their corresponding values are those indicated in the following table:

FOR this pattern of presence bits … The value for City is … AND stored in …
                    00 not compressed. the row.
                    01 ’Chicago’ and is compressed the table header.
                    10 ’Los Angeles’ and is compressed the table header.
                    11 ’New York’ and is compressed the table header.

From the perspective of the row header, the best policy is to compress the highest number of values in each octet bit array, because doing so does not add any more space in the row header (nor does null compression, which is handled by presence bits in the row header for each row). In other words, the best practice is to optimize multivalued compression for 1, 3, 7, 15, 31, 63, 127, or 255 values. You should evaluate all these possibilities to determine which yields the best compression for a particular column.

You must also take into account the number of bytes compression transfers to the table header, because those bytes count against the maximum row length of 1 MB.

Systems using small cylinders have a maximum row length of 64 KB.

The compressed values are stored adjacent to one another in ascending value order and descending order of field size following the end of Field5 in the table header.