Presence bits indicate the status of each column with respect to its nullability, multivalue compressibility, and autocompressibility. Compression presence bits are added to the row header of each row to specify how multivalue compression is used for that row.
Each row has at least one octet of presence bits and can have more, depending on the degree of the table and the cumulative number of values compressed. The difference between bytes and octets is conceptual. The 8 bits of a byte define an atomic unit, while each individual bit of an octet is an atomic flag in a bit array. For the purpose of storage capacity analysis, both are treated as bytes. All rows for a given table have the same presence bits defined because nullability and multivalue compressibility are table attributes.
The first bit of the first presence bits octet is always set to 1, so the first octet defines the nullability and multivalue compressibility for no more than seven columns. When necessary, additional presence bit octets are added to the row header. Although only 7 of the bits in the first octet are available for use as presence bits, all 8 bits of successive octets are available. Eight bits are used because 255 compressed values plus null require 28, or 256 bit combinations, to be represented.
For a column-partitioned table or join index there is one set of presence bits for each column partition value in the container. If a presence bit is 1, a value is present. If a presence bit is 0, no value is present.
A compression-enabled column for a table has a maximum of 256 presence bits set, depending on how many values are compressed using multivalue compression.
The meaning of the number of presence bits set per column for compression of a single value is provided in the following table.
This type of presence bit field… | Has this many presence bits … | For this type of column … |
---|---|---|
Nullability | 0 | Non-nullable. |
1 | Nullable. | |
Compressibility | 0 | Not compressed or compressed on nulls only. |
1 | Compressed on a value. |
The total number of presence bits for a given row is the sum of the nullability presence bits and the compressibility presence bits.
The following table expresses the same information in a slightly different way.
Compressible | Nullable | |||
---|---|---|---|---|
Bit Value | Meaning | Bit Value | Meaning | |
0 | The column is multivalue compressed. | 0 | The column is null. | |
1 | A non-compressed column value is present. | 1 | The column is not null. | |
An algorithmically compressed column adds an extra bit to indicate whether the column is algorithmically compressed or not, as follows: | ||||
Compressible | Nullable | |||
Bit Value | Meaning | |||
0 | The column is not algorithmically compressed. | |||
1 | The column is algorithmically compressed. | |||
1 to 8 bits for each multivalue compressible column (8 bits because 255 compressed values plus null require 28, or 256 bit combinations, to be represented). | 1 bit for each nullable column. |
The following table provides a comprehensive mapping of the presence bits and their various combinations for the multivalue compression case:
WHEN the presence bits for a column have these values … | THEN the column … | AND … | |
---|---|---|---|
Compress | Null | ||
no bit | no bit | is not compressible | is not nullable. |
0 | no bit | is compressed | is not nullable. |
1 | no bit | contains uncompressed column values | is not nullable. |
no bit | 0 | is not compressible | is null. |
no bit | 1 | is not compressible | is not null. |
0 | 0 | is compressed | is null. |
1 | 1 | is not compressed | is not null. |
1 | 0 | is not compressed | is null. |
0 | 1 | is not compressed. | is null. |
Mappings of COMPRESS bit values for multivalued compression generalize from this specific case as illustrated by the following table.
IF the presence bit is … | THEN the data is … | |||
---|---|---|---|---|
1 | not compressed. The corresponding compress bits are all 0. |
|||
0 | compressed.
|
The following table presents a set of examples that clarifies the correspondence between presence bits and data attribute specifications.
FOR this column definition … | The presence bits are … | For these characters … |
---|---|---|
col_1 CHAR(1) NOT NULL | none | A |
col_1 CHAR(1) NOT NULL COMPRESS (‘A’) | 0 | A |
1 | B | |
col_1 CHAR(1) COMPRESS | 0 | null |
1 | A | |
col_1 CHAR(1) COMPRESS (‘A’) | 00 | null |
01 | A | |
10 | B | |
10 | C | |
col_1 CHAR(1) COMPRESS (‘A’, ‘B’, ‘C’, ‘D’) | 0000 | null |
0001 | A | |
0010 | B | |
0011 | C | |
0100 | D | |
1000 | E | |
col_1 CHAR(1) NOT NULL COMPRESS (‘A’, ‘B’, ‘C’, ‘D’) | 001 | A |
010 | B | |
011 | C | |
100 | D | |
000 | E |
Number of Presence Bits Required to Represent Compressed Values
Because compression is signaled by populating octets of presence bit fields in the row header, there is a tradeoff between the capacity required to add presence octets and the savings realized from multivalue compression at the base table row level: the size of the bit field required to express multivalue compression increases non-linearly as a function of the number of values to be compressed. With respect to the savings gained by multivalue compression, this tradeoff is almost always worth doing.
The following table indicates the relationship between the number of values (and null, which is always compressed by default) specified for compression per column and the number of compress bits required to indicate that number of values.
Nullable? | Number of Column Values Compressed | Number of Compress Bits in the Row Header Bit Field |
---|---|---|
Yes | null | 1 Shared with the presence bit. |
No | 1 | 1 |
No | 2 - 3 | 2 |
No | 4 - 7 | 3 |
No | 8 - 15 | 4 |
No | 16 - 31 | 5 |
No | 32 - 63 | 6 |
No | 64 - 127 | 7 |
No | 128 - 255 | 8 |
Yes | 1 + null | 1 |
Yes | 2 - 3 + null | 2 |
Yes | 4 - 7 + null | 3 |
Yes | 8 - 15 + null | 4 |
Yes | 16 - 31 + null | 5 |
Yes | 32 - 63 + null | 6 |
Yes | 64 - 127 + null | 7 |
Yes | 128 - 255 + null | 8 |
The compress bits value represents an index into the compress multivalue array for that column in the table header. This is illustrated by the following graphic showing that the presence bit pattern 10 points to the compressed character string “Los Angeles” in the table header. Note that because only 3 values are compressed for this column, 2 compression bits are required to uniquely identify them.
The table definition DDL for this illustration is as follows:
CREATE TABLE MVCompress ( StreetAddress VARCHAR(40), City CHARACTER(20) COMPRESS (’New York’, ’Los Angeles’, ’Chicago’) NOT NULL, StateCode CHARACTER(2));
As you can see, all the possible combinations of bit patterns are used to identify and locate the values. For this particular example, the bit patterns and their corresponding values are those indicated in the following table:
FOR this pattern of presence bits … | The value for City is … | AND stored in … |
---|---|---|
00 | not compressed. | the row. |
01 | ’Chicago’ and is compressed | the table header. |
10 | ’Los Angeles’ and is compressed | the table header. |
11 | ’New York’ and is compressed | the table header. |
From the perspective of the row header, the best policy is to compress the highest number of values in each octet bit array, because doing so does not add any more space in the row header (nor does null compression, which is handled by presence bits in the row header for each row). In other words, the best practice is to optimize multivalued compression for 1, 3, 7, 15, 31, 63, 127, or 255 values. You should evaluate all these possibilities to determine which yields the best compression for a particular column.
You must also take into account the number of bytes compression transfers to the table header, because those bytes count against the maximum row length of 1 MB.
The compressed values are stored adjacent to one another in ascending value order and descending order of field size following the end of Field5 in the table header.