15.00 - Byte Alignment Considerations for Multi-Value Compression - Teradata Database

Teradata Database Design

prodname
Teradata Database
vrm_release
15.00
category
User Guide
featnum
B035-1094-015K

Byte Alignment Considerations for Multi-Value Compression

As described in “Byte Alignment” on page 770, rows are always aligned on even byte boundaries. You must take this into consideration when you are determining how many values to compress for a given column because for short rows with all fixed‑length columns, compression can actually increase the number of bytes in the table.

Byte alignment effects on compression are important when the following things are true for a table.

  • Row lengths are short.
  • Most or all columns have fixed length data types.
  • Few columns are compressed.
  • Byte alignment effects on compression are less important when the following things are true for a table.

  • Row lengths are highly variable because of variable length columns.
  • The number of columns is high and many are compressed.
  • Consider the following example. Suppose you have a table with these row characteristics:

  • 14‑byte row header
  • 4‑byte nullable non-unique primary index column
  • 2‑byte nullable SMALLINT non‑index data column
  • The total number of bytes for this row is 20.

    Because the primary index and SMALLINT columns are both nullable in this scenario, each uses a null presence bit in the row header, so 5 unused presence bits remain in the default presence octet.

    Suppose you decide to compress 63 distinct values in the SMALLINT column. This requires an additional 6 presence bits (see “Presence Bits” on page 780 and “Number of Presence Bits Required to Represent Compressed Values” on page 784), rolling over into a new presence bits octet. The row header is now 15 bytes wide, where it was previously only 14 bytes, but when a row contains a compressed value for the SMALLINT column, it is 15 + 4 = 19 bytes wide, an apparent savings of 1 byte for each such row in the table.

    Upon further analysis, you realize that all the rows you thought were 19 bytes wide are actually 20 bytes wide, so no savings are accrued by multi-value compression. The reason the rows expanded from 19 to 20 bytes is the system‑enforced even‑byte row alignment: the system added a 20th filler byte to the row to ensure an even offset.

    Suppose that instead of compressing 63 distinct values in the SMALLINT column, you compress only 31. In this case, there is no need to roll over to a second presence octet, so many rows compress to 18 bytes.

    You could also make the primary index non‑nullable (the recommended practice anyway), which also removes the need to roll over to a second presence octet. In this case, all rows can compress to 18 bytes.