Tradeoffs between Multivalue Compression and Storage Requirements for Compressed Values - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

Multivalue Compression and Net Capacity for Nulls and Values

While multivalue compression removes specified values from row storage, those values must be stored somewhere, unless the values are nulls. Null compression is handled by the presence bits in the row header and does impact the table header.

Storage of Compressed Values

The presence bits in the row header index into field 5 of the table header, where the compressed values are stored, once per column per AMP. This does not apply to algorithmically compressed data, which is stored in place within the row except for algorithmically compressed BLOB, BLOB-related UDT, CLOB, CLOB-related UDT, XML, XML-related UDT, or Geospatial data that is typically stored in subtables.

The size of the table header is limited to 1 MB, limiting the number of bytes that can be compressed for a given column. If the number of bytes compressed exceeds the maximum row length, the CREATE TABLE or ALTER TABLE statement is not valid and the DDL statement ends, even if the number of values specified for compression does not exceed the upper limit of 255.

The following graph plots the number of compressible values that can be specified for a column as a function of column width.


# of compressible values specified for a column as a function of column width

Not surprisingly, the plot clearly indicates that the wider the column, the fewer the number of values that can be compressed for the column. Particularly for wider columns, to optimize compression you must carefully analyze your tables to determine which values occur the most frequently and then limit compression to the top n values from that list.