Tradeoffs between Multivalue Compression and Storage Requirements for Compressed Values

Tradeoffs between Multivalue Compression and Storage Requirements for Compressed Values - Teradata Vantage

Teradata® VantageCloud Lake

Deployment

VantageCloud

Edition

Lake

Product

Teradata Vantage

Published

January 2023

Language

English (United States)

Last Update

2024-04-03

dita:mapPath

phg1621910019905.ditamap

dita:ditavalPath

pny1626732985837.ditaval

dita:id

phg1621910019905

Multivalue Compression and Net Capacity for Nulls and Values

While multivalue compression removes specified values from row storage, those values must be stored somewhere, unless the values are nulls. Null compression is handled by the presence bits in the row header and does impact the table header.

Storage of Compressed Values

The presence bits in the row header index into field 5 of the table header, where the compressed values are stored, once per column per AMP. This does not apply to algorithmically compressed data, which is stored in place within the row except for algorithmically compressed BLOB, BLOB-related UDT, CLOB, CLOB-related UDT, XML, XML-related UDT, or Geospatial data that is typically stored in subtables.

The size of the table header is limited to 1 MB, limiting the number of bytes that can be compressed for a given column. If the number of bytes compressed exceeds the maximum row length, the CREATE TABLE or ALTER TABLE statement is not valid and the DDL statement ends, even if the number of values specified for compression does not exceed the upper limit of 255.

The following graph plots the number of compressible values that can be specified for a column as a function of column width.

# of compressible values specified for a column as a function of column width

Not surprisingly, the plot clearly indicates that the wider the column, the fewer the number of values that can be compressed for the column. Particularly for wider columns, to optimize compression you must carefully analyze your tables to determine which values occur the most frequently and then limit compression to the top n values from that list.