Tradeoffs Between Multivalue Compression and Storage Requirements for Compressed Values - Advanced SQL Engine - Teradata Database

Database Design

Product
Advanced SQL Engine
Teradata Database
Release Number
17.05
17.00
Published
June 2020
Language
English (United States)
Last Update
2021-01-22
dita:mapPath
qby1588121512748.ditamap
dita:ditavalPath
lze1555437562152.ditaval
dita:id
B035-1094
lifecycle
previous
Product Category
Teradata Vantageā„¢

About Multivalue Compression and Net Capacity for Nulls and Values

While multivalue compression removes specified values from row storage, those values do not disappear: they must be stored somewhere. This statement applies only to values, not nulls. Null compression is handled by the presence bits in the row header and does not have an impact on the table header.

Storage of Compressed Values

The presence bits in the row header index into field 5 of the table header, where the compressed values are stored, once per column per AMP. This does not apply to algorithmically compressed data, which is stored in place within the row except for algorithmically compressed BLOB, BLOB-related UDT, CLOB, CLOB-related UDT, XML, XML-related UDT, or Geospatial data that is generally stored in subtables. See Teradata Vantageā„¢ - SQL External Routine Programming , B035-1147 for more information about algorithmic compression.

Because the size of the table header is limited to 1 MB, there is a limit to how many bytes can be compressed for a given column. If the number of bytes compressed exceeds the maximum row length, then the CREATE or ALTER TABLE statement used to create the new table is not valid and the DDL statement aborts. This is true even if the number of values specified for compression does not exceed the upper limit of 255.

The following graph plots the number of compressible values that can be specified for a column as a function of column width.



Not surprisingly, the plot clearly indicates that the wider the column, the fewer the number of values that can be compressed for the column. Particularly for wider columns, this means that in order to optimize compression, you must carefully analyze your tables to determine which values occur the most frequently and then limit compression to the top n values from that list.