17.10 - Compressing Column Values Using Only Multivalue Compression - Advanced SQL Engine - Teradata Database

Teradata Vantage™ - SQL Data Definition Language Detailed Topics

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Release Date
July 2021
Content Type
Programming Reference
Publication ID
B035-1184-171K
Language
English (United States)

Teradata systems use a variety of compression methods. Multivalue and row compression are lossless. The original data can be reconstructed exactly from their compressed forms. Algorithmic compression can be a non-loss or lossy process.

Multivalue compression refers to the storage of column values values one time only in the table header, not in the row itself, and pointing to them using an array of presence bits in the row header. This method of value compression is called Dictionary Indexing. See Teradata Vantage™ - Database Design, B035-1094.

You can specify multivalue compression only for columns having one of the following data types.
  • All numeric types
  • DATE
  • CHARACTER and CHARACTER SET GRAPHIC
  • VARCHAR and CHARACTER SET VARGRAPHIC
  • BYTE
  • VARBYTE
  • A distinct UDT type based on any of the predefined data types in this list.
Because the list of compressed data values is memory-resident any time the table is being accessed, the system can access compressed values in 2 ways, depending on the presence bits for the column.
  • Using a pointer reference to the value in the current row
  • Using a pointer reference to the value in the multivalue

There is a small cost for compressing a value when a row is stored for the first time, and a one-time cost to convert an existing noncompressed column to a compressed column. But for queries, even those made against small tables, multivalue compression is a net win as long as the chosen compression reduces the size of the table. See Teradata Vantage™ - Database Design, B035-1094.

For multivalue-compressed spools, if a column is copied to spool with no expressions applied against it, Vantage copies just the compressed data into the spool, saving CPU and I/O costs. Once in spool, compression works exactly as it does in a base table. There is a compressed multivalue list in the table header of the spool that is memory-resident while the system is operating on the spool.

You can compress nulls and as many as 255 distinct values per column using standard default multivalue compression. You can compress the values for an unlimited number of columns per table. Although there is no defined limit on the number of columns that can be compressed per table, the actual number of columns with compression that can be achieved is a function of the amount of space available in the table header that is not consumed by other optional table characteristics. See Teradata Vantage™ - Database Design, B035-1094.

Type of Multivalue Compression Description
Single-valued The default is null if no value is specified explicitly.
Multivalued You must specify all values explicitly. There is no default.

There is no essential ordering for specifying values to be compressed for a column.

You cannot specify the same value or NULL more than once in the compression multivalue for a column.

Note that multivalue compression, once defined for table columns, is retained as a property of spools containing data from the table as well, so compression is not just a property of data stored on disk.

Multivalue Compression Capacity

Using multivalue compression to compress a substantial number of values in a table can cause the following capacity issues.

Table Header Overflow

Compressed multivalue information is stored in the table header, and like other database rows, table header rows have a maximum length of 64 KB. Null storage is indicated by presence bits in the row header and has no impact on the size of the table header. The maximum size of a table header is 1 MB. If table header overflow occurs, you must resubmit the CREATE TABLE statement fewer compressed column values.

For additional information about modifying multivalue compressed columns, see Rules for Adding or Modifying Multivalue Compression for a Column.

Several factors contribute to the length of table header rows, including partitioning and UDT columns, so it is not possible to provide rigid guidelines to avoid overflow. For more information about table headers, see Teradata Vantage™ - Database Design, B035-1094.

Dictionary Cache Overflow

When this happens, the system reports a 3930 error. Increase the size of your dictionary cache from its current value to the maximum size of 1 MB. The default value for the size of the dictionary cache is also 1 MB, so you would only need to increase its size if it had reduced for some reason.