15.00 - Compressing Column Values Using Only Multi-Value Compression - Teradata Database

Teradata Database SQL Data Definition Language Detailed Topics

Product
Teradata Database
Release Number
15.00
Content Type
Programming Reference
Publication ID
B035-1184-015K
Language
English (United States)

Compressing Column Values Using Only Multi-Value Compression

The term compression is used to mean different things for Teradata systems. Multi-value and row compression are both lossless, meaning that the original data can be reconstructed exactly from their compressed forms, while algorithmic compression can be done using either a non‑loss method or a lossy method.

When describing compression of column values, multi-value compression refers to the storage of those values one time only in the table header, not in the row itself, and pointing to them by means of an array of presence bits in the row header. This method of value compression is called Dictionary Indexing, and it is a variation of Run‑Length Encoding (see Database Design for more information).

You can specify multi-value compression only for columns having one of the following data types.

  • All numeric types
  • DATE
  • CHARACTER and CHARACTER SET GRAPHIC
  • VARCHAR and CHARACTER SET VARGRAPHIC
  • BYTE
  • VARBYTE
  • A distinct UDT type based on any of the predefined data types in this list.
  • Because the list of compressed data values is memory‑resident any time the table is being accessed, the system can access compressed values in 2 ways, depending on the presence bits for the column.

  • Using a pointer reference to the value in the current row
  • Using a pointer reference to the value in the multi-value
  • There is a small cost for compressing a value when a row is stored for the first time, and a one‑time cost to convert an existing uncompressed column to a compressed column. But for queries, even those made against small tables, multi-value compression is a net win as long as the chosen compression reduces the size of the table (see Database Design for details).

    With respect to multi-value‑compressed spool files, if a column is copied to spool with no expressions applied against it, Teradata Database copies just the compressed data into the spool file, saving on both CPU and I/O costs. Once in spool, compression works exactly as it does in a base table. There is a compressed multi-value list in the table header of the spool that is memory‑resident while the system is operating on the spool.

    You can compress nulls and as many as 255 distinct values per column using standard default multi-value compression. You can compress the values for an unlimited number of columns per table. Although there is no defined limit on the number of columns that can be compressed per table, the actual number of columns with compression that can be achieved is a function of the amount of space available in the table header that is not consumed by other optional table characteristics. See Database Design for details.

     

    IF you use this type of multi-value compression …

    THEN …

    single‑valued

    the default is null if no value is specified explicitly.

    multi-valued

  • There is no default and you must specify all values explicitly.
  • There is no essential ordering for specifying values to be compressed for a given column.
  • You cannot specify the same value or NULL more than once in the compression multi-value for a given column.
  • Note that columnar multi-value compression, once defined for table columns, is retained as a property of spool files containing data from the table as well, so compression is not just a property of data stored on disk.

    Note: Using multi-value compression to compress a substantial number of values in a table can cause the following capacity issues.

  • Table header overflow.
  • Compressed multi-values are stored in the table header, and like other Teradata Database rows, table header rows have a maximum length of 64 KB. Null storage is indicated by presence bits in the row header and has no impact on the size of the table header. The maximum size of a table header is 1 MB. If table header overflow occurs, your CREATE TABLE request fails. Resubmit the request compressing fewer column values.

    For additional information about modifying multi-value compressed columns, see “Rules for Adding or Modifying Multi-Value Compression for a Column” on page 63.

    Several factors contribute to the length of table header rows, including partitioning and UDT columns, so it is not possible to provide rigid guidelines to avoid overflow. See Database Design for more information about table headers.

  • Dictionary cache overflow
  • When this happens, Teradata Database reports a 3930 error. Increase the size of your dictionary cache from its current value to the maximum size of 1 MB. The default value for the size of the dictionary cache is also 1 MB, so you would only need to increase its size if it had reduced for some reason.