15.00 - Compression Types Supported by Teradata Database - Teradata Database

Teradata Database Design

Teradata Database
User Guide

Compression Types Supported by Teradata Database

Compression reduces the physical size of stored information. The goal of compression is to represent information accurately using the fewest number of bits. Compression methods are either logical or physical. Physical data compression re‑encodes information independently of its meaning, while logical data compression substitutes one set of data with another, more compact set.

Compression is used for the following reasons.

  • To reduce storage costs
  • To enhance system performance
  • Compression reduces storage costs by storing more logical data per unit of physical capacity. Compression produces smaller rows, resulting in more rows stored per data block and fewer data blocks.

    Compression enhances system performance because there is less physical data to retrieve per row for queries. Also, because compressed data remains compressed while in memory, the FSG cache can hold more rows, reducing the size of disk I/O.

    Most forms of compression are transparent to applications, ETL utilities, and queries. This can be less true of algorithmic compression, because a poorly performing decompression algorithm can have a negative effect on system performance, and in some cases a poorly written decompression algorithm can even corrupt data.

    Experience with real world customer production databases with very large tables indicates that compression produces performance benefits for a table even when more than 100 of its columns have been compressed.

    Teradata Database uses several types of compression.


    FOR this database element...

    Compression refers to...

    column values

    the storage of those values one time only in the table header, not in the row itself, and pointing to them by means of an array of presence bits in the row header. It applies to:

  • Multi-value compression
  • See “Multi‑Value Compression” on page 699.

  • Algorithmic compression
  • See “Algorithmic Compression” on page 700.

    You cannot apply either multi‑value compression or algorithmic compression to row‑level security constraint columns.

    hash and join indexes

    a logical row compression in which multiple sets of nonrepeating column values are appended to a single set of repeating column values. This allows the system to store the repeating value set only once, while any nonrepeating column values are stored as logical segmental extensions of the base repeating set.

    See “Row Compression” on page 709.

    data blocks

    the storage of primary table data, or join or hash index subtable data. Secondary Index (SI) subtable data cannot be compressed.

    See “Block-Level Compression” on page 704.

    There are no restrictions on using block-level compression for a row-level security-protected table.

    partition containers

    the autocompression method set determined by Teradata Database to apply to a container of a column‑partitioned table or join index when you have not specified the NO AUTO COMPRESS option at the time the object was created.

    See “Autocompression” on page 300 for further information about autocompression for column‑partitioned tables and join indexes.

    Row compression, multi-value compression, block-level compression, and autocompression are lossless methods, meaning that the original data can be reconstructed exactly from the compressed forms, while algorithmic compression can be either lossless or lossy, depending on the algorithm used.

    There is a small initial cost, but even for queries made against small tables, compression is a net win if the chosen compression method reduces table size.

    For compressed spool files, if a column is copied to spool with no expressions applied against it, then the system copies just the compressed bits into the spool file, saving both CPU and
    disk I/O size. Once in spool, compression works exactly as it does in a base table. There is a compress multi-value in the table header of the spool that stays in memory while the system is operating on the spool. When algorithmic compression is carried to spool files, the compressed data is carried along with the compress bits

    The column attributes COMPRESS and NULL (see SQL Data Types and Literals) are useful for minimizing table storage space. You can use these attributes to selectively compress as many as 255 distinct, frequently repeated column values (not characters), to compress all nulls in a column, or to compress both.

    The limit of 255 values is approximate because there is also a limit on the number of bytes or characters per column that can be multi-value compressed. These limits vary for different types of character data, as the following table explains.


    FOR this type of data …

    THE maximum storage per column is approximately …

  • BYTE
  • 4,093 bytes

  • KanjiSJIS
  • Latin
  • Unicode
  • 8,188 characters