Compression Types Supported by VantageCloud Lake - Compression Types Supported by Vantage - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

Compression reduces the physical size of stored information. The goal of compression is to represent information accurately using the fewest number of bits. Compression methods are either logical or physical. Physical data compression re-encodes information independently of its meaning, while logical data compression substitutes one set of data with another, more compact set.

Compression is used for the following reasons.
  • To reduce storage costs
  • To enhance system performance

Compression reduces storage costs by storing more logical data per unit of physical capacity. Compression produces smaller rows, resulting in more rows stored per data block and fewer data blocks.

Compression enhances system performance because there is less physical data to retrieve per row for queries. Also, because compressed data remains compressed while in memory, the FSG cache can hold more rows, reducing the size of disk I/O.

Most forms of compression are transparent to applications, ETL utilities, and queries. This can be less true of algorithmic compression, because a poorly performing decompression algorithm can have a negative effect on system performance, and a poorly written decompression algorithm can even corrupt data.

Experience with real world customer production databases with large tables indicates that compression produces performance benefits for a table even when more than 100 of its columns have been compressed.

Vantage uses the following types of compression:

Database Element Compression Type
Column value The storage of those values one time only in the table header, not in the row itself, and pointing to the values by means of an array of presence bits in the row header. This description applies to the following:

You cannot apply either multivalue compression or algorithmic compression to row-level security constraint columns.

Join index A logical row compression in which multiple sets of nonrepeating column values are appended to a single set of repeating column values. This allows the system to store the repeating value set only once, while any nonrepeating column values are stored as logical segmental extensions of the base repeating set.

See Row Compression.

Data block The storage of primary table data or join index subtable data. Secondary Index (SI) subtable data cannot be compressed.

See Block-Level Compression.

Partition container The autocompression method set determined by Vantage to apply to a container of a column-partitioned table or join index when you have not specified the NO AUTO COMPRESS option at the time the object was created.

See Autocompression for further information about autocompression for column-partitioned tables and join indexes.

Row compression, multivalue compression, block-level compression, and autocompression are lossless methods, meaning that the original data can be reconstructed exactly from the compressed forms, while algorithmic compression can be lossless or lossy.

There is a small initial cost, but even for queries made against small tables, compression is a net win if the chosen compression method reduces table size.

For compressed spool, if a column is copied to spool with no expressions applied against it, then the system copies only the compressed bits into the spool, saving CPU and reducing disk I/O size. In spool, compression works as in a base table. There is a compress multivalue in the table header of the spool that stays in memory while the system is operating on the spool. When algorithmic compression is carried to spool, the compressed data is carried with the compress bits.

The column attributes COMPRESS and NULL can minimize table storage space. You can use these attributes to compress up to 255 distinct, frequently repeated column values (not characters), to compress all nulls in a column, or both. See COMPRESS and DECOMPRESS Phrases and NOT NULL Phrase .

The limit of 255 values is approximate because there is also a limit on the number of bytes or characters per column that can be multivalue compressed. These limits vary for different types of character data, as the following table explains.

Data Type Approximate Maximum Storage Per Column
BYTE 4,093 bytes
  • KanjiSJIS
  • GRAPHIC
  • Latin
  • Unicode
8,188 characters