Compression reduces the physical size of stored information. The goal of compression is to represent information accurately using the fewest number of bits. Compression methods are either logical or physical. Physical data compression re-encodes information independently of its meaning, while logical data compression substitutes one set of data with another, more compact set.
Compression is used for the following reasons.
- To reduce storage costs
- To enhance system performance
Compression reduces storage costs by storing more logical data per unit of physical capacity. Compression produces smaller rows, resulting in more rows stored per data block and fewer data blocks.
Compression enhances system performance because there is less physical data to retrieve per row for queries. Also, because compressed data remains compressed while in memory, the FSG cache can hold more rows, reducing the size of disk I/O.
Most forms of compression are transparent to applications, ETL utilities, and queries. This can be less true of algorithmic compression, because a poorly performing decompression algorithm can have a negative effect on system performance, and in some cases a poorly written decompression algorithm can even corrupt data.
Experience with real world customer production databases with very large tables indicates that compression produces performance benefits for a table even when more than 100 of its columns have been compressed.
Teradata Database uses several types of compression.
|FOR this database element...||Compression refers to...|
|column values||the storage of those values one time only in the table header, not in the row itself, and pointing to them by means of an array of presence bits in the row header. It applies to:
You cannot apply either multivalue compression or algorithmic compression to row-level security constraint columns.
|hash and join indexes||a logical row compression in which multiple sets of nonrepeating column values are appended to a single set of repeating column values. This allows the system to store the repeating value set only once, while any nonrepeating column values are stored as logical segmental extensions of the base repeating set.
See Row Compression.
|data blocks||the storage of primary table data, or join or hash index subtable data. Secondary Index (SI) subtable data cannot be compressed.|
|partition containers||the autocompression method set determined by Teradata Database to apply to a container of a column-partitioned table or join index when you have not specified the NO AUTO COMPRESS option at the time the object was created.
See Autocompression for further information about autocompression for column-partitioned tables and join indexes.
Row compression, multivalue compression, block-level compression, and autocompression are lossless methods, meaning that the original data can be reconstructed exactly from the compressed forms, while algorithmic compression can be either lossless or lossy, depending on the algorithm used.
There is a small initial cost, but even for queries made against small tables, compression is a net win if the chosen compression method reduces table size.
For compressed spool, if a column is copied to spool with no expressions applied against it, then the system copies just the compressed bits into the spool, saving both CPU and disk I/O size. Once in spool, compression works exactly as it does in a base table. There is a compress multivalue in the table header of the spool that stays in memory while the system is operating on the spool. When algorithmic compression is carried to spool, the compressed data is carried along with the compress bits.
The column attributes COMPRESS and NULL (see SQL Data Types and Literals) are useful for minimizing table storage space. You can use these attributes to selectively compress as many as 255 distinct, frequently repeated column values (not characters), to compress all nulls in a column, or to compress both.
The limit of 255 values is approximate because there is also a limit on the number of bytes or characters per column that can be multivalue compressed. These limits vary for different types of character data, as the following table explains.
|FOR this type of data …||THE maximum storage per column is approximately …|