Multivalue Compression | Database Design | Teradata Vantage - Multivalue Compression - Advanced SQL Engine - Teradata Database

Database Design

Product
Advanced SQL Engine
Teradata Database
Release Number
17.05
17.00
Published
June 2020
Language
English (United States)
Last Update
2021-01-22
dita:mapPath
qby1588121512748.ditamap
dita:ditavalPath
lze1555437562152.ditaval
dita:id
B035-1094
lifecycle
previous
Product Category
Teradata Vantage™

Multivalue compression (MVC) compresses repeating values in a column when you specify the value in a compression list in the column definition.

When data in the column matches a value specified in the compression list, the database stores the value only once in the table header, regardless of how many times it occurs as a field value for the column. The database then stores a smaller substitute value, often as small as 2 bits, in each row where the value occurs.

MVC generally provides the best cost/benefit ratio compared to other methods. Because it requires minimal resources to uncompress the data during query processing, you can use MVC for hot (frequently used) data without compromising query/load performance. MVC is also considered the easiest to implement of all the compression methods.

MVC is a logical data compression form and is lossless.

Besides storage capacity and disk I/O size improvements, MVC has the following performance impacts:

  • Improves table scan response times for most configurations and workloads
  • Provides moderate to little CPU savings

Procedure

  1. Use the COMPRESS phrase in a CREATE TABLE or ALTER TABLE statement to specify a list of frequently occurring values for the column that contains the values, for example:
    CREATE TABLE Employee
       (EmpNo INT NOT NULL,
        EmpLoc CHAR(30) COMPRESS ('Seattle','Dallas','Chicago')
    ...
        DOB DATE COMPRESS (DATE '1972-02-29', DATE '1976-02-29')
    ...);
  2. The system automatically compresses the specified values when data moves into the table, and uncompresses them when the containing rows are accessed.
The system automatically compresses NULLs when you specify COMPRESS.
Compressing many values in a table can cause the dictionary cache to overflow. If this happens increase your dictionary cache to the default value of 1 MB.
You cannot apply MVC to row-level security columns.

You can use MVC to compress columns with these data types:

  • Any numeric data type
  • BYTE
  • VARBYTE
  • CHARACTER
  • VARCHAR
  • DATE

    To compress a DATE value, you must specify the value as a Date literal using the ANSI DATE format (DATE 'YYYY-MM-DD'). For example:

        COMPRESS (DATE '2000-06-15')
       
  • TIME and TIME WITH TIME ZONE
  • TIMESTAMP and TIMESTAMP WITH TIME ZONE

To compress a TIME or TIMESTAMP value, you must specify the value as a TIME or TIMESTAMP literal. For example:

   COMPRESS (TIME '15:30:00')
   COMPRESS (TIMESTAMP '2006-11-23 15:30:23')

In addition, you can use COMPRESS (NULL) for columns with these data types:

  • ARRAY
  • Period
  • Non-LOB distinct or structured UDT

Multivalue Compression Example

The following CREATE TABLE fragment specifies that all occurrences of 'cashier', 'manager', and 'programmer' for the jobtitle column as well as all nulls are to be compressed to zero space.

Nulls are compressed by default whether or not there is an argument to the COMPRESS attribute specified for a column.

This definition saves 30 bytes for each row whenever an employee has one of the following job titles:

  • Null
  • Cashier
  • Manager
  • Programmer
     CREATE TABLE employee (
       employee_number INTEGER
       ...
       jobtitle        CHARACTER(30) COMPRESS (’cashier’,
                       ’manager’, ’programmer’)
       ...
       );

Guidelines for Using Multivalue Compression

  • To change the values that are compressed or to change whether or not a column uses compression, use an ALTER TABLE request. For example, the following turns off the compression in the mycol_varchar column:
    ALTER TABLE DB.mytable_vlc
    ADD mycol_varchar NO COMPRESS;
  • You cannot compress more than 255 distinct values for an individual column.
  • You cannot create a table with more bytes compressed than there is room to store them in the table header.
  • The maximum number of characters that can be listed in a COMPRESS clause is 8,188.
  • You cannot compress values in columns that are any of the following:
    • Primary index columns
    • Identity columns
    • Derived table columns
    • Derived period columns
    • Row-level security constraint columns
    • Referenced primary key columns
    • Referencing foreign key columns for standard referential integrity relationships

      You can compress values in referencing foreign key columns for Batch and Referential Constraint referential integrity relationships.

  • You can compress columns that are a component of a secondary index, but MultiLoad operations on a table with a secondary index can take longer if the secondary index column set is compressed.

    To avoid this problem, drop any compressed secondary indexes before starting the MultiLoad job and then recreate them afterward.

  • You can compress columns that are components of a referential integrity relationship.
  • You can assign multivalue compression to columns that contain the following types of data:
    • Nulls, including nulls for distinct and structured non-LOB and non-XML UDTs, ARRAY/VARRAY and Period UDT data types
    • Zeros
    • Blanks
    • Constants having any of the data types supported by multivalue compression.