15.10 - COMPRESS and DECOMPRESS Phrases - Teradata Database

Teradata Database SQL Data Types and Literals

prodname
Teradata Database
vrm_release
15.10
category
Programming Reference
featnum
B035-1143-151K

Compresses data values and nulls in one or more columns of a table, and decompresses the previously compressed values.

where:

 

Syntax element …

Specifies …

COMPRESS with no arguments

to compress nulls only.

Note that if you do not specify COMPRESS, nulls are not automatically compressed.

Multi-value Compression (MVC) Specification

COMPRESS

to compress column data using multi-value compression.

constant

a set of values to be compressed.

You can specify a single constant value, or a multi-valued, comma‑separated list of up to 255 distinct constant values enclosed in parentheses.

NULL

that nulls are to be compressed.

NULL can be specified alone, but it must be enclosed in parentheses. This is the same as specifying COMPRESS without an argument.

NULL can be specified with up to 255 constant values in a multi-valued, comma‑separated list enclosed in parentheses.

Algorithmic Compression (ALC) Specification

COMPRESS USING

to compress column data using algorithmic compression.

DECOMPRESS USING

to decompress column data that was previously compressed using algorithmic compression.

dbname

the name of the database in which the compress or decompress user-defined function (UDF) is stored. If a database name is not specified, the default database used is SYSUDTLIB.

You should create all compress and decompress UDFs in the SYSUDTLIB database. Note that compression functions supplied by Teradata are located in TD_SYSFNLIB.

compress_udf

the name of the UDF used to compress values in the column.

decompress_udf

the name of the UDF used to decompress values in the column.

COMPRESS and DECOMPRESS are Teradata extensions to the ANSI SQL:2011 standard.

You can compress as many columns as practical for any table. However, compressing a substantial number of values in a table using MVC can contribute to table header overflow.

 

IF you specify …

THEN …

COMPRESS with:

  • no argument
  • NULL enclosed in parentheses
  • nulls are compressed for nullable columns.

    NULL cannot be specified in a COMPRESS phrase for a column that is declared NOT NULL.

    COMPRESS with a constant

    the indicated value is compressed. For nullable columns, nulls are also compressed.

    COMPRESS with:

  • a multi-valued, comma-separated list of constant values enclosed in parentheses
  • a multi-valued, comma-separated list of constant values and NULL enclosed in parentheses
  • nulls and the specified distinct values are compressed for nullable columns. Nulls are compressed for nullable columns regardless of whether NULL appears in the list.

    NULL cannot appear in a COMPRESS phrase for a column that is declared NOT NULL.

    COMPRESS USING compress_udf DECOMPRESS USING decompress_udf

    nulls are compressed for nullable columns. Non-null column values that are not specified in the value compression list are compressed using the specified compress UDF, and decompressed using the specified decompress UDF.

    You can compress data at the column level using multi-value compression, a lossless, dictionary-based compression scheme. With MVC, you specify a list of values to be compressed when defining a column in the CREATE TABLE/ALTER TABLE statement. When you insert a value into the column which matches a value in the compression list, a corresponding compress bit is set instead of storing the actual value, thus saving the disk storage space.

    The best candidates for compression are the most frequently occurring values in each column. MVC is a good compression scheme when there are many repeating values for a column.

    If MVC is not an efficient compression scheme for data in a particular column, you can compress the column using algorithmic compression (ALC). You can specify MVC alone, or both MVC and ALC on the same column. If you define both on the same column, ALC is applied only to those non-null values that are not specified in the value compression list of the MVC specification. You can also use MVC together with block-level compression (BLC).

    You can use MVC to compress columns with these data types:

  • Any numeric data type
  • BYTE
  • VARBYTE
  • CHARACTER
  • VARCHAR
  • DATE
  • To compress a DATE value, you must specify the value as a Date literal using the ANSI DATE format (DATE 'YYYY-MM-DD'). For example:

       COMPRESS (DATE '2000-06-15')
  • TIME and TIME WITH TIME ZONE
  • TIMESTAMP and TIMESTAMP WITH TIME ZONE
  • To compress a TIME or TIMESTAMP value, you must specify the value as a TIME or TIMESTAMP literal. For example:

       COMPRESS (TIME '15:30:00')
       COMPRESS (TIMESTAMP '2006-11-23 15:30:23')

    In addition, you can use COMPRESS (NULL) for columns with these data types:

  • ARRAY
  • Period
  • Non-LOB distinct or structured UDT
  • When you specify a constant for a CHARACTER or VARCHAR data type without the UPPERCASE option, a data value must match the case of the constant to be compressed.

    For example, if a column named Doc_Type is defined as follows:

       Doc_Type CHARACTER(6) COMPRESS 'Manual' 

    then each value must be entered as ' Manual ' in order to be compressed. A value in a different case, such as ' MANUAL ' or ' manual,' is not compressed.

    The following rules apply to a MVC specification that specifies a comma-separated list of constant values:

  • The maximum number of distinct constant values in the list is 255 plus NULL.
  • Values can appear in any order.
  • NULL can appear at any point in the multi-valued constant list.
  • The multi-value list cannot contain duplicate values. Note that case variants are not considered duplicate values if UPPERCASE is not specified.
  • NULL cannot appear in the list if the column is declared NOT NULL.
  • When the value compression list of a CHARACTER or VARCHAR column contains characters that are not displayable in the current session character set, a SHOW TABLE query displays a Unicode delimited identifier.

    Maximum benefit is achieved when the COMPRESS phrase is applied to a column under the following conditions:

  • Enough rows contain a compressible value (for example, null, zero, blank, or constant value) in the compressed field to exceed the break-even point. For details, see “Break-even point for value compression” in Database Design.
  • The shortened row length results in the elimination of one or more data blocks.
  • In some cases, such as when column values are mostly unique, algorithmic compression can provide better compression results than multi-value compression. Algorithmic compression allows you to define your own compression and decompression algorithms and apply them to data at the column level.

    You implement the algorithms as external C/C++ scalar UDFs, and then specify them in the column definition of a CREATE TABLE/ALTER TABLE statement. Teradata Database invokes these algorithms to compress and uncompress the column data when the data is moved into the tables or when data is retrieved from the tables.

    ALC allows you to implement the compression scheme that is most suitable for data in a particular column. The cost of compression and uncompression depends on the algorithm chosen.

    You can specify ALC alone, or both MVC and ALC on the same column. If you define both on the same column, ALC is applied only to those non-null values that are not specified in the value compression list of the MVC specification.

    Note: Using ALC together with block-level compression (BLC) may degrade performance, so this practice is not recommended.

    You can use algorithmic compression to compress table columns with the following data types:

  • ARRAY
  • BYTE
  • VARBYTE
  • BLOB
  • CHARACTER
  • VARCHAR
  • CLOB
  • JSON, with some restrictions listed below
  • TIME and TIME WITH TIME ZONE
  • TIMESTAMP and TIMESTAMP WITH TIME ZONE
  • Period types
  • Distinct UDTs, with some restrictions listed below
  • System-defined UDTs, with some restrictions listed below
  • Teradata provides the following embedded services functions for compressing and uncompressing data. These functions are stored in the TD_SYSFNLIB system database.

     

    Function Type

    UDF Name

    Description

    Compress

    TransUnicodeToUTF8

    Takes UNICODE character input and stores it in UTF-8 format.

    This is useful when the characters are in the ASCII script (U+0000 to U+007F) because UTF-8 uses one byte to represent these characters and UNICODE (UTF-16) uses two bytes.

    Decompress

    TransUTF8ToUnicode

    Takes the data previously compressed using the TransUnicodeToUTF8 function and converts it back to UNICODE.

    Compress

    LZCOMP

    Compresses UNICODE character data using the Lempel-Ziv algorithm.

    Decompress

    LZDECOMP

    Uncompresses UNICODE data that was compressed using LZCOMP.

    Compress

    LZCOMP_L

    Compresses LATIN character data using the Lempel-Ziv algorithm.

    Decompress

    LZDECOMP_L

    Uncompresses LATIN data that was compressed using LZCOMP_L.

    Compress

    CAMSET

    Compresses UNICODE character data into partial byte (for example, 4-bit digits or 5-bit alphabetic letters), one byte, or two byte values using a proprietary Teradata algorithm.

    Decompress

    DECAMSET

    Uncompresses the UNICODE character data that was compressed using CAMSET.

    Compress

    CAMSET_L

    Compresses LATIN character data into partial byte (for example, 4-bit digits or 5-bit alphabetic letters), or one byte values using a proprietary Teradata algorithm.

    Decompress

    DECAMSET_L

    Uncompresses the LATIN character data that was compressed using CAMSET_L.

    Compress

    TD_LZ_COMPRESS

    Compresses any supported ALC data type or predefined type data using Lempel-Ziv coding.

    Decompress

    TD_LZ_DECOMPRESS

    Uncompresses any supported ALC data type or predefined type data that was compressed using TZ_LZ_COMPRESS.

    Compress

    TS_COMPRESS

    Compresses TIME and TIMESTAMP data.

    Decompress

    TS_DECOMPRESS

    Uncompresses TIME and TIMESTAMP data that was compressed using TS_COMPRESS.

    Compress

    JSON_COMPRESS

    Compresses JSON data.

    Decompress

    JSON_DECOMPRESS

    Uncompresses the JSON data that was compressed using JSON_COMPRESS.

    For more information about these functions, see SQL Functions, Operators, Expressions, and Predicates.

    If these functions do not provide optimal compression for your data, you may implement your own UDFs for use in compressing and uncompressing table columns. For more information, see “Defining Functions for Algorithmic Compression” in SQL External Routine Programming.

    Notice:

    If you create custom ALC UDFs, make sure to test them thoroughly. If the compression or decompression algorithm fails, compressed data may not be recoverable, or may be corrupted.

  • You cannot use ALC to compress columns that have a data type of structured UDT.
  • The TD_LZ_COMPRESS and TD_LZ_DECOMPRESS system functions compress all large UDTs including UDT-based system types such as Geospatial, XML, and JSON. However, if you write your own compression functions, the following restrictions apply:
  • Custom compression functions cannot be used to compress UDT-based system types (except for ARRAY and Period types).
  • Custom compression functions cannot be used to compress distinct UDTs that are based on UDT-based system types (except for ARRAY and Period types).
  • You cannot write your own compression functions to perform algorithmic compression on JSON type columns. However, Teradata provides the JSON_COMPRESS and JSON_DECOMPRESS functions that you can use to perform ALC on JSON type columns.
  • You cannot use ALC to compress temporal columns:
  • A column defined as SYSTEM_TIME, VALIDTIME, or TRANSACTIONTIME.
  • The DateTime columns that define the beginning and ending bounds of a temporal derived period column (SYSTEM_TIME, VALIDTIME, or TRANSACTIONTIME).
  • You can use ALC to compress Period data types in columns that are nontemporal; however, you cannot use ALC to compress derived period columns.

    For details about temporal tables, see Temporal Table Support and ANSI Temporal Table Support.

  • You cannot specify multi-value or algorithmic compression for a row-level security constraint column.
  • The following example uses MVC to compress a specified date in a DATE column.

       CREATE TABLE Duration
          (DurationID INTEGER,
           StartDate  DATE COMPRESS (DATE '2000-01-01'));

    The following example uses MVC to compress a list of date values in a DATE column.

       CREATE TABLE Duration 
          (DurationID INTEGER,
           StartDate  DATE COMPRESS (DATE '2000-01-01', 
                                     DATE '2000-01-02', 
                                     DATE '2000-01-03'));

    The following example uses MVC to compress a list of numeric values in an INTEGER column.

       CREATE TABLE ID
          (IDNum INTEGER,
           Post  INTEGER COMPRESS (44, 45, 63));

    In this example, assume that the default server character set is UNICODE. The NULLs in the Description column are compressed. The non-null values of the Description column are compressed using the Teradata-supplied function, TransUnicodeToUTF8. The TransUTF8ToUnicode function uncompresses the values compressed by the TransUnicodeToUTF8 function.

       CREATE TABLE Pendants
          (ItemNo INTEGER, 
           Description VARCHAR(1000) 
              COMPRESS USING TD_SYSFNLIB.TransUnicodeToUTF8
              DECOMPRESS USING TD_SYSFNLIB.TransUTF8ToUnicode);

    In this example, assume that the default server character set is UNICODE. The NULLs in the Gem column are compressed. MVC is used to compress the values 'amethyst' and 'amber' in the Gem column. Because UPPERCASE is specified, all values of 'amethyst' and 'amber' are compressed regardless of case. The Teradata-supplied function, TransUnicodeToUTF8, compresses all non-null values in the Gem column that are not 'amethyst' or 'amber'. The TransUTF8ToUnicode function uncompresses the values compressed by TransUnicodeToUTF8.

       CREATE TABLE Pendants
          (ItemNo INTEGER, 
           Gem CHAR(10) UPPERCASE COMPRESS ('amethyst', 'amber')
               COMPRESS USING TD_SYSFNLIB.TransUnicodeToUTF8
               DECOMPRESS USING TD_SYSFNLIB.TransUTF8ToUnicode);

    In this example, the NULLs in the Description column are compressed. The non-null UNICODE values in the Description column are compressed using the Teradata-supplied LZCOMP function. The LZDECOMP function uncompresses the values compressed by LZCOMP.

       CREATE MULTISET TABLE Pendants
          (ItemNo INTEGER,
           Gem CHAR(10) UPPERCASE CHARACTER SET UNICODE,
           Description VARCHAR(1000) CHARACTER SET UNICODE
              COMPRESS USING TD_SYSFNLIB.LZCOMP
              DECOMPRESS USING TD_SYSFNLIB.LZDECOMP);
     

    For information on …

    See …

    the performance and storage capacity savings benefits of value compression

    “Database-Level Considerations” in Database Design.

    using COMPRESS when defining or modifying columns in a table

    “CREATE TABLE” and “ALTER TABLE” in SQL Data Definition Language.

    Teradata compression and decompression functions

    SQL Functions, Operators, Expressions, and Predicates.

    rules and restrictions for compressing column data

    “CREATE TABLE” in SQL Data Definition Language.

    rules for creating compress and decompress UDFs

  • “CREATE TABLE” in SQL Data Definition Language.
  • “Defining Functions for Algorithmic Compression” in SQL External Routine Programming.
  • compression methods supported by Teradata Database and a comparison of the various methods

    “Reducing Space Usage with Data Compression” in Database Administration.