15.00 - Algorithmic Compression - Teradata Database

Teradata Database Design

prodname
Teradata Database
vrm_release
15.00
category
User Guide
featnum
B035-1094-015K

Algorithmic Compression

You can use algorithmic compression to compress table columns with the following data types:

  • ARRAY
  • BYTE
  • VARBYTE
  • BLOB
  • CHARACTER
  • VARCHAR
  • CLOB
  • JSON, with some restrictions listed below
  • TIME and TIME WITH TIME ZONE
  • TIMESTAMP and TIMESTAMP WITH TIME ZONE
  • Period types
  • Distinct UDTs, with some restrictions listed below
  • System-defined UDTs, with some restrictions listed below
  • These restrictions apply:

  • You cannot use ALC to compress columns that have a data type of structured UDT.
  • The TD_LZ_COMPRESS and TD_LZ_DECOMPRESS system functions compress all large UDTs including UDT-based system types such as Geospatial, XML, and JSON. However, if you write your own compression functions, the following restrictions apply:
  • Custom compression functions cannot be used to compress UDT-based system types (except for ARRAY and Period types).
  • Custom compression functions cannot be used to compress distinct UDTs that are based on UDT-based system types (except for ARRAY and Period types).
  • You cannot write your own compression functions to perform algorithmic compression on JSON type columns. However, Teradata provides the JSON_COMPRESS and JSON_DECOMPRESS functions that you can use to perform ALC on JSON type columns.
  • You cannot use ALC to compress temporal columns:
  • A column defined as SYSTEM_TIME, VALIDTIME, or TRANSACTIONTIME.
  • The DateTime columns that define the beginning and ending bounds of a temporal derived period column (SYSTEM_TIME, VALIDTIME, or TRANSACTIONTIME).
  • You can use ALC to compress Period data types in columns that are nontemporal; however, you cannot use ALC to compress derived period columns.

    For details about temporal tables, see Temporal Table Support and ANSI Temporal Table Support.

  • You cannot specify multi-value or algorithmic compression for a row-level security constraint column.
  • For more information about row‑level security, see Security Administration.

    You can apply algorithmic compression to referential integrity columns.

    Depending on the implementation, algorithmic compression can be either physical or logical, though most implementations use physical data compression. Algorithmic compression can be either lossy or lossless, depending on the algorithm used.

    Teradata provides the following system‑defined external UDFs for algorithmic compression and decompression.

     

    UDF Name

    Function

    TransUnicodeToUTF8

    Compresses the specified Unicode character data into UTF‑8 format.

    The result data type is VARBYTE(64000).

    This is useful when the characters are in the ASCII script (U+0000 to U+007F) because UTF‑8 uses one byte to represent these characters and Unicode (UTF-16) uses two bytes.

    TransUnicodeToUTF8 can only compress characters from the ASCII script of Unicode (U+0000 to U+007F).

    TransUTF8ToUnicode

    Uncompresses data that was compressed using TransUnicodeToUTF8.

    The result data type is VARCHAR(32000) CHARACTER SET UNICODE.

    LZCOMP

    Compresses the specified Unicode character data using Lempel-Ziv coding.

    The result data type is VARBYTE(64000).

    LZDECOMP

    Uncompresses Unicode data that was compressed using LZCOMP.

    The result data type is VARCHAR(32000) CHARACTER SET UNICODE.

    LZCOMP_L

    Compresses the specified Latin character data using Lempel-Ziv coding.

    The result data type is VARBYTE(64000).

    LZDECOMP_L

    Uncompresses Latin data that was compressed using LZCOMP_L.

    CAMSET

    Compresses the specified Unicode character data into partial byte 4‑bit values for numeric characters or partial 5‑bit values for alphabetic characters using a proprietary Teradata algorithm and an embedded services function.

    The result data type is VARBYTE(64000).

    You should not use this UDF to compress CHARACTER SET GRAPHIC character data.

    DECAMSET

    Uncompresses the Unicode character data that was compressed using CAMSET.

    The result data type is VARCHAR(32000) CHARACTER SET UNICODE.

    CAMSET_L

    Compresses the specified Latin character data in the range U+0000 to U+00FF into partial byte 4‑bit values for numeric characters or partial 5‑bit values for alphabetic characters using a proprietary Teradata algorithm and an embedded services function.

    The result data type is VARBYTE(64000).

    You should not use this UDF to compress CHARACTER SET GRAPHIC character data.

    DECAMSET_L

    Uncompresses the Latin character data that was compressed using CAMSET_L.

    The result data type is VARCHAR(64000) CHARACTER SET LATIN.

    TD_LZ_COMPRESS

    Compresses the specified ALC-supported data type or predefined type data using Lempel‑Ziv coding.

    Valid column data types include distinct UDTs, distinct BLOB‑, CLOB‑, and XML‑based UDTs, ARRAY, VARRAY, Period, CHARACTER, VARCHAR, BLOB, CLOB, XML, JSON, Geospatial, BYTE, and VARBYTE.

    For a column that specifies both multi-value compression and algorithmic compression, only distinct UDTs based on predefined types that are valid for multi-value compression can be algorithmically compressed using TD_LZ_COMPRESS. This includes all numeric types, Period types, BLOB, CLOB, XML, distinct BLOB‑based UDTs, distinct CLOB‑based UDTs, distinct XML‑based UDTs, Geospatial, DATE, CHARACTER, VARCHAR, BYTE, and VARBYTE types.

    The parameter data type of TD_LZ_COMPRESS must match the data type of TD_LZ_DECOMPRESS exactly.

    The result data type must match the data type of the UDT or predefined type column exactly.

    TD_LZ_DECOMPRESS

    Uncompresses data that was compressed using TD_LZ_COMPRESS.

  • For algorithmic compression, the return data type of TD_LZ_DECOMPRESS must match the data type of the compressed column exactly.
  • For algorithmic compression, the VARBYTE return length for TD_LZ_COMPRESS must match the length of the TD_LZ_DECOMPRESS VARBYTE parameter exactly.
  • TS_COMPRESS

    Compresses TIME and TIMESTAMP data.

    TS_DECOMPRESS

    Uncompresses TIME and TIMESTAMP data that was compressed using TS_COMPRESS.

    JSON_COMPRESS

    Compresses JSON data.

    JSON_DECOMPRESS

    Uncompresses the JSON data that was compressed using JSON_COMPRESS.

    Related Topics

     

    Topic

    Reference

    How to code scalar UDFs to perform algorithmic compression on column data

    SQL External Routine Programming

    How to create the SQL definition for an algorithmic compression UDF

    CREATE FUNCTION (External Form) in SQL Data Definition Language

    How to specify those scalar UDFs in a table definition

    ALTER TABLE and CREATE TABLE in SQL Data Definition Language

    Guidelines for selecting an algorithmic compression functions

    http://developer.teradata.com/extensibility/articles/selecting-an-alc-compression-algorithm

    Evaluating algorithmic compression UDFs

    download the test suite from http://downloads.teradata.com/download/extensibility/algorithmic-compression-test-package