15.00 - Compressing Column Values Using Only Algorithmic Compression - Teradata Database

Teradata Database SQL Data Definition Language Detailed Topics

Product
Teradata Database
Release Number
15.00
Content Type
Programming Reference
Publication ID
B035-1184-015K
Language
English (United States)

Compressing Column Values Using Only Algorithmic Compression

The term compression is used to mean different things for Teradata systems. Algorithmic compression can be done using either a non‑loss method or a lossy method.

  • When describing algorithmic compression of column values, compression refers to whatever methods you choose to implement to compress byte, character, and graphic data. Algorithmic compression is implemented by assigning a compression and a decompression UDF to a table column. Although you can only specify one compression UDF and one decompression UDF per column, those functions can compress and decompress multiple column values (see SQL Data Definition Language Syntax and Examples for the syntax used to assign algorithmic compression UDFs to columns and SQL External Routine Programming for details about coding UDFs to implement algorithmic compression).
  • When you specify algorithmic compression for a column, you might be using a form of lossless compression, or you might be using lossy compression for sound, video, graphic, or picture files.
  • When you specify a UDF name for algorithmic compression, you must also specify its containing database or user.
  • You can specify algorithmic compression only for columns having one of the following data types.
  • BYTE
  • CHARACTER
  • VARBYTE
  • VARCHAR
  • All UDT types that are not based on BLOBs or CLOBs.
  • You can specify algorithmic compression for Period types.

  • Algorithmic compression is not supported for the columns of join indexes
  • Algorithmic compression automatically compresses nulls.
  • Algorithmic compression is carried to spool files whenever possible.
  • You can specify algorithmic compression for an unlimited number of columns in a table.
  • The column DBC.TVFields.CompressValueList stores the names of the compression and decompression algorithms specified for a column as well as the compressed multi-value for any multi-value compression specified for a column.
  • The maximum size of DBC.TVFields.CompressValueList is 8,192 characters. If the composite size of the value and algorithmic compression information exceeds this limit, Teradata Database reports an error.

    The following rules apply to the compression and decompression algorithms you create as UDFs.

  • The UDFs must be created in either the SYSUDTLIB or the TD_SYSFNLIB database.
  • The UDFs must be scalar UDFs.
  • Neither aggregate nor table UDFs are valid for algorithmic compression.

  • The UDFs must be deterministic.
  • The UDFs can only have 1 input parameter, and the data type of the value passed must be a type that is valid for algorithmic compression.
  • The input to an DECOMPRESS USING UDF must have a variable data type.
  • The output from an DECOMPRESS USING UDF must be compatible with the data type of the column being decompressed.

  • Algorithmic compression cannot truncate data values.
  • The length of the column must be less than or equal to the length of the input parameter passed to the COMPRESS USING UDF.

    Similarly, the length of the column must be less than or equal to the length of the output parameter returned by the DECOMPRESS USING UDF.

  • A user who references a column defined with algorithmic compression must have the privilege on the COMPRESS USING and DECOMPRESS USING UDFs specified for that column.
  • Teradata Database supports function overloading for the COMPRESS USING and DECOMPRESS USING UDFs you write.
  • For example, you can overload an algorithm that handles different data types, and the system executes the appropriate function depending on the data type of the input parameter value passed to it.

  • Teradata Database supports algorithmic compression of Period columns.
  • The costs of compressing and decompressing column data algorithmically depends on the algorithms used, and you must evaluate those costs yourself.

    Teradata provides the following system‑defined external UDFs that you can use for algorithmic compression and decompression (see SQL Functions, Operators, Expressions, and Predicates for details). These functions are stored in the TD_SYSFNLIB database.

     

    UDF Name

    Function

    TransUnicodeToUTF8

    Compresses the specified Unicode character data into UTF‑8 format.

    The result data type is VARBYTE(64000).

    This is useful when the characters are in the ASCII script (U+0000 to U+007F) because UTF‑8 uses one byte to represent these characters and Unicode (UTF-16) uses two bytes.

    TransUnicodeToUTF8 can only compress characters from the ASCII script of Unicode (U+0000 to U+007F).

    TransUTF8ToUnicode

    Uncompresses data that was compressed using TransUnicodeToUTF8.

    The result data type is VARCHAR(32000) CHARACTER SET UNICODE.

    LZCOMP

    Compresses the specified Unicode character data using Lempel-Ziv coding.

    The result data type is VARBYTE(64000).

    LZDECOMP

    Uncompresses Unicode data that was compressed using LZCOMP.

    The result data type is VARCHAR(32000) CHARACTER SET UNICODE.

    LZCOMP_L

    Compresses the specified Latin character data using Lempel-Ziv coding.

    The result data type is VARBYTE(64000).

    LZDECOMP_L

    Uncompresses Latin data that was compressed using LZCOMP_L.

    CAMSET

    Compresses the specified Unicode character data into partial byte 4‑bit values for numeric characters or partial 5‑bit values for alphabetic characters using a proprietary Teradata algorithm and an embedded services function.

    The result data type is VARBYTE(64000).

    You should not use this UDF to compress CHARACTER SET GRAPHIC character data.

    DECAMSET

    Uncompresses the Unicode character data that was compressed using CAMSET.

    The result data type is VARCHAR(32000) CHARACTER SET UNICODE.

    CAMSET_L

    Compresses the specified Latin character data in the range U+0000 to U+00FF into partial byte 4‑bit values for numeric characters or partial 5‑bit values for alphabetic characters using a proprietary Teradata algorithm and an embedded services function.

    The result data type is VARBYTE(64000).

    You should not use this UDF to compress CHARACTER SET GRAPHIC character data.

    DECAMSET_L

    Uncompresses the Latin character data that was compressed using CAMSET_L.

    The result data type is VARCHAR(64000) CHARACTER SET LATIN.

    TD_LZ_COMPRESS

    Compresses the specified ALC-supported data type or predefined type data using Lempel‑Ziv coding.

    Valid column data types include distinct UDTs, distinct BLOB‑, CLOB‑, and XML‑based UDTs, ARRAY, VARRAY, Period, CHARACTER, VARCHAR, BLOB, CLOB, XML, JSON, Geospatial, BYTE, and VARBYTE.

    For a column that specifies both multi-value compression and algorithmic compression, only distinct UDTs based on predefined types that are valid for multi-value compression can be algorithmically compressed using TD_LZ_COMPRESS. This includes all numeric types, Period types, BLOB, CLOB, XML, distinct BLOB‑based UDTs, distinct CLOB‑based UDTs, distinct XML‑based UDTs, Geospatial, DATE, CHARACTER, VARCHAR, BYTE, and VARBYTE types.

    The parameter data type of TD_LZ_COMPRESS must match the data type of TD_LZ_DECOMPRESS exactly.

    The result data type must match the data type of the UDT or predefined type column exactly.

    TD_LZ_DECOMPRESS

    Uncompresses data that was compressed using TD_LZ_COMPRESS.

  • For algorithmic compression, the return data type of TD_LZ_DECOMPRESS must match the data type of the compressed column exactly.
  • For algorithmic compression, the VARBYTE return length for TD_LZ_COMPRESS must match the length of the TD_LZ_DECOMPRESS VARBYTE parameter exactly.
  • TS_COMPRESS

    Compresses TIME and TIMESTAMP data.

    TS_DECOMPRESS

    Uncompresses TIME and TIMESTAMP data that was compressed using TS_COMPRESS.

    JSON_COMPRESS

    Compresses JSON data.

    JSON_DECOMPRESS

    Uncompresses the JSON data that was compressed using JSON_COMPRESS.

    For information about the compression algorithm used by LZCOMP, LZCOMP_L, and TD_LZ_COMPRESS and the decompression algorithm used by LZDECOMP, LZDECOMP_L, and TD_LZ_DECOMPRESS, see http://zlib.net.