Compressing Column Values Using Only Algorithmic Compression - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905
The term compression is used to mean different things for Teradata systems. Algorithmic compression can be done using either a non-loss method or a lossy method.
  • When describing algorithmic compression of column values, compression refers to whatever methods you choose to implement to compress byte, character, and graphic data. Algorithmic compression is implemented by assigning a compression and a decompression UDF to a table column. Although you can only specify one compression UDF and one decompression UDF per column, those functions can compress and decompress multiple column values. For the syntax used to assign algorithmic compression UDFs to columns, see COMPRESS USING.
  • When you specify algorithmic compression for a column, you may be using a form of lossless compression, or you may be using lossy compression for sound, video, graphic, or picture files.
  • When you specify a UDF name for algorithmic compression, you must also specify its containing database or user.
  • You can specify algorithmic compression only for columns having one of the following data types.
    • BYTE
    • CHARACTER
    • VARBYTE
    • VARCHAR

    You can specify algorithmic compression for Period types.

  • Algorithmic compression is not supported for the columns of join indexes
  • Algorithmic compression automatically compresses nulls.
  • Algorithmic compression is carried to spools whenever possible.
  • You can specify algorithmic compression for an unlimited number of columns in a table.
  • The column DBC.TVFields.CompressValueList stores the names of the compression and decompression algorithms specified for a column and the compressed multivalue for any multivalue compression specified for a column.

    The maximum size of DBC.TVFields.CompressValueList is 8,192 characters. If the composite size of the value and algorithmic compression information exceeds this limit, the system reports an error.

The following rules apply to the compression and decompression algorithms you create as UDFs.
  • The UDFs must be created in either the SYSUDTLIB or the TD_SYSFNLIB database.
  • The UDFs must be scalar UDFs.

    Neither aggregate nor table UDFs are valid for algorithmic compression.

  • The UDFs must be deterministic.
  • The UDFs can only have 1 input parameter, and the data type of the value passed must be a type that is valid for algorithmic compression.
  • The input to an DECOMPRESS USING UDF must have a variable data type.

    The output from an DECOMPRESS USING UDF must be compatible with the data type of the column being decompressed.

  • Algorithmic compression cannot truncate data values.

    The length of the column must be less than or equal to the length of the input parameter passed to the COMPRESS USING UDF.

    Similarly, the length of the column must be less than or equal to the length of the output parameter returned by the DECOMPRESS USING UDF.

  • A user who references a column defined with algorithmic compression must have the privilege on the COMPRESS USING and DECOMPRESS USING UDFs specified for that column.
  • Vantage supports function overloading for the COMPRESS USING and DECOMPRESS USING UDFs you write.

    For example, you can overload an algorithm that handles different data types, and the system runs the appropriate function depending on the data type of the input parameter value passed to it.

  • Vantage supports algorithmic compression of Period columns.

The costs of compressing and decompressing column data algorithmically depends on the algorithms used, and you must evaluate those costs yourself.

Teradata provides the following system-defined external UDFs that you can use for algorithmic compression and decompression. See Compression/Decompression Functions. These functions are stored in the TD_SYSFNLIB database.