15.00 - TransUnicodeToUTF8 - Teradata Database

Teradata Database SQL Functions, Operators, Expressions, and Predicates

Product
Teradata Database
Release Number
15.00
Content Type
Programming Reference
Publication ID
B035-1145-015K
Language
English (United States)
Last Update
2018-09-24

TransUnicodeToUTF8

Purpose  

Compress the specified Unicode character data into UTF8 format.

Syntax  

where:

 

Syntax element…

Specifies…

TD_SYSFNLIB

the name of the database where the function is located.

Unicode_string

a Unicode character string or string expression.

Note: This function takes no arguments when used as part of the COMPRESS USING or DECOMPRESS USING phrases. For more information about the COMPRESS/DECOMPRESS phrase, see SQL Data Types and Literals.

ANSI Compliance

This is a Teradata extension to the ANSI SQL:2011 standard.

Argument Type and Rules

Expressions passed to this function must have a data type of VARCHAR(n) CHARACTER SET UNICODE, where the maximum supported size (n) is 32000. You can also pass arguments with data types that can be converted to VARCHAR(32000) CHARACTER SET UNICODE using the implicit data type conversion rules that apply to UDFs. For example, TransUnicodeToUTF8(CHAR) is allowed because it can be implicitly converted to TransUnicodeToUTF8(VARCHAR).

Note: The UDF implicit type conversion rules are more restrictive than the implicit type conversion rules normally used by Teradata Database. If an argument cannot be converted to VARCHAR following the UDF implicit conversion rules, it must be explicitly cast.

For details, see “Compatible Types” in SQL External Routine Programming.

The input to this function must be Unicode character data.

If you specify NULL as input, the function returns NULL.

Result Type

The result data type is VARBYTE(64000).

Usage Notes

TransUnicodeToUTF8 compresses the specified Unicode character data into UTF8 format, and returns the compressed result. This is useful when the input data is predominantly Latin characters because UTF8 uses one byte to represent Latin characters and Unicode uses 2 bytes.

TransUnicodeToUTF8 provides good compression for Unicode strings of any length and is best used:

  • On a Unicode column that contains mostly US-ASCII characters
  • When the data frequently switches between:
  • Uppercase and lowercase letters
  • Digits and letters
  • Latin and non-Latin characters
  • When the data is very dynamic (under frequent update)
  • For a detailed comparison between the Teradata-supplied compression functions and guidelines for choosing a compression function, see Database Administration.

    Although you can call the function directly, TransUnicodeToUTF8 is normally used with algorithmic compression (ALC) to compress table columns. If TransUnicodeToUTF8 is used with ALC, nulls are also compressed if those columns are nullable.

    For more information about ALC, see “COMPRESS and DECOMPRESS Phrases” in SQL Data Types and Literals.

    Restrictions

    TransUnicodeToUTF8 can only compress character values in the 7-bit ASCII character range, from U+0000 to U+007F, also known as US-ASCII.

    Uncompressing Data Compressed with TransUnicodeToUTF8

    To uncompress Unicode data that was compressed using TransUnicodeToUTF8, use the TransUTF8ToUnicode function. See “TransUTF8ToUnicode” on page 556.

    Example  

    In this example, assume that the default server character set is UNICODE. The values of the Description column are compressed using the TransUnicodeToUTF8 function with ALC, which stores the Unicode input in UTF8 format. The TransUTF8ToUnicode function uncompresses the previously compressed values.

       CREATE TABLE Pendants
          (ItemNo INTEGER, 
           Gem CHAR(10) UPPERCASE,
           Description VARCHAR(1000)
              COMPRESS USING TD_SYSFNLIB.TransUnicodeToUTF8
              DECOMPRESS USING TD_SYSFNLIB.TransUTF8ToUnicode);