15.00 - TRANSLATE - Teradata Database

Teradata Database SQL Functions, Operators, Expressions, and Predicates

Product
Teradata Database
Release Number
15.00
Content Type
Programming Reference
Publication ID
B035-1145-015K
Language
English (United States)
Last Update
2018-09-24

TRANSLATE

Purpose  

Converts a character string or character string expression from one server character set to another server character set.

Syntax  

where:

 

Syntax element …

Specifies …

character_string_expression

a character string to translate to another server character set.

If character_string_expression is not a character type, an error is returned.

source_repertoire_name

the source character set of the string to translate. For supported values, see “Supported Translations Between Character Sets” on page 1259.

A value of LOCALE can be specified for source_repertoire_name to translate a character string from LATIN or KANJI1 to UNICODE using a source repertoire determined by the language support mode of the system and the client character set of the session. For details, see “Supported Translations Between Character Sets” on page 1259.

_encoding

an optional literal for translating from KANJI1 to UNICODE that indicates a specific encoding of KANJI1.

The _encoding option is not allowed if LOCALE is specified for source_repertoire_name or target_repertoire_name.

_encoding

(continued)

If the translation is from these character sets:

  • KatakanaEBCDIC
  • KanjiEBCDIC5026_0I
  • KanjiEBCDIC5038_0I
  • use the following value for _encoding: _KanjiEBCDIC

    KanjiEUC_0U, use the following value for _encoding: _KanjiEUC

    KanjiShiftJIS_0S, use the following value for _encoding: _KANJISJIS

    ASCII or EBCDIC, use the following value for _encoding: _SBC

    target_repertoire_name

    the target character set of the string to translate. For supported values, see “Supported Translations Between Character Sets” on page 1259.

    A value of LOCALE can be specified for target_repertoire_name to translate a character string from UNICODE to LATIN or KANJI1 using a target repertoire determined by the language support mode of the system and the client character set of the session. For details, see “Supported Translations Between Character Sets” on page 1259.

    _suffix

    that the translation maps some source characters to semantically different characters.

    For example, a translation that specifies the _Halfwidth suffix maps any character with a halfwidth variant to that variant, and all fullwidth variants to their non-fullwidth counterparts.

    The _suffix option also indicates the form of character data translated from UNICODE to the KANJI1 server character set, for example, _KanjiEUC.

    Valid values are:

  • _KanjiEBCDIC
  • _KanjiEUC
  • _KANJISJIS
  • _SBC
  • _PadSpace
  • _PadGraphic
  • _Fullwidth
  • _Halfwidth
  • _FoldSpace
  • _VarGraphic
  • The _suffix option is not allowed if LOCALE is specified for source_repertoire_name or target_repertoire_name.

    WITH ERROR

    that the translation replaces offending characters in the string with a designated error character, instead of reporting an error.

    For details, see “Error Characters Assigned by the WITH ERROR Option” on page 1262).

    Argument Types

    Use TRANSLATE on character strings or character string expressions.

    By default, Teradata Database performs implicit type conversion on UDT arguments that have implicit casts to predefined character types.

    To define an implicit cast for a UDT, use the CREATE CAST statement and specify the AS ASSIGNMENT clause. For more information on CREATE CAST, see SQL Data Definition Language.

    Implicit type conversion of UDTs for system operators and functions, including TRANSLATE, is a Teradata extension to the ANSI SQL standard. To disable this extension, set the DisableUDTImplCastForSysFuncOp field of the DBS Control Record to TRUE. For details, see Utilities: Volume 1 (A-K).

    For more information on implicit type conversion of UDTs, see Chapter 13: “Data Type Conversions.”

    Result Type and Attributes

    The default attributes for TRANSLATE (string USING source_TO_target) are as follows.

  • If the argument is CHAR or VARCHAR, the result is VARCHAR(n) or CHARACTER SET target
  • If the argument is CLOB, the result is CLOB(n) or CHARACTER SET target
  • where source_TO_target determines the character set value of target, according to the supported translations in “Supported Translations Between Character Sets” on page 1259.

     

    Supported Translations for CLOB Strings

    The following translations are supported for CLOB strings:

  • LATIN_TO_UNICODE
  • UNICODE_TO_LATIN
  • Supported Translations Between Character Sets

    Caution:

    In accordance with Teradata internationalization plans, KANJI1 support is deprecated and is to be discontinued in the near future. KANJI1 is not allowed as a default character set; the system changes the KANJI1 default character set to the UNICODE character set. Creation of new KANJI1 objects is highly restricted. Although many KANJI1 queries and applications may continue to operate, sites using KANJI1 should convert to another character set as soon as possible. “”

    The following table lists the supported values that you can use for source_repertoire_name_TO_target_repertoire_name to translate between server character sets.

     

    Value of source_TO_target

    Source Character Set

    Target Character Set

    GRAPHIC_TO_KANJISJIS

    GRAPHIC

    KANJISJIS

    GRAPHIC_TO_LATIN

    GRAPHIC

    LATIN

    GRAPHIC_TO_UNICODE

    GRAPHIC

    UNICODE

    GRAPHIC_TO_UNICODE_PadSpace

    GRAPHIC

    UNICODE

    KANJI1_KanjiEBCDIC_TO_UNICODE

    KANJI1

    UNICODE

    KANJI1_KanjiEUC_TO_UNICODE

    KANJI1

    UNICODE

    KANJI1_KANJISJIS_TO_UNICODE

    KANJI1

    UNICODE

    KANJI1_SBC_TO_UNICODE

    KANJI1

    UNICODE

    KANJISJIS_TO_GRAPHIC

    KANJISJIS

    GRAPHIC

    KANJISJIS_TO_LATIN

    KANJISJIS

    LATIN

    KANJISJIS_TO_UNICODE

    KANJISJIS

    UNICODE

    LATIN_TO_GRAPHIC

    LATIN

    GRAPHIC

    LATIN_TO_KANJISJIS

    LATIN

    KANJISJIS

    LATIN_TO_UNICODE

    LATIN

    UNICODE

    LOCALE_TO_UNICODE

     

    KANJI1

    UNICODE

     

    LATIN

    UNICODE_TO_GRAPHIC

    UNICODE

    GRAPHIC

    UNICODE_TO_GRAPHIC_PadGraphic

    UNICODE

    GRAPHIC

    UNICODE_TO_GRAPHIC_VarGraphic

    UNICODE

    GRAPHIC

    UNICODE_TO_KANJI1_KanjiEBCDIC

    UNICODE

    KANJI1

    UNICODE_TO_KANJI1_KanjiEUC

    UNICODE

    KANJI1

    UNICODE_TO_KANJI1_KANJISJIS

    UNICODE

    KANJI1

    UNICODE_TO_KANJI1_SBC

    UNICODE

    KANJI1

    UNICODE_TO_KANJISJIS

    UNICODE

    KANJISJIS

    UNICODE_TO_LATIN

    UNICODE

    LATIN

    UNICODE_TO_LOCALE

     

    UNICODE

     

    KANJI1

    LATIN

    UNICODE_TO_UNICODE_FoldSpace

    UNICODE

    UNICODE

    UNICODE_TO_UNICODE_Fullwidth

    UNICODE

    UNICODE

    UNICODE_TO_UNICODE_Halfwidth

    UNICODE

    UNICODE

    UNICODE_TO_UNICODE_NFC

    UNICODE

    UNICODE

    UNICODE_TO_UNICODE_NFD

    UNICODE

    UNICODE

    UNICODE_TO_UNICODE_NFKC

    UNICODE

    UNICODE

    UNICODE_TO_UNICODE_NFKD

    UNICODE

    UNICODE

    If the value specified for source_repertoire_name_TO_target_repertoire_name is UNICODE_TO_LOCALE or LOCALE_TO_UNICODE, the repertoire that the translation uses for LOCALE is determined by the language support mode for the system and the client character set for the session.

     

    IF the language support mode is …

    AND the session character set is …

    THEN the repertoire that the translation uses for LOCALE is …

    standard

    any

    LATIN

    Japanese

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

  • ASCII
  • LATIN1252_0A
  • LATIN1_0A
  • LATIN9_0A
  • EBCDIC
  • EBCDIC037_0E
  • EBCDIC273_0E
  • EBCDIC277_0E
  • KANJI1_SBC

     

     

  • any other client character set with a name that has a suffix of _0A or _0E
  • a single-byte, extended site-defined client character set
  • KANJIEBCDIC5026_0I
  • KANJIEBCDIC5035_0I
  • KATAKANAEBCDIC
  • any other client character set with a name that has a suffix of _0I
  • KANJI1_KANJIEBCDIC

  • UTF8
  • UTF16
  • KanjiShiftJIS_0S
  • any other client character set with a name that has a suffix of _0S
  • a multibyte extended site-defined client character set
  • KANJI1_KANJISJIS

     

  • KanjiEUC_0U
  • any other client character set with a name that has a suffix of _0U
  • KANJI1_KanjiEUC

    Source Characters That Generate Errors

    The following table lists the characters that generate errors for specific source_repertoire_name_TO_target_repertoire_name translations. For supported translations that do not appear in the table, only the error character generates errors.

     

    Value of source_TO_target

    Source Characters That Generate Errors

  • LATIN_TO_GRAPHIC
  • KANJISJIS_TO_GRAPHIC
  • UNICODE_TO_GRAPHIC
  • non-GRAPHIC

  • LATIN_TO_KANJISJIS
  • KANJI1_KANJISJIS_TO_UNICODE
  • GRAPHIC_TO_KANJISJIS
  • UNICODE_TO_KANJI1_KANJISJIS
  • UNICODE_TO_KANJISJIS
  • LOCALE_TO_UNICODE or UNICODE_TO_LOCALE
    where the repertoire that the translation uses for LOCALE is KANJI1_KANJISJIS
  • non-KANJISJIS

  • KANJI1_KanjiEBCDIC_TO_UNICODE
  • UNICODE_TO_KANJI1_KanjiEBCDIC
  • LOCALE_TO_UNICODE or UNICODE_TO_LOCALE
    where the repertoire that the translation uses for LOCALE is KANJI1_KanjiEBCDIC
  • non-KanjiEBCDIC

    KANJI1 is very permissive, so there may be characters outside the defined region of the encoding as well as illegal form-of-use errors.

  • KANJI1_KanjiEUC_TO_UNICODE
  • UNICODE_TO_KANJI1_KanjiEUC
  • LOCALE_TO_UNICODE or UNICODE_TO_LOCALE
    where the repertoire that the translation uses for LOCALE is KANJI1_KanjiEUC
  • non-KanjiEUC

  • KANJISJIS_TO_LATIN
  • GRAPHIC_TO_LATIN
  • UNICODE_TO_LATIN
  • UNICODE_TO_KANJI1_SBC
  • UNICODE_TO_LOCALE
    where the repertoire that the translation uses for LOCALE is LATIN or KANJI1_SBC
  • non-LATIN

    Error Characters Assigned by the WITH ERROR Option

    The error characters substituted for offending characters that cannot be translated to a designated target character set are defined in the following table.

     

    Target Character Set

    Error Character

    LATIN

    0x1A

    KANJI1

    0x1A

    KANJISJIS

    0x1A

    UNICODE

    U+FFFD

    GRAPHIC

    U+FFFD

    Suffixes

    The _suffix variable is used for translations that map source characters to semantically different characters. They indicate the nature of the semantic transformation.

    The translations perform minor, yet essential, semantic changes to the data, such as halfwidth/fullwidth conversions, and Space folding modification.

    The _suffix variable also indicates the form of character data translated from UNICODE to the KANJI1 server character set in one of the four possible encodings, for example Unicode_TO_Kanji1_KanjiEBCDIC. For a list of the encodings, see the definition of _encoding in “Syntax” on page 1256.

    This form of translation is also useful for migrating object names. For information, see “Migration” on page 1265.

    Translations Between Fullwidth and Halfwidth Character Data

    UNICODE has an area known as the compatibility zone. Among other things, this zone includes halfwidth and fullwidth variants of characters that exist elsewhere in the standard.

    Translations between fullwidth and halfwidth are provided by the following source_repertoire_name_TO_target_repertoire_name values.

     

    source_TO_target

    Meaning

    UNICODE_TO_UNICODE_Halfwidth

    Maps the fullwidth characters of Unicode to the halfwidth characters of Unicode. Other characters remain unchanged by the translation.

    See Unicode Fullwidth to Halfwidth (publication B035-1201).

    UNICODE_TO_UNICODE_Fullwidth

    Maps the halfwidth characters of Unicode to the fullwidth characters of Unicode. At the same time, it maps any character defined by the standard as a halfwidth variant to its non-halfwidth counterpart outside the compatibility zone.

    Other characters remain unchanged by the translation.

    See Unicode Halfwidth to Fullwidth (publication B035-1202).

    UNICODE_TO_GRAPHIC_VarGraphic

    This translation is an ANSI equivalent to the VARGRAPHIC function.

    See Unicode to Vargraphic (publication B035-1057).

    Note: The mapping and translation files are readable, but are intended to be used by software. In most cases, items not in a mapping file are mapped to themselves.

    Also note that these translations are useful for maintaining more information as a step in translating GRAPHIC to LATIN and vice versa.

    For details on the mappings, see International Character Set Support.

    Space Folding

    Space folding is performed via UNICODE_TO_UNICODE_FoldSpace. All characters defined as space are converted to U+0020.

    All other characters are left unchanged.

    For details on which characters are converted to U+0020, see International Character Set Support.

    UNICODE Normalization Form Translations

    Teradata supports translation using the 4 UNICODE normalization forms: NFC, NFD, NFKC, and NFKD, which correspond to the ANSI NORMALIZE function. You can perform these translations using:

  • UNICODE_TO_UNICODE_NFC
  • UNICODE_TO_UNICODE_NFD
  • UNICODE_TO_UNICODE_NFKC
  • UNICODE_TO_UNICODE_NFKD
  • Because normalization functions can cause errors due to not preserving BMP characters, you should use the TRANSLATE_CHK function to verify a clean translation. See

    Pad Character Translation

    The following translations do not translate the pad character.

     

    source_TO_target

    Pad Character Translation

    GRAPHIC_TO_UNICODE

    A GRAPHIC string that includes an Ideographic Space is translated to a UNICODE string with an Ideographic Space.

    UNICODE_TO_GRAPHIC

    A UNICODE string with a Space character generates an error when translated to GRAPHIC.

    If you require pad character translation, use one of the following translations.

     

    source_TO_target

    Pad Character Translation

    GRAPHIC_TO_UNICODE_PadSpace

    Converts all occurrences of Ideographic Space (U+3000) to Space (U+0020).

    UNICODE_TO_GRAPHIC_PadGraphic

    Converts all occurrences of Space to Ideographic Space.

    Other characters are not affected. Note that the position of a character does not affect the translation, so not only trailing pad characters are modified.

    Migration

    During the migration process, any GRAPHIC data in the old form must be translated to the new canonical form. Note that this involves converting the pad characters from Null (U+0000) to Ideographic Space (U+3000).

    Implicit Character Data Type Conversion

    TRANSLATE performs implicit conversion if the string server character set does not match the type implied by source_repertoire_name.

    An implicit conversion generates an error if a character from character_string_expression has no corresponding character in the source_repertoire_name type. This holds regardless of whether you specify the WITH ERROR option.

    For example, the following function first translates the string from UNICODE to LATIN, because Teradata Database treats literals as UNICODE, and then translates the string from LATIN to KANJISJIS. However, the translation generates an error because the last character is not in the LATIN repertoire.

       ...
       TRANSLATE('abc' USING LATIN_TO_KanjiSJIS WITH ERROR)
       ...

    To circumvent the problem if error character substitution is acceptable, specify two levels of translation, as used in the following example.

       ...
       TRANSLATE((TRANSLATE(_UNICODE 'abc' USING UNICODE_TO_LATIN WITH ERROR)) USING LATIN_TO_KanjiSJIS WITH ERROR)
       ...

    Examples  

     

    Function

    Result

    Type of the Result

    TRANSLATE('abc' USING UNICODE_TO_LATIN)

    'abc'

    VARCHAR(3)
    CHARACTER SET LATIN

    TRANSLATE('abc' USING UNICODE_TO_UNICODE_Fullwidth)

    'abc'

    VARCHAR(3)
    CHARACTER SET UNICODE

    TRANSLATE('abc' USING UNICODE_TO_LATIN WITH ERROR)

    where ε represents the designated error character for LATIN (0x1A).

    'abcε'

    VARCHAR(4)
    CHARACTER SET LATIN

    Related Topics

    For details on the mappings that Teradata Database uses for the TRANSLATE function, see International Character Set Support.