15.10 - Mapping File for a Multibyte Character Set - Teradata Database

Teradata Database International Character Set Support

prodname
Teradata Database
vrm_release
15.00
15.10
category
Configuration
User Guide
featnum
B035-1125-015K

The mapping file for a multibyte client character set contains single-byte and multibyte translation tables that define the mapping between an internal transition form and Unicode for the character set.

The single-byte translation tables are analogous to the E2I and I2E fields in the translation tables in DBC.CharTranslationV, which define the conversion between the client character set and the internal transition form.

The mapping file for a multibyte client character set must contain the following statement:

#STATEMACHINE smachine

where the client character set name determines the value of smachine.

 

IF the character set name is…

THEN the value of smachine is…

AND the mapping file provides translation tables for a character set that uses this encoding form…

  • SDSCHEBCDIC935_6IJ
  • SDTCHEBCDIC937_7IB
  • SDKATAKANAEBCDIC_4IF
  • SDKANJIEBCDIC5026_4IG
  • SDKANJIEBCDIC5035_4IH
  • SDHANGULEBCDIC933_5II
  • SOSI0E0F

    EBCDIC Shift-Out/Shift-In.

    Shift-out character 0x0E and shift-in character 0x0F bracket zero or more double-byte characters.

  • SDTCHBIG5_3R0
  • SDHANGULKSC5601_4R4
  • S81

    The value of first byte in the sequence, which distinguishes single-byte characters from double-byte characters.

    If the value of the first byte is:

  • less than 0x81, the length of the character is one byte
  • equal to or greater than 0x81, the length of the character is 2 bytes
  • Note: S80 is no longer supported. S80 behaves like S81. You should change your map files to reflect this change (distributed map files are changed by Teradata).

    SDSCHGB2312_2T0

    EUC1211

    Extended UNIX Code (EUC), composed of two code sets: cs0 for single-byte characters and cs1 for double-byte characters

    SDKANJISJIS_1S3

    S80A1E0

    The value of first byte in the sequence, which distinguishes single-byte characters from double-byte characters.

    If the value of the first byte is:

  • less than 0x81, the length of the character is one byte
  • equal to or greater than 0x81, the length of the character is 2 bytes
  • greater than or equal to 0xA1 and less than 0xE0, the length of the character is one byte
  • greater than or equal to 0xE0, the length of the character is 2 bytes
  • SDKANJIEUC_1U3

    EUC1223

    The encoding form is Extended UNIX Code (EUC), composed of four code sets: cs0 for one-byte characters, cs1 and cs2 for two-byte characters, and cs3 for three-byte characters.

    Translation tables map characters between an extended site-defined client character set and Unicode.

    A mapping file must minimally provide translation tables that map characters from the client character set to Unicode.

    A mapping file may optionally provide translation tables that map characters from Unicode to the client character set. If the optional translation tables are not defined, the system derives them by inverting the corresponding mandatory tables.

    If one of the following conditions exists, however, the optional translation tables become mandatory:

  • More than one character from the client character set maps to a single Unicode character.
  • More than one Unicode character maps to a single character of the client character set.
  • Each translation table starts with the following statement:

    #BEGINMAP table_name

    and ends with the following statement:

    #ENDMAP table_name

    where:

    The value of table_name is determined by the client character set name.

     

    IF the character set name is …

    THEN the mapping file must define these tables …

    AND optionally define these tables …

    SDHANGULEBCDIC933_5II

  • 5I_SBC_2_UNICODE
  • 5I_MBC_2_UNICODE
  • UNICODE_2_5I_SBC
  • UNICODE_2_5I_MBC
  • SDHANGULKSC5601_4R4

  • 4R_SBC_2_UNICODE
  • 4R_MBC_2_UNICODE
  • UNICODE_2_4R_SBC
  • UNICODE_2_4R_MBC
  • SDKANJIEBCDIC5026_4IG

  • 4I_SBC_2_UNICODE
  • 4I_MBC_2_UNICODE
  • UNICODE_2_4I_SBC
  • UNICODE_2_4I_MBC
  • SDKANJIEBCDIC5035_4IH

  • 4I_SBC_2_UNICODE
  • 4I_MBC_2_UNICODE
  • UNICODE_2_4I_SBC
  • UNICODE_2_4I_MBC
  • SDKANJIEUC_1U3

  • 1U_CS0_2_UNICODE
  • 1U_CS1_2_UNICODE
  • 1U_CS2_2_UNICODE
  • 1U_CS3_2_UNICODE
  • UNICODE_2_1U_CS0
  • UNICODE_2_1U_CS1
  • UNICODE_2_1U_CS2
  • UNICODE_2_1U_CS3
  • SDKANJISJIS_1S3

  • 1S_SBC_2_UNICODE
  • 1S_MBC_2_UNICODE
  • UNICODE_2_1S_SBC
  • UNICODE_2_1S_MBC
  • SDKATAKANAEBCDIC_4IF

  • 4I_SBC_2_UNICODE
  • 4I_MBC_2_UNICODE
  • UNICODE_2_4I_SBC
  • UNICODE_2_4I_MBC
  • SDSCHEBCDIC935_6IJ

  • 6I_SBC_2_UNICODE
  • 6I_MBC_2_UNICODE
  • UNICODE_2_6I_SBC
  • UNICODE_2_6I_MBC
  • SDSCHGB2312_2T0

  • 2T_CS0_2_UNICODE
  • 2T_CS1_2_UNICODE
  • UNICODE_2_2T_CS0
  • UNICODE_2_2T_CS1
  • SDTCHBIG5_3R0

  • 3R_SBC_2_UNICODE
  • 3R_MBC_2_UNICODE
  • UNICODE_2_3R_SBC
  • UNICODE_2_3R_MBC
  • SDTCHEBCDIC937_7IB

  • 7I_SBC_2_UNICODE
  • 7I_MBC_2_UNICODE
  • UNICODE_2_7I_SBC
  • UNICODE_2_7I_MBC
  • A mapping file named map_4R has the following statement:

    #STATEMACHINE S81

    and defines the following translation tables:

    4R_SBC_2_UNICODE
    4R_MBC_2_UNICODE

    and optionally defines the following translation tables:

    UNICODE_2_4R_SBC
    UNICODE_2_4R_MBC

    The format of the mapping file is multiple lines, with each line terminated by a linefeed character. This may be problematic for editors that expect carriage-return or carriage-return followed by linefeed to terminate a line.

    Note: Linefeed termination is the UNIX convention. Carriage-return linefeed is the Windows convention.

    Use the # to start a comment that continues to the end of a line. Blank lines are ignored.