Mapping File for a Multibyte Character Set

The mapping file for a multibyte client character set contains single-byte and multibyte translation tables that define the mapping between an internal transition form and Unicode for the character set.

The single-byte translation tables are analogous to the E2I and I2E fields in the translation tables in DBC.CharTranslationV, which define the conversion between the client character set and the internal transition form.

The mapping file for a multibyte client character set must contain the following statement:

#STATEMACHINE smachine

where the client character set name determines the value of smachine.

IF the character set name is…	THEN the value of smachine is…	AND the mapping file provides translation tables for a character set that uses this encoding form…
SDSCHEBCDIC935_6IJ SDTCHEBCDIC937_7IB SDKATAKANAEBCDIC_4IF SDKANJIEBCDIC5026_4IG SDKANJIEBCDIC5035_4IH SDHANGULEBCDIC933_5II	SOSI0E0F	EBCDIC Shift-Out/Shift-In. Shift-out character 0x0E and shift-in character 0x0F bracket zero or more double-byte characters.
SDTCHBIG5_3R0 SDHANGULKSC5601_4R4	S81	The value of first byte in the sequence, which distinguishes single-byte characters from double-byte characters. If the value of the first byte is: less than 0x81, the length of the character is one byte equal to or greater than 0x81, the length of the character is 2 bytes Note: S80 is no longer supported. S80 behaves like S81. You should change your map files to reflect this change (distributed map files are changed by Teradata).
SDSCHGB2312_2T0	EUC1211	Extended UNIX Code (EUC), composed of two code sets: cs0 for single-byte characters and cs1 for double-byte characters
SDKANJISJIS_1S3	S80A1E0	The value of first byte in the sequence, which distinguishes single-byte characters from double-byte characters. If the value of the first byte is: less than 0x81, the length of the character is one byte equal to or greater than 0x81, the length of the character is 2 bytes greater than or equal to 0xA1 and less than 0xE0, the length of the character is one byte greater than or equal to 0xE0, the length of the character is 2 bytes
SDKANJIEUC_1U3	EUC1223	The encoding form is Extended UNIX Code (EUC), composed of four code sets: cs0 for one-byte characters, cs1 and cs2 for two-byte characters, and cs3 for three-byte characters.

Translation tables map characters between an extended site-defined client character set and Unicode.

A mapping file must minimally provide translation tables that map characters from the client character set to Unicode.

A mapping file may optionally provide translation tables that map characters from Unicode to the client character set. If the optional translation tables are not defined, the system derives them by inverting the corresponding mandatory tables.

If one of the following conditions exists, however, the optional translation tables become mandatory:

More than one character from the client character set maps to a single Unicode character.

More than one Unicode character maps to a single character of the client character set.

Each translation table starts with the following statement:

#BEGINMAP table_name

and ends with the following statement:

#ENDMAP table_name

where:

The value of table_name is determined by the client character set name.

IF the character set name is …	THEN the mapping file must define these tables …	AND optionally define these tables …
SDHANGULEBCDIC933_5II	5I_SBC_2_UNICODE 5I_MBC_2_UNICODE	UNICODE_2_5I_SBC UNICODE_2_5I_MBC
SDHANGULKSC5601_4R4	4R_SBC_2_UNICODE 4R_MBC_2_UNICODE	UNICODE_2_4R_SBC UNICODE_2_4R_MBC
SDKANJIEBCDIC5026_4IG	4I_SBC_2_UNICODE 4I_MBC_2_UNICODE	UNICODE_2_4I_SBC UNICODE_2_4I_MBC
SDKANJIEBCDIC5035_4IH	4I_SBC_2_UNICODE 4I_MBC_2_UNICODE	UNICODE_2_4I_SBC UNICODE_2_4I_MBC
SDKANJIEUC_1U3	1U_CS0_2_UNICODE 1U_CS1_2_UNICODE 1U_CS2_2_UNICODE 1U_CS3_2_UNICODE	UNICODE_2_1U_CS0 UNICODE_2_1U_CS1 UNICODE_2_1U_CS2 UNICODE_2_1U_CS3
SDKANJISJIS_1S3	1S_SBC_2_UNICODE 1S_MBC_2_UNICODE	UNICODE_2_1S_SBC UNICODE_2_1S_MBC
SDKATAKANAEBCDIC_4IF	4I_SBC_2_UNICODE 4I_MBC_2_UNICODE	UNICODE_2_4I_SBC UNICODE_2_4I_MBC
SDSCHEBCDIC935_6IJ	6I_SBC_2_UNICODE 6I_MBC_2_UNICODE	UNICODE_2_6I_SBC UNICODE_2_6I_MBC
SDSCHGB2312_2T0	2T_CS0_2_UNICODE 2T_CS1_2_UNICODE	UNICODE_2_2T_CS0 UNICODE_2_2T_CS1
SDTCHBIG5_3R0	3R_SBC_2_UNICODE 3R_MBC_2_UNICODE	UNICODE_2_3R_SBC UNICODE_2_3R_MBC
SDTCHEBCDIC937_7IB	7I_SBC_2_UNICODE 7I_MBC_2_UNICODE	UNICODE_2_7I_SBC UNICODE_2_7I_MBC

A mapping file named map_4R has the following statement:

#STATEMACHINE S81

and defines the following translation tables:

4R_SBC_2_UNICODE

4R_MBC_2_UNICODE

and optionally defines the following translation tables:

UNICODE_2_4R_SBC

UNICODE_2_4R_MBC

The format of the mapping file is multiple lines, with each line terminated by a linefeed character. This may be problematic for editors that expect carriage-return or carriage-return followed by linefeed to terminate a line.

Note: Linefeed termination is the UNIX convention. Carriage-return linefeed is the Windows convention.

Use the # to start a comment that continues to the end of a line. Blank lines are ignored.

Mapping File for a Multibyte Character Set - Teradata Database

International Character Set Support