15.10 - Naming Character Sets - Teradata Database

Teradata Database International Character Set Support

prodname
Teradata Database
vrm_release
15.00
15.10
category
Configuration
User Guide
featnum
B035-1125-015K

Choose a letter as a mnemonic for the group of character sets being defined. Use the mnemonic as the last letter in the suffix of the character set name and in the name of the mapping file. If you choose ‘x’, for example, the suffix of the character set name is _0x, and the name of the mapping file is map_0x. The system uses the letter as a link between the name of the client character set, as specified in DBC.CharTranslationsV, and the name of the mapping file.

For example, Cyrillic character sets can use the mnemonic ‘c’. If CYRILLIC_0C is the name of a client character set, the system uses the internal transitional form specified in map_0c.

If you omit the mnemonic in the name of the character set, the system does not look for a mapping file, and assumes the character set is based on standard Latin characters. A client character set named CYRILLIC does not produce Cyrillic characters, although it may appear to be working correctly due to the display characteristics of the client.

Note: The characters ‘a’, ‘e’, ‘i’, ‘r’, ‘s’, ‘t’, and ‘u’ are reserved and cannot be used for extended single-byte character set mapping file names.

If the Teradata-defined Chinese character sets described in “Chinese Character Sets” on page 40 are not appropriate for your site, you can define your own character sets using the following names, IDs, and encodings.

 

Character Set Name

ID

Description

SDSCHEBCDIC935_6IJ

75

Simplified Chinese for mainframe clients.

The encoding form is EBCDIC Shift-Out/Shift-In, where the shift-out character 0x0E and shift-in character 0x0F bracket zero or more double-byte characters.

SDTCHEBCDIC937_7IB

76

Traditional Chinese for mainframe clients.

The encoding form is EBCDIC Shift-Out/Shift-In, where the shift-out character 0x0E and shift-in character 0x0F bracket zero or more double-byte characters.

SDSCHGB2312_2T0

94

Simplified Chinese for network-attached clients.

The encoding form is Extended UNIX Code (EUC), composed of two code sets: cs0 for single-byte characters and cs1 for double-byte characters.

SDTCHBIG5_3R0

95

Traditional Chinese for network-attached clients.

The value of the first byte in a sequence distinguishes single-byte characters from double-byte characters.

The system uses the two characters following the underscore (_) in the character set name as a link to the mapping file you create in the TPA etc or TPA cfg directory. The name of the mapping file must start with “map_” and end with the first two characters following the underscore in the character set name.

For example, if you define a character set for SDTCHBIG5_3R0, you must create a mapping file named map_3R that provides the translation tables between the transitional forms and Unicode.

If the Teradata-defined Korean character sets described in “Korean Character Sets” on page 46 are not appropriate for your site, you can define your own character sets using the following names, IDs, and encodings.

 

Character Set Name

ID

Description

SDHANGULEBCDIC933_5II

74

Korean for mainframe clients.

The encoding form is EBCDIC Shift-Out/Shift-In, where the shift-out character 0x0E and shift-in character 0x0F bracket zero or more double-byte characters.

SDHANGULKSC5601_4R4

93

Korean for network-attached clients.

The value of the first byte in a sequence distinguishes single-byte characters from double-byte characters.

The system uses the two characters following the underscore (_) in the character set name as a link to the mapping file you create in the TPA etc or TPA cfg directory. The name of the mapping file must start with “map_” and end with the first two characters following the underscore in the character set name.

For example, if you define a character set for SDHANGULKSC5601_4R4, you must create a mapping file named map_4R that provides the translation tables between the transitional forms and Unicode.

If the Teradata-defined Japanese character sets described in “Japanese Client Character Set Support” on page 29 are not appropriate for your site, you can define your own character sets using the following names, IDs, and encodings.

 

Character Set Name

ID

Description

SDKATAKANAEBCDIC_4IF

77

Japanese Katakana EBCDIC for mainframe clients.

The encoding form is EBCDIC Shift-Out/Shift-In, where the shift-out character 0x0E and shift-in character 0x0F bracket zero or more double-byte characters.

SDKANJIEBCDIC5026_4IG

78

IBM Japanese Extended Katakana character set for mainframe clients.

The high order bit of the first byte in a sequence distinguishes single-byte characters from double-byte characters.

SDKANJIEBCDIC5035_4IH

79

IBM Japanese Extended English character set for mainframe clients.

The high order bit of the first byte in a sequence distinguishes single-byte characters from double-byte characters.

SDKANJIEUC_1U3

91

Japanese character set for network-attached clients that is compatible with the UNIX operating system.

The encoding form is Extended UNIX Code (EUC), composed of four code sets: cs0 for one-byte characters, cs1 and cs2 for two-byte characters, and cs3 for three-byte characters.

SDKANJISJIS_1S3

92

Windows-compatible Japanese character set for network-attached clients.

The first byte in a sequence distinguishes single-byte characters from double-byte characters.

The system uses the two characters following the underscore (_) in the character set name as a link to the mapping file you create in the TPA etc or TPA cfg directory. The name of the mapping file must start with “map_” and end with the first two characters following the underscore in the character set name.

For example, if you define a character set for SDKANJISJIS_1S3, you must create a mapping file named map_1S that provides the translation tables between the transitional forms and Unicode.