Chinese Character Sets - Teradata Database

International Character Set Support

Product
Teradata Database
Release Number
15.10
Language
English (United States)
Last Update
2018-09-25
dita:id
B035-1125
lifecycle
previous
Product Category
Teradata® Database

Teradata supplies several multibyte character sets to support Simplified Chinese and Traditional Chinese on mainframe and network-attached clients.

Each character set uses a specific encoding form to distinguish single-byte characters from multibyte characters.

 

Character Set

Description

Encoding Form

SCHEBCDIC935_2IJ

Simplified Chinese (IBM CCSID 935) for mainframe clients.

EBCDIC Shift-Out/Shift-In.

Shift-out character 0x0E and shift-in character 0x0F bracket each string of double-byte characters.

TCHEBCDIC937_3IB

Traditional Chinese (IBM CCSID 937) for mainframe clients.

SCHGB2312_1T0

Simplified Chinese (mixed GB2312) for network-attached clients.

Extended UNIX Code (EUC) composed of two code sets: cs0 for single-byte characters and cs1 for double-byte characters.

TCHBIG5_1R0

Traditional Chinese (Big5) for network-attached clients.

Value of first byte in sequence distinguishes single-byte characters from double-byte characters.

SCHINESE936_6R0

Simplified Chinese (mixed GB2312) for network-attached clients.

Value of first byte in sequence distinguishes single-byte characters from double-byte characters.

TCHINESE950_8R0

Traditional Chinese (Big5) for network-attached clients.

Value of first byte in sequence distinguishes single-byte characters from double-byte characters.

To determine whether a Chinese character is valid in an object name:

1 Find the text file on the documentation CD or on the Web at http://www.info.teradata.com/ that maps the character set to UNICODE.

2 In the text file, find the Unicode character to which the client character in question maps.

3 Find the file that identifies valid Unicode characters, UOBJNEXT.txt, available on the Teradata User Documentation CD and at http://www.info.teradata.com.

4 If the Unicode character appears in the file that is applicable to your system, you can use the client character that maps to it in an object name.

Character data entered using Chinese client character sets should be stored in columns defined as Unicode. The UNICODE server character set requires two bytes of storage per character so that a CHAR(5) CHARACTER SET UNICODE field occupies 10 bytes of storage.

Given the 64000 byte limit on column size, a column cannot exceed 32000 characters. Furthermore, the combination of character data and other data types cannot exceed the 64000 byte limit on row size.