Extended UNIX Code (EUC) - Teradata Database

International Character Set Support

Product
Teradata Database
Release Number
15.10
Language
English (United States)
Last Update
2018-09-25
dita:id
B035-1125
lifecycle
previous
Product Category
Teradata® Database

For UNIX client systems, the Teradata Database supports the Extended UNIX Code (EUC).

The KANJIEUC_0U client character set is based on EUC.

EUC is composed of one primary and three supplementary codesets.

The primary codeset, codeset 0, is used for ASCII characters.

The three supplementary code sets, code sets 1, 2, and 3, can be assigned to different character sets by the user.

There is a system default assignment for these codesets.

The primary code set is defined to be a single-byte with the most significant (high-order) bit set to 0. The supplementary codesets can be multiple bytes, and the most significant bit of each is set to 1.

Code sets 2 and 3 have a preceding single-shift character, known as ss2 and ss3, respectively, where ss2 is 0x8E and ss3 is 0x8F. Differentiation between codesets is as follows.

 

IF the most significant bit is this value …

THEN …

0

the code set is one-byte ASCII.

1

the byte is checked for ss2 or ss3 to determine the code set. The length in bytes of characters from that code set is retrieved from an ANSI localization table governing character classification, and that number of bytes is read in.

The following table shows the KanjiEUC Code Set Localization.

 

Code Set

EUC Representation (In Bits)

Japanese Language Environment Implementation

cs0

0xxxxxxx

JIS X 0201

cs1

1xxxxxxx 1xxxxxxx

JIS X 0208 (Kanji Characters).

The first 1xxxxxxx must not be ss2 or ss3. The valid range of the first byte is A1-FE and the valid range of the second byte is A1-FE.

Those ranges are implied by the JIS X 0208 standard.

cs2

ss2 1xxxxxxx

JIS X 0201 (half-size Katakana).

The valid range of the second byte is A1-DF.

cs3

ss3 1xxxxxxx 1xxxxxxx

JIS X 0212.

The valid range of the first byte is A1-FE and the valid range of the second byte is A1-FE.

These ranges are implied by the JIS X 0212 standard.

The following table identifies selected characters in the KanjiEUC character set.

 

Double-Byte Space

Double-Byte Underscore

Double-Byte Percent

0xA1A1

0xA1B2

0xA1F3

The following table identifies the EUC code set 2 introducer (ss2) and code set 3 introducer (ss3).

 

ss2

ss3

0x8E

0x8F

The following graphic illustrates the KanjiEUC encoding for Kanji alphabets.

 

For more information on …

See …

the JIS X 0201 standard

“JIS X 0201” on page 151.

the JIS X 0208 standard

“JIS X 0208” on page 153.

the KANJIEUC_0U client character set that is based on EUC

“UNIX Compatible Japanese Character Set (KANJIEUC_0U)” on page 36.