UNIX Compatible Japanese Character Set (KANJIEUC_0U) - Teradata Database

International Character Set Support

Product
Teradata Database
Release Number
15.10
Language
English (United States)
Last Update
2018-09-25
dita:id
B035-1125
lifecycle
previous
Product Category
Teradata® Database

KanjiEUC refers to the character set KANJIEUC_0U, which is compatible with the UNIX operating system.

KanjiEUC emulates the standard Extended UNIX Code style of mixed single- and multibyte character data, where the most significant bit of each byte classifies the byte as a single-byte character or part of a multibyte character.

The KanjiEUC character set includes all characters in the JIS X 0201, JIS X 0208, and JIS 0212 standards, plus extensions.

The valid ranges for JIS X 0201 characters in KanjiEUC dictionary data include all of U.S. ASCII and the portion of JIS X 0201 for which the second byte ranges from A1 through DF. See the rows for code set 0 (cs0) and code set 2 (cs2) in the succeeding table.

KanjiEUC uses the four external code sets defined in the following table.

 

Code Set

Description

Note

0

Single-byte character data

Not permitted for the GRAPHIC server character set

JIS X 0201

For the detailed encoding, see “Shift-JIS Encoding: Detailed View” on page 167.

1

Two-byte character data

JIS X 0208

2

Two-byte multibyte character with first byte ss2=0x8E

Not permitted for the GRAPHIC server character set

JIS X 0201

Hankaku Katakana

3

Three-byte multibyte character with first byte ss3=0x8F

JIS X 0212

Object names on systems enabled with Japanese language support can contain single-byte Latin and Katakana characters from the JIS X 0201 standard, and double-byte characters from the JIS X 0208 standard.

The valid ranges for JIS X 0201 characters in object names under the KanjiEUC client character set appear in rows cs0 and cs2 in “KanjiEUC Code Set Localization” on page 162. The set does not permit Katakana symbols 0x8EA1—0x8EA5 nor Unicode symbols other than $, #, and _.

The valid ranges for JIS X 0208 characters in object names under the KanjiEUC client character set appear in row cs1 in “KanjiEUC Code Set Localization” on page 162. Characters in the reserved regions of the standard are not allowed.

Characters from JIS X 0212 (row cs3) are not valid in object names. Additionally, some characters that are valid in JIS X 0208 do not map to the KanjiEBCDIC encoding and are not valid in KanjiEUC object names. The following table provides a complete list of multibyte character codes that are not valid for object names under the KANJIEUC_0U character set.

 

First Byte

Second Byte

Third Byte

0xA1

0xA1 - 0xAA

0xAD - 0xB1

 

0xB3 - 0xBB

0xBD - 0xEF

 

0xF1 - 0xF3

0xF5 - 0xFE

 

0xA2

0xA1 - 0xFE

 

0xA6 - 0xA8

0xA1 - 0xFE

 

0xF4

0xA5 - 0xA6

 

0x8E

0xA1 - 0xA5

 

0x8F

0xA1 - 0xFE

0xA1 - 0xFE

For information on the rules and restrictions for naming Teradata Database objects, see the topics beginning with “About Object Names” on page 17.

Also see SQL Fundamentals, which covers topics such as:

  • Translation conventions for storing object names in the data dictionary
  • Rules for object name comparison
  •  

    For more information on …

    See …

    the JIS X 0201 standard

    “JIS X 0201” on page 151.

    the JIS X 0208 standard

    “JIS X 0208” on page 153.

    the standard Extended UNIX Code (EUC) style of mixed single- and multibyte character data

    “Extended UNIX Code (EUC)” on page 162.