UNIX Compatible Japanese Character Set (KANJIEUC_0U)

UNIX Compatible Japanese Character Set (KANJIEUC_0U) - Teradata Database

International Character Set Support

Product

Teradata Database

Release Number

15.10

Language

English (United States)

Last Update

2018-09-25

dita:id

B035-1125

lifecycle

Product Category

Teradata® Database

KanjiEUC refers to the character set KANJIEUC_0U, which is compatible with the UNIX operating system.

KanjiEUC emulates the standard Extended UNIX Code style of mixed single- and multibyte character data, where the most significant bit of each byte classifies the byte as a single-byte character or part of a multibyte character.

The KanjiEUC character set includes all characters in the JIS X 0201, JIS X 0208, and JIS 0212 standards, plus extensions.

The valid ranges for JIS X 0201 characters in KanjiEUC dictionary data include all of U.S. ASCII and the portion of JIS X 0201 for which the second byte ranges from A1 through DF. See the rows for code set 0 (cs0) and code set 2 (cs2) in the succeeding table.

KanjiEUC uses the four external code sets defined in the following table.

Code Set	Description	Note
0	Single-byte character data Not permitted for the GRAPHIC server character set	JIS X 0201 For the detailed encoding, see “Shift-JIS Encoding: Detailed View” on page 167.
1	Two-byte character data	JIS X 0208
2	Two-byte multibyte character with first byte ss₂=0x8E Not permitted for the GRAPHIC server character set	JIS X 0201 Hankaku Katakana
3	Three-byte multibyte character with first byte ss₃=0x8F	JIS X 0212

Object names on systems enabled with Japanese language support can contain single-byte Latin and Katakana characters from the JIS X 0201 standard, and double-byte characters from the JIS X 0208 standard.

The valid ranges for JIS X 0201 characters in object names under the KanjiEUC client character set appear in rows cs0 and cs2 in “KanjiEUC Code Set Localization” on page 162. The set does not permit Katakana symbols 0x8EA1—0x8EA5 nor Unicode symbols other than $, #, and _.

The valid ranges for JIS X 0208 characters in object names under the KanjiEUC client character set appear in row cs1 in “KanjiEUC Code Set Localization” on page 162. Characters in the reserved regions of the standard are not allowed.

Characters from JIS X 0212 (row cs3) are not valid in object names. Additionally, some characters that are valid in JIS X 0208 do not map to the KanjiEBCDIC encoding and are not valid in KanjiEUC object names. The following table provides a complete list of multibyte character codes that are not valid for object names under the KANJIEUC_0U character set.

First Byte	Second Byte		Third Byte
0xA1	0xA1 - 0xAA	0xAD - 0xB1
	0xB3 - 0xBB	0xBD - 0xEF
	0xF1 - 0xF3	0xF5 - 0xFE
0xA2	0xA1 - 0xFE
0xA6 - 0xA8	0xA1 - 0xFE
0xF4	0xA5 - 0xA6
0x8E	0xA1 - 0xA5
0x8F	0xA1 - 0xFE		0xA1 - 0xFE

For information on the rules and restrictions for naming Teradata Database objects, see the topics beginning with “About Object Names” on page 17.

Also see SQL Fundamentals, which covers topics such as:

Translation conventions for storing object names in the data dictionary

Rules for object name comparison

For more information on …	See …
the JIS X 0201 standard	“JIS X 0201” on page 151.
the JIS X 0208 standard	“JIS X 0208” on page 153.
the standard Extended UNIX Code (EUC) style of mixed single- and multibyte character data	“Extended UNIX Code (EUC)” on page 162.