This document uses the Unicode naming convention for characters. For example, the lowercase character ‘a’ is more formally specified as either LATIN CAPITAL LETTER A or U+0041. The U+xxxx notation refers to a particular code point in the Unicode standard, where xxxx stands for the hexadecimal representation of the 16-bit value defined in the standard.
- KanjiEBCDIC
- KanjiEUC
- KanjiShift-JIS
These encodings are further defined in International Character Set Support, B035-1125.
Character Symbols
Symbol | Encoding | Meaning |
---|---|---|
|
Any | Any single byte Latin letter or digit. |
|
Any | Any fullwidth Latin letter or digit. |
< | KanjiEBCDIC | Shift Out [SO] (0x0E). Indicates transition from single to multibyte character in KanjiEBCDIC. |
> | KanjiEBCDIC | Shift In [SI] (0x0F). Indicates transition from multibyte to single byte KanjiEBCDIC. |
T | Any | Any multibyte character. The encoding depends on the current character set. For KanjiEUC, code set 3 characters are always preceded by ss3. |
I | Any | Any single byte Hankaku Katakana character. In KanjiEUC, it must be preceded by ss2, forming an individual multibyte character. |
Δ | Any | Represents the graphic pad character. |
Δ | Any | Represents a single or multibyte pad character, depending on context. |
ss 2 | KanjiEUC | Represents the EUC code set 2 introducer (0x8E). |
ss 3 | KanjiEUC | Represents the EUC code set 3 introducer (0x8F). |
For example, string "TEST", where each letter is intended to be a fullwidth character, is written as TEST. When encoding is important, hexadecimal representation is used.
For example, the following mixed single byte/multibyte character data in KanjiEBCDIC character set:
LMN<TEST>QRS
is represented as:
D3 D4 D5 0E 42E3 42C5 42E2 42E3 0F D8 D9 E2
Pad Characters
Server Character Set | Pad Character Name | Pad Character Value |
---|---|---|
LATIN | SPACE | 0x20 |
UNICODE | SPACE | U+0020 |
GRAPHIC | IDEOGRAPHIC SPACE | U+3000 |
KANJISJIS | ASCII SPACE | 0x20 |
KANJI1 | ASCII SPACE | 0x20 |