15.10 - KANJI1 Server Character Set - Teradata Database

Teradata Database SQL Data Types and Literals

prodname
Teradata Database
vrm_release
15.10
category
Programming Reference
featnum
B035-1143-151K

Japanese applications that must remain compatible with previous Teradata Database Kanji releases.

Notice:

KANJI1 support is deprecated. KANJI1 is not allowed as a default character set. The system changes the KANJI1 default character set to the UNICODE character set. Creation of new KANJI1 objects is highly restricted. Although many KANJI1 queries and applications may continue to operate, sites using KANJI1 should convert to another character set as soon as possible.

ASCII SPACE (0x20)

CHARACTER(n) CHARACTER SET KANJI1
VARCHAR(n) CHARACTER SET KANJI1

64000

This also is the size of LONG VARCHAR CHARACTER SET KANJI1.

KANJI1 contains mixed single- and multibyte characters in the KanjiEBCDIC, KANJISJIS, or KanjiEUC client character set, as determined by the current character set for the session.

KANJI1 inherits the semantics and limitations on character types of prior releases. For example, it is possible to generate nonvalid strings with the SUBSTR function.

Note the following peculiarity of KANJI1:

  • The single byte character portion of the type is stored in the canonical form defined by JIS X 0201.
  • The multibyte character portion of the type is stored in the client form-of-use and thus cannot be shared among heterogeneous clients.
  • The maximum length in bytes of the exported character data for the KANJI1 server character set is always n for CHARACTER(n) and VARCHARACTER(n).

    On a system that is enabled with Japanese language support, Teradata Database assumes that all character data is mixed single and multibyte characters. Mixed single and multibyte character data is associated with any column defined as CHAR, VARCHAR, or LONG VARCHAR.

    Encoding of the KANJI1 character set is based on the KanjiEBCDIC, KanjiEUC, or KanjiShift‑JIS client character set, depending on the current character set for the session.

    Depending on the character set of the session, single byte and multibyte characters are distinguished as described by the following table.

     

    Client Character Set

    Definition

    KanjiEBCDIC

    Characters are assumed to be single byte until a shift-out character is encountered.

    Subsequent characters are assumed to be multibyte characters until a Shift-In character is encountered. If the end of the string is reached without finding Shift-In, an error condition occurs.

    KanjiEUC

    The first byte of a multibyte character always has the most significant bit on. Multibyte characters are two bytes except KanjiEUC cs3 characters, which require three bytes.

    KanjiShift-JIS

    If the CASESPECIFIC option is defined for the character column, conversion to uppercase is not performed, and simple Latin letters (A...Z, a...z) are considered to match only if they are the same letters and the same case.

    For more information about the KANJI1 server character set, see International Character Set Support.

    The effect of the UPPERCASE function on various Japanese characters is illustrated in the following table.

     

    Client Character Set

    Character String

    Conversion Result

    KanjiEBCDIC

    mN<abc>b

    MN<abc>B

    KanjiEUC

    mna ss2B ss3c

    MNa ss2B ss3c

    KanjiShift-JIS

    mnIabc

    MNIabc

    Because simple Latin letters always have the same canonical representation, the effect of converting to uppercase is the same across all the character sets supported by Teradata Database.

    CLOB types do not support the KANJI1 server character set.

    In Teradata mode, if a character expression is assigned to a CHAR column of a shorter length, the extra bytes are truncated. This may result in an improper string. You are not informed that this truncation has occurred.

    In ANSI mode, an error occurs if a nonblank character is truncated.

    If a character expression of some length is assigned to a CHAR column of a longer length, the field is padded with the SPACE character.

    Shorter strings are padded with single-byte spaces, regardless of whether the mode is Teradata or ANSI. Only truncation differs between the two modes.

    Translation and storage of validated multibyte character data on the server depends on the character set of the current session, as explained in the following table.

     

    For this client

    character set …

    Multibyte characters are translated and stored on the server …

    KanjiEUC

    according to the client encoding of the current session, as follows:

  • Code set cs0:
  • For each character of this code set, the first byte is translated to all characters, from EUC to the internal representation (based on JIS X 0201) and stored as single byte characters.

  • Code set cs1:
  • For each character of this code set, the first byte is translated to KanjiShift-JIS character data only (based on JIS X 0208), as illustrated in the translation map in International Character Set Support.

    GRAPHIC data is stored without translation.

    Subsequent characters are also translated to KanjiShift-JIS.

  • Code set cs2:
  • For each character of this code set, the first byte is translated to character data only. The first byte (ss2=0x8E) is translated to 0x80 and the second byte is left unmodified.

  • Code set cs3:
  • For each character of this code set, the first byte is translated to character data only. The first byte (ss3=0x8F) is translated to 0xFF and the remaining 2 bytes are left unmodified.

    GRAPHIC data is stored without translation.

    KanjiEBCDIC

    as received.

    They are not translated and remain in the client encoding.

    For KanjiEBCDIC only, the Shift-Out and Shift-In characters are stored as part of the string after being translated to the same encoding (0x0E and 0x0F, respectively).

    KanjiShift-JIS

    Assume that a fixed-length column is to contain the following KanjiEBCDIC data:

       < S T R I N G > 12   

    The column definition must be at least 16 bytes (CHAR(16) CHARACTER SET KANJI1) to accommodate the internal representation of the data plus the Shift-Out (<) and Shift-In (>) characters, which is as follows:

       0E 42E2 42E3 42D9 42C9 42D5 42D7 0F 31 32  

    Note that each of the three client representations of multibyte character data could require a different length for the same sequence of symbols.

    The same string in KanjiShift-JIS is as follows:

       S T R I N G 12  

    and requires a length of only 14 bytes (CHAR(14) CHARACTER SET KANJISJIS. The internal equivalent for the KanjiShift-JIS string is as follows:

       8272 8273 8271 8268 826D 8266 31 32