KANJI1 Server Character Set | Data Types and Literals | Teradata Vantage - KANJI1 Server Character Set [Deprecated] - Advanced SQL Engine - Teradata Database

SQL Data Types and Literals

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Published
July 2021
Language
English (United States)
Last Update
2021-07-27
dita:mapPath
tpf1598412463935.ditamap
dita:ditavalPath
tpf1598412463935.ditaval
dita:id
B035-1143
lifecycle
previous
Product Category
Teradata Vantage™

Intended Use

Japanese applications that must remain compatible with previous Vantage and Teradata Database Kanji releases.

In accordance with Teradata internationalization plans, KANJI1 support is deprecated and is to be discontinued in the near future. KANJI1 is not allowed as a default character set; the system changes the KANJI1 default character set to the UNICODE character set. Creation of new KANJI1 objects is highly restricted. Although many KANJI1 queries and applications may continue to operate, sites using KANJI1 should convert to another character set as soon as possible.

Pad Character

ASCII SPACE (0x20)

SQL Declaration

CHARACTER(n) CHARACTER SET KANJI1

VARCHAR(n) CHARACTER SET KANJI1

Maximum Value for n

64000

This also is the size of LONG VARCHAR CHARACTER SET KANJI1.

Usage Notes

KANJI1 contains mixed single- and multibyte characters in the KanjiEBCDIC, KANJISJIS, or KanjiEUC client character set, as determined by the current character set for the session.

KANJI1 inherits the semantics and limitations on character types of prior releases. For example, it is possible to generate nonvalid strings with the SUBSTR function.

Note the following peculiarity of KANJI1:
  • The single byte character portion of the type is stored in the canonical form defined by JIS X 0201.
  • The multibyte character portion of the type is stored in the client form-of-use and thus cannot be shared among heterogeneous clients.

The maximum length in bytes of the exported character data for the KANJI1 server character set is always n for CHARACTER(n) and VARCHARACTER(n).

On a system that is enabled with Japanese language support, the database assumes that all character data is mixed single and multibyte characters. Mixed single and multibyte character data is associated with any column defined as CHAR, VARCHAR, or LONG VARCHAR.

Encoding of the KANJI1 character set is based on the KanjiEBCDIC, KanjiEUC, or KanjiShift-JIS client character set, depending on the current character set for the session.

Depending on the character set of the session, single byte and multibyte characters are distinguished as described by the following table.

Client Character Set Definition
KanjiEBCDIC Characters are assumed to be single byte until a shift-out character is encountered.

Subsequent characters are assumed to be multibyte characters until a Shift-In character is encountered. If the end of the string is reached without finding Shift-In, an error condition occurs.

KanjiEUC The first byte of a multibyte character always has the most significant bit on. Multibyte characters are two bytes except KanjiEUC cs3 characters, which require three bytes.
KanjiShift-JIS

If the CASESPECIFIC option is defined for the character column, conversion to uppercase is not performed, and simple Latin letters (A...Z, a...z) are considered to match only if they are the same letters and the same case.

For more information about the KANJI1 server character set, see Teradata Vantage™ - Advanced SQL Engine International Character Set Support, B035-1125.

Conversion to Uppercase

The effect of the UPPERCASE function on various Japanese characters is illustrated in the following table.

Client Character Set Character String Conversion Result
KanjiEBCDIC mN<abc>b MN<abc>B
KanjiEUC mn a ss 2B ss 3c MNa ss 2B ss 3c
KanjiShift-JIS mn Iabc MN Iabc

Because simple Latin letters always have the same canonical representation, the effect of converting to uppercase is the same across all the character sets supported by Vantage.

Restrictions

CLOB types do not support the KANJI1 server character set.

Padding and Truncation for CHARACTER Types

In Teradata mode, if a character expression is assigned to a CHAR column of a shorter length, the extra bytes are truncated. This may result in an improper string. You are not informed that this truncation has occurred.

In ANSI mode, an error occurs if a nonblank character is truncated.

If a character expression of some length is assigned to a CHAR column of a longer length, the field is padded with the SPACE character.

Shorter strings are padded with single-byte spaces, regardless of whether the mode is Teradata or ANSI. Only truncation differs between the two modes.

Multibyte Character Data Validation and Storage

Translation and storage of validated multibyte character data on the server depends on the character set of the current session, as explained in the following table.

For this client character set … Multibyte characters are translated and stored on the server …
KanjiEUC according to the client encoding of the current session, as follows:
  • Code set cs0:

    For each character of this code set, the first byte is translated to all characters, from EUC to the internal representation (based on JIS X 0201) and stored as single byte characters.

  • Code set cs1:

    For each character of this code set, the first byte is translated to KanjiShift-JIS character data only (based on JIS X 0208), as illustrated in the translation map in Teradata Vantage™ - Advanced SQL Engine International Character Set Support, B035-1125.

    GRAPHIC data is stored without translation.

    Subsequent characters are also translated to KanjiShift-JIS.

  • Code set cs2:

    For each character of this code set, the first byte is translated to character data only. The first byte (ss 2=0x8E) is translated to 0x80 and the second byte is left unmodified.

  • Code set cs3:

    For each character of this code set, the first byte is translated to character data only. The first byte (ss 3=0x8F) is translated to 0xFF and the remaining 2 bytes are left unmodified.

    GRAPHIC data is stored without translation.

KanjiEBCDIC as received.

They are not translated and remain in the client encoding.

For KanjiEBCDIC only, the Shift-Out and Shift-In characters are stored as part of the string after being translated to the same encoding (0x0E and 0x0F, respectively).

KanjiShift-JIS

Example: Fixed Length KanjiEBCDIC

Assume that a fixed-length column is to contain the following KanjiEBCDIC data:

<  S   T   R   I   N   G  > 12  

The column definition must be at least 16 bytes (CHAR(16) CHARACTER SET KANJI1) to accommodate the internal representation of the data plus the Shift-Out (<) and Shift-In (>) characters, which is as follows:

0E 42E2 42E3 42D9 42C9 42D5 42D7 0F 31 32 

Note that each of the three client representations of multibyte character data could require a different length for the same sequence of symbols.

Example: Fixed Length KanjiShift-JIS

The same string in KanjiShift-JIS is as follows:

S   T   R   I   N   G  12 

and requires a length of only 14 bytes (CHAR(14) CHARACTER SET KANJISJIS. The internal equivalent for the KanjiShift-JIS string is as follows:

8272 8273 8271 8268 826D 8266 31 32