Japanese applications that must remain compatible with previous Teradata Database Kanji releases.
ASCII SPACE (0x20)
CHARACTER(n) CHARACTER SET KANJI1
VARCHAR(n) CHARACTER SET KANJI1
Maximum Value for n
This also is the size of LONG VARCHAR CHARACTER SET KANJI1.
KANJI1 contains mixed single- and multibyte characters in the KanjiEBCDIC, KANJISJIS, or KanjiEUC client character set, as determined by the current character set for the session.
KANJI1 inherits the semantics and limitations on character types of prior releases. For example, it is possible to generate nonvalid strings with the SUBSTR function.
- The single byte character portion of the type is stored in the canonical form defined by JIS X 0201.
- The multibyte character portion of the type is stored in the client form-of-use and thus cannot be shared among heterogeneous clients.
The maximum length in bytes of the exported character data for the KANJI1 server character set is always n for CHARACTER(n) and VARCHARACTER(n).
On a system that is enabled with Japanese language support, Teradata Database assumes that all character data is mixed single and multibyte characters. Mixed single and multibyte character data is associated with any column defined as CHAR, VARCHAR, or LONG VARCHAR.
Encoding of the KANJI1 character set is based on the KanjiEBCDIC, KanjiEUC, or KanjiShift-JIS client character set, depending on the current character set for the session.
Depending on the character set of the session, single byte and multibyte characters are distinguished as described by the following table.
|Client Character Set||Definition|
|KanjiEBCDIC||Characters are assumed to be single byte until a shift-out character is encountered.
Subsequent characters are assumed to be multibyte characters until a Shift-In character is encountered. If the end of the string is reached without finding Shift-In, an error condition occurs.
|KanjiEUC||The first byte of a multibyte character always has the most significant bit on. Multibyte characters are two bytes except KanjiEUC cs3 characters, which require three bytes.|
If the CASESPECIFIC option is defined for the character column, conversion to uppercase is not performed, and simple Latin letters (A...Z, a...z) are considered to match only if they are the same letters and the same case.
For more information about the KANJI1 server character set, see Teradata Vantage™ NewSQL Engine International Character Set Support, B035-1125.
Conversion to Uppercase
The effect of the UPPERCASE function on various Japanese characters is illustrated in the following table.
|Client Character Set||Character String||Conversion Result|
|KanjiEUC||mn a ss 2 B ss 3 c||MNa ss 2 B ss 3 c|
|KanjiShift-JIS||mn I abc||MN I abc|
Because simple Latin letters always have the same canonical representation, the effect of converting to uppercase is the same across all the character sets supported by Teradata Database.
CLOB types do not support the KANJI1 server character set.
Padding and Truncation for CHARACTER Types
In Teradata mode, if a character expression is assigned to a CHAR column of a shorter length, the extra bytes are truncated. This may result in an improper string. You are not informed that this truncation has occurred.
In ANSI mode, an error occurs if a nonblank character is truncated.
If a character expression of some length is assigned to a CHAR column of a longer length, the field is padded with the SPACE character.
Shorter strings are padded with single-byte spaces, regardless of whether the mode is Teradata or ANSI. Only truncation differs between the two modes.
Multibyte Character Data Validation and Storage
Translation and storage of validated multibyte character data on the server depends on the character set of the current session, as explained in the following table.
|For this client character set …||Multibyte characters are translated and stored on the server …|
|KanjiEUC||according to the client encoding of the current session, as follows:
They are not translated and remain in the client encoding.
For KanjiEBCDIC only, the Shift-Out and Shift-In characters are stored as part of the string after being translated to the same encoding (0x0E and 0x0F, respectively).
Example: Fixed Length KanjiEBCDIC
Assume that a fixed-length column is to contain the following KanjiEBCDIC data:
< S T R I N G > 12
The column definition must be at least 16 bytes (CHAR(16) CHARACTER SET KANJI1) to accommodate the internal representation of the data plus the Shift-Out (<) and Shift-In (>) characters, which is as follows:
0E 42E2 42E3 42D9 42C9 42D5 42D7 0F 31 32
Note that each of the three client representations of multibyte character data could require a different length for the same sequence of symbols.
Example: Fixed Length KanjiShift-JIS
The same string in KanjiShift-JIS is as follows:
S T R I N G 12
and requires a length of only 14 bytes (CHAR(14) CHARACTER SET KANJISJIS. The internal equivalent for the KanjiShift-JIS string is as follows:
8272 8273 8271 8268 826D 8266 31 32