Japanese Encoding Schemes - Teradata Database

International Character Set Support

Product
Teradata Database
Release Number
15.10
Language
English (United States)
Last Update
2018-09-25
dita:id
B035-1125
lifecycle
previous
Product Category
Teradata® Database

Because thousands of characters are required to write Japanese, it is not possible to represent all characters as a single-byte. For this reason, Japanese character sets use either:

  • A multibyte mapping standard.
  • The combination of a multibyte standard to handle most of the enormous number of required characters, and a single-byte standard to efficiently code a smaller number of frequently used characters.
  • There are several mapping standards used in the character sets supported under the Teradata Database Japanese character support.

     

    Standard

    Description

    JIS X 0201

    Similar to the ISO 8859 family of protocols with the exception that there are some changes in the ASCII region. The area from 0xA1-0xDF is used mainly for Hankaku Katakana.

    JIS X 0208

    A double-byte standard that includes the more common Kanji characters along with many uncommon ones. It also includes Hiragana, Katakana and Zenkaku Romaji characters, as well as Greek, Cyrillic, and various other characters.

    JIS X 0212

    A double-byte standard that was designed to include many of the rarer Kanji characters.

    IBM Code Page 300

    A double-byte standard similar in content to JIS X 0208, but designed for an EBCDIC platform.

    IBM-provided single-byte standards for Japanese

    Based on EBCDIC, but include Hankaku Katakana characters. These mapping standards are described in more detail in the descriptions of individual supported character sets.

    UTF-8

    A version of Unicode optimized for backward compatibility with ASCII. In Teradata UTF8, a character can consist of from one to three bytes.

    For more information on Japanese encodings and mapping standards, see Appendix B: “Japanese Encodings and Mapping Standards.”