15.10 - Character Shorthand Notation Used In This Book - Teradata Database

Teradata Database SQL Data Types and Literals

prodname
Teradata Database
vrm_release
15.10
category
Programming Reference
featnum
B035-1143-151K

This book uses the Unicode naming convention for characters. For example, the lowercase character ‘a’ is more formally specified as either LATIN CAPITAL LETTER A or U+0041. The U+xxxx notation refers to a particular code point in the Unicode standard, where xxxx stands for the hexadecimal representation of the 16-bit value defined in the standard.

In parts of the book, it is convenient to use a symbol to represent a special character, or a particular class of characters. This is particularly true in discussion of the following Japanese character encodings.

  • KanjiEBCDIC
  • KanjiEUC
  • KanjiShift-JIS
  • The symbols, along with character sets with which they are used, are defined in the following table.

     

    Symbol

    Encoding

    Meaning

    a–z

    A–Z

    0–9

    Any

    Any single byte Latin letter or digit.

    az

    AZ

    09

    Any

    Any fullwidth Latin letter or digit.

    <

    KanjiEBCDIC

    Shift Out [SO] (0x0E).

    Indicates transition from single to multibyte character in KanjiEBCDIC.

    >

    KanjiEBCDIC

    Shift In [SI] (0x0F).

    Indicates transition from multibyte to single byte KanjiEBCDIC.

    T

    Any

    Any multibyte character.

    The encoding depends on the current character set.

    For KanjiEUC, code set 3 characters are always preceded by “ss3”.

    I

    Any

    Any single byte Hankaku Katakana character.

    In KanjiEUC, it must be preceded by “ss2”, forming an individual multibyte character.

    Δ

    Any

    Represents the graphic pad character.

    Δ

    Any

    Represents a single or multibyte pad character, depending on context.

    ss 2

    KanjiEUC

    Represents the EUC code set 2 introducer (0x8E).

    ss 3

    KanjiEUC

    Represents the EUC code set 3 introducer (0x8F).

    For example, string “TEST”, where each letter is intended to be a fullwidth character, is written as TEST. Occasionally, when encoding is important, hexadecimal representation is used.

    For example, the following mixed single byte/multibyte character data in KanjiEBCDIC character set

    LMN<TEST>QRS

    is represented as:

    D3 D4 D5 0E 42E3 42C5 42E2 42E3 0F D8 D9 E2

    The following table lists the pad characters for the various character data types.

     

    Server Character Set

    Pad Character Name

    Pad Character Value

    LATIN

    SPACE

    0x20

    UNICODE

    SPACE

    U+0020

    GRAPHIC

    IDEOGRAPHIC SPACE

    U+3000

    KANJISJIS

    ASCII SPACE

    0x20

    KANJI1

    ASCII SPACE

    0x20