Character Shorthand Notation Used in This Document | VantageCloud Lake - Character Shorthand Notation Used in This Document - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

This document uses the Unicode naming convention for characters. For example, the lowercase character ‘a’ is more formally specified as either LATIN CAPITAL LETTER A or U+0041. The U+xxxx notation refers to a code point in the Unicode standard, where xxxx stands for the hexadecimal representation of the 16-bit value defined in the standard.

In parts of the document, it is convenient to use a symbol to represent a special character, or a class of characters. This is particularly true in discussion of the following Japanese character encodings:
  • KanjiEBCDIC
  • KanjiEUC
  • KanjiShift-JIS

These encodings are further defined in Japanese Encodings and Mapping Standards .

Character Symbols

The symbols, with character sets with which they are used, are defined in the following table.
Symbol Encoding Meaning
  • a-z
  • A-Z
  • 0-9
Any Any single byte Latin letter or digit.
  • a-z
  • A-Z
  • 0-9
Any Any fullwidth Latin letter or digit.
< KanjiEBCDIC Shift Out [SO] (0x0E).

Indicates transition from single to multibyte character in KanjiEBCDIC.

> KanjiEBCDIC Shift In [SI] (0x0F).

Indicates transition from multibyte to single byte KanjiEBCDIC.

T Any Any multibyte character.

The encoding depends on the current character set.

For KanjiEUC, code set 3 characters are preceded by ss3.

I Any Any single byte Hankaku Katakana character.

In KanjiEUC, it must be preceded by ss2, forming an individual multibyte character.

Δ Any Represents the graphic pad character.
Δ Any Represents a single or multibyte pad character, depending on context.
ss 2 KanjiEUC Represents the EUC code set 2 introducer (0x8E).
ss 3 KanjiEUC Represents the EUC code set 3 introducer (0x8F).

For example, string “TEST”, where each letter is intended to be a fullwidth character, is written as TEST. Occasionally, when encoding is important, hexadecimal representation is used.

For example, the following mixed single byte/multibyte character data in KanjiEBCDIC character set:

LMN<TEST>QRS

is represented as:

D3 D4 D5 0E 42E3 42C5 42E2 42E3 0F D8 D9 E2

Pad Characters

The following table lists the pad characters for the character data types.
Server Character Set Pad Character Name Pad Character Value
LATIN SPACE 0x20
UNICODE SPACE U+0020
GRAPHIC IDEOGRAPHIC SPACE U+3000
KANJISJIS ASCII SPACE 0x20
KANJI1 ASCII SPACE 0x20