15.10 - UNICODE Server Character Set - Teradata Database

Teradata Database International Character Set Support

prodname
Teradata Database
vrm_release
15.00
15.10
category
Configuration
User Guide
featnum
B035-1125-015K

The Unicode standard is a 16-bit encoding of virtually all characters in all current world languages.

UNICODE is a canonical character set. Data stored as UNICODE can be shared among heterogeneous clients.

UNICODE supports all single- and multibyte client character sets.

Teradata Database supports characters from Unicode 6.0, referred to in this chapter as UNICODE.

For a list of the supported characters, see the UNICODE Server Character Set, B035-1056-036K, available on the documentation CD and on the Web at http://www.info.teradata.com/.

Note: Teradata currently does not support surrogate code points (4-byte UTF16) defined in Unicode 3.1 or above.

It is often useful to divide the UNICODE character set into eight areas.

 

Area

Description

General scripts

Latin, Greek, Cyrillic, Hebrew, Arabic, Indic, and other characters.

Symbols

Arrows, mathematical symbols, and punctuation.

CJK Phonetics and Symbols

Hiragana, Katakana, and Bopomofo.

CJK stands for Chinese, Japanese, and Korean.

CJK Ideographs

Chinese, Japanese, and Korean ideographs.

Hangul Syllables

Complete set of modern Hangul.

Surrogates

Code points designed to extend the range of Unicode within the ISO 10646 encoding scheme.

Private Use Area

The Private Use Area contains characters for sharing site-defined characters from the KanjiEBCDIC, KanjiEUC, and KanjiShift-JIS client character sets.

Compatibility Zone

The Compatibility Zone contains halfwidth and fullwidth variants of characters defined by Japanese standards and, among others, includes Hankaku (halfwidth) Katakana and fullwidth ASCII characters.

The first 1880 characters of the Private Use Area are used for sharing site-defined characters from KanjiEBCDIC, KanjiEUC, and KanjiShift-JIS client character sets.

The following table defines the Teradata Database use of the Private Use Area.

 

Name

Unicode
Range

KanjiShift-JIS
Range

KanjiEUC
Range

KanjiEBCDIC
Range

Comments

Gaiji-1

U+E000 to U+E3AB

0xF040 to 0xF4FC

(1st 940 Gaiji)

0xF5A1 to 0xFEFE

(Rows 85-94 JIS X 0208)

0x6941 to 0x6DF4

(1st 940 Gaiji)

Shared by all.
(940 characters)

Gaiji-2

U+E3AC to U+E757

0xF540 to 0xF9FC

(2nd 940 Gaiji)

0x8FF5A1 to 0x8FFEFE

(Rows 85-94 JIS X 0212)

0x6DF5 to 0x72EA

(2nd 940 Gaiji)

Shared by all.
(940 characters)

Graphic Error Character

U+F8FF

Not applicable

Not applicable

Not applicable

Associated with the VARGRAPHIC function

For additional characteristics of the UNICODE server character set, see SQL Data Types and Literals.