UNICODE Server Character Set - Teradata Database

International Character Set Support

Product

Teradata Database

Release Number

15.10

Language

English (United States)

Last Update

2018-09-25

dita:id

B035-1125

lifecycle

Product Category

Teradata® Database

The Unicode standard is a 16-bit encoding of virtually all characters in all current world languages.

UNICODE is a canonical character set. Data stored as UNICODE can be shared among heterogeneous clients.

UNICODE supports all single- and multibyte client character sets.

Teradata Database supports characters from Unicode 6.0, referred to in this chapter as UNICODE.

For a list of the supported characters, see the UNICODE Server Character Set, B035-1056-036K, available on the documentation CD and on the Web at http://www.info.teradata.com/.

Note: Teradata currently does not support surrogate code points (4-byte UTF16) defined in Unicode 3.1 or above.

It is often useful to divide the UNICODE character set into eight areas.

Area	Description
General scripts	Latin, Greek, Cyrillic, Hebrew, Arabic, Indic, and other characters.
Symbols	Arrows, mathematical symbols, and punctuation.
CJK Phonetics and Symbols	Hiragana, Katakana, and Bopomofo. CJK stands for Chinese, Japanese, and Korean.
CJK Ideographs	Chinese, Japanese, and Korean ideographs.
Hangul Syllables	Complete set of modern Hangul.
Surrogates	Code points designed to extend the range of Unicode within the ISO 10646 encoding scheme.
Private Use Area	The Private Use Area contains characters for sharing site-defined characters from the KanjiEBCDIC, KanjiEUC, and KanjiShift-JIS client character sets.
Compatibility Zone	The Compatibility Zone contains halfwidth and fullwidth variants of characters defined by Japanese standards and, among others, includes Hankaku (halfwidth) Katakana and fullwidth ASCII characters.

The first 1880 characters of the Private Use Area are used for sharing site-defined characters from KanjiEBCDIC, KanjiEUC, and KanjiShift-JIS client character sets.

The following table defines the Teradata Database use of the Private Use Area.

Name	Unicode Range	KanjiShift-JIS Range	KanjiEUC Range	KanjiEBCDIC Range	Comments
Gaiji-1	U+E000 to U+E3AB	0xF040 to 0xF4FC (1st 940 Gaiji)	0xF5A1 to 0xFEFE (Rows 85-94 JIS X 0208)	0x6941 to 0x6DF4 (1st 940 Gaiji)	Shared by all. (940 characters)
Gaiji-2	U+E3AC to U+E757	0xF540 to 0xF9FC (2nd 940 Gaiji)	0x8FF5A1 to 0x8FFEFE (Rows 85-94 JIS X 0212)	0x6DF5 to 0x72EA (2nd 940 Gaiji)	Shared by all. (940 characters)
Graphic Error Character	U+F8FF	Not applicable	Not applicable	Not applicable	Associated with the VARGRAPHIC function

For additional characteristics of the UNICODE server character set, see SQL Data Types and Literals.