The Unicode standard is a 16-bit encoding of virtually all characters in all current world languages.
UNICODE is a canonical character set. Data stored as UNICODE can be shared among heterogeneous clients.
UNICODE supports all single- and multibyte client character sets.
Teradata Database supports characters from Unicode 6.0, referred to in this chapter as UNICODE.
For a list of the supported characters, see the UNICODE Server Character Set, B035-1056-036K, available on the documentation CD and on the Web at http://www.info.teradata.com/.
Note: Teradata currently does not support surrogate code points (4-byte UTF16) defined in Unicode 3.1 or above.
It is often useful to divide the UNICODE character set into eight areas.
Area |
Description |
General scripts |
Latin, Greek, Cyrillic, Hebrew, Arabic, Indic, and other characters. |
Symbols |
Arrows, mathematical symbols, and punctuation. |
CJK Phonetics and Symbols |
Hiragana, Katakana, and Bopomofo. CJK stands for Chinese, Japanese, and Korean. |
CJK Ideographs |
Chinese, Japanese, and Korean ideographs. |
Hangul Syllables |
Complete set of modern Hangul. |
Surrogates |
Code points designed to extend the range of Unicode within the ISO 10646 encoding scheme. |
Private Use Area |
The Private Use Area contains characters for sharing site-defined characters from the KanjiEBCDIC, KanjiEUC, and KanjiShift-JIS client character sets. |
Compatibility Zone |
The Compatibility Zone contains halfwidth and fullwidth variants of characters defined by Japanese standards and, among others, includes Hankaku (halfwidth) Katakana and fullwidth ASCII characters. |
The first 1880 characters of the Private Use Area are used for sharing site-defined characters from KanjiEBCDIC, KanjiEUC, and KanjiShift-JIS client character sets.
The following table defines the Teradata Database use of the Private Use Area.
Name |
Unicode |
KanjiShift-JIS |
KanjiEUC |
KanjiEBCDIC |
Comments |
Gaiji-1 |
U+E000 to U+E3AB |
0xF040 to 0xF4FC (1st 940 Gaiji) |
0xF5A1 to 0xFEFE (Rows 85-94 JIS X 0208) |
0x6941 to 0x6DF4 (1st 940 Gaiji) |
Shared by all. |
Gaiji-2 |
U+E3AC to U+E757 |
0xF540 to 0xF9FC (2nd 940 Gaiji) |
0x8FF5A1 to 0x8FFEFE (Rows 85-94 JIS X 0212) |
0x6DF5 to 0x72EA (2nd 940 Gaiji) |
Shared by all. |
Graphic Error Character |
U+F8FF |
Not applicable |
Not applicable |
Not applicable |
Associated with the VARGRAPHIC function |
For additional characteristics of the UNICODE server character set, see SQL Data Types and Literals.