15.10 - About Extended Character Sets - Teradata Database

Teradata Database International Character Set Support

Teradata Database
User Guide

An extended site-defined character set defines the mapping of hexadecimal values to characters for single- and multibyte components of a character set. If you use a non-Western European language such as Russian, Arabic, or Urdu, you can define and install your own single-byte character set. If you use a non-Western European language such as Japanese, Korean, or Chinese, and the Teradata-supplied character sets are not entirely sufficient for your site, you can define and install your own multibyte character set.

Extended site-defined character sets can support, with certain constraints, any subset of the Unicode repertoire.

A user who is sufficiently privileged can define the relevant client character set, mapping bytes from the client to their corresponding Unicode values.

Character data entered using extended site-defined client character sets should be stored in columns defined as Unicode. The UNICODE server character set requires two bytes of storage per character so that a CHAR(5) CHARACTER SET UNICODE field occupies 10 bytes of storage.

Given the 64000 byte limit on column size, a column cannot exceed 32000 characters. Furthermore, the combination of character data and other data types cannot exceed the 64000 byte limit on row size.

A user who is sufficiently privileged can also define an appropriate collation.

If a custom collation is required, and CHARSET_COLL collation does not produce the desired result, then you can modify the MULTINATIONAL collation. For information, see “MULTINATIONAL Collation for Extended Site-Defined Character Sets” on page 124.

Only sufficiently privileged users can define extended site-defined character sets and collations. A sufficiently privileged user is one who can:

  • Edit and place files in the appropriate directories on every node in the Teradata Database.
  • Modify records in DBC.CharTranslationsV and DBC.CollationsV.
  • Restart the Teradata Database.
  • Teradata ships the following characters sets that you can use as examples of extended site-defined character sets:

  • SCHGB2312_1T0
  • TCHBIG5_1R0
  • HANGULKSC5601_2R4
  • KANJI932_1S0
  • LATIN1252_3A0
  • LATIN1250_1A0
  • LATIN1254_7A0
  • LATIN1258_8A0
  • SCHINESE936_6R0
  • HANGUL949_7R0
  • HEBREW1255_5A0
  • ARABIC1256_6A0
  • CYRILLIC1251_2A0
  • THAI1874_4A0
  • TCHINESE950_8R0