An extended site-defined character set defines the mapping of hexadecimal values to characters for single- and multibyte components of a character set. If you use a non-Western European language such as Russian, Arabic, or Urdu, you can define and install your own single-byte character set. If you use a non-Western European language such as Japanese, Korean, or Chinese, and the Teradata-supplied character sets are not entirely sufficient for your site, you can define and install your own multibyte character set.
Extended site-defined character sets can support, with certain constraints, any subset of the Unicode repertoire.
A user who is sufficiently privileged can define the relevant client character set, mapping bytes from the client to their corresponding Unicode values.
Character data entered using extended site-defined client character sets should be stored in columns defined as Unicode. The UNICODE server character set requires two bytes of storage per character so that a CHAR(5) CHARACTER SET UNICODE field occupies 10 bytes of storage.
Given the 64000 byte limit on column size, a column cannot exceed 32000 characters. Furthermore, the combination of character data and other data types cannot exceed the 64000 byte limit on row size.
A user who is sufficiently privileged can also define an appropriate collation.
If a custom collation is required, and CHARSET_COLL collation does not produce the desired result, then you can modify the MULTINATIONAL collation. For information, see “MULTINATIONAL Collation for Extended Site-Defined Character Sets” on page 124.
Only sufficiently privileged users can define extended site-defined character sets and collations. A sufficiently privileged user is one who can:
Teradata ships the following characters sets that you can use as examples of extended site-defined character sets: