CHARSET_COLL Collation - Teradata Database

International Character Set Support

Product

Teradata Database

Release Number

15.10

Language

English (United States)

Last Update

2018-09-25

dita:id

B035-1125

lifecycle

Product Category

Teradata® Database

The CHARSET_COLL collation produces a binary ordering based upon the current client character set. The NOT CASESPECIFIC version is designed to produce the results if the strings were converted to uppercase and then sorted in binary order on the client.

CHARSET_COLL gives you a collation that matches the client character set. For example, if the client character set is KANJIEBCDIC5035_0I, then the collation matches KANJIEBCDIC5035_0I order (rather than EBCDIC order).

Strings are compared character-by-character.

The comparison rules for CHARSET_COLL are:

If one string is shorter, it is padded with the pad character for the character set.

If the comparison is not case specific, lowercase characters are mapped to their uppercase counterparts.

If the strings are now identical, the equality relation holds. Otherwise, the first pair of characters that are not equal determine the collating sequence.

If both characters are in the repertoire of the current client character set, then the binary ordering of the two characters in the client form-of-use becomes the ordering of the two strings.

If one of the characters is not within the repertoire of the current client character set, then the error character is used as the collation point for that character.

If both characters being compared are outside the repertoire of the current client character set, then the binary ordering of the characters (case blind or case specific, as appropriate) in the Unicode form-of-use becomes the ordering of the two strings.

Kanji data

CHARSET_COLL is of limited use with the KANJI1 server character set.

KANJI1 character data can contain mixed single-byte/multibyte characters. Single-byte characters are translated into the Teradata Database form-of-use and multibyte characters are not translated.

Single-byte characters are collated based on the current character set and multibyte characters based on their internal value.

For KanjiEBCDIC and KanjiShift-JIS client character sets, the collation is like a binary sort on the client.

For a KanjiEUC client character set, the collation is like Kanji Phase I ASCII collation.

The distinction between this and a binary sort on the client is that the JIS X 0208 characters collate before, rather than after, the JIS X 0212 characters.

You can specify CHARSET_COLL as the default user collation with the CREATE USER or MODIFY USER statements.

You can also use the SQL SET SESSION COLLATION CHARSET_COLL statement to override any user defaults.

For information on...

See...

Teradata Database collating conventions

“Comparison Operators” and “Comparisons for KANJI1 Characters” in SQL Functions, Operators, Expressions, and Predicates

“ORDER BY Clause” in SQL Data Manipulation Language