16.10 - Comparison Rules - Teradata Database

Teradata Database International Character Set Support

prodname
Teradata Database
vrm_release
16.10
created_date
June 2017
category
Configuration
User Guide
featnum
B035-1125-161K

Strings are compared character-by-character.

The comparison rules for CHARSET_COLL are:

  • If one string is shorter, it is padded with the pad character for the character set.
  • If the comparison is not case specific, lowercase characters are mapped to their uppercase counterparts.
  • If the strings are now identical, the equality relation holds. Otherwise, the first pair of characters that are not equal determine the collating sequence.
  • If both characters are in the repertoire of the current client character set, then the binary ordering of the two characters in the client form-of-use becomes the ordering of the two strings.
  • If one of the characters is not within the repertoire of the current client character set, then the error character is used as the collation point for that character.
  • If both characters being compared are outside the repertoire of the current client character set, then the binary ordering of the characters (case blind or case specific, as appropriate) in the Unicode form-of-use becomes the ordering of the two strings.
  • Kanji data
    • CHARSET_COLL is of limited use with the KANJI1 server character set.

      KANJI1 character data can contain mixed single-byte/multibyte characters. Single-byte characters are translated into the Teradata Database form-of-use and multibyte characters are not translated.

    • Single-byte characters are collated based on the current character set and multibyte characters based on their internal value.
    • For KanjiEBCDIC and KanjiShift-JIS client character sets, the collation is like a binary sort on the client.
    • For a KanjiEUC client character set, the collation is like Kanji Phase I ASCII collation.

      The distinction between this and a binary sort on the client is that the JIS X 0208 characters collate before, rather than after, the JIS X 0212 characters.