About Collation Sequences

Teradata Database International Character Set Support

brand
Software
prodname
Teradata Database
vrm_release
15.00
15.10
category
Configuration
User Guide
featnum
B035-1125-015K

Collations control character ordering and comparison operations during Teradata Database sessions.

Collations are designed as single level or two level. A two-level collation orders character strings according to a two-level comparison.

Characters are first partitioned into equivalence classes that have the same collating value. The relative ordering of classes and characters within a class is significant.

Comparisons obey the following rules:

  • All characters in a class have the same collation value.
  • A character from class i is less than any character from class i+1.
  • The process is as follows:

    1 Convert characters in the strings to be compared into equivalence classes.

    2 Compare the strings.

     

    IF the strings are…

    THEN processing …

    not equal

    stops.

    equal

    continues.

    3 Order characters within each class using criteria defined for the collation sequence, and compare.

    The MULTINATIONAL Norwegian standard collation sequence is an example of a two-level collation.

    The Teradata Database offers five standard collation sequences in which data can be defined as CASESPECIFIC or NOT CASESPECIFIC. This affects how the five collation sequences collate and compare data.

    The five collations, determined either by default or explicit use of the SET SESSION COLLATION statement, are:

  • ASCII
  • EBCDIC
  • CHARSET_COLL
  • JIS_COLL
  • MULTINATIONAL
  • CASESPECIFIC or NOT CASESPECIFIC can be chosen at table definition time, or specified as part of the SQL statement.

    The default collation sequence is based upon the client type:

  • EBCDIC for mainframe clients
  • ASCII for all other clients
  • Collation sequence ordering is as follows:

  • ASCII collation orders the data essentially as would converting the data to Teradata extended ASCII (the LATIN server character set), and then using a binary ordering of the resulting byte string.
  • EBCDIC collation orders the data essentially as would converting the data to Teradata extended EBCDIC, and then using a binary ordering of the resulting bytes.
  • CHARSET_COLL collation orders the data essentially as would converting the string to the current client character set, and then using a binary ordering of the resulting byte string.
  • JIS_Coll collation collates in the order of the Japanese Industrial Standards.
  • MULTINATIONAL collation provides more culturally aware ordering of data.
  • The predefined multinational collation options are:

  • Teradata Standard Multinational (the initial default)
  • Swedish
  • Norwegian
  • Katakana_Standard
  • Kanji5026_Standard
  • Kanji5035_Standard
  • Further multinational collation options can be loaded using scripts. The database administrator can alter MULTINATIONAL collation.

    When collation is set to MULTINATIONAL, the default sequence currently installed is used. This can either be one of the predefined sequences, supplied with the Teradata Database, or a sequence you have defined and installed.

    You can execute predefined macros to change the default to Swedish, Norwegian, or the appropriate Japanese standard collation. You can also define and install your own collation, as explained in “Defining Your Own Collation Sequence” on page 119.

    If all the items being compared or collated are determined to be NOT CASESPECIFIC, the collation works as if all characters that have an uppercase counterpart were converted to uppercase before being processed through ASCII, EBCDIC, CHARSET_COLL or JIS_COLL collation.

    Collation can be set or changed several different ways:

  • At the user-definition level with the CREATE USER or MODIFY USER statements.
  • At the session level with the SQL SET SESSION COLLATION statement.
  • You can use predefined macros to change the collation default to Swedish, Norwegian, or the appropriate Japanese standard collation.

    Note: Katakana_Standard, Kanji5026_Standard, and Kanji5035_Standard are designed for the KANJI1 server character set and should not be used with other server character sets. Similarly, the other predefined collation options should not be used for KANJI1 data.

    You can also define and install your own collation sequences, as explained in “Defining Your Own Collation Sequence” on page 119.

    Use the HELP SESSION statement to display the collation currently in effect for your session.

    Selected collation and character mapping tables are described in the following text files, which are available only on CD-ROM or on the Teradata Information Products website at
    http://www.info.teradata.com/.

     

    File Name (on CD)

    Title (on the Web)

    Description

    A6A0SUCD.txt

    ARABIC1256_6A0 to Unicode

    Maps ARABIC1256 to Unicode.

    blinddef.txt

    Multinational Case Blind Default Collation

    Defines the default for Multinational Case Blind collation.

    C1RMUNCD.txt

    TCHBIG5_1R0 Multibyte to Unicode

    Maps the multibyte character portion of TCHBIG5 to Unicode.

    C1RSUNCD.txt

    TCHBIG5_1R0 Single Byte to Unicode

    Maps the single-byte character portion of TCHBIG5 to Unicode.

    C1T0UNCD.txt

    SCHGB2312_1T0 Code Set 0 to Unicode

    Maps SCHGB2312 Code Set 0 to corresponding Latin letters of Unicode.

    C1T1UNCD.txt

    SCHGB2312_1T0 Code Set 1 to Unicode

    Maps SCHGB2312 Code Set 1 to Unicode.

    C2IMUNCD.txt

    SCHEBCDIC935_2IJ Multibyte to Unicode

    Maps the multibyte character portion of SCHEBCDIC935 to Unicode.

    C2ISUNCD.txt

    SCHEBCDIC935_2IJ Single Byte to Unicode

    Maps the single-byte character portion of SCHEBCDIC935 to Unicode.

    C3IMUNCD.txt

    TCHEBCDIC937_3IB Multibyte to Unicode

    Maps the multibyte character portion of TCHEBCDIC937 to Unicode.

    C2A0SUCD.txt

    CYRILLIC1251_2A0 to Unicode

    Maps CYRILLIC1251 to Unicode.

    blinddef.txt

    Multinational Case Blind Default Collation

    Defines the default for Multinational Case Blind collation.

    EUC1UNCD.txt

    KanjiEUC Code Set 1 to Unicode

    Maps KanjiEUC Code Set 1 characters (JIS-x0208) to their Unicode equivalents.

    EUC2UNCD.txt

    KanjiEUC Code Set 2 to Unicode

    Maps KanjiEUC Code Set 2 characters (JIS-x0201 Katakana) to their Unicode equivalents.

    EUC3UNCD.txt

    KanjiEUC Code Set 3 to Unicode

    Maps KanjiEUC Code Set 3 characters (JIS-x0212) to their Unicode equivalents.

    H1IMUNCD.txt

    HANGULEBCDIC933_1II Multibyte to Unicode

    Maps the multibyte character portion of HANGULEBCDIC933 to Unicode.

    H1ISUNCD.txt

    HANGULEBCDIC933_1II Single Byte to Unicode

    Maps the single-byte character portion of HANGULEBCDIC933 to Unicode.

    H2RMUNCD.txt

    HANGULKSC5601_2R4 Multibyte to Unicode

    Maps the multibyte character portion of HANGULKSC5601 to Unicode.

    H2RSUNCD.txt

    HANGULKSC5601_2R4 Single Byte to Unicode

    Maps the single-byte character portion of HANGULKSC5601 to Unicode.

    H5A0SUCD.txt

    HEBREW1255_5A0 to Unicode

    Maps HEBREW1255 to Unicode.

    H7R0MUCD.txt

    HANGUL949_7R0 Multibyte to Unicode

    Maps the multibyte character portion of HANGUL949 to Unicode.

    H7R0SUCD.txt

    HANGUL949_7R0 Single Byte to Unicode

    Maps the single-byte character portion of HANGUL949 to Unicode.

    JIS_COLL.txt

    JIS_COLL Case-Specific Collation

    Defines the JIS_COLL Case-Specific collation.

    JISCOLBL.txt

    JIS_COLL Case Blind Collation

    Defines the JIS_COLL Case Blind collation.

    K1S0SUCD.txt

    KANJI932_1S0 Single Byte to Unicode

    Maps KANJI932 to Unicode

    K1S0MUCD.txt

    KANJI932_1S0 Multibyte to Unicode

    Maps the multibyte character portion of KANJI932 to Unicode

    L1A0SUCD.txt

    LATIN1250_1A0 to Unicode

    Maps LATIN1250 to Unicode.

    L3A0SUCD.txt

    LATIN1252_3A0 to Unicode

    Maps LATIN1252 to Unicode

    L7A0SUCD.txt

    LATIN1254_7A0 to Unicode

    Maps LATIN1254 to Unicode.

    L8A0SUCD.txt

    LATIN1258_8A0 to Unicode

    Maps LATIN1258 to Unicode.

    multnatl.txt

    Multinational Case-Specific Default Collation

    Defines the default for Multinational Case-Specific collation.

    S6R0MUCD.txt

    SCHINESE936_6R0 Multibyte to Unicode

    Maps the multibyte character portion of SCHINESE936 to Unicode.

    S6R0SUCD.txt

    SCHINESE936_6R0 Single Byte to Unicode

    Maps SCHINESE936 to Unicode.

    SJISSJIS.txt

    KanjiSJIS to KanjiSJIS multibyte

    Maps KanjiShiftJIS to KanjiShiftJIS multibyte characters.

    SJISUNCD.txt

    KanjiSJIS to Unicode multibyte

    Maps KanjiShiftJIS characters to their multibyte Unicode equivalents.

    SOSIUNCD.txt

    KanjiEBCDIC (SO/SI) to Unicode

    Maps multibyte character portion of KanjiEBCDIC (SO/SI) to Unicode.

    T4A0SUCD.txt

    THAI874_4A0 Single Byte to Unicode

    Maps THAI874 to Unicode.

    T8R0MUCD.txt

    TCHINESE950_8R0 Multibyte to Unicode

    Maps the multibyte character portion of TCHINESE950 to Unicode.

    T8R0SUCD.txt

    TCHINESE950_8R0 Single Byte to Unicode

    Maps TCHINESE950 to Unicode.

    UNCDUNCD.txt

    Unicode to Unicode

    Lists the characters supported by the UNICODE server character set.

    UNCDVARG.txt

    Unicode to Vargraphic

  • Used by the VARGRAPHIC function.
  • Valid characters of Graphic are mapped to themselves.
  • UNCDSJIS.txt

    Unicode to KanjiSJIS

    Maps Unicode characters to their KanjiShiftJIS multibyte equivalents.

    UNCDE123.txt

    Unicode to KanjiEUC Sets 1, 2, 3

    Maps Unicode characters to KanjiEUC Code Set 1, 2, and 3 (JIS-x0208) as UNIX Process Code (UPC).

    UNCDSOSI.txt

    Unicode to KanjiEBCDIC (SO/SI)

    Maps Unicode characters to the multibyte character portion of KanjiEBCDIC (SO/SI).

    UOBJNSTD.txt

    Unicode in Object Names on standard language support systems

    Lists characters from the UNICODE server character set that are allowed in object names on standard language support systems.

    UOBJNJAP.txt

    Unicode in Object Names on Japanese language support systems

    Lists characters from the UNICODE server character set that are allowed in object names on Japanese language support systems.