About Collation Sequences - Teradata Database

Collations control character ordering and comparison operations during Teradata Database sessions.

Collations are designed as single level or two level. A two-level collation orders character strings according to a two-level comparison.

Characters are first partitioned into equivalence classes that have the same collating value. The relative ordering of classes and characters within a class is significant.

Comparisons obey the following rules:

All characters in a class have the same collation value.

A character from class i is less than any character from class i+1.

The process is as follows:

1 Convert characters in the strings to be compared into equivalence classes.

2 Compare the strings.

IF the strings are…	THEN processing …
not equal	stops.
equal	continues.

3 Order characters within each class using criteria defined for the collation sequence, and compare.

The MULTINATIONAL Norwegian standard collation sequence is an example of a two-level collation.

The Teradata Database offers five standard collation sequences in which data can be defined as CASESPECIFIC or NOT CASESPECIFIC. This affects how the five collation sequences collate and compare data.

The five collations, determined either by default or explicit use of the SET SESSION COLLATION statement, are:

ASCII

EBCDIC

CHARSET_COLL

JIS_COLL

MULTINATIONAL

CASESPECIFIC or NOT CASESPECIFIC can be chosen at table definition time, or specified as part of the SQL statement.

The default collation sequence is based upon the client type:

EBCDIC for mainframe clients

ASCII for all other clients

Collation sequence ordering is as follows:

ASCII collation orders the data essentially as would converting the data to Teradata extended ASCII (the LATIN server character set), and then using a binary ordering of the resulting byte string.

EBCDIC collation orders the data essentially as would converting the data to Teradata extended EBCDIC, and then using a binary ordering of the resulting bytes.

CHARSET_COLL collation orders the data essentially as would converting the string to the current client character set, and then using a binary ordering of the resulting byte string.

JIS_Coll collation collates in the order of the Japanese Industrial Standards.

MULTINATIONAL collation provides more culturally aware ordering of data.

The predefined multinational collation options are:

Teradata Standard Multinational (the initial default)

Swedish

Norwegian

Katakana_Standard

Kanji5026_Standard

Kanji5035_Standard

Further multinational collation options can be loaded using scripts. The database administrator can alter MULTINATIONAL collation.

When collation is set to MULTINATIONAL, the default sequence currently installed is used. This can either be one of the predefined sequences, supplied with the Teradata Database, or a sequence you have defined and installed.

You can execute predefined macros to change the default to Swedish, Norwegian, or the appropriate Japanese standard collation. You can also define and install your own collation, as explained in “Defining Your Own Collation Sequence” on page 119.

If all the items being compared or collated are determined to be NOT CASESPECIFIC, the collation works as if all characters that have an uppercase counterpart were converted to uppercase before being processed through ASCII, EBCDIC, CHARSET_COLL or JIS_COLL collation.

Collation can be set or changed several different ways:

At the user-definition level with the CREATE USER or MODIFY USER statements.

At the session level with the SQL SET SESSION COLLATION statement.

You can use predefined macros to change the collation default to Swedish, Norwegian, or the appropriate Japanese standard collation.

Note: Katakana_Standard, Kanji5026_Standard, and Kanji5035_Standard are designed for the KANJI1 server character set and should not be used with other server character sets. Similarly, the other predefined collation options should not be used for KANJI1 data.

You can also define and install your own collation sequences, as explained in “Defining Your Own Collation Sequence” on page 119.

Use the HELP SESSION statement to display the collation currently in effect for your session.

Selected collation and character mapping tables are described in the following text files, which are available only on CD-ROM or on the Teradata Information Products website at
http://www.info.teradata.com/.

File Name (on CD)	Title (on the Web)	Description
A6A0SUCD.txt	ARABIC1256_6A0 to Unicode	Maps ARABIC1256 to Unicode.
blinddef.txt	Multinational Case Blind Default Collation	Defines the default for Multinational Case Blind collation.
C1RMUNCD.txt	TCHBIG5_1R0 Multibyte to Unicode	Maps the multibyte character portion of TCHBIG5 to Unicode.
C1RSUNCD.txt	TCHBIG5_1R0 Single Byte to Unicode	Maps the single-byte character portion of TCHBIG5 to Unicode.
C1T0UNCD.txt	SCHGB2312_1T0 Code Set 0 to Unicode	Maps SCHGB2312 Code Set 0 to corresponding Latin letters of Unicode.
C1T1UNCD.txt	SCHGB2312_1T0 Code Set 1 to Unicode	Maps SCHGB2312 Code Set 1 to Unicode.
C2IMUNCD.txt	SCHEBCDIC935_2IJ Multibyte to Unicode	Maps the multibyte character portion of SCHEBCDIC935 to Unicode.
C2ISUNCD.txt	SCHEBCDIC935_2IJ Single Byte to Unicode	Maps the single-byte character portion of SCHEBCDIC935 to Unicode.
C3IMUNCD.txt	TCHEBCDIC937_3IB Multibyte to Unicode	Maps the multibyte character portion of TCHEBCDIC937 to Unicode.
C2A0SUCD.txt	CYRILLIC1251_2A0 to Unicode	Maps CYRILLIC1251 to Unicode.
blinddef.txt	Multinational Case Blind Default Collation	Defines the default for Multinational Case Blind collation.
EUC1UNCD.txt	KanjiEUC Code Set 1 to Unicode	Maps KanjiEUC Code Set 1 characters (JIS-x0208) to their Unicode equivalents.
EUC2UNCD.txt	KanjiEUC Code Set 2 to Unicode	Maps KanjiEUC Code Set 2 characters (JIS-x0201 Katakana) to their Unicode equivalents.
EUC3UNCD.txt	KanjiEUC Code Set 3 to Unicode	Maps KanjiEUC Code Set 3 characters (JIS-x0212) to their Unicode equivalents.
H1IMUNCD.txt	HANGULEBCDIC933_1II Multibyte to Unicode	Maps the multibyte character portion of HANGULEBCDIC933 to Unicode.
H1ISUNCD.txt	HANGULEBCDIC933_1II Single Byte to Unicode	Maps the single-byte character portion of HANGULEBCDIC933 to Unicode.
H2RMUNCD.txt	HANGULKSC5601_2R4 Multibyte to Unicode	Maps the multibyte character portion of HANGULKSC5601 to Unicode.
H2RSUNCD.txt	HANGULKSC5601_2R4 Single Byte to Unicode	Maps the single-byte character portion of HANGULKSC5601 to Unicode.
H5A0SUCD.txt	HEBREW1255_5A0 to Unicode	Maps HEBREW1255 to Unicode.
H7R0MUCD.txt	HANGUL949_7R0 Multibyte to Unicode	Maps the multibyte character portion of HANGUL949 to Unicode.
H7R0SUCD.txt	HANGUL949_7R0 Single Byte to Unicode	Maps the single-byte character portion of HANGUL949 to Unicode.
JIS_COLL.txt	JIS_COLL Case-Specific Collation	Defines the JIS_COLL Case-Specific collation.
JISCOLBL.txt	JIS_COLL Case Blind Collation	Defines the JIS_COLL Case Blind collation.
K1S0SUCD.txt	KANJI932_1S0 Single Byte to Unicode	Maps KANJI932 to Unicode
K1S0MUCD.txt	KANJI932_1S0 Multibyte to Unicode	Maps the multibyte character portion of KANJI932 to Unicode
L1A0SUCD.txt	LATIN1250_1A0 to Unicode	Maps LATIN1250 to Unicode.
L3A0SUCD.txt	LATIN1252_3A0 to Unicode	Maps LATIN1252 to Unicode
L7A0SUCD.txt	LATIN1254_7A0 to Unicode	Maps LATIN1254 to Unicode.
L8A0SUCD.txt	LATIN1258_8A0 to Unicode	Maps LATIN1258 to Unicode.
multnatl.txt	Multinational Case-Specific Default Collation	Defines the default for Multinational Case-Specific collation.
S6R0MUCD.txt	SCHINESE936_6R0 Multibyte to Unicode	Maps the multibyte character portion of SCHINESE936 to Unicode.
S6R0SUCD.txt	SCHINESE936_6R0 Single Byte to Unicode	Maps SCHINESE936 to Unicode.
SJISSJIS.txt	KanjiSJIS to KanjiSJIS multibyte	Maps KanjiShiftJIS to KanjiShiftJIS multibyte characters.
SJISUNCD.txt	KanjiSJIS to Unicode multibyte	Maps KanjiShiftJIS characters to their multibyte Unicode equivalents.
SOSIUNCD.txt	KanjiEBCDIC (SO/SI) to Unicode	Maps multibyte character portion of KanjiEBCDIC (SO/SI) to Unicode.
T4A0SUCD.txt	THAI874_4A0 Single Byte to Unicode	Maps THAI874 to Unicode.
T8R0MUCD.txt	TCHINESE950_8R0 Multibyte to Unicode	Maps the multibyte character portion of TCHINESE950 to Unicode.
T8R0SUCD.txt	TCHINESE950_8R0 Single Byte to Unicode	Maps TCHINESE950 to Unicode.
UNCDUNCD.txt	Unicode to Unicode	Lists the characters supported by the UNICODE server character set.
UNCDVARG.txt	Unicode to Vargraphic	Used by the VARGRAPHIC function. Valid characters of Graphic are mapped to themselves.
UNCDSJIS.txt	Unicode to KanjiSJIS	Maps Unicode characters to their KanjiShiftJIS multibyte equivalents.
UNCDE123.txt	Unicode to KanjiEUC Sets 1, 2, 3	Maps Unicode characters to KanjiEUC Code Set 1, 2, and 3 (JIS-x0208) as UNIX Process Code (UPC).
UNCDSOSI.txt	Unicode to KanjiEBCDIC (SO/SI)	Maps Unicode characters to the multibyte character portion of KanjiEBCDIC (SO/SI).
UOBJNSTD.txt	Unicode in Object Names on standard language support systems	Lists characters from the UNICODE server character set that are allowed in object names on standard language support systems.
UOBJNJAP.txt	Unicode in Object Names on Japanese language support systems	Lists characters from the UNICODE server character set that are allowed in object names on Japanese language support systems.