MULTINATIONAL Collation for Extended Site-Defined Character Sets - Teradata Database

International Character Set Support

Product
Teradata Database
Release Number
15.10
Language
English (United States)
Last Update
2018-09-25
dita:id
B035-1125
lifecycle
previous
Product Category
Teradata® Database

If the CHARSET_COLL collation and the supplied default MULTINATIONAL collation are not sufficient to support the desired language, the MULTINATIONAL collation can be modified to provide support for an extended site-defined character set.

Because modifying MULTINATIONAL collation is a complex task, avoid it if at all possible.

Modify the following files in the TPA etc or TPA cfg directory:

  • latin1multinationalcb.z
  • latin1multinationalcs.z
  • unicodemultinationalcb.z
  • unicodemultinationalcs.z
  • Back up these files so that you can restore them, if required.

    The latin1 files determine the Standard MULTINATIONAL default collation for LATIN and KANJI1 data, and the unicode files determine the MULTINATIONAL collation for Unicode data.

    Note: A site-defined MULTINATIONAL collation overrides the definition in the preceding files. For more information, see “Changing the Standard Multinational Default Collation” on page 116 and “Defining Your Own Collation Sequence” on page 119.

    The files with suffix cb.z handle case blind (NOT CASESPECIFIC) collation. The files with suffix cs.z handle case specific (CASESPECIFIC) comparison.

    Generate the cs.z files first, and then create the cb.z files from copies of the cs.z files. Next, modify the cb.z file so that one of the characters in a case pair matches the weight of the other character in the same case pair. If the files are not properly synchronized, then unpredictable results may ensue.

    The format for each file is one character weighting definition per line. The first item is the character to be weighted (in a special hexadecimal Unicode-based format), followed by the primary and secondary weights (also expressed in a special hexadecimal Unicode-based format), which are separated by the semicolon character.

    When used as a weight, the line of the file in which the Unicode character first occurs determines the relative weight, earlier characters indicate earlier weights. Forward references are not allowed.

    As Unicode is a very large set, it may be best to attempt modifying existing files rather than starting from scratch. Since the startup routine that reads these files is very format sensitive, formatting should be followed precisely.

    A % indicates the start of a comment that continues to the end of line. Blank lines and comment only lines are illegal.

    Although part of the system, the collation files are also available on CD and on the Web at http://www.info.teradata.com/.

    Note: Do not install site-defined MULTINATIONAL collation if you take the file modification approach described here. That way MULTINATIONAL collation is exactly as defined by modified Standard Multinational Default collation files.

    After you set up the files and DBC.CollationsV, you must perform a tpareset for the MULTINATIONAL collation changes to take effect.