Creating a Custom Client Character Set - Teradata Database

International Character Set Support

Product
Teradata Database
Release Number
15.10
Language
English (United States)
Last Update
2018-09-25
dita:id
B035-1125
lifecycle
previous
Product Category
Teradata® Database

To create a custom character set, do the following:

1 Observe applicable restrictions. See “Restrictions on Redefining Characters” on page 72.

2 Define the character set translation code. See “Defining the Custom Character Translation Codes” on page 73.

3 Name the character set. See “Naming a Custom Character Set” on page 74.

Site‑defined character set restrictions for networked and mainframe clients are described in the following sections.

Note: Network clients use ASCII-compatible character sets and mainframe clients are often EBCDIC-based.

ASCII

Site-defined character sets based on the ASCII character set must preserve the definition of code points 0-127. This means that you may only redefine those characters whose numeric value is 128 or greater.

EBCDIC

When you create a custom character set based on the EBCDIC character set, you must retain a certain subset of the EBCDIC characters. The required characters are identified as shaded code points in “Required EBCDIC Characters” on page 73.

The shaded code points must remain as defined, and the site‑defined translation tables must map these code points into the correct internal Teradata Database representations.

Required EBCDIC Characters

1 Copy the existing translation codes for the Teradata character set that is closest to the one you want to define

2 Make changes.

3 Save.

Collation

When you create a custom character translation code/character set, the Teradata Database automatically generates a collation matching the order of the site-defined character set. To use the collation when you use the site-defined character set, choose the CHARSET_COLL collation.

Each custom character set must have a unique name (CharSetName), 1-30 characters in length, in the form:

CHARACTERSETNAME_0X

where:

CHARACTERSETNAME identifies the character set.

  • The name must begin with an unaccented, uppercase Roman letter.
  • The rest of the name can use only unaccented uppercase Roman letters, digits, the underscore, the dollar sign, or the number sign.
  • _0x is the suffix (required), composed of a _ (LOWLINE) character, followed by a numeric character and then an alpha character.

  • The numeric character must be zero. Other values are reserved for future use.
  • The alpha character indicates the type of encoding used in the character set, as shown in the following table:
  •  

    Code

    Description

    a

    Single-byte character ASCII.

    e

    Single-byte character EBCDIC.

    i

    IBM (SO/SI style) mixed character single- and multibyte characters.

    u

    EUC mixed character single-byte characters/multibyte characters.

    s

    Shift‑JIS mixed character single-byte characters/multibyte characters and graphic multibyte characters.

    b-d, f-h, j-r, t, v-z

    Single- and multibyte extended site-defined character sets.

    For example, the suffix _0i in the name KANJIEBCDIC5026_0i implies IBM mixed single- and multibyte characters.

    Common character set names, such as ASCII, EBCDIC and UTF8 cannot be used for site-defined character sets. It also recognizes the special character set name KATAKANAEDBCDIC that is used to specify characteristics similar to the suffix “_0i”. Other site-defined character set names are given characteristics as if the name ends in either “_0a” or “_0e”.

    Referencing a Client Character Set Name

    The suffix is part of the client character set name. The complete name must be included in any references to the character set, for example, in a BTEQ SET SESSION CHARSET command.