CHARSET - Teradata Director Program

Teradata® Director Program Reference - 17.20

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
Lake
VMware
Product
Teradata Director Program
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2023-11-17
dita:mapPath
cki1641301536147.ditamap
dita:ditavalPath
obe1474387269547.ditaval
dita:id
frc1470439950465
Product Category
Teradata Tools and Utilities

The CHARSET directive explicitly begins a definition and possibly the encoding scheme.

Syntax



Usage Notes

NAME identifies the character set to which the description applies. The name might include a standard suffix that defines the encoding scheme. The standard suffix consists of an underscore, a number not relevant to CLIv2, the encoding character (A, E, I, R, S, T, or U), and an optional character not relevant to CLIv2. Each suffix corresponds to an ENCODING operand value:
  • E – EBCDIC
  • I – IBMSOSI
  • A – ASCII
  • R – BIGFIVE
  • S – SJIS
  • T – EUC-CN or EUC-KR
  • U – EUC-JP

ENCODING optionally identifies the encoding scheme for the character set. If omitted, the character set must contain a standard suffix that indicates the encoding. If such a suffix exists, then the encoding cannot be overridden using this operand. The following character sets are available in TDP.

ENCODING Meaning Characteristics
EBCDIC Extended Binary-Coded-Decimal Interchange Code
  • Single-byte (EBCDIC) codepoints:

    X'00' through X'FF'

IBMOSI IBM Shift-out/Shift-in
  • Single-byte (EBCDIC) codepoints:

    X'00' through X'FF'

  • Double-byte (EBDCIC) codepoints:

    Shift-out (X'0E') through Shift-in (X'0E')

ASCII American Standard Code for Information Interchange
  • Single-byte (ASCII) codepoints:

    X'00' through X'FF'

BIGFIVE Big Five Plus
  • Single-byte (ASCII) codepoints:

    X'00' through X'80', and X'FF'

  • Double-byte (ASCII) codepoints:

    X'81' through X'FE'

EUC-CN Extended Unix Code - China
  • Single-byte (ASCII) codepoints:

    X'00' through X'7F'

  • Double-byte (ASCII) codepoints:

    X'80' through X'FF'

EUC-JP Extended Unix Code - Japan
  • Single-byte (ASCII) codepoints:

    X'00' through X'8D'

    X'90' through X'FF'

  • Double-byte (ASCII) codepoints:

    Single-shift1 (X'8E')

  • Triple-byte (ASCII) codepoints:

    Single-shift2 (X'8F)'

EUC-KR Extended Unix Code - Korea
  • Single-byte (ASCII) codepoints:

    X'00' through X'7F'

  • Double-byte (ASCII) codepoints:

    X'80' through X'FF'

SJIS Shift-JIS (Japanese Industrial Standard)
  • Single-byte (ASCII) codepoints:

    X'00' through X'80'

    X'A0' through X'DF'

    X'FD' through X'FF'

  • Double-byte (ASCII) codepoints:

    X'81' through X'9F'

    X'E0' through X'FC'

UHC Unified Hangul Code
  • Single-byte (ASCII) codepoints:

    X'00' through X'80', and X'FF'

  • Double-byte (ASCII) codepoints:

    X'81' through X'FE'

UTF8 UCS (Universal Character Set) Transformation Format 8-bit
  • Single-byte (Unicode®) codepoints:

    X'00' through X'7F'

  • Double-byte (Unicode) codepoints:

    X'C0' through X'DF'

  • (Most) triple-byte (Unicode) codepoints:

    X'E0' through X'FE'

Most four-byte codepoints (X'F0' through X'F4') are not supported by the database.

UTF16 UCS (Universal Character Set) Transformation Format- 16-bit
  • Single-byte (Unicode) codepoints:

    X'0000' through X'D7FF'

    X'E000' through X'FFFF'

Surrogates (four-byte codepoints that begin or end with the two-byte codpoints X'D800' through X'DBFF') are not supported by the database.

When the NAME operand is specified, if this name does not match the character set name specified on the SET USERCS command, this directive and all directives until the next CHARSET directive are ignored. When the NAME operand is not specified, then this directive is used, which implies that any subsequent CHARSET directives in the file will never be processed since this one will always be used.

While all codepoints are reflected to and from the database, for character sets that allow mixtures of single and multi-byte characters, only the single-byte characters are meaningful in TDP command syntax.

Example: CHARSET

Begin definition for IBM Code Page 833, the single-byte component for IBM CCSID 933.

CHARSET NAME KOREAN_EBCDIC933 ENCODING IBMSOSI