15.10 - Character Set Specification - MultiLoad

Teradata MultiLoad Reference

prodname
MultiLoad
vrm_release
15.10
category
Programming Reference
featnum
B035-2409-035K

Character Set Specification

Teradata Database for a UNIX system and Teradata for Windows allow establish a character set when invoking Teradata MultiLoad. Table 13 lists the character sets supported by Teradata MultiLoad. Character sets containing “EBCDIC” as part of the name are for mainframe-attached clients; all others are for network-attached clients.

 

Table 13: Character Sets Supported by Teradata MultiLoad 

Character Set Name

Description

Configuration

ASCII

Latin

Network-attached

EBCDIC

Latin

Mainframe-attached

HANGULEBCDIC933_1II

Korean

Mainframe-attached

HANGULKSC5601_2R4

Korean

Network-attached

SCHEBCDIC935_2IJ

Simplified Chinese

Mainframe-attached

SCHGB2312_1T0

Simplified Chinese

Network-attached

TCHBIG5_1R0

Traditional Chinese

Network-attached

TCHEBCDIC937_31B

Traditional Chinese

Mainframe-attached

KATAKANAEBCDIC

Japanese

Mainframe-attached

KANJIEBCDIC5026_01

Japanese

Mainframe-attached

KANJIEBCDIC5035_01

Japanese

Mainframe-attached

KANJIEUC_0U

Japanese

Network-attached

KANJISJIS_0S

Japanese

Network-attached

UTF-8 or UTF8

Unicode®

Mainframe-attached

Network-attached

UTF-16 or UTF16

Unicode

Network-attached

Table 14 describes five ways to specify the character set or accept a default specification.

 

Table 14: Methods for Specifying Character Sets 

Method

Description

Configuration File Specification

One of the best ways to specify the character set is with the character set specification in the Teradata MultiLoad configuration file, as described earlier in this chapter:

CHARSET=character-set-name

This allows a standard default character set for several or all Teradata MultiLoad runs, without having to specify the character set explicitly for each run.

Run‑time Parameter Specification

Another good way to specify the character set is with the character set run‑time parameter when Teradata MultiLoad is invoked, as described in “Run‑time Parameters” on page 33:

  • CHARSET=character-set-name for mainframe-attached z/OS client systems
  • -c character-set-name for network-attached UNIX and Windows client systems
  • Client System Specification

    Another way is to specify the character set for the client system before invoking Teradata MultiLoad by configuring the:

  • HSHSPB parameter for mainframe-attached z/OS client systems
  • clispb.dat file for network-attached UNIX and Windows client systems
  • Note: The character-set-name specification used to invoke Teradata MultiLoad always takes precedence over the current client system specification.

    Teradata Database Default

    If a character-set-name specification is not used when Teradata MultiLoad is invoked, and there is no character set specification for the client system, the utility uses the default specification in the Teradata Database system table DBC.Hosts.

    Note: If relying on the DBC.Hosts table specification for the default character set, ensure that the initial logon is in the default character set:

  • EBCDIC for mainframe-attached z/OS client systems
  • ASCII for network-attached UNIX and Windows client systems
  • Teradata MultiLoad Utility Default

    If there is no character set specification in DBC.Hosts, then Teradata MultiLoad defaults to:

  • EBCDIC for mainframe-attached z/OS client systems
  • ASCII for network-attached UNIX and Windows client systems
  • When an AXSMOD is used, Teradata MultiLoad will pass the session character set as an attribute to the AXSMOD for its possible use (most AXSMODs will not make any use of this information). The attribute name will be CHARSET_NAME and the attribute value will be a variable length character string consisting of the character set name.

    Rules for Using Chinese and Korean Character Sets

    Follow these rules when using Chinese and Korean character sets on mainframe-attached and network-attached platforms.

  • Object Names – Object names are limited to A-Z, a-z, 0-9, and special characters such as $ and _.
  • Maximum String Length – Teradata Database requires two bytes to process each of the Chinese or Korean characters. This limits both request size and record size. For example, if a record consists of one string, the length of that string is limited to a maximum of 32,000 characters or 64,000 bytes.
  • For more information about Chinese or Korean character set restrictions for Teradata Database, see International Character Set Support.

    For more information about alternate character sets, see SQL Data Definition Language.

    UTF-8 and UTF-16 Character Sets

    Unicode character sets UTF-8 and UTF-16 are two of the standard ways of encoding Unicode character data.

    The UTF-8 client character set supports UTF-8 encoding and UTF-16 client character set supports UTF-16 encoding.

    Teradata Database supports multi-byte characters in object names when UTF-8 and UTF-16 client character sets are used. If multi-byte characters are used in object names in Teradata MultiLoad script, they must be enclosed in double quotes.

    Do not use the TABLE command when using UTF-8 and UTF-16 client character sets. Instead, specify the layout of the input record.

    There are restrictions imposed by Teradata Database on using the UTF-8 or UTF-16 character set. See International Character Set Support for restriction details.

    UTF-8 Character Sets

    Teradata MultiLoad supports the UTF-8 character set on network-attached platforms and IBM z/OS. When using UTF-8 client character set on IBM z/OS, the job script must be in Teradata EBCDIC. Teradata MultiLoad translates commands in the job script from Teradata EBCDIC to UTF-8 during the load.

    Before using the UTF-8 client character set on a mainframe platform, check the character set definition to determine the code points and the Teradata EBCDIC and Unicode character mapping. Different versions of EBCDIC do not always agree as to the placement of any special characters required in the job script. See International Character Set Support for details. For more information on using the UTF-8 client character set on mainframe platforms, see:

  • nullexpr and fieldexpr command parameters for the FIELD command in “FIELD” on page 127
  • VARTEXT format delimiter and WHERE condition for the IMPORT command in “IMPORT” on page 146
  • CONTINUEIF condition for the LAYOUT command in“LAYOUT” on page 159
  • UTF-16 Character Sets

    Teradata MultiLoad supports the UTF-16 character set on network-attached platforms. In general, the command language and the job output are the same as the client character set used by the job. However, the command language and the job output are not required to be the same as the client character set when using a UTF-16 character set. When using a UTF-16 character set, the job script and the job output can be either UTF-8 or UTF-16 character set, which is specified by the run‑time parameters “-i” and “-u” when the job is invoked.

    For more information on the run‑time parameters, see the parameters -i scriptencoding and -u outputencoding Table 9 on page 35. For more information on using the UTF-16 client character set, see:

  • nullexpr and fieldexpr command parameters for the FIELD command in “FIELD” on page 127
  • WHERE condition for the IMPORT command in “IMPORT” on page 146
  • CONTINUEIF condition for the LAYOUT command in“LAYOUT” on page 159
  • User-Defined Session Character Sets

    Teradata MultiLoad also supports user-defined session character sets when the character sets defined for Teradata Database are not appropriate for the site. Use a session character set that matches the data, such as:

  • EBCDIC037_0E for mainframe-attached clients (for the United States or Canada)
  • LATIN1_0A, LATIN9_0A (for Western European languages) or UTF-8 for UNIX system clients
  • LATIN1252_0A for Western European Windows clients
  • Do not use ASCII or EBCDIC session character sets.

    For information on defining a character set, see SQL Data Definition Language.

    Backwards Compatibility Issues with Character Sets

    Support for certain character sets depends on the version of Teradata Database in use. Chinese and Korean character sets are available for mainframe-attached and network-attached client systems.

    Note: If using ASCII and EBCDIC for loading in the past, do not make changes in the character sets implemented without first considering possible issues with respect to compatibility with earlier versions and the associated character-set support.

    Considerations for Multibyte Character Sets

    Teradata Database supports multibyte characters in object names when the client session character set is UTF-8 or UTF-16. Refer to International Character Set Support for a list of valid characters used in object names. If multi-byte characters are used in object names in Teradata MultiLoad script, they must be enclosed in double quotes

    Table 15 shows how multibyte character sets impact the operation of certain Teradata MultiLoad commands, as well as object names in Teradata SQL statements.

     

    Table 15: Commands Impacting Multibyte Character Sets 

    Teradata MultiLoad Command

    Affected Elements

    Impact

    ACCEPT

    Utility variables

    The utility variables can have multibyte characters. If the client does not allow multibyte character set names, then the file name must be in uppercase English.

    BEGIN MLOAD

    Table names:

  • Target tables
  • Work tables
  • Error tables
  • Target table names, work table names, and error table names can have multibyte characters.

    See“BEGIN MLOAD and BEGIN DELETE MLOAD” on page 104.

    DML LABEL

    DML label name

    The label name in a DML LABEL command can have multibyte characters. The label name may be referenced in the APPLY clause of an IMPORT command.

    FIELD

    Field name

    The field name specified can have multibyte characters.

    The name can be referenced in other FIELD commands, in NULLIF and field concatenation expressions, and in APPLY WHERE conditions in IMPORT commands.

    The FIELD command can also contain a NULLIF expression, which may use multibyte characters.

    FILLER

    Filler name

    The name specified in a FILLER command can have multibyte characters.

    IF

    IF condition

    The condition in an IF statement can compare multibyte character strings.

    See “IF, ELSE, and ENDIF” on page 144.

    LAYOUT

    Layout name

    The layout name can:

  • Have multibyte characters
  • Be used in the LAYOUT clause of an IMPORT command
  • CONTINUEIF condition

    The CONTINUEIF condition can specify multibyte character set character comparisons.

    LOGON

    User name and password

    The user name and password can have multibyte characters.

    LOGTABLE

    Table and database names

    The restart log table name and database name can have multibyte characters.

    SET

    Utility variable

    The utility variable can:

  • Have multibyte characters
  • Be substituted wherever substitution is allowed
  • TABLE

    Table and database names

    The table name (tableref, and the database name if the table name is fully qualified) specified in a TABLE command can have multibyte characters.

    Note: Do not use the TABLE command while using UTF-8 and UTF-16 character sets.