Character Set Specification

Teradata Database for a UNIX system and Teradata for Windows allow establish a character set when invoking Teradata MultiLoad. Table 13 lists the character sets supported by Teradata MultiLoad. Character sets containing “EBCDIC” as part of the name are for mainframe-attached clients; all others are for network-attached clients.

Table 13: Character Sets Supported by Teradata MultiLoad
Character Set Name	Description	Configuration
ASCII	Latin	Network-attached
EBCDIC	Latin	Mainframe-attached
HANGULEBCDIC933_1II	Korean	Mainframe-attached
HANGULKSC5601_2R4	Korean	Network-attached
SCHEBCDIC935_2IJ	Simplified Chinese	Mainframe-attached
SCHGB2312_1T0	Simplified Chinese	Network-attached
TCHBIG5_1R0	Traditional Chinese	Network-attached
TCHEBCDIC937_31B	Traditional Chinese	Mainframe-attached
KATAKANAEBCDIC	Japanese	Mainframe-attached
KANJIEBCDIC5026_01	Japanese	Mainframe-attached
KANJIEBCDIC5035_01	Japanese	Mainframe-attached
KANJIEUC_0U	Japanese	Network-attached
KANJISJIS_0S	Japanese	Network-attached
UTF-8 or UTF8	Unicode®	Mainframe-attached Network-attached
UTF-16 or UTF16	Unicode	Network-attached

Table 14 describes five ways to specify the character set or accept a default specification.

Table 14: Methods for Specifying Character Sets
Method	Description
Configuration File Specification	One of the best ways to specify the character set is with the character set specification in the Teradata MultiLoad configuration file, as described earlier in this chapter: CHARSET=character-set-name This allows a standard default character set for several or all Teradata MultiLoad runs, without having to specify the character set explicitly for each run.
Run‑time Parameter Specification	Another good way to specify the character set is with the character set run‑time parameter when Teradata MultiLoad is invoked, as described in “Run‑time Parameters” on page 33: CHARSET=character-set-name for mainframe-attached z/OS client systems -c character-set-name for network-attached UNIX and Windows client systems
Client System Specification	Another way is to specify the character set for the client system before invoking Teradata MultiLoad by configuring the: HSHSPB parameter for mainframe-attached z/OS client systems clispb.dat file for network-attached UNIX and Windows client systems Note: The character-set-name specification used to invoke Teradata MultiLoad always takes precedence over the current client system specification.
Teradata Database Default	If a character-set-name specification is not used when Teradata MultiLoad is invoked, and there is no character set specification for the client system, the utility uses the default specification in the Teradata Database system table DBC.Hosts. Note: If relying on the DBC.Hosts table specification for the default character set, ensure that the initial logon is in the default character set: EBCDIC for mainframe-attached z/OS client systems ASCII for network-attached UNIX and Windows client systems
Teradata MultiLoad Utility Default	If there is no character set specification in DBC.Hosts, then Teradata MultiLoad defaults to: EBCDIC for mainframe-attached z/OS client systems ASCII for network-attached UNIX and Windows client systems

When an AXSMOD is used, Teradata MultiLoad will pass the session character set as an attribute to the AXSMOD for its possible use (most AXSMODs will not make any use of this information). The attribute name will be CHARSET_NAME and the attribute value will be a variable length character string consisting of the character set name.

Rules for Using Chinese and Korean Character Sets

Follow these rules when using Chinese and Korean character sets on mainframe-attached and network-attached platforms.

Object Names. Object names are limited to A-Z, a-z, 0-9, and special characters such as $ and _.

Maximum String Length. Teradata Database requires two bytes to process each of the Chinese or Korean characters. This limits both request size and record size. For example, if a record consists of one string, the length of that string is limited to a maximum of 32,000 characters or 64,000 bytes.

For more information about Chinese or Korean character set restrictions for Teradata Database, see International Character Set Support.

For more information about alternate character sets, see SQL Data Definition Language.

UTF-8 and UTF-16 Character Sets

Unicode character sets UTF-8 and UTF-16 are two of the standard ways of encoding Unicode character data.

The UTF-8 client character set supports UTF-8 encoding and UTF-16 client character set supports UTF-16 encoding.

Teradata Database supports multi-byte characters in object names when UTF-8 and UTF-16 client character sets are used. If multi-byte characters are used in object names in Teradata MultiLoad script, they must be enclosed in double quotes.

Do not use the TABLE command when using UTF-8 and UTF-16 client character sets. Instead, specify the layout of the input record.

There are restrictions imposed by Teradata Database on using the UTF-8 or UTF-16 character set. See International Character Set Support for restriction details.

UTF-8 Character Sets

Teradata MultiLoad supports the UTF-8 character set on network-attached platforms and IBM z/OS. When using UTF-8 client character set on IBM z/OS, the job script must be in Teradata EBCDIC. Teradata MultiLoad translates commands in the job script from Teradata EBCDIC to UTF-8 during the load.

Before using the UTF-8 client character set on a mainframe platform, check the character set definition to determine the code points and the Teradata EBCDIC and Unicode character mapping. Different versions of EBCDIC do not always agree as to the placement of any special characters required in the job script. See International Character Set Support for details. For more information on using the UTF-8 client character set on mainframe platforms, see:

nullexpr and fieldexpr command parameters for the FIELD command in “FIELD” on page 132

VARTEXT format delimiter and WHERE condition for the IMPORT command in “IMPORT” on page 152

CONTINUEIF condition for the LAYOUT command in“LAYOUT” on page 166

UTF-16 Character Sets

Teradata MultiLoad supports the UTF-16 character set on network-attached platforms. In general, the command language and the job output are the same as the client character set used by the job. However, the command language and the job output are not required to be the same as the client character set when using a UTF-16 character set. When using a UTF-16 character set, the job script and the job output can be either UTF-8 or UTF-16 character set, which is specified by the run‑time parameters “-i” and “-u” when the job is invoked.

For more information on the run‑time parameters, see the parameters -i scriptencoding and -u outputencoding Table 9 on page 35. For more information on using the UTF-16 client character set, see:

nullexpr and fieldexpr command parameters for the FIELD command in “FIELD” on page 132

WHERE condition for the IMPORT command in “IMPORT” on page 152

CONTINUEIF condition for the LAYOUT command in“LAYOUT” on page 166

User-Defined Session Character Sets

Teradata MultiLoad also supports user-defined session character sets when the character sets defined for Teradata Database are not appropriate for the site. Use a session character set that matches the data, such as:

EBCDIC037_0E for mainframe-attached clients (for the United States or Canada)

LATIN1_0A, LATIN9_0A (for Western European languages) or UTF-8 for UNIX system clients

LATIN1252_0A for Western European Windows clients

Do not use ASCII or EBCDIC session character sets.

For information on defining a character set, see SQL Data Definition Language.

Backwards Compatibility Issues with Character Sets

Support for certain character sets depends on the version of Teradata Database in use. Chinese and Korean character sets are available for mainframe-attached and network-attached client systems.

Note: If using ASCII and EBCDIC for loading in the past, do not make changes in the character sets implemented without first considering possible issues with respect to compatibility with earlier versions and the associated character-set support.

Considerations for Multibyte Character Sets

Teradata Database supports multibyte characters in object names when the client session character set is UTF-8 or UTF-16. Refer to International Character Set Support for a list of valid characters used in object names. If multi-byte characters are used in object names in Teradata MultiLoad script, they must be enclosed in double quotes

Table 15 shows how multibyte character sets impact the operation of certain Teradata MultiLoad commands, as well as object names in Teradata SQL statements.

Table 15: Commands Impacting Multibyte Character Sets
Teradata MultiLoad Command	Affected Elements	Impact
ACCEPT	Utility variables	The utility variables can have multibyte characters. If the client does not allow multibyte character set names, then the file name must be in uppercase English.
BEGIN MLOAD	Table names: Target tables Work tables Error tables	Target table names, work table names, and error table names can have multibyte characters. See“BEGIN MLOAD and BEGIN DELETE MLOAD” on page 106.
DML LABEL	DML label name	The label name in a DML LABEL command can have multibyte characters. The label name may be referenced in the APPLY clause of an IMPORT command.
FIELD	Field name	The field name specified can have multibyte characters. The name can be referenced in other FIELD commands, in NULLIF and field concatenation expressions, and in APPLY WHERE conditions in IMPORT commands. The FIELD command can also contain a NULLIF expression, which may use multibyte characters.
FILLER	Filler name	The name specified in a FILLER command can have multibyte characters.
IF	IF condition	The condition in an IF statement can compare multibyte character strings. See “IF, ELSE, and ENDIF” on page 150.
LAYOUT	Layout name	The layout name can: Have multibyte characters Be used in the LAYOUT clause of an IMPORT command
LAYOUT	CONTINUEIF condition	The CONTINUEIF condition can specify multibyte character set character comparisons.
LOGON	User name and password	The user name and password can have multibyte characters.
LOGTABLE	Table and database names	The restart log table name and database name can have multibyte characters.
SET	Utility variable	The utility variable can: Have multibyte characters Be substituted wherever substitution is allowed
TABLE	Table and database names	The table name (tableref, and the database name if the table name is fully qualified) specified in a TABLE command can have multibyte characters. Note: Do not use the TABLE command while using UTF-8 and UTF-16 character sets.

Character Set Specification - MultiLoad

Teradata MultiLoad Reference