Character Set Specification
Teradata Database for a UNIX system and Teradata for Windows allow establish a character set when invoking Teradata MultiLoad. Table 13 lists the character sets supported by Teradata MultiLoad. Character sets containing “EBCDIC” as part of the name are for mainframe-attached clients; all others are for network-attached clients.
Character Set Name |
Description |
Configuration |
ASCII |
Latin |
Network-attached |
EBCDIC |
Latin |
Mainframe-attached |
HANGULEBCDIC933_1II |
Korean |
Mainframe-attached |
HANGULKSC5601_2R4 |
Korean |
Network-attached |
SCHEBCDIC935_2IJ |
Simplified Chinese |
Mainframe-attached |
SCHGB2312_1T0 |
Simplified Chinese |
Network-attached |
TCHBIG5_1R0 |
Traditional Chinese |
Network-attached |
TCHEBCDIC937_31B |
Traditional Chinese |
Mainframe-attached |
KATAKANAEBCDIC |
Japanese |
Mainframe-attached |
KANJIEBCDIC5026_01 |
Japanese |
Mainframe-attached |
KANJIEBCDIC5035_01 |
Japanese |
Mainframe-attached |
KANJIEUC_0U |
Japanese |
Network-attached |
KANJISJIS_0S |
Japanese |
Network-attached |
UTF-8 or UTF8 |
Unicode® |
Mainframe-attached Network-attached |
UTF-16 or UTF16 |
Unicode |
Network-attached |
Table 14 describes five ways to specify the character set or accept a default specification.
Method |
Description |
Configuration File Specification |
One of the best ways to specify the character set is with the character set specification in the Teradata MultiLoad configuration file, as described earlier in this chapter: CHARSET=character-set-name This allows a standard default character set for several or all Teradata MultiLoad runs, without having to specify the character set explicitly for each run. |
Run‑time Parameter Specification |
Another good way to specify the character set is with the character set run‑time parameter when Teradata MultiLoad is invoked, as described in “Run‑time Parameters” on page 33: |
Client System Specification |
Another way is to specify the character set for the client system before invoking Teradata MultiLoad by configuring the: Note: The character-set-name specification used to invoke Teradata MultiLoad always takes precedence over the current client system specification. |
Teradata Database Default |
If a character-set-name specification is not used when Teradata MultiLoad is invoked, and there is no character set specification for the client system, the utility uses the default specification in the Teradata Database system table DBC.Hosts. Note: If relying on the DBC.Hosts table specification for the default character set, ensure that the initial logon is in the default character set: |
Teradata MultiLoad Utility Default |
If there is no character set specification in DBC.Hosts, then Teradata MultiLoad defaults to: |
When an AXSMOD is used, Teradata MultiLoad will pass the session character set as an attribute to the AXSMOD for its possible use (most AXSMODs will not make any use of this information). The attribute name will be CHARSET_NAME and the attribute value will be a variable length character string consisting of the character set name.
Rules for Using Chinese and Korean Character Sets
Follow these rules when using Chinese and Korean character sets on mainframe-attached and network-attached platforms.
For more information about Chinese or Korean character set restrictions for Teradata Database, see International Character Set Support.
For more information about alternate character sets, see SQL Data Definition Language.
UTF-8 and UTF-16 Character Sets
Unicode character sets UTF-8 and UTF-16 are two of the standard ways of encoding Unicode character data.
The UTF-8 client character set supports UTF-8 encoding and UTF-16 client character set supports UTF-16 encoding.
Teradata Database supports multi-byte characters in object names when UTF-8 and UTF-16 client character sets are used. If multi-byte characters are used in object names in Teradata MultiLoad script, they must be enclosed in double quotes.
Do not use the TABLE command when using UTF-8 and UTF-16 client character sets. Instead, specify the layout of the input record.
There are restrictions imposed by Teradata Database on using the UTF-8 or UTF-16 character set. See International Character Set Support for restriction details.
UTF-8 Character Sets
Teradata MultiLoad supports the UTF-8 character set on network-attached platforms and IBM z/OS. When using UTF-8 client character set on IBM z/OS, the job script must be in Teradata EBCDIC. Teradata MultiLoad translates commands in the job script from Teradata EBCDIC to UTF-8 during the load.
Before using the UTF-8 client character set on a mainframe platform, check the character set definition to determine the code points and the Teradata EBCDIC and Unicode character mapping. Different versions of EBCDIC do not always agree as to the placement of any special characters required in the job script. See International Character Set Support for details. For more information on using the UTF-8 client character set on mainframe platforms, see:
UTF-16 Character Sets
Teradata MultiLoad supports the UTF-16 character set on network-attached platforms. In general, the command language and the job output are the same as the client character set used by the job. However, the command language and the job output are not required to be the same as the client character set when using a UTF-16 character set. When using a UTF-16 character set, the job script and the job output can be either UTF-8 or UTF-16 character set, which is specified by the run‑time parameters “-i” and “-u” when the job is invoked.
For more information on the run‑time parameters, see the parameters -i scriptencoding and -u outputencoding Table 9 on page 35. For more information on using the UTF-16 client character set, see:
User-Defined Session Character Sets
Teradata MultiLoad also supports user-defined session character sets when the character sets defined for Teradata Database are not appropriate for the site. Use a session character set that matches the data, such as:
Do not use ASCII or EBCDIC session character sets.
For information on defining a character set, see SQL Data Definition Language.
Backwards Compatibility Issues with Character Sets
Support for certain character sets depends on the version of Teradata Database in use. Chinese and Korean character sets are available for mainframe-attached and network-attached client systems.
Note: If using ASCII and EBCDIC for loading in the past, do not make changes in the character sets implemented without first considering possible issues with respect to compatibility with earlier versions and the associated character-set support.
Considerations for Multibyte Character Sets
Teradata Database supports multibyte characters in object names when the client session character set is UTF-8 or UTF-16. Refer to International Character Set Support for a list of valid characters used in object names. If multi-byte characters are used in object names in Teradata MultiLoad script, they must be enclosed in double quotes
Table 15 shows how multibyte character sets impact the operation of certain Teradata MultiLoad commands, as well as object names in Teradata SQL statements.
Teradata MultiLoad Command |
Affected Elements |
Impact |
Utility variables |
The utility variables can have multibyte characters. If the client does not allow multibyte character set names, then the file name must be in uppercase English. |
|
BEGIN MLOAD |
Table names: |
Target table names, work table names, and error table names can have multibyte characters. |
DML label name |
The label name in a DML LABEL command can have multibyte characters. The label name may be referenced in the APPLY clause of an IMPORT command. |
|
Field name |
The field name specified can have multibyte characters. The name can be referenced in other FIELD commands, in NULLIF and field concatenation expressions, and in APPLY WHERE conditions in IMPORT commands. The FIELD command can also contain a NULLIF expression, which may use multibyte characters. |
|
Filler name |
The name specified in a FILLER command can have multibyte characters. |
|
IF |
IF condition |
The condition in an IF statement can compare multibyte character strings. |
Layout name |
The layout name can: |
|
CONTINUEIF condition |
The CONTINUEIF condition can specify multibyte character set character comparisons. |
|
User name and password |
The user name and password can have multibyte characters. |
|
Table and database names |
The restart log table name and database name can have multibyte characters. |
|
Utility variable |
The utility variable can: |
|
Table and database names |
The table name (tableref, and the database name if the table name is fully qualified) specified in a TABLE command can have multibyte characters. Note: Do not use the TABLE command while using UTF-8 and UTF-16 character sets. |