UTF-8 and UTF-16 are two of the standard ways of encoding Unicode character data. The database supports UTF-8 and UTF-16 client character sets. The UTF-8 client character set supports UTF-8 encoding. Currently, the database supports UTF-8 characters that can consist of from one to three bytes. The UTF-16 client character set supports UTF-16 encoding. Currently, the database supports the Unicode 5.0 standard, where each defined character requires exactly 16 bits.
There are restrictions imposed by the database on using the UTF-8 or UTF-16 character set. See International Character Set Support, B035-1125 for restriction details.
UTF-8 Character Sets
Teradata TPump supports UTF-8 character set on workstation-attached platforms and IBM z/OS.
On IBM z/OS, the job script must be in Teradata EBCDIC when using UTF-8 client character set. Teradata TPump will translate commands in the job script from Teradata EBCDIC to UTF-8 during the load. Be sure to examine the definition in International Character Set Support, B035-1125 to determine the code points of any special characters which might be required in the job script. Different versions of EBCDIC do not always agree as to the placement of these characters. Refer to the mappings between Teradata EBCDIC and Unicode in International Character Set Support, B035-1125.
Currently, UTF-8 Byte Order Mark (BOM) is not supported on the z/OS platform when using access modules or data files.
- parameters command nullexpr
- fieldexpr
- VARTEXT format delimiter
- WHERE condition
- CONTINUEIF condition
for additional information on using UTF-8 client character set on the mainframe.
UTF-16 Character Sets
Teradata TPump supports UTF-16 character set on workstation-attached platforms. In general, the command language and the job output should be the same as the client character set used by the job. However, for user’s convenience and because of the special property of Unicode, the command language and the job output are not required to be the same as the client character set when using UTF-16 character set. When using UTF-16 character set, the job script and the job output can either be in UTF-8 or UTF-16 character set. This is provided by specifying runtime parameters “-i” and “-u” when the job is invoked.
For more reference information on runtime parameters “-i” and “-u”, see parameters -i scriptencoding and -u outputencoding in DATEFORM.
Also refer to parameters commands fieldexpr (see IMPORT), nullexpr (see DATEFORM), WHERE condition in BEGIN LOAD and CONTINUEIF condition in BEGIN LOAD for additional information on using UTF-16 client character set.
Client Character Set/Client Type Compatibility
The following table is a general guideline for choosing client character sets that may work better for the client environment.
Client Type | Best Client Character Sets |
---|---|
Mainframe-attached |
|
Workstation-attached running a UNIX operating system |
|
Workstation-attached running Windows |
|
Site-Defined Character Sets
When the character sets defined by the database are not appropriate for a site, custom character sets can be defined.
Refer to International Character Set Support, B035-1125 for information on defining a custom character set.