Unicode® Character Sets

Teradata Parallel Data Pump Reference

Product

Parallel Data Pump

Release Number

15.10

Language

English (United States)

Last Update

2018-10-07

dita:id

B035-3021

lifecycle

Product Category

Teradata Tools and Utilities

UTF‑8 and UTF‑16 are two of the standard ways of encoding Unicode character data. Teradata Database supports UTF‑8 and UTF‑16 client character sets. The UTF‑8 client character set supports UTF‑8 encoding. Currently, Teradata Database supports UTF‑8 characters that can consist of from one to three bytes. The UTF‑16 client character set supports UTF‑16 encoding. Currently, Teradata Database supports the Unicode 5.0 standard, where each defined character requires exactly 16 bits.

There are restrictions imposed by Teradata Database on using the UTF‑8 or UTF‑16 character set. Refer to International Character Set Support (B035‑1125) for restriction details.

Note: TPump supports all Unicode characters in object names.

UTF‑8 Character Sets

Teradata TPump supports UTF‑8 character set on network‑attached platforms and IBM z/OS.

On IBM z/OS, the job script must be in Teradata EBCDIC when using UTF‑8 client character set. Teradata TPump will translate commands in the job script from Teradata EBCDIC to UTF‑8 during the load. Be sure to examine the definition in International Character Set Support (B035‑1125) to determine the code points of any special characters which might be required in the job script. Different versions of EBCDIC do not always agree as to the placement of these characters. Refer to the mappings between Teradata EBCDIC and Unicode in International Character Set Support (B035‑1125).

Currently, UTF‑8 Byte Order Mark (BOM) is not supported on the z/OS platform when using access modules or data files.

See Chapter 3 for complete information on Teradata TPump commands. Refer to

parameters commands nullexpr

fieldexpr

VARTEXT format delimiter

WHERE condition

CONTINUEIF condition

for additional information on using UTF‑8 client character set on the mainframe.

UTF‑16 Character Sets

Teradata TPump supports UTF‑16 character set on network‑attached platforms. In general, the command language and the job output should be the same as the client character set used by the job. However, for user’s convenience and because of the special property of Unicode, the command language and the job output are not required to be the same as the client character set when using UTF‑16 character set. When using UTF‑16 character set, the job script and the job output can either be in UTF‑8 or UTF‑16 character set. This is provided by specifying runtime parameters “‑i” and “‑u” when the job is invoked.

For more reference information on runtime parameters “‑i” and “‑u”, see parameters ‑i scriptencoding and ‑u outputencoding on “‑u outputencoding” on page 50.

Also refer to parameters commands fieldexpr “fieldexpr” on page 140, nullexpr on “nullexpr” on page 137, WHERE condition on “WHERE condition” on page 113 and CONTINUEIF condition on “CONTINUEIF condition” on page 170 for additional information on using UTF‑16 client character set.

Client Character Set/Client Type Compatibility

Table 6 is a general guideline for choosing client character sets that may work better for the client environment.

Table 6: General Guidelines for Choosing Client Character Sets
Client Type	Best Client Character Sets
Mainframe‑attached	EBCDIC EBCDIC037_0E KANJIEBCDIC5026_0I KANJIEBCDIC5035_0I KATAKANAEBCDIC SCHEBCDIC935_2IJ TCHEBCDIC937_3IB HANGULEBCDIC933_1II UTF‑8
Network‑attached running a UNIX operating system	ASCII KANJIEUC_0U LATIN1_0A LATIN9_0A UTF‑8 UTF‑16 SCHGB2312_1T0 TCHBIG5_1R0 HANGULKSC5601_2R4
Network‑attached running Windows	ASCII KANJISJIS_0S LATIN1252_0A UTF‑8 UTF‑16 SCHGB2312_1T0 TCHBIG5_1R0 HANGULKSC5601_2R4

Site‑Defined Character Sets

When the character sets defined by Teradata Database are not appropriate for a site, custom character sets can be defined.

Refer to International Character Set Support (B035‑1125) for information on defining custom own character set.