Specifying Unicode in the tbuild Command - Parallel Transporter

Teradata Parallel Transporter Reference

Product
Parallel Transporter
Release Number
15.10
Language
English (United States)
Last Update
2018-10-07
dita:id
B035-2436
lifecycle
previous
Product Category
Teradata Tools and Utilities

Specifying Unicode in the tbuild Command

As described above, the USING CHARACTER SET <characterSet> statement in the Teradata PT job script is used to define the session character set. The session character set must match the data and it must match the encoding of the job script.

When submitting a job script that is encoded in UTF-16, however, you must also specify the -e command line option for the tbuild command.

tbuild -f <filename> [-v jobVariableFile] -e UTF16

-e UTF16 indicates to Teradata PT that the job script is encoded in UTF-16. The file endianness is determined by the Byte Order Mark (BOM) at the beginning of the file.

Note: The following -e options support the different encoding schemes:

1. UTF-16 / UTF16 and any upper/lower case variations with or without a hyphen. For UTF-16 scripts: if the script is not UTF-16, error is reported. If the script endianness differs from the platform encoding, the script is converted to the platform endianness before execution.

2. UTF-16LE / UTF16LE and any upper/lower case variations with or without a hyphen. For UTF-16 little endian scripts: the script is not little endian, an error is reported. If the platform is big endian, the script is converted to big endian before execution.

3. UTF-16BE / UTF16BE and any upper/lower case variations with or without a hyphen. For UTF-16 big endian scripts: if the script is not big endian, an error is reported. If the platform is little endian, the script is converted to little endian before execution.

4. UTF-8 / UTF8 and any upper/lower case variations with or without a hyphen. For UTF-8 scripts: if the script is not UTF-8, error is reported.

The job variable and include files in either big endian or little endian format can be executed on either kind of platform.

Setting the Byte Order Mark

Use this feature to write a BOM at the beginning of a data file. The actual BOM written is determined by the character set in use and, when using UTF-16, the endian aspect of the active platform. BOMs are detected when data is read back to ensure correct processing.

The following conditions must be met:

  • The operator must be a consumer
  • A Unicode character set must be specified in the script (for example, USING CHARACTER SET UTF8)
  • The data format must be either text or delimited (for example, VARCHAR Format = 'Delimited')
  • The WriteBOM attribute must be set to 'Yes' (for example, VARCHAR WriteBOM = 'Yes')
  • Note: If WriteBOM = 'Yes' and any of the other necessary conditions are not met, the job fails.

    Usage Notes

    Consider the following when working with varying session character sets:

    When using UTF-16 character set in Teradata PT scripts, the value of n in VARCHAR(n) and CHAR(n) in the SCHEMA definition must be an even and positive number.

    LONG VARCHAR and LONG VARGRAPHIC are no longer supported as column types. LONG VARCHAR now corresponds to VARCHAR(64000). See “Using LONG VARCHAR with Unicode Character Sets” on page 404 for a discussion on how Teradata PT will handle this column type.