Rules for Unicode Character Sets - Basic Teradata Query

Basic Teradata Query Reference

Product

Basic Teradata Query

Release Number

16.10

Published

May 2017

Language

English (United States)

Last Update

2018-06-28

dita:mapPath

wmy1488824663431.ditamap

dita:ditavalPath

Audience_PDF_include.ditaval

dita:id

B035-2414

lifecycle

Product Category

Teradata Tools and Utilities

BTEQ supports all Unicode characters in the range U+0000 to U+10FFFF. To send/receive non-BMP characters to/from the database, the “Unicode Pass Through” capability must be turned on using the following statement:

SET SESSION CHARACTER SET UNICODE PASS THROUGH ON;

It is important to understand that a Unicode character may vary in size: one to four bytes for a UTF-8 character, and two or four bytes for a UTF-16 character. Therefore, the size of output or export files is not indicative of the number of characters it contains.

It is the user's responsibility to ensure that the endianness of any UTF-16 input files are the same as the endianness of the platform BTEQ is running on. If not, or if an incorrect BOM is encountered, BTEQ will report an error.

Workstation-Attached Systems

To start a UTF-8 or UTF-16 session, it is recommended that the -c option be used to define the session charset encoding, and possibly the -e option (batch mode) or -m option (interactive mode) to define the I/O encoding.

A BOM is optional for the following input files:

Files redirected through stdin
Files executed by way of RUN commands
REPORT format import files
VARTEXT format import files
SQL (internal) Stored Procedure source files
LDO text files

An optional BOM can be written to the following output files:

Files generated by way of MESSAGEOUT command use
REPORT format export files
DIF format export files
LDO text files

BTEQ does not allow for a BOM to be written to stdout or stderr.

Mainframe-Attached Systems

z/OS BTEQ supports Unicode sessions in the following way:

Input data (defined as SYSIN or for execution by way of RUN commands) is read as EBCDIC.
Output data (defined as SYSOUT or generated by way of MESSAGEOUT command use) is written in EBCDIC.
VARTEXT format import files and LDO text import files must be in the session character set encoding (UTF-8 or UTF-16). A BOM is optional.
REPORT and DIF format export files and LDO text export files are written in the session character set encoding (UTF-8 or UTF-16).

The EBCDIC repertoire is much smaller than Unicode. Trying to display Unicode characters not in the EBCDIC repertoire to SYSOUT (or a MESSAGEOUT file) will result in a translation error.