Rules for Unicode Character Sets - Basic Teradata Query

Basic Teradata Query Reference

Product
Basic Teradata Query
Release Number
16.20
Published
October 2018
Language
English (United States)
Last Update
2020-02-20
dita:mapPath
kil1527114222313.ditamap
dita:ditavalPath
Audience_PDF_include.ditaval
dita:id
B035-2414
lifecycle
previous
Product Category
Teradata Tools and Utilities

BTEQ supports all Unicode characters in the range U+0000 to U+10FFFF. To send/receive non-BMP characters to/from the database, the “Unicode Pass Through” capability must be turned on using the following statement:

SET SESSION CHARACTER SET UNICODE PASS THROUGH ON;

It is important to understand that a Unicode character may vary in size: one to four bytes for a UTF-8 character, and two or four bytes for a UTF-16 character. Therefore, the size of output or export files is not indicative of the number of characters it contains.

It is the user's responsibility to ensure that the endianness of any UTF-16 input files are the same as the endianness of the platform BTEQ is running on. If not, or if an incorrect BOM is encountered, BTEQ will report an error.

Workstation-Attached Systems

To start a UTF-8 or UTF-16 session, it is recommended that the -c option be used to define the session charset encoding, and possibly the -e option (batch mode) or -m option (interactive mode) to define the I/O encoding.

A BOM is optional for the following input files:

  • Files redirected through stdin
  • Files executed by way of RUN commands
  • REPORT format import files
  • VARTEXT format import files
  • SQL (internal) Stored Procedure source files
  • LDO text files

An optional BOM can be written to the following output files:

  • Stdout and stderr streams redirected to a file or pipe
  • Files generated by way of MESSAGEOUT command use
  • REPORT format export files
  • DIF format export files
  • LDO text files

Mainframe-Attached Systems

z/OS BTEQ supports Unicode sessions in the following way:

  • Input data (defined as SYSIN or for execution by way of RUN commands) is read as EBCDIC.
  • Output data (defined as SYSOUT or generated by way of MESSAGEOUT command use) is written in EBCDIC.
  • VARTEXT format import files and LDO text import files must be in the session character set encoding (UTF-8 or UTF-16). A BOM is optional.
  • REPORT and DIF format export files and LDO text export files are written in the session character set encoding (UTF-8 or UTF-16).
The EBCDIC repertoire is much smaller than Unicode. Trying to display Unicode characters not in the EBCDIC repertoire to SYSOUT (or a MESSAGEOUT file) will result in a translation error.