Rules for Unicode Character Sets - Basic Teradata Query

Basic Teradata Query Reference

Basic Teradata Query
Release Number
May 2017
English (United States)
Last Update
Product Category
Teradata Tools and Utilities

BTEQ supports all Unicode characters in the range U+0000 to U+10FFFF. To send/receive non-BMP characters to/from the database, the “Unicode Pass Through” capability must be turned on using the following statement:


It is important to understand that a Unicode character may vary in size: one to four bytes for a UTF-8 character, and two or four bytes for a UTF-16 character. Therefore, the size of output or export files is not indicative of the number of characters it contains.

It is the user's responsibility to ensure that the endianness of any UTF-16 input files are the same as the endianness of the platform BTEQ is running on. If not, or if an incorrect BOM is encountered, BTEQ will report an error.

Workstation-Attached Systems

To start a UTF-8 or UTF-16 session, it is recommended that the -c option be used to define the session charset encoding, and possibly the -e option (batch mode) or -m option (interactive mode) to define the I/O encoding.

A BOM is optional for the following input files:

  • Files redirected through stdin
  • Files executed by way of RUN commands
  • REPORT format import files
  • VARTEXT format import files
  • SQL (internal) Stored Procedure source files
  • LDO text files

An optional BOM can be written to the following output files:

  • Files generated by way of MESSAGEOUT command use
  • REPORT format export files
  • DIF format export files
  • LDO text files
BTEQ does not allow for a BOM to be written to stdout or stderr.

Mainframe-Attached Systems

z/OS BTEQ supports Unicode sessions in the following way:

  • Input data (defined as SYSIN or for execution by way of RUN commands) is read as EBCDIC.
  • Output data (defined as SYSOUT or generated by way of MESSAGEOUT command use) is written in EBCDIC.
  • VARTEXT format import files and LDO text import files must be in the session character set encoding (UTF-8 or UTF-16). A BOM is optional.
  • REPORT and DIF format export files and LDO text export files are written in the session character set encoding (UTF-8 or UTF-16).
The EBCDIC repertoire is much smaller than Unicode. Trying to display Unicode characters not in the EBCDIC repertoire to SYSOUT (or a MESSAGEOUT file) will result in a translation error.