BTEQ supports all Unicode characters in the range U+0000 to U+10FFFF. To send/receive non-BMP characters to/from the database, the “Unicode Pass Through” capability must be turned on using the following statement:
SET SESSION CHARACTER SET UNICODE PASS THROUGH ON;
It is important to understand that a Unicode character may vary in size: one to four bytes for a UTF-8 character, and two or four bytes for a UTF-16 character. Therefore, the size of output or export files is not indicative of the number of characters it contains.
It is the user's responsibility to ensure that the endianness of any UTF-16 input files are the same as the endianness of the platform BTEQ is running on. If not, or if an incorrect BOM is encountered, BTEQ will report an error.
To start a UTF-8 or UTF-16 session, it is recommended that the -c option be used to define the session charset encoding, and possibly the -e option (batch mode) or -m option (interactive mode) to define the I/O encoding.
A BOM is optional for the following input files:
- Files redirected through stdin
- Files executed by way of RUN commands
- REPORT format import files
- VARTEXT format import files
- SQL (internal) Stored Procedure source files
- LDO text files
An optional BOM can be written to the following output files:
- Stdout and stderr streams redirected to a file or pipe
- Files generated by way of MESSAGEOUT command use
- REPORT format export files
- DIF format export files
- LDO text files
z/OS BTEQ supports Unicode sessions in the following way:
- Input data (defined as SYSIN or for execution by way of RUN commands) is read as EBCDIC.
- Output data (defined as SYSOUT or generated by way of MESSAGEOUT command use) is written in EBCDIC.
- VARTEXT format import files and LDO text import files must be in the session character set encoding (UTF-8 or UTF-16). A BOM is optional.
- REPORT and DIF format export files and LDO text export files are written in the session character set encoding (UTF-8 or UTF-16).