BTEQ supports all Unicode characters in the range U+0000 to U+10FFFF. To send/receive non-BMP characters to/from the database, the “Unicode Pass Through” capability must be turned on using the following statement:
SET SESSION CHARACTER SET UNICODE PASS THROUGH ON;
It is important to understand that a Unicode character may vary in size: one to four bytes for a UTF-8 character, and two or four bytes for a UTF-16 character. Therefore, the size of output or export files is not indicative of the number of characters it contains.
It is the user's responsibility to ensure that the endianness of any UTF-16 input files are the same as the endianness of the platform BTEQ is running on. If not, or if an incorrect BOM is encountered, BTEQ will report an error.
Workstation-Attached Systems
To start a UTF-8 or UTF-16 session, it is recommended that the -c option be used to define the session charset encoding, and possibly the -e option (batch mode) or -m option (interactive mode) to define the I/O encoding.
A BOM is optional for the following input files:
- Files redirected through stdin
- Files executed by way of RUN commands
- REPORT format import files
- VARTEXT format import files
- SQL (internal) Stored Procedure source files
- LDO text files
An optional BOM can be written to the following output files:
- Files generated by way of MESSAGEOUT command use
- REPORT format export files
- DIF format export files
- LDO text files
Mainframe-Attached Systems
z/OS BTEQ supports Unicode sessions in the following way:
- Input data (defined as SYSIN or for execution by way of RUN commands) is read as EBCDIC.
- Output data (defined as SYSOUT or generated by way of MESSAGEOUT command use) is written in EBCDIC.
- VARTEXT format import files and LDO text import files must be in the session character set encoding (UTF-8 or UTF-16). A BOM is optional.
- REPORT and DIF format export files and LDO text export files are written in the session character set encoding (UTF-8 or UTF-16).