15.10 - UNICODE - Call-Level Interface Version 2

Teradata Call-Level Interface Version 2 Reference for Mainframe-Attached Systems

Call-Level Interface Version 2
Programming Reference

The UNICODE directive defines all single-byte codepoints in the character set. The relevant syntactic characters in the character set that must be defined are those that have the Unicode® codepoints of 0020 (Space), 0022 (Quotation Mark), 0027 (Apostrophe), and 002F (Slash), and 001A (the Substitute control character). In addition, all characters that are valid in a TDP identifier must be defined.



The actual information is contained on statements that immediately follow the UNICODE directive. Each such statement has the following syntax:

target_codepoint1<-target_codepoint2>: data_codepoint ...

target_codepoint1 specifies the first character in the user-defined character set that is defined on this statement, target_codepoint2 optionally specifies the last character defined on this statement, and data_codepoint defines the equivalent character in Unicode®.

A codepoint is the hexadecimal representation of a character. The number of characters needed to specify a target codepoint is dependent on the encoding scheme for the character set. For the characters of interest to CLIv2, the length is always two except for UTF-16 encoding, for which the length is four. The length of a data codepoint is always four.

If the second target codepoint is specified, then one data codepoint is required for each character in the range between the two target codepoints. If the second target codepoint is omitted, then any number of data codepoints may be specified, each associated with codepoint one greater than the previous.

All statements after the UNICODE directive that contain a colon are associated with the UNICODE directive. Lack of a colon indicates that the statement is a new directive and ends that UNICODE directive.

The order of data codepoints among different statements is not significant.

The UNICODE directive may be specified only once for each character set.

If the same character is defined more than once for a character set, the last value is used.


Define the Unicode® equivalents for IBM Code Page 833, the single-byte component for IBM CCSID 933.

40-47: 0020 001A 115F 1100 1101 1115 1102 11AC
48-4F: 11AD 1103 00A2 002E 003C 0028 002B 007C
50-57: 0026 001A 1104 1105 11B0 11B1 11B2 11B3
58-5F: 11B4 11B5 0021 0024 002A 0029 003B 00AC
60-67: 002D 002F 11B6 1106 1107 1108 1121 1109
68-6F: 110A 110B 00A6 002C 0025 005F 003E 003F
70-77: 005B 001A 110C 110D 110E 110F 1110 1111
78-7F: 1112 0060 003A 0023 0040 0027 003D 0022
80-87: 005D 0061 0062 0063 0064 0065 0066 0067
88-8F: 0068 0069 1161 1162 1163 1164 1165 1166
90-97: 001A 006A 006B 006C 006D 006E 006F 0070
98-9F: 0071 0072 1167 1168 1169 116A 116B 116C
A0-A7: 00AF 007E 0073 0074 0075 0076 0077 0078
A8-AF: 0079 007A 116D 116E 116F 1170 1171 1172
B0-B7: 005E 001A 005C 001A 001A 001A 001A 001A
B8-BF: 001A 001A 1173 1174 1175 001A 001A 001A
C0-C7: 007B 0041 0042 0043 0044 0045 0046 0047
C8-CF: 0048 0049 001A 001A 001A 001A 001A 001A
D0-D7: 007D 004A 004B 004C 004D 004E 004F 0050
D8-DF: 0051 0052 001A 001A 001A 001A 001A 001A
E0-E7: 20A9 001A 0053 0054 0055 0056 0057 0058
E8-EF: 0059 005A 001A 001A 001A 001A 001A 001A
F0-F7: 0030 0031 0032 0033 0034 0035 0036 0037
F8-FF: 0038 0039 001A 001A 001A 001A 001A 001A