UNICODE - Teradata Director Program

Teradata® Director Program Reference - 17.20

Teradata Director Program
Release Number
June 2022
English (United States)
Last Update
Product Category
Teradata Tools and Utilities

The UNICODE directive defines the syntactic characters and characters that have both lower and upper case. It might be possible to use it to provide the same information as the CHAR, MONOCASE, and NUMERICS directives. Since UNICODE is required to add a user-defined character set to CLIv2, it is also supported by TDP to potentially simplify use of user-defined character sets. The relevant syntactic characters in the character set are those that have the Unicode codepoints of 0020 (Space), 0022 (Quotation Mark), 0025 (Percent), 0027 (Apostrophe), 002C (Comma), 002E (Period), 002F (Slash), 0030 through 0039 (Numerics Zero through Nine), 003A (Colon), 005B (Left Bracket), and 005D (Right Bracket). The monocase information in the character set are those that have the Unicode codepoints of 0061 through 007A (lower case) and 0041 through 005A (upper case). Codepoints beyond those relevant to CHAR, MONOCASE, and NUMERICS are ignored. If these are not the characteristics of the character set, then CHAR, MONOCASE, and NUMERICS must be used instead of UNICODE.


Usage Notes

The actual information is contained on statements that immediately follow the UNICODE directive. Each such statement has the following syntax:

target_codepoint1<-target_codepoint2>: data_codepoint ...


Specifies the first character in the user-defined character set that is defined on this statement.
Optionally specifies the last character defined on this statement, and data_codepoint defines the equivalent character in Unicode.

A codepoint is the hexadecimal representation of a character. The number of characters needed to specify a target codepoint is dependent on the encoding scheme for the character set. For the characters of interest to TDP, the length is always two except for UTF16 encoding, for which the length is four. The length of a data codepoint is always four.

If the second target codepoint is specified, then one data codepoint is required for each character in the range between the two target codepoints. If the second target codepoint is omitted, then any number of data codepoints can be specified, each associated with codepoint one greater than the previous.

All statements after the UNICODE directive that contain a colon are associated with the UNICODE directive. Lack of a colon indicates that the statement is a new directive and ends that UNICODE directive.

The order of data codepoints among different statements is not significant.

The UNICODE directive can be specified only once for each character set.

If the same character is defined for the same purpose more than once for a character set (using a CHAR, MONOCASE, NUMERICS, or UNICODE directive), the last value is used.

If no CHARSET directive precedes UNICODE, then a character set description is implicitly begun -- in effect, a CHARSET directive with no operands is assumed.

Example: UNICODE

Define the Unicode equivalents for IBM Code Page 833, the single-byte component for IBM CCSID 933.

40-47: 0020 001A 115F 1100 1101 1115 1102 11AC
48-4F: 11AD 1103 00A2 002E 003C 0028 002B 007C
50-57: 0026 001A 1104 1105 11B0 11B1 11B2 11B3
58-5F: 11B4 11B5 0021 0024 002A 0029 003B 00AC
60-67: 002D 002F 11B6 1106 1107 1108 1121 1109
68-6F: 110A 110B 00A6 002C 0025 005F 003E 003F
70-77: 005B 001A 110C 110D 110E 110F 1110 1111
78-7F: 1112 0060 003A 0023 0040 0027 003D 0022
80-87: 005D 0061 0062 0063 0064 0065 0066 0067
88-8F: 0068 0069 1161 1162 1163 1164 1165 1166
90-97: 001A 006A 006B 006C 006D 006E 006F 0070
98-9F: 0071 0072 1167 1168 1169 116A 116B 116C
A0-A7: 00AF 007E 0073 0074 0075 0076 0077 0078
A8-AF: 0079 007A 116D 116E 116F 1170 1171 1172
B0-B7: 005E 001A 005C 001A 001A 001A 001A 001A
B8-BF: 001A 001A 1173 1174 1175 001A 001A 001A
C0-C7: 007B 0041 0042 0043 0044 0045 0046 0047
C8-CF: 0048 0049 001A 001A 001A 001A 001A 001A
D0-D7: 007D 004A 004B 004C 004D 004E 004F 0050
D8-DF: 0051 0052 001A 001A 001A 001A 001A 001A
E0-E7: 20A9 001A 0053 0054 0055 0056 0057 0058
E8-EF: 0059 005A 001A 001A 001A 001A 001A 001A
F0-F7: 0030 0031 0032 0033 0034 0035 0036 0037
F8-FF: 0038 0039 001A 001A 001A 001A 001A 001A