The actual information is contained on statements that immediately follow the UNICODE directive. Each such statement has the following syntax:
target_codepoint1<-target_codepoint2>: data_codepoint ...
target_codepoint1 specifies the first character in the user-defined character set that is defined on this statement, target_codepoint2 optionally specifies the last character defined on this statement, and data_codepoint defines the equivalent character in Unicode.
A codepoint is the hexadecimal representation of a character. The number of characters needed to specify a target codepoint is dependent on the encoding scheme for the character set. For the characters of interest to CLIv2, the length is always two except for UTF-16 encoding, for which the length is four. The length of a data codepoint is always four.
If the second target codepoint is specified, then one data codepoint is required for each character in the range between the two target codepoints. If the second target codepoint is omitted, then any number of data codepoints may be specified, each associated with codepoint one greater than the previous.
All statements after the UNICODE directive that contain a colon are associated with the UNICODE directive. Lack of a colon indicates that the statement is a new directive and ends that UNICODE directive.
The order of data codepoints among different statements is not significant.
The UNICODE directive may be specified only once for each character set.
If the same character is defined more than once for a character set, the last value is used.