Ill-formed code unit sequence errors occur when the byte values in a character string do not follow the encoding rules. These exceptions are detected while importing characters from a UTF8 or UTF16 session to a Unicode character string, or exporting from a Unicode character string to a UTF16 or UTF8 session.
UTF8 and Unicode
In a session where UPT is disabled, an error is returned when an ill-formed sequence error is encountered and processing stops.
In a Pass Through session, processing continues without error and a UNICODE REPLACEMENT CHARACTER (U+FFFD) is stored in the destination string for each ill-formed byte in the source string sequence.
The following table shows how UTF8 source strings are converted to UTF16 destination strings with replacement characters. The replacement stops once a well-formed byte sequence is encountered.
Ill-formed UTF8 Source String | UTF16 Destination String With Replacement Characters |
---|---|
80 | FFFD |
C261 | FFFD0061 |
E18065 | FFFDFFFD0065 |
F1808062 | FFFDFFFDFFFD0062 |
63F48164 | 0063FFFDFFFD0064 |
C3679068A0BF69 | FFFD0067FFFD0068FFFDFFFD0069 |
Ill-formed sequence exceptions during export to UTF8 are rare and can only occur if the source Unicode string is not well-formed.
UTF16 and Unicode
In a session where UPT is disabled, an error is returned when an ill-formed sequence error is encountered and processing stops.
- In cases where a High Surrogate is not followed by a Low Surrogate or a Low Surrogate is not preceded by a High Surrogate, processing continues without error and a UNICODE REPLACEMENT CHARACTER (U+FFFD) is stored in the destination string for each inappropriate surrogate. For example, the ill-formed UTF16 source string of ‘D800D800’ is converted to the UTF16 destination string with replacement characters of ‘FFFDFFFD’.
- If the length of the source UTF16 character string is odd on import, processing continues without error and a trailing UNICODE REPLACEMENT CHARACTER (U+FFFD) is stored in the destination string for the partial source byte. For example, the ill-formed UTF16 source string of ‘D800’ is converted to the UTF16 destination string with replacement characters of ‘FFFD’.