16.10 - Ill-formed Code Unit Sequences - Teradata Database

Teradata Database International Character Set Support

prodname
Teradata Database
vrm_release
16.10
created_date
June 2017
category
Configuration
User Guide
featnum
B035-1125-161K

Ill-formed code unit sequence errors occur when the byte values in a character string do not follow the encoding rules. These exceptions are detected while importing characters from a UTF8 or UTF16 session to a Unicode character string, or exporting from a Unicode character string to a UTF16 or UTF8 session.

UTF8 and Unicode

In a session where UPT is disabled, an error is returned when an ill-formed sequence error is encountered and processing stops.

In a Pass Through session, processing continues without error and a UNICODE REPLACEMENT CHARACTER (U+FFFD) is stored in the destination string for each ill-formed byte in the source string sequence.

The following table shows how UTF8 source strings are converted to UTF16 destination strings with replacement characters. Note that the replacement stops once a well-formed byte sequence is encountered.

Ill-formed UTF8 Source String UTF16 Destination String With Replacement Characters
80 FFFD
C261 FFFD0061
E18065 FFFDFFFD0065
F1808062 FFFDFFFDFFFD0062
63F48164 0063FFFDFFFD0064
C3679068A0BF69 FFFD0067FFFD0068FFFDFFFD0069

Ill-formed sequence exceptions during export to UTF8 are rare and can only occur if the source Unicode string is not well-formed.

UTF16 and Unicode

In a session where UPT is disabled, an error is returned when an ill-formed sequence error is encountered and processing stops.

In a Pass Through session, the following occurs for ill-formed sequence errors:
  • In cases where a High Surrogate is not followed by a Low Surrogate or a Low Surrogate is not preceded by a High Surrogate, processing continues without error and a UNICODE REPLACEMENT CHARACTER (U+FFFD) is stored in the destination string for each inappropriate surrogate. For example, the ill-formed UTF16 source string of ‘D800D800’ is converted to the UTF16 destination string with replacement characters of ‘FFFDFFFD’.
  • If the length of the source UTF16 character string is odd on import, processing continues without error and a trailing UNICODE REPLACEMENT CHARACTER (U+FFFD) is stored in the destination string for the partial source byte. For example, the ill-formed UTF16 source string of ‘D800’ is converted to the UTF16 destination string with replacement characters of ‘FFFD’.