Ill-formed Code Unit Sequences | Teradata Vantage - 17.10 - Ill-formed Code Unit Sequences - Advanced SQL Engine - Teradata Database

Teradata Vantage™ - Advanced SQL Engine International Character Set Support

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Release Date
July 2021
Content Type
Configuration
User Guide
Publication ID
B035-1125-171K
Language
English (United States)

Ill-formed code unit sequence errors occur when the byte values in a character string do not follow the encoding rules. These exceptions are detected while importing characters from a UTF8 or UTF16 session to a Unicode character string, or exporting from a Unicode character string to a UTF16 or UTF8 session.

UTF8 and Unicode

In a session where UPT is disabled, an error is returned when an ill-formed sequence error is encountered and processing stops.

In a Pass Through session, processing continues without error and a UNICODE REPLACEMENT CHARACTER (U+FFFD) is stored in the destination string for each ill-formed byte in the source string sequence.

The following table shows how UTF8 source strings are converted to UTF16 destination strings with replacement characters. Note that the replacement stops once a well-formed byte sequence is encountered.

Ill-formed UTF8 Source String UTF16 Destination String With Replacement Characters
80 FFFD
C261 FFFD0061
E18065 FFFDFFFD0065
F1808062 FFFDFFFDFFFD0062
63F48164 0063FFFDFFFD0064
C3679068A0BF69 FFFD0067FFFD0068FFFDFFFD0069

Ill-formed sequence exceptions during export to UTF8 are rare and can only occur if the source Unicode string is not well-formed.

UTF16 and Unicode

In a session where UPT is disabled, an error is returned when an ill-formed sequence error is encountered and processing stops.

In a Pass Through session, the following occurs for ill-formed sequence errors:
  • In cases where a High Surrogate is not followed by a Low Surrogate or a Low Surrogate is not preceded by a High Surrogate, processing continues without error and a UNICODE REPLACEMENT CHARACTER (U+FFFD) is stored in the destination string for each inappropriate surrogate. For example, the ill-formed UTF16 source string of ‘D800D800’ is converted to the UTF16 destination string with replacement characters of ‘FFFDFFFD’.
  • If the length of the source UTF16 character string is odd on import, processing continues without error and a trailing UNICODE REPLACEMENT CHARACTER (U+FFFD) is stored in the destination string for the partial source byte. For example, the ill-formed UTF16 source string of ‘D800’ is converted to the UTF16 destination string with replacement characters of ‘FFFD’.