Ill-formed Code Unit Sequences | VantageCloud Lake - Ill-formed Code Unit Sequences - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

Ill-formed code unit sequence errors occur when the byte values in a character string do not follow the encoding rules. These exceptions are detected while importing characters from a UTF8 or UTF16 session to a Unicode character string, or exporting from a Unicode character string to a UTF16 or UTF8 session.

UTF8 and Unicode

In a session where UPT is disabled, an error is returned when an ill-formed sequence error is encountered and processing stops.

In a Pass Through session, processing continues without error and a UNICODE REPLACEMENT CHARACTER (U+FFFD) is stored in the destination string for each ill-formed byte in the source string sequence.

The following table shows how UTF8 source strings are converted to UTF16 destination strings with replacement characters. The replacement stops once a well-formed byte sequence is encountered.

Ill-formed UTF8 Source String UTF16 Destination String With Replacement Characters
80 FFFD
C261 FFFD0061
E18065 FFFDFFFD0065
F1808062 FFFDFFFDFFFD0062
63F48164 0063FFFDFFFD0064
C3679068A0BF69 FFFD0067FFFD0068FFFDFFFD0069

Ill-formed sequence exceptions during export to UTF8 are rare and can only occur if the source Unicode string is not well-formed.

UTF16 and Unicode

In a session where UPT is disabled, an error is returned when an ill-formed sequence error is encountered and processing stops.

In a Pass Through session, the following occurs for ill-formed sequence errors:
  • In cases where a High Surrogate is not followed by a Low Surrogate or a Low Surrogate is not preceded by a High Surrogate, processing continues without error and a UNICODE REPLACEMENT CHARACTER (U+FFFD) is stored in the destination string for each inappropriate surrogate. For example, the ill-formed UTF16 source string of ‘D800D800’ is converted to the UTF16 destination string with replacement characters of ‘FFFDFFFD’.
  • If the length of the source UTF16 character string is odd on import, processing continues without error and a trailing UNICODE REPLACEMENT CHARACTER (U+FFFD) is stored in the destination string for the partial source byte. For example, the ill-formed UTF16 source string of ‘D800’ is converted to the UTF16 destination string with replacement characters of ‘FFFD’.