Ill-formed Code Unit Sequences | Teradata Vantage - Ill-formed Code Unit Sequences - Advanced SQL Engine

Ill-formed Code Unit Sequences | Teradata Vantage - Ill-formed Code Unit Sequences - Advanced SQL Engine - Teradata Database

International Character Set Support

Product

Advanced SQL Engine

Teradata Database

Release Number

17.05

17.00

Published

June 2020

Language

English (United States)

Last Update

2021-01-23

dita:mapPath

ywb1588027283948.ditamap

dita:ditavalPath

lze1555437562152.ditaval

dita:id

B035-1125

lifecycle

Product Category

Teradata Vantage™

Ill-formed code unit sequence errors occur when the byte values in a character string do not follow the encoding rules. These exceptions are detected while importing characters from a UTF8 or UTF16 session to a Unicode character string, or exporting from a Unicode character string to a UTF16 or UTF8 session.

UTF8 and Unicode

In a session where UPT is disabled, an error is returned when an ill-formed sequence error is encountered and processing stops.

In a Pass Through session, processing continues without error and a UNICODE REPLACEMENT CHARACTER (U+FFFD) is stored in the destination string for each ill-formed byte in the source string sequence.

The following table shows how UTF8 source strings are converted to UTF16 destination strings with replacement characters. Note that the replacement stops once a well-formed byte sequence is encountered.

Ill-formed UTF8 Source String	UTF16 Destination String With Replacement Characters
80	FFFD
C261	FFFD0061
E18065	FFFDFFFD0065
F1808062	FFFDFFFDFFFD0062
63F48164	0063FFFDFFFD0064
C3679068A0BF69	FFFD0067FFFD0068FFFDFFFD0069

Ill-formed sequence exceptions during export to UTF8 are rare and can only occur if the source Unicode string is not well-formed.

UTF16 and Unicode

In a session where UPT is disabled, an error is returned when an ill-formed sequence error is encountered and processing stops.

In a Pass Through session, the following occurs for ill-formed sequence errors:

In cases where a High Surrogate is not followed by a Low Surrogate or a Low Surrogate is not preceded by a High Surrogate, processing continues without error and a UNICODE REPLACEMENT CHARACTER (U+FFFD) is stored in the destination string for each inappropriate surrogate. For example, the ill-formed UTF16 source string of ‘D800D800’ is converted to the UTF16 destination string with replacement characters of ‘FFFDFFFD’.
If the length of the source UTF16 character string is odd on import, processing continues without error and a trailing UNICODE REPLACEMENT CHARACTER (U+FFFD) is stored in the destination string for the partial source byte. For example, the ill-formed UTF16 source string of ‘D800’ is converted to the UTF16 destination string with replacement characters of ‘FFFD’.