Ill-formed Code Unit Sequences | Teradata Vantage - Ill-formed Code Unit Sequences - Analytics Database - Teradata Vantage

Teradata Vantage™ - Analytics Database International Character Set Support - 17.20

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Teradata Vantage
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2023-01-27
dita:mapPath
aju1628095815656.ditamap
dita:ditavalPath
qkf1628213546010.ditaval
dita:id
hqj1472245413611
Product Category
Teradata Vantage

Ill-formed code unit sequence errors occur when the byte values in a character string do not follow the encoding rules. These exceptions are detected while importing characters from a UTF8 or UTF16 session to a Unicode character string, or exporting from a Unicode character string to a UTF16 or UTF8 session.

UTF8 and Unicode

In a session where UPT is disabled, an error is returned when an ill-formed sequence error is encountered and processing stops.

In a Pass Through session, processing continues without error and a UNICODE REPLACEMENT CHARACTER (U+FFFD) is stored in the destination string for each ill-formed byte in the source string sequence.

The following table shows how UTF8 source strings are converted to UTF16 destination strings with replacement characters. Note that the replacement stops once a well-formed byte sequence is encountered.

Ill-formed UTF8 Source String UTF16 Destination String With Replacement Characters
80 FFFD
C261 FFFD0061
E18065 FFFDFFFD0065
F1808062 FFFDFFFDFFFD0062
63F48164 0063FFFDFFFD0064
C3679068A0BF69 FFFD0067FFFD0068FFFDFFFD0069

Ill-formed sequence exceptions during export to UTF8 are rare and can only occur if the source Unicode string is not well-formed.

UTF16 and Unicode

In a session where UPT is disabled, an error is returned when an ill-formed sequence error is encountered and processing stops.

In a Pass Through session, the following occurs for ill-formed sequence errors:
  • In cases where a High Surrogate is not followed by a Low Surrogate or a Low Surrogate is not preceded by a High Surrogate, processing continues without error and a UNICODE REPLACEMENT CHARACTER (U+FFFD) is stored in the destination string for each inappropriate surrogate. For example, the ill-formed UTF16 source string of ‘D800D800’ is converted to the UTF16 destination string with replacement characters of ‘FFFDFFFD’.
  • If the length of the source UTF16 character string is odd on import, processing continues without error and a trailing UNICODE REPLACEMENT CHARACTER (U+FFFD) is stored in the destination string for the partial source byte. For example, the ill-formed UTF16 source string of ‘D800’ is converted to the UTF16 destination string with replacement characters of ‘FFFD’.