UTF-8 - FastExport

Teradata FastExport Reference

Product
FastExport
Release Number
16.10
Published
May 2017
Language
English (United States)
Last Update
2018-05-22
dita:mapPath
fmk1488824663357.ditamap
dita:ditavalPath
Audience_PDF_include.ditaval
dita:id
B035-2410
lifecycle
previous
Product Category
Teradata Tools and Utilities

In simple terms, UTF-8 is an 8 bit encoding of 16 bit Unicode to achieve an international character representation.

In more technical terms, in UTF-8, characters are encoded using sequences of 1 to 6 octets. The only octet of a sequence of one has the higher-order bit set to 0, the remaining 7 bits are used to encode the character value. UTF-8 uses all bits of an octet, but has the quality of preserving the full US-ASCII range. The UTF-8 encoding of Unicode and UCS avoids the problems of fixed-length Unicode encodings because an ASCII file encoded in UTF is exactly same as the original ASCII file and all non-ASCII characters are guaranteed to have the most significant bit set (bit 0x80). This means that normal tools for text searching work as expected.