UTF-8 - Basic Teradata Query

Basic Teradata Query Reference

Product
Basic Teradata Query
Release Number
16.00
Published
November 2016
Language
English (United States)
Last Update
2018-04-25
dita:mapPath
hyz1479325149183.ditamap
dita:ditavalPath
Audience_PDF_include.ditaval
dita:id
B035-2414
lifecycle
previous
Product Category
Teradata Tools and Utilities

In simple terms, UTF-8 is an 8 bit encoding of 16 bit Unicode to achieve an international character representation.

In more technical terms, in UTF-8, characters are encoded using sequences of 1 to 6 octets. The only octet of a sequence of one has the higher-order bit set to 0, the remaining 7 bits are used to encode the character value. UTF-8 uses all bits of an octet, but has the quality of preserving the full US-ASCII range. The UTF-8 encoding of Unicode and UCS avoids the problems of fixed-length Unicode encodings because an ASCII file encoded in UTF is exactly same as the original ASCII file and all non-ASCII characters are guaranteed to have the most significant bit set (bit 0x80). This means that normal tools for text searching work as expected.