UTF-8 - MultiLoad

Teradata MultiLoad Reference

Product

MultiLoad

Release Number

16.10

Published

May 2017

Language

English (United States)

Last Update

2018-07-11

dita:mapPath

cgb1488824663145.ditamap

dita:ditavalPath

Audience_PDF_product_tpt_userguide_include.ditaval

dita:id

B035-2409

lifecycle

Product Category

Teradata Tools and Utilities

In simple terms, UTF-8 is an 8 bit encoding of 16 bit Unicode to achieve an international character representation.

In more technical terms, in UTF-8, characters are encoded using sequences of 1 to 6 octets. The only octet of a sequence of one has the higher-order bit set to 0, the remaining 7 bits are used to encode the character value. UTF-8 uses all bits of an octet, but has the quality of preserving the full US-ASCII range. The UTF-8 encoding of Unicode and UCS avoids the problems of fixed-length Unicode encodings because an ASCII file encoded in UTF is exactly same as the original ASCII file and all non-ASCII characters are guaranteed to have the most significant bit set (bit 0x80). This means that normal tools for text searching work as expected.