- You may have existing processes in place to cleanse Unicode data before loading it into VantageCloud Lake, such as access modules and UDFs. To take full advantage of Unicode Pass Through, you must change or eliminate prior methods of cleansing Unicode data before loading. See the Teradata Parallel Transporter (TPT) documentation for details on configuring how TPT uses access modules. For details about configuring access modules to allow pass through characters, see Teradata® Tools and Utilities Access Module Reference, B035-2425, especially the sections regarding automatic character conversions.
- PTCs have default character properties; therefore, case sensitivity, object name support, collation, and inequality comparisons are not supported.
- Unicode delimited character literals support PTCs.
For example: SELECT feed FROM twitter WHERE feed LIKE U&'%#+01F602%' UESCAPE '#';
- PTCs in SMP require two 16-bit UTF-16 code units, or 4 bytes. Therefore, VARCHAR(2) or CHAR(2) are the minimum sizes required to store a PTC.
- CHARACTER(4) is the minimum size required to import a PTC using a USING clause.
- PTCs do not affect compression. The TransUnicodeToUTF8 compression function compresses surrogate pairs to a single 4-byte UTF8 encoded value. TransUTF8ToUnicode decompresses the value to surrogate pairs.
Truncation
In sessions where Unicode Pass Through is enabled, truncation of a PTC occurs when the destination buffer does not have 4 bytes to contain the entire surrogate pair. If only 2 bytes remain in a fixed CHAR string, then the string is padded with a SPACE (U+0020) in Teradata mode. In ANSI mode, the truncation mode error is returned.
Format Strings
The length of an ‘X’ format string is in UTF-16 code units. So, FORMAT ‘X(2)’ is the minimum size format string required to display a PTC.
Export Width
The representation of supplementary code points is four bytes long in the UTF8 and UTF16 character sets. This may suggest that the export width for the session must be four bytes per character to correctly process these characters. However, the representation internally is UTF-16 as two code units. Export width sees this as two characters, so unless a site-defined export width is specified, export width reserves four bytes for two characters exported to UTF16 and six bytes when exported to UTF8.
Therefore, there is no export width issue for supplementary code points unless you use a site-tailored export width. For sessions with tailored export width that may cause truncation of strings with supplementary code points, truncation was already possible unless the repertoire is known. If the repertoire of characters is being managed, pass-through sessions are not an issue.
Collation
For Unicode data, PTCs sort at the position of the error character but in Unicode code point order. Characters off the BMP, such as those represented as surrogate pairs, collate based on their UTF-16 code point order. This is true even for CHARSET_COLL collation for UTF8 sessions.
UNICODE PASS THROUGH does not provide full support for PTCs. With respect to collation, pass through surrogate pairs sort as if composed of two characters. In the current system, for all ASCII, EBCDIC, JIS_COLL, CHARSET_COLL, and MULTINATIONAL collations, code points in the range 0xD800 to 0xDFFF sort before U+FFFF. For example, under ASCII case-specific collation, a pass through surrogate pair 0xD800 0xDC00 is treated as two Unicode UTF-16 characters, U+D800 and U+DC00 and are sorted before U+FFFF.
Comparisons
- Equality and the LIKE operators work correctly with PTCs.
- Inequality predicates may not work correctly with PTCs.
String Functions
- Indexing and counting functions operate on UTF-16 code units.
- Substring can split surrogate pairs resulting in U+FFFD.
- UNICODE_TO_UNICODE_NF(K)C and UNICODE_TO_UNICODE_NF(K)D TRANSLATE functions support the normalization of PTCs producing a normalized result string.
- The TRANSLATE_CHK function used with Unicode-to-Unicode translations stops processing only when encountering a U+FFFD.
- CASE translations may not work correctly.
Stored Procedures
In a session where UPT is disabled, you cannot create a stored procedure that references a PTC inside the stored procedure body. Such a procedure can only be created within a Pass Through session.
- In a session where UPT is disabled, you cannot call a stored procedure that is defined with a PTC parameter.
- In a Pass Through session, you can successfully call a stored procedure that is defined with a PTC parameter and pass a PTC literal to it as an argument.
- In all sessions (UPT enabled or disabled), you can call a stored procedure that references a PTC in the body of the the stored procedure.
Stored Procedure Creation or Execution | Current Session Type | Expected Result |
---|---|---|
Creating a stored procedure that references a PTC in the procedure body | Pass Through session | Successful |
UPT is disabled | Error | |
Calling a stored procedure that is defined with a PTC parameter | Pass Through session | Successful |
UPT is disabled | Error | |
Calling a stored procedure that references a PTC in the procedure body | Pass Through session | Successful |
UPT is disabled | Successful | |
Calling a stored procedure that is defined with a PTC parameter and also references a PTC in the procedure body | Pass Through session | Successful |
UPT is disabled | Error |