Setting the TD_CHARSET optional attribute to UTF16 allows users to load and export UTF-16 data to and from database systems. The TD_CHARSET attribute is available for each driver. Teradata PT applications that use the UTF16 session character set must be aware of the differences in the expected sizes for CHAR and VARCHAR columns in the Schema object. The Schema object’s AddColumn method expects different sizes for CHAR and VARCHAR columns when the session character set is UTF16 or UTF8 than it does when the session character set is ASCII.
For example, a table is created with a column defined for data in Unicode® format:
create table utf16_tbl (c1 VARCHAR(100) character set unicode);
If a Teradata PT application wanted to load or export ASCII data into this table then column c1 would be added to the Schema object in the following way:
schema>AddColumn(“c1”, TD_VARCHAR, 100);
However, if a Teradata PT application wanted to load or export UTF-16 data into this table, then column c1 would have to be added to the Schema object in this way:
schema>AddColumn(“c1”, TD_VARCHAR, 200);
The size parameter in the AddColumn method refers to the size of the column in bytes. Valid UTF-16 data in the database contains two bytes per character, which is why 200 bytes is specified instead of 100 bytes. The same principle is true when the session character set is UTF8 because UTF-8 characters can be up to three bytes in length. Therefore, if a Teradata PT application wanted to load or export UTF-8 data into the table then column c1 would be added to the Schema object in the following way:
schema>AddColumn(“c1”, TD_VARCHAR, 300);
Using the UTF16 Encoding for Teradata PT Objects and Messages
User attributes, DML statements, and column names can be in the UTF-16 encoding when the session character set is UTF16. Teradata PT applications have the choice of passing in these values in UTF-16 or in UTF-8 when the session character set is UTF16. To accommodate this option, the Connection, Schema, and DML Group objects can be instantiated with an encoding parameter instead of the default void parameter. UTF-8, which is an ASCII-based encoding for Unicode characters, is the default encoding for all Connection, Schema, and DML Group objects.
If users specify all Connection attributes in the UTF-16 encoding, then their Connection objects must be instantiated in the following way:
Connection*conn = new Connection(TD_UTF16_ENCODING);
If users specify all column names passed to the Schema object methods in the UTF-16 encoding, then their Schema objects must be instantiated in the following way:
Schema*schema=new Schema(TD_UTF16_ENCODING);
If users specify all column names and DML statements passed to their DMLGroup object methods in the UTF-16 encoding, then their DMLGroup objects must be instantiated in the following way:
DMLGroup*dmlgr = new DMLGroup(TD_UTF16_ENCODING);
If an object is instantiated with TD_UTF16_ENCODING, then Teradata PT will treat all character strings passed in to any of that object’s methods as being in the UTF-16 encoding. Also, all UTF-16 character strings passed to the Connection, Schema, or DMLGroup object methods must be NULL-terminated. This means that the last two bytes of the UTF-16 string must be zero. This is important so Teradata PT can accurately determine the length of the UTF-16 string passed to any of the Connection, Schema, or DMLGroup object methods.
An option also exists for the Teradata PT application to receive messages between itself and Teradata PT in UTF-8 or UTF-16 when the session character set is UTF16. The optional attribute TD_MSG_ENCODING will dictate the behavior of Teradata PT in this regard. Messages that are passed between Teradata PT and the calling application include Teradata CLIv2 errors, database errors, and table or macro or database names found in the buffer for the Stream driver’s ApplyCount event. The value for the TD_MSG_ENCODING attribute can only be TD_UTF8_ENCODING (the default) or TD_UTF16_ENCODING. This optional attribute should only be used when the session character set attribute, TD_CHARSET, is set to UTF16.
- If the TD_MSG_ENCODING attribute is used and has a value of TD_UTF16_ENCODING but the TD_CHARSET attribute is not set to UTF16.
- If a Connection object is instantiated with TD_UTF16_ENCODING but the TD_CHARSET attribute is not set to UTF16.
- If a Schema object is instantiated with TD_UTF16_ENCODING but the TD_CHARSET attribute is not set to UTF16.
- If a DMLGroup object is instantiated with TD_UTF16_ENCODING but the TD_CHARSET attribute is not set to UTF16.
Additionally, all invalid character errors detected in any UTF-16 attribute, column name, or DML statement will be treated as terminating errors.