Using ODBC Operator with UTF8 Character Set - Parallel Transporter

Teradata® Parallel Transporter Reference - 20.00

Deployment
VantageCloud
VantageCore
Edition
Enterprise
VMware
IntelliFlex
Lake
Product
Parallel Transporter
Release Number
20.00
Published
October 2023
ft:locale
en-US
ft:lastEdition
2025-11-21
dita:mapPath
mjn1691132485167.ditamap
dita:ditavalPath
obe1474387269547.ditaval
dita:id
ogv1478610452101
Product Category
Teradata Tools and Utilities

When using the ODBC operator in TPT to extract Unicode data from external sources such as Oracle, MySQL, PostgreSQL, SQL Server, or DB2, and specifying the UTF8 session character set, extra care must be taken to define VARCHAR/CHAR lengths in the DEFINE SCHEMA block.

While you can use ADJUST UNICODE in DEFINE SCHEMA statement, enabling this setting automatically triples the length of Unicode VARCHAR columns to account for multibyte character storage. However, tripling is not always sufficient when working with UTF8, which can use up to 4 bytes per character. This mismatch can result in runtime errors such as:
  • Data length mismatch
  • Invalid data translation or truncation
  • Row discarded due to data conversion error

Set the attribute 'TruncateData' to 'Yes' to circumvent these runtime errors. See "TruncateData" attribute description in Required and Optional Attributes.

Teradata recommends manually defining VARCHAR/CHAR lengths up to 4 times the original character length to safely accommodate full UTF8 encoding range (e.g., CJK, emojis). This is particularly important when dealing with multilingual content or character sets outside the Basic Multilingual Plane (BMP).

Example:

Source Column Column_Name VARCHAR(100) (in Oracle, MySQL, etc.)
TPT DEFINE SCHEMA Columnn_Name VARCHAR(400)

Be aware that the total row width in the DEFINE SCHEMA must still respect TPT's maximum row size limit (typically 1MB) and that VARCHAR/CHAR size in the schema does not exceed 64,000 bytes.