When using the ODBC operator in TPT to extract Unicode data from external sources such as Oracle, MySQL, PostgreSQL, SQL Server, or DB2, and specifying the UTF8 session character set, extra care must be taken to define VARCHAR/CHAR lengths in the DEFINE SCHEMA block.
- Data length mismatch
- Invalid data translation or truncation
- Row discarded due to data conversion error
Set the attribute 'TruncateData' to 'Yes' to circumvent these runtime errors. See "TruncateData" attribute description in Required and Optional Attributes.
Teradata recommends manually defining VARCHAR/CHAR lengths up to 4 times the original character length to safely accommodate full UTF8 encoding range (e.g., CJK, emojis). This is particularly important when dealing with multilingual content or character sets outside the Basic Multilingual Plane (BMP).
Example:
| Source Column | Column_Name | VARCHAR(100) (in Oracle, MySQL, etc.) |
| TPT DEFINE SCHEMA | Columnn_Name | VARCHAR(400) |
Be aware that the total row width in the DEFINE SCHEMA must still respect TPT's maximum row size limit (typically 1MB) and that VARCHAR/CHAR size in the schema does not exceed 64,000 bytes.