15.10 - Application Considerations - ODBC Driver for Teradata

ODBC Driver for Teradata User Guide

ODBC Driver for Teradata
User Guide

The following subsections discuss the use of Unicode in applications accessing ODBC Driver for Teradata.

UNICODE Symbol Definition

If an application is compiled with the UNICODE symbol defined, then calls to ODBC API functions are mapped to their corresponding W-functions through macro substitution in the sqlucode.h header file. For example, a call to SQLExecDirect is mapped to a call to SQLExecDirectW.

If the UNICODE symbol is undefined, then the application uses Unicode string arguments by explicitly calling W-functions.

Applications can be written to be compiled as either Unicode or ANSI applications. In that case, the character data type can be declared as SQL_C_TCHAR. This is done using a macro that inserts SQL_C_WCHAR if the application is compiled as a Unicode application (with UNICODE symbol defined), or inserts SQL_C_CHAR if compiled as an ANSI application. The application programmer must be careful of functions taking SQLPOINTER as an argument. In addition, the size of the length argument changes for string data types, depending on whether the application is ANSI or Unicode.

On Windows, definitions in the tchar.h include file are useful for applications built as Unicode or ANSI. Unicode definitions in tchar.h are controlled by the _UNICODE #define function (preceded by an underscore).

See the MSDN ODBC programmer's manual for additional information.

Unicode Character Types

On Microsoft Windows, the Unicode character type is a distinct C/C++ type called wchar_t. Strings of this type use UTF-16 Unicode encoding, and many string support functions can be applied to them.

On a UNIX system, there is no distinct C/C++ type for UTF-8 encoded Unicode strings. Strings of type char are commonly used to represent character strings encoded in UTF-8, but care must be used when applying string manipulation functions, specifically with respect to lengths of strings that can be measured in bytes and characters.

Most UNIX system implementations also have a type wchar_t, but it is usually a 32-bit type used for fixed length character encodings such as UTF-32, and not UTF-8. For such systems, another approach is to use wchar_t internally within the application, and then convert strings of that type to UTF-8 and back whenever they are passed to external interfaces such as ODBC Driver for Teradata.

Whenever possible, the SQLWCHAR ODBC character type should be used for Unicode strings instead of the wchar_t type, since SQLWCHAR and wchar_t are not the same on all operating systems.

On a UNIX system, Unicode encoding for strings and data passed to ODBC Driver for Teradata can be changed from the default UTF-8 to UTF-16 as follows:

1 Define SQLWCHARSHORT. For example, add the following to your code:


Note: SQLCHARSHORT changes definitions of SQL_WCHAR from char* to short * and must be defined before ODBC include files are specified.

2 Set the SQL_ATTR_APP_UNICODE_TYPE environment attribute to SQL_DD_CP_UTF16. For example, add the following to your code:

 // Specify the unicode encoding for the application. SQL calls and
 // data are both affected. No other environment variables or
 // connection options (including DSN options) are needed.
 rc 	= SQLSetEnvAttr
	 (void *) SQL_DD_CP_UTF16, SQL_IS_INTEGER);

On Apple OS X, the wchar_t type is available, and is a 32-bit type. Since ODBC Driver Manager expects Unicode strings in UTF-32 encoding, the wchar_t type can be used to represent Unicode strings.

Length Arguments for Unicode ODBC Functions

Many ODBC interface functions expect string arguments that specify the length of character string input and output values. While some functions expect Unicode arguments to specify such lengths in bytes, others expect lengths to be specified as character counts. This varies by platform and Unicode encoding.

  • UTF-16/UTF-32 Encoded Unicode Strings: The following paragraph from the Unicode section of Chapter 17, “Programming Considerations,” of Microsoft ODBC 3.0 Programmer's Reference states the ultimate rule regarding the specification of length arguments in Unicode functions:
  • “Unicode functions that always return or take strings or length arguments are passed as count of-characters. For functions that return length information for server data, the display size and precision are described in number of characters. When a length (transfer size of the data) could refer to string or non-string data, the length is described in octet lengths. For example, SQLGetInfoW will still take the length as count-of-bytes, but SQLExecDirectW will use count-of-characters.”

  • UTF-8 Encoded Unicode Strings on a UNIX System: UTF-8 is the default Unicode encoding for ODBC applications running on a UNIX system. All string length arguments to ODBC interface functions should be specified as count of bytes.
  • For more information about specifying length arguments in Unicode functions, go to: http://support.microsoft.com/kb/294169/en-us.