Character Set Specification - FastExport

Teradata FastExport Reference

Product
FastExport
Release Number
15.00
Language
English (United States)
Last Update
2018-09-28
dita:id
B035-2410
lifecycle
previous
Product Category
Teradata Tools and Utilities

Character Set Specification

Teradata Database allows a character set to be established when invoking FastExport. For example, if a table or database names that have kanji double‑byte characters or mixed single‑byte and multibyte characters, the appropriate character set can be chosen.

Table 9 lists the standard character sets supported by FastExport.

 

Table 9: Standard Character Sets Supported by FastExport 

Name

Description

System Configuration

EBCDIC

Latin

Mainframeattached

ASCII

Latin

Networkattached

HANGULEBCDIC933_1II

Korean

Mainframeattached

HANGULKSC5601_2R4

Korean

Networkattached

KATAKANAEBCDIC

Japanese

Mainframeattached

KANJIEBCDIC5026_0I

Japanese

Mainframeattached

KANJIEBCDIC5035_0I

Japanese

Mainframeattached

KANJIEUC_0U

Japanese

Networkattached

KANJISJIS_0S

Japanese

Networkattached

SCHEBCDIC935_2lJ

Simplified Chinese

Mainframeattached

SCHGB2312_1T0

Simplified Chinese

Networkattached

TCHBIG5_1R0

Traditional Chinese

Networkattached

TCHEBCDIC937_3IB

Traditional Chinese

Mainframeattached

UTF8

Unicode® character set

Networkattached

UTF8

Unicode character set

Mainframeattached

Networkattached

UTF16

Unicode character set

Networkattached

UTF16

Unicode character set

Networkattached

SiteDefined Character Sets

When the character sets defined are not appropriate for a site, define the character sets shown in Table 10.

 

Table 10: SiteDefined Character Sets 

Name

Description

System Configuration

SDKATAKANAEBCDIC_4IF

Sitedefined Japanese

MainframeAttached

SDKANJIEBCDIC5026_4IG

Sitedefined Japanese

Mainframe‑Attached

SDKANJIEBCDIC5035_4IH

Sitedefined Japanese

Mainframe‑Attached

SDKANJIEUC_1U3

Sitedefined Japanese

Network‑Attached

SDKANJISJIS_1S3

Sitedefined Japanese

Network‑Attached

SDSCHEBCDIC935_6IJ

Sitedefined Simplified Chinese

Mainframe‑Attached

SDTCHEBCDIC937_7IB

Sitedefined Traditional Chinese

Mainframe‑Attached

SDSCHGB2312_2T0

Sitedefined Simplified Chinese

Network‑Attached

SDTCHBIG5_3R0

Sitedefined Traditional Chinese

Network‑Attached

SDHANGULEBCDIC933_5II

Sitedefined Korean

Mainframe‑Attached

SDHANGULKSC5601_4R4

Sitedefined Korean

Network‑Attached

Note: For information about defining a character set appropriate for a site, see International Character Set Support (B035‑1132).

Rules for Using Chinese and Korean Character Sets

Observe the following rules when using Chinese and Korean character sets on mainframeattached and networkattached platforms:

  • Object Names
  • Object names are limited to AZ, az, 09, and special characters such as $ and _.

  • Maximum String Length
  • Teradata Database requires two bytes to process each of the Chinese or Korean characters. This limits both request size and record size. For example, if a record consists of one string, the length of that string is limited to a maximum of 32,000 characters or 64,000 bytes.

    Note: For more information about Chinese or Korean character set restrictions for Teradata Database, or for more information about alternate character sets, see International Character Set Support (B035‑1132).

    If Japanese language support is not required, specify EBCDIC or ASCII as the character set parameter.

    Unicode® Character Sets

    UTF8 and UTF16 are two of the standard ways of encoding Unicode character data. The UTF‑8 client character set supports UTF8 encoding. Currently, Teradata Database supports UTF8 characters that can consist of from one to three bytes. The UTF‑16 client character set supports UTF16 encoding. Currently, Teradata Database supports the Unicode 2.1 standard, where each defined character requires exactly 16 bits.

    There are restrictions imposed by Teradata Database on using the UTF‑8 or UTF‑16 character set. For restriction details, see International Character Set Support (B035‑1132).

    UTF‑8 Character Sets

    FastExport supports UTF‑8 character set on network‑attached platforms and IBM z/OS. When using UTF‑8 client character set on IBM z/OS, the job script must be in Teradata EBCDIC. FastExport translates commands in the job script from Teradata EBCDIC to UTF‑8 during the export.

    Be sure to check the definition in International Character Set Support (B035‑1132) to determine the code points of any special characters required in the job script.

    Different versions of EBCDIC do not always agree as to the placement of these characters. See International Character Set Support (B035‑1132) for details on mapping Teradata EBCDIC and Unicode.

    UTF‑16 Character Sets

    FastExport supports UTF‑16 character set on network‑attached platforms. In general, the command language and the job output should be the same as the client character set used by the job. However, for users’ convenience and because of the special property of Unicode, the command language and the job output are not required to be the same as the client character set when using UTF16 character set. When using UTF16 character set, the job script and the job output can either be in UTF‑8 or UTF‑16 character set. This is provided by specifying runtime parameters “‑i” and “‑u” when the job is invoked.

    For more information on runtime parameters “‑i” and “‑u”, seeTable 6 on page 29.

    Table 11 describes four ways to either specify the character set or accept a default specification.

     

    Table 11: Methods for Specifying Character Sets 

    Method

    Description

    Client System Specification

    Another way is to specify the character set for a client system before invoking FastExport by configuring the:

  • HSHSPB parameter for mainframe‑attached z/OS client systems
  • clispb.dat file for network‑attached UNIX and Windows client systems
  • Note: The charactersetname specification used when to invoke FastExport always takes precedence over the current client system specification.

    FastExport Utility Default

    If there is no character set specification in DBC.Hosts, then FastExport defaults to:

  • EBCDIC for mainframe‑attached z/OS client systems
  • ASCII for network‑attached UNIX client systems
  • Runtime Parameter Specification

    The best way to specify the character set is with the character set runtime parameter when invoking FastExport, as described earlier in this chapter:

  • CHARSET=charactersetname for mainframe‑attached z/OS client systems
  • ‑c charactersetname for network‑attached UNIX and Windows client systems
  • For a list of valid character set names, see “Character Set Specification” on page 43.

    Teradata Database Default

    If a charactersetname specification is not used when FastExport is invoked, and there is no character set specification for the client system, then the utility uses the default specification in the Teradata Database system table DBC.Hosts.

    Note: If the DBC.Hosts table specification is relied upon for the default character set, make sure that the initial logon is in the default character set:

  • EBCDIC for mainframe‑attached z/OS client systems
  • ASCII for network‑attached UNIX and Windows client systems
  • Using AXSMOD

    When an AXSMOD is used, FastExport will pass the session character set as an attribute to the AXSMOD for its possible use (most AXSMODs will not make any use of this information). The attribute name will be CHARSET_NAME and it will be a variable length character string.

    After FastExport passes the session character set to the AXSMOD successfully, FastExport will pass export widths information that pertains to the current session character set as an attribute to the AXSMOD for its possible use. The attribute name is EXPORT_WIDTHS. FastExport extracts the export widths information from the data parcel returned by the HELP SESSION command.

    The export width information is passed as an array to the AXSMOD and is used by the AXSMOD to calculate the size in bytes of exported fixedlength character columns. This size depends not only on the number of characters in the data type (the n in CHAR(n)), but also on the selected session character set, and the server character type (specified in the CHARACTER SET clause of the CREATE TABLE statement). Each structure passed in the array has information for one server character type. The export widths information structure is defined as the following:

    typedef struct pmExpWidth
     {
       pmUInt16 CharType;    /* Server character type code. */
       pmUInt16 ExpWidth;    /* Export width. */
       pmUInt16 ExpWidthAdj; /* Export width adjustment. */  
     } pmExpWidth_t;

    For more information about export width rules, see Utilities (B035‑1102).

    Multibyte Character Sets

    Teradata Database supports multibyte characters in object names when the client session character set is UTF8 or UTF16. Refer to International Character Set Support for a list of valid characters used in object names. If multi-byte characters are used in object names in a Teradata FastExport script, they must be enclosed in double quotes.

    To log on with UTF8 character set or other supported multibyte character sets (Chinese, Japanese, or Korean), create object names shorter than 30 bytes. This limitation applies to userid, password, and account. The logon string might fail if it exceeds 30 bytes per object name.

    Multibyte character sets impact the operation of certain FastExport commands, as well as object names in Teradata SQL statements, as shown in Table 12.

     

    Table 12: Commands Impacting Multibyte Character Sets 

    FastExport Command

    Affected Elements

    Impact

    FIELD

    Field name

    The field name specified can have multibyte characters. In addition, it can be referenced in:

  • Other FIELD commands
  • NULLIF and field concatenation expressions
  • APPLY WHERE conditions in IMPORT commands
  • Contain a NULLIF expression, which may use multibyte characters
  • FILLER

    Filler name

    The name specified in a FILLER command can have multibyte characters.

    LAYOUT

    Layout name

    The layout name can:

  • Have multibyte characters
  • Be used in the LAYOUT clause of an IMPORT command
  •  

    CONTINUEIF condition

    The CONTINUEIF condition can specify multibyte character set character comparisons.

    LOGON

    User name and password

    The user name and password can have multibyte characters.

    LOGTABLE

    Table and database names

    The restart log table name and database name can have multibyte characters.