Directives - Director Program

Teradata Director Program Reference

Product
Director Program
Release Number
15.10
Language
English (United States)
Last Update
2018-10-07
dita:id
B035-2416
lifecycle
previous
Product Category
Teradata Tools and Utilities

Each statement contains a directive or is associated with a directive that identifies the purpose of the statement. A directive is analogous to a command but different terminology is used to prevent confusion with true TDP commands.

The following directives are supported:

 

Table 5: Directives 

Directive

Function

CHAR

Defines the syntactic characters.

CHARSET

Explicitly begins a definition and possibly the encoding scheme.

END

Ends processing of records in the file.

MONOCASE

Defines characters that have both lower and upper case.

NUMERICS

Defines the numeric characters.

SANITIZE

Defines valid characters for TDP messages sent using operating system facilities.

UNICODE

Defines the syntactic characters and characters that have both lower and upper case.

A file describes one or more character sets, although only one description is used by each SET USERCS command.

When multiple descriptions are present, each begins with a CHARSET directive and ends with the next CHARSET directive, the END directive, or the last record in the file.

The CHAR, MONOCASE, NUMERICS, SANITIZE, and UNICASE directives can appear in any order within a description. If a CHAR, MONOCASE, NUMERICS, SANITIZE, or UNICODE directive appears before a CHARSET directive, then a character set description is implicitly begun -- in effect, a CHARSET directive with no operands is assumed.

The following sections provide information and syntax diagrams for each directive. Refer to Appendix D: “How to Read the Syntax Diagrams,” for additional information on syntax diagrams.

The CHAR directive defines the syntactic characters of importance to TDP.

Syntax  

Usage Notes  

The length of each value is determined by the encoding scheme for the character set. For the characters of interest to TDP, the length is always two except for UTF16 encoding, for which the length is four. The CHAR directive can be specified more than once for each character set.

If the same character is defined more than once for a character set (either on a single CHAR directive, on multiple CHAR directives, or on a CHAR and a UNICODE directive), the last value is used. All four characters must be defined to form a complete character set description.

If no CHARSET directive precedes CHAR, then a character set description is implicitly begun -- in effect, a CHARSET directive with no operands is assumed.

Example  

Define the relevant syntactic characters for IBM Code Page 833.

CHAR SPACE 40 COMMA 6B APOSTROP 7D DBLQUOTE 7F

The CHARSET directive explicitly begins a definition and possibly the encoding scheme.

Syntax  

Usage Notes  

NAME identifies the character set to which the description applies. The name might include a standard suffix that defines the encoding scheme. The standard suffix consists of an underscore, a number not relevant to CLIv2, the encoding character (A, E, I, R, S, T, or U), and an optional character not relevant to CLIv2. Each suffix corresponds to an ENCODING operand value:

  • E - EBCDIC
  • I - IBMSOSI
  • A - ASCII
  • R - BIGFIVE
  • S - SJIS
  • T - EUC-CN or EUC-KR
  • U - EUC-JP
  • ENCODING optionally identifies the encoding scheme for the character set. If omitted, the character set must contain a standard suffix that indicates the encoding. If such a suffix exists, then the encoding cannot be overridden using this operand. The following character sets are available in TDP.

     

    ENCODING

    Meaning

    Characteristics

    EBCDIC

    Extended Binary-Coded-Decimal Interchange Code

  • Single-byte (EBCDIC) codepoints:
  • X'00' through X'FF'

    IBMOSI

    IBM Shift-out/Shift-in

  • Single-byte (EBCDIC) codepoints:
  • X'00' through X'FF'

  • Double-byte (EBDCIC) codepoints:
  • Shift-out (X'0E') through Shift-in (X'0E')

    ASCII

    American Standard Code for Information Interchange

  • Single-byte (ASCII) codepoints:
  • X'00' through X'FF'

    BIGFIVE

    Big Five Plus

  • Single-byte (ASCII) codepoints:
  • X'00' through X'80', and X'FF'

  • Double-byte (ASCII) codepoints:
  • X'81' through X'FE'

    EUC-CN

    Extended Unix Code - China

  • Single-byte (ASCII) codepoints:
  • X'00' through X'7F'

  • Double-byte (ASCII) codepoints:
  • X'80' through X'FF'

    EUC-JP

    Extended Unix Code - Japan

  • Single-byte (ASCII) codepoints:
  • X'00' through X'8D'

    X'90' through X'FF'

  • Double-byte (ASCII) codepoints:
  • Single-shift1 (X'8E')

  • Triple-byte (ASCII) codepoints:
  • Single-shift2 (X'8F)'

    EUC-KR

    Extended Unix Code - Korea

  • Single-byte (ASCII) codepoints:
  • X'00' through X'7F'

  • Double-byte (ASCII) codepoints:
  • X'80' through X'FF'

    SJIS

    Shift-JIS (Japanese Industrial Standard)

  • Single-byte (ASCII) codepoints:
  • X'00' through X'80'

    X'A0' through X'DF'

    X'FD' through X'FF'

  • Double-byte (ASCII) codepoints:
  • X'81' through X'9F'

    X'E0' through X'FC'

    UHC

    Unified Hangul Code

  • Single-byte (ASCII) codepoints:
  • X'00' through X'80', and X'FF'

  • Double-byte (ASCII) codepoints:
  • X'81' through X'FE'

    UTF8

    UCS (Universal Character Set) Transformation Format 8-bit

  • Single-byte (Unicode®) codepoints:
  • X'00' through X'7F'

  • Double-byte (Unicode®) codepoints:
  • X'C0' through X'DF'

  • (Most) triple-byte (Unicode®) codepoints:
  • X'E0' through X'FE'

    Most four-byte codepoints (X'F0' through X'F4') are not supported by Teradata Database.

    UTF16

    UCS (Universal Character Set) Transformation Format- 16-bit

  • Single-byte (Unicode®) codepoints:
  • X'0000' through X'D7FF'

    X'E000' through X'FFFF'

    Surrogates (four-byte codepoints that begin or end with the two-byte codpoints X'D800' through X'DBFF') are not supported by Teradata Database.

    When the NAME operand is specified, if this name does not match the character set name specified on the SET USERCS command, this directive and all directives until the next CHARSET directive are ignored. When the NAME operand is not specified, then this directive is used, which implies that any subsequent CHARSET directives in the file will never be processed since this one will always be used.

    While all codepoints are reflected to and from Teradata Database, for character sets that allow mixtures of single and multi-byte characters, only the single-byte characters are meaningful in TDP command syntax.

    Example  

    Begin definition for IBM Code Page 833, the single-byte component for IBM CCSID 933.

    CHARSET NAME KOREAN_EBCDIC933 ENCODING IBMSOSI

    The END directive ends processing of records in the file.

    Syntax  

    Usage Notes  

    Any remaining records in the file are not read.

    Example  

    END

    The MONOCASE directive optionally defines characters that have both lower and upper case. If this information is not supplied, then no monocasing is performed.

    Syntax  

    Usage Notes  

    The actual monocase information is contained on statements that immediately follow the MONOCASE directive. Each such statement has the following syntax:

    target_codepoint1<-target_codepoint2>: data_codepoint ...

    where:

     

    Syntax Element

    Function

    target_codepoint1

    Specifies the first character defined on this statement.

    target_codepoint2

    Optionally specifies the last character defined on this statement.

    data_codepoint

    Defines the upper case equivalent for the associated target_codepoint character.

    A codepoint is the hexadecimal representation of a character. The number of characters needed to specify a codepoint is dependent on the encoding scheme for the character set. With the current TDP support, the length is always two except for UTF16 encoding, for which the length is four.

    If the second target codepoint is specified, then one data codepoint is required for each character in the range between the two target codepoints. If the second target codepoint is omitted, then any number of data codepoints can be specified, each associated with codepoint one greater than the previous.

    All statements after the MONOCASE directive that contain a colon are associated with the MONOCASE directive. Lack of a colon indicates that the statement is a new directive and ends that MONOCASE directive.

    The only codepoints that need be specified are those for which upper case equivalents exist.

    The MONOCASE directive can be specified only once for each character set.

    The order of data codepoints among different statements is not significant.

    If the same character is defined more than once for a character set (either on a MONOCASE directive, or on a MONOCASE and a UNICODE directive), the last value is used.

    If no CHARSET directive precedes MONOCASE, then a character set description is implicitly begun -- in effect, a CHARSET directive with no operands is assumed.

    Example  

    Define the monocase information for IBM Code Page 833, the single-byte component for IBM CCSID 933.

    MONOCASE
    81-89: C1 C2 C3 C4 C5 C6 C7 C8 C9
    91-99: D1 D2 D3 D4 D5 D6 D7 D8 D9
    A2-A9: E2 E3 E4 E5 E6 E7 E8 E9

    The NUMERICS directive defines codepoints for the ten numeric characters, zero through nine.

    Syntax  

    Each ‘xxn’ specifies a codepoint for one of the ten numeric characters. The first codepoint is for the number zero, each subsequent codepoint is the next ascending number, up to the number nine.

    Usage Notes  

    The NUMERICS directive can be specified only once for each character set.

    If the numerics are defined both by a NUMERICS and a UNICODE directive, the last is used.

    If no CHARSET directive precedes NUMERICS, then a character set description is implicitly begun - in effect a CHARSET directive with no operands is assumed.

    The SANITIZE directive optionally defines valid characters for TDP messages sent using operating system facilities. Since all such facilities support only EBCDIC, the sanitizing process ensures that unsupported or non-EBCDIC characters are replaced by an acceptable character (the Hyphen character (hexadecimal 60) is the TDP convention). If this information is not supplied, then a default is chosen based on the encoding scheme.

    Syntax  

    Usage Notes  

    The actual sanitize information is contained on statements that immediately follow the SANITIZE directive. Each such statement has the following syntax:

    target_codepoint1<-target_codepoint2>: data_codepoint ...

    where:

     

    Syntax Element

    Function

    target_codepoint1

    Specifies the first character defined on this statement

    target_codepoint2

    Optionally specifies the last character defined on this statement, and data_codepoint defines the replacement character for the associated target_codepoint character.

    A codepoint is the hexadecimal representation of a character. The number of characters needed to specify a codepoint is dependent on the encoding scheme for the character set. For the characters of interest to TDP, the length is always two except for UTF16 encoding, for which the length is four.

    If the second target codepoint is specified, then one data codepoint is required for each character in the range between the two target codepoints. If the second target codepoint is omitted, then any number of data codepoints can be specified, each associated with codepoint one greater than the previous.

    All statements after the SANITIZE directive that contain a colon are associated with the SANITIZE directive. Lack of a colon indicates that the statement is a new directive and ends that SANITIZE directive.

    The SANITIZE directive can be specified only once for each character set.

    The order of data codepoints among different statements is not significant. If the same character is defined more than once for a character set, the last value is used.

    If no CHARSET directive precedes SANITIZE, then a character set description is implicitly begun -- in effect, a CHARSET directive with no operands is assumed.

    Example  

    Provide the sanitize information for IBM Code Page 833, the single-byte component for IBM CCSID 933. The valid characters which do not correspond to standard EBCDIC are converted to Hyphens

    SANITIZE
    0E-0F: 4C 6E
    42-49: 60 60 60 60 60 60 60 60
    52-59: 60 60 60 60 60 60 60 60
    62-69: 60 60 60 60 60 60 60 60
    72-78: 60 60 60 60 60 60 60
    8A-8F: 60 60 60 60 60 60
    9A-9F: 60 60 60 60 60 60
    AA-AF: 60
    B2: 60
    BA-BC: 60 60 60
    E0: 60

    The UNICODE directive defines the syntactic characters and characters that have both lower and upper case. It might be possible to use it to provide the same information as the CHAR, MONOCASE, and NUMERICS directives. Since UNICODE is required to add a user-defined character set to CLIv2, it is also supported by TDP to potentially simplify use of user-defined character sets. The relevant syntactic characters in the character set are those that have the Unicode® codepoints of 0020 (Space), 0022 (Quotation Mark), 0025 (Percent), 0027 (Apostrophe), 002C (Comma), 002E (Period), 002F (Slash), 0030 through 0039 (Numerics Zero through Nine), 003A (Colon), 005B (Left Bracket), and 005D (Right Bracket). The monocase information in the character set are those that have the Unicode® codepoints of 0061 through 007A (lower case) and 0041 through 005A (upper case). Codepoints beyond those relevant to CHAR, MONOCASE, and NUMERICS are ignored. If these are not the characteristics of the character set, then CHAR, MONOCASE, and NUMERICS must be used instead of UNICODE.

    Syntax  

    Usage Notes  

    The actual information is contained on statements that immediately follow the UNICODE directive. Each such statement has the following syntax:

    target_codepoint1<-target_codepoint2>: data_codepoint ...

    where:

     

    Syntax Element

    Function

    target_codepoint1

    Specifies the first character in the user-defined character set that is defined on this statement.

    target_codepoint2

    Optionally specifies the last character defined on this statement, and data_codepoint defines the equivalent character in Unicode®.

    A codepoint is the hexadecimal representation of a character. The number of characters needed to specify a target codepoint is dependent on the encoding scheme for the character set. For the characters of interest to TDP, the length is always two except for UTF16 encoding, for which the length is four. The length of a data codepoint is always four.

    If the second target codepoint is specified, then one data codepoint is required for each character in the range between the two target codepoints. If the second target codepoint is omitted, then any number of data codepoints can be specified, each associated with codepoint one greater than the previous.

    All statements after the UNICODE directive that contain a colon are associated with the UNICODE directive. Lack of a colon indicates that the statement is a new directive and ends that UNICODE directive.

    The order of data codepoints among different statements is not significant.

    The UNICODE directive can be specified only once for each character set.

    If the same character is defined for the same purpose more than once for a character set (using a CHAR, MONOCASE, NUMERICS, or UNICODE directive), the last value is used.

    If no CHARSET directive precedes UNICODE, then a character set description is implicitly begun -- in effect, a CHARSET directive with no operands is assumed.

    Example  

    Define the Unicode® equivalents for IBM Code Page 833, the single-byte component for IBM CCSID 933.

    UNICODE
    40-47: 0020 001A 115F 1100 1101 1115 1102 11AC
    48-4F: 11AD 1103 00A2 002E 003C 0028 002B 007C
    50-57: 0026 001A 1104 1105 11B0 11B1 11B2 11B3
    58-5F: 11B4 11B5 0021 0024 002A 0029 003B 00AC
    60-67: 002D 002F 11B6 1106 1107 1108 1121 1109
    68-6F: 110A 110B 00A6 002C 0025 005F 003E 003F
    70-77: 005B 001A 110C 110D 110E 110F 1110 1111
    78-7F: 1112 0060 003A 0023 0040 0027 003D 0022
    80-87: 005D 0061 0062 0063 0064 0065 0066 0067
    88-8F: 0068 0069 1161 1162 1163 1164 1165 1166
    90-97: 001A 006A 006B 006C 006D 006E 006F 0070
    98-9F: 0071 0072 1167 1168 1169 116A 116B 116C
    A0-A7: 00AF 007E 0073 0074 0075 0076 0077 0078
    A8-AF: 0079 007A 116D 116E 116F 1170 1171 1172
    B0-B7: 005E 001A 005C 001A 001A 001A 001A 001A
    B8-BF: 001A 001A 1173 1174 1175 001A 001A 001A
    C0-C7: 007B 0041 0042 0043 0044 0045 0046 0047
    C8-CF: 0048 0049 001A 001A 001A 001A 001A 001A
    D0-D7: 007D 004A 004B 004C 004D 004E 004F 0050
    D8-DF: 0051 0052 001A 001A 001A 001A 001A 001A
    E0-E7: 20A9 001A 0053 0054 0055 0056 0057 0058
    E8-EF: 0059 005A 001A 001A 001A 001A 001A 001A
    F0-F7: 0030 0031 0032 0033 0034 0035 0036 0037
    F8-FF: 0038 0039 001A 001A 001A 001A 001A 001A