SANITIZE - Teradata Director Program

Teradata® Director Program Reference

Product
Teradata Director Program
Release Number
17.00
Published
June 2020
Language
English (United States)
Last Update
2020-06-18
dita:mapPath
pxm1544831938750.ditamap
dita:ditavalPath
obe1474387269547.ditaval
dita:id
B035-2416
lifecycle
previous
Product Category
Teradata Tools and Utilities

The SANITIZE directive optionally defines valid characters for TDP messages sent using operating system facilities. Since all such facilities support only EBCDIC, the sanitizing process ensures that unsupported or non-EBCDIC characters are replaced by an acceptable character (the Hyphen character (hexadecimal 60) is the TDP convention). If this information is not supplied, then a default is chosen based on the encoding scheme.

Syntax



Usage Notes

The actual sanitize information is contained on statements that immediately follow the SANITIZE directive. Each such statement has the following syntax:

target_codepoint1<-target_codepoint2>: data_codepoint ...

where:

target_codepoint1
Specifies the first character defined on this statement.
target_codepoint2
Optionally specifies the last character defined on this statement, and data_codepoint defines the replacement character for the associated target_codepoint character.

A codepoint is the hexadecimal representation of a character. The number of characters needed to specify a codepoint is dependent on the encoding scheme for the character set. For the characters of interest to TDP, the length is always two except for UTF16 encoding, for which the length is four.

If the second target codepoint is specified, then one data codepoint is required for each character in the range between the two target codepoints. If the second target codepoint is omitted, then any number of data codepoints can be specified, each associated with codepoint one greater than the previous.

All statements after the SANITIZE directive that contain a colon are associated with the SANITIZE directive. Lack of a colon indicates that the statement is a new directive and ends that SANITIZE directive.

The SANITIZE directive can be specified only once for each character set.

The order of data codepoints among different statements is not significant. If the same character is defined more than once for a character set, the last value is used.

If no CHARSET directive precedes SANITIZE, then a character set description is implicitly begun -- in effect, a CHARSET directive with no operands is assumed.

Example: SANITIZE

Provide the sanitize information for IBM Code Page 833, the single-byte component for IBM CCSID 933. The valid characters which do not correspond to standard EBCDIC are converted to Hyphens.

SANITIZE
0E-0F: 4C 6E
42-49: 60 60 60 60 60 60 60 60
52-59: 60 60 60 60 60 60 60 60
62-69: 60 60 60 60 60 60 60 60
72-78: 60 60 60 60 60 60 60
8A-8F: 60 60 60 60 60 60
9A-9F: 60 60 60 60 60 60
AA-AF: 60
B2: 60
BA-BC: 60 60 60
E0: 60