15.00 - STRING_CS - Teradata Database

Teradata Database SQL Functions, Operators, Expressions, and Predicates

Product
Teradata Database
Release Number
15.00
Content Type
Programming Reference
Publication ID
B035-1145-015K
Language
English (United States)
Last Update
2018-09-24

STRING_CS

Purpose  

Returns a heuristically derived integer value that you can use to help determine which KANJI1-compatible client character set was used to encode string_expression.

Note: The result is not guaranteed correct, but should work for most strings likely to be encountered.

Syntax  

where:

 

Syntax element …

Specifies …

string_expression

a CHAR or VARCHAR character string or expression.

ANSI Compliance

This is a Teradata extension to the ANSI SQL:2011 standard.

Argument Types

Use STRING_CS on character strings or character string expressions that use the KANJI1 server character set. (Non-KANJI1 character strings will be coerced to KANJI1, but the results are unlikely to be useful.)

STRING_CS does not accept CLOB or UDT types.

Result Value

STRING_CS returns a heuristically derived INTEGER value that you can use to help determine the client character set that was used to encode the KANJI1 character string or expression. The result value can also help determine which client character set to use to interpret the character data.

 

IF the result value is …

THEN the heuristic found that string_expression …

-1

most likely uses a single-byte client character set encoding, but it may also contain a mix of encodings.

0

does not contain anything distinguishable from any particular character set, so any character set that you use to interpret string_expression provides the same result.

Not all translations use the same interpretation for the characters represented by 0x5C and 0x7E, however.

If string_expression contains:

0x5C and you want it to be interpreted as REVERSE SOLIDUS, use a single-byte character set.

0x7E and you want it to be interpreted as TILDE, use a single-byte character set.

0x5C and you want it to be interpreted as YEN SIGN, or

0x7E and you want it to be interpreted as OVERLINE, use any of the following:

  • KANJISJIS_0S
  • KANJIEBCDIC5026_0I
  • KANJIEBCDIC5035_0I
  • KATAKANAEBCDIC
  • KANJIEUC_0U
  • 1

     

     

    uses the encoding of one of the following:

  • KANJIEBCDIC5026_0I
  • KANJIEBCDIC5035_0I
  • KATAKANAEBCDIC
  • 2

    uses the encoding of KANJIEUC_0U.

    3

    uses the encoding of KANJISJIS_0S.

    Usage Notes

    STRING_CS helps determine which encoding to use when using the TRANSLATE function to translate a string from the KANJI1 server character set to the UNICODE server character set.

     

    IF the result value is …

    THEN substitute the following value for source_TO_target in TRANSLATE(string_expression USING source_to_target) …

    -1

    KANJI1_SBC_TO_UNICODE.

    0

    KANJI1_SBC_TO_UNICODE.

    1

    KANJI1_KANJIEBCDIC_TO_UNICODE.

    2

    KANJI1_KANJIEUC_TO_UNICODE.

    3

    KANJI1_KANJISJIS_TO_UNICODE.

    For more information on TRANSLATE, see “TRANSLATE” on page 1256.

    Example : Using STRING_CS to Determine the Client Character Set

    Consider the following table definition:

       CREATE TABLE SysNames
          (SysID INTEGER
          ,SysName VARCHAR(30) CHARACTER SET KANJI1);
     

    Suppose the session character set is KANJIEBCDIC5026_0I. The following statement inserts the mixed single-byte/multibyte character string '<TEST>Q' into the SysName column of the SysNames table:

       INSERT SysNames (101, '0E42E342C542E242E30FD8'XC);
     

    Using STRING_CS to determine the client character set that was used to encode the string produces the results that follow:

       SELECT STRING_CS(SysName) FROM SysNames WHERE SysID = 101;
     
       String_CS(SysName)
       ------------------
                        1

    Example : Using STRING_CS to Translate a KANJI1 String to UNICODE

    Consider the SysNames table from the preceding example, “Example 1: Using STRING_CS to Determine the Client Character Set.”

    The following statement uses STRING_CS to determine which encoding to use to translate strings in the SysName column from the KANJI1 server character set to the UNICODE server character set:

       SELECT CASE STRING_CS(SysName)
          WHEN 0 THEN TRANSLATE(SysName USING KANJI1_SBC_TO_UNICODE)
          WHEN 1 THEN TRANSLATE(SysName USING KANJI1_KANJIEBCDIC_TO_UNICODE)
          WHEN 2 THEN TRANSLATE(SysName USING KANJI1_KANJIEUC_TO_UNICODE)
          WHEN 3 THEN TRANSLATE(SysName USING KANJI1_KANJISJIS_TO_UNICODE)
          ELSE TRANSLATE(SysName USING KANJI1_SBC_TO_UNICODE)
          END
       FROM SysNames;