Unicode Delimited Character Literals | Teradata Vantage - Unicode Delimited Character Literals - Advanced SQL Engine - Teradata Database

SQL Data Types and Literals

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Published
July 2021
Language
English (United States)
Last Update
2021-07-27
dita:mapPath
tpf1598412463935.ditamap
dita:ditavalPath
tpf1598412463935.ditaval
dita:id
B035-1143
lifecycle
previous
Product Category
Teradata Vantageā„¢

Declares a Unicode delimited character literal value in an expression.

The data type of Unicode delimited character literals is VARCHAR(n) CHARACTER SET UNICODE, where n is the resolved length of the literal in Unicode characters.

For details on the VARCHAR(n) type, see VARCHAR Data Type.

ANSI Compliance

Unicode delimited character literals are partially ANSI SQL:2011 compliant. The ANSI SQL:2011 standard does not require the UESCAPE clause and allows more possibilities for the Unicode_esc_char.

Syntax

[ _Latin | _Unicode | _KanjiSJIS | _Graphic ] U& 'Unicode_string_body'
  [ 'Unicode_string_body' [...] ] UESCAPE 'Unicode_esc_char'

Syntax Elements

'Unicode_string_body'
A string of zero or more characters, enclosed in apostrophes, consisting of a combination of the following:
  • Any character other than an apostrophe ( ' ) or the Unicode escape character
  • Two consecutive apostrophes

    Use two consecutive apostrophes to include an apostrophe character in the Unicode delimited character literal.

  • Two consecutive Unicode escape characters

    Use two consecutive escape characters to include the escape character in the delimited character literal.

  • One Unicode escape character followed by 4 hexadecimal digits, where a hexadecimal digit is a character from 0 to 9, a to f, or A to F

    The 4 hexadecimal digits represent a Unicode BMP (Basic Multilingual Plane) character. Leading zeroes are required.

  • One Unicode escape character followed by a plus sign (+) followed by 6 hexadecimal digits, where a hexadecimal digit is a character from 0 to 9, a to f, or A to F

    The 6 hexadecimal digits represent a Unicode BMP character or a supplementary character. Leading zeroes are required.

The Unicode escape character is specified by the UESCAPE clause.
If 'Unicode_string_body' immediately follows the U& key letters, no pad characters can appear before the first apostrophe.
Use successive occurrences of 'Unicode_string_body' to concatenate multiple strings, separated by a SPACE (U+0020).
A Unicode delimited character literal can consist of a maximum of 31000 Unicode_string_body characters.
Unicode_esc_char
A single character from the session character set to use as the Unicode escape character in the delimited character literal.
The character must be within the printable ASCII range of Unicode characters (U+0021 through U+007E), with the following exceptions:
  • SPACE (U+0020)
  • QUOTATION MARK (U+0022)
  • APOSTROPHE (U+0027)
  • PLUS SIGN (U+002B)
  • hexadecimal digit (0-9, a-f, or A-F):
    • U+0030 to U+0039
    • U+0041 to U+0046
    • U+0061 to U+0066
You can also specify the YEN SIGN (U+00A5) and WON SIGN (U+20A9) as the Unicode escape character.

Usage Notes

A Unicode delimited character literal is useful for inserting a character string containing characters that cannot generally be entered directly on the terminal keyboard or is not available in the current session character set.

In a session where Unicode Pass Through is enabled, noncharacters in a Unicode delimited character literal are converted to an FFFD replacement character. In sessions where Unicode Pass Through is not enabled, an error is returned. Noncharacters are Unicode code points that are permanently reserved for internal use.

Examples:
SELECT U&'#FFFE' UESCAPE '#';
SELECT U&'#+01FFFF' UESCAPE '#';

Surrogate pairs and inappropriate surrogates in Unicode delimited character literals result in an error whether Unicode Pass Through is enabled or not.

Examples:
SELECT U&'#D800#DC00' UESCAPE '#';
SELECT U&'#+00D800#+00DC00' UESCAPE '#';
SELECT U&'#+00D800#+00D801' UESCAPE '#';
SELECT U&'#+00D800' UESCAPE '#';
Unicode supplementary characters are allowed with any character set when Unicode Pass Through is enabled:
.set session charset 'ascii'
set session character set unicode pass through on;
sel u&'#+010000' uescape '#';
Unicode supplementary characters are not allowed with any character set when Unicode Pass Through is disabled. The following SELECT statement will return an error:
.set session charset 'utf8'
set session character set unicode pass through off;
sel u&'#+010000' uescape '#';

For more information about Unicode Pass Through, see Teradata Vantageā„¢ - Advanced SQL Engine International Character Set Support, B035-1125.

If your application is intended to be ANSI-compliant and portable, you can replace existing hexadecimal character literals of the form 'hexadecimal digits'XC with Unicode character literals.

Examples: Unicode Delimited Character Literal

Consider this table:

CREATE TABLE TextTable
   (IDNum INTEGER
   ,Ustring VARCHAR(10) CHARACTER SET UNICODE);

This statement uses a Unicode delimited character literal to insert the character string into the Ustring column, using the number sign (#) as the Unicode escape character:

INSERT TextTable (10, _Unicode U&'#8CC7#6599#5009#5132' UESCAPE '#');

This statement concatenates two strings to insert '855-34-9729' into the Ustring column:

INSERT TextTable (11, U&'855-34-' '9729' UESCAPE '%');

This statement selects all rows from the TextTable table where the string in the Ustring column is an empty string:

SELECT * FROM TextTable WHERE Ustring = U&'' UESCAPE '&';

Examples: Literals Containing Unicode BMP and Supplementary Characters

You can specify Unicode BMP characters using either the 4 or 6 digit syntax. Note that leading zeroes are required.

SELECT U&'#0041' UESCAPE '#'
SELECT U&'#+000041' UESCAPE '#'

You can specify Unicode supplementary characters using the 6 digit syntax. Note that leading zeroes are required.

SELECT U&'#+020000' UESCAPE '#'
SELECT U&'#+010000' UESCAPE '#'

Example: Error: Literal Containing Surrogate Code Points

The character LINEAR B SYLLABLE B008 A (U+010000) can be represented as follows:
SELECT U&'#+010000' UESCAPE '#'
This character requires two 16 bit surrogate code points in internal UNICODE format (which is based on UTF-16). The query SELECT CHAR2HEXINT(U&'#+010000' UESCAPE '#'); produces the following string:
'D800DC00'
However, note that the following query returns an error because surrogate code points are not allowed in a Unicode delimited character literal:
SELECT CHAR2HEXINT(U&'#D800#DC00' UESCAPE '#');