Unicode Delimited Character Literals | VantageCloud Lake - Unicode Delimited Character Literals - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

Declares a Unicode delimited character literal value in an expression.

The data type of Unicode delimited character literals is VARCHAR(n) CHARACTER SET UNICODE, where n is the resolved length of the literal in Unicode characters.

For details on the VARCHAR(n) type, see VARCHAR Data Type.

ANSI Compliance

Unicode delimited character literals are partially ANSI SQL:2011 compliant. The ANSI SQL:2011 standard does not require the UESCAPE clause and allows more possibilities for the Unicode_esc_char.

Syntax

[ _Latin | _Unicode | _KanjiSJIS | _Graphic ] U& 'Unicode_string_body'
  [ 'Unicode_string_body' [...] ] UESCAPE 'Unicode_esc_char'

Syntax Elements

'Unicode_string_body'
A string of zero or more characters, enclosed in apostrophes, consisting of a combination of the following:
  • Any character other than an apostrophe ( ' ) or the Unicode escape character
  • Two consecutive apostrophes

    Use two consecutive apostrophes to include an apostrophe character in the Unicode delimited character literal.

  • Two consecutive Unicode escape characters

    Use two consecutive escape characters to include the escape character in the delimited character literal.

  • One Unicode escape character followed by 4 hexadecimal digits, where a hexadecimal digit is a character from 0 to 9, a to f, or A to F

    The 4 hexadecimal digits represent a Unicode BMP (Basic Multilingual Plane) character. Leading zeroes are required.

  • One Unicode escape character followed by a plus sign (+) followed by 6 hexadecimal digits, where a hexadecimal digit is a character from 0 to 9, a to f, or A to F

    The 6 hexadecimal digits represent a Unicode BMP character or a supplementary character. Leading zeroes are required.

The Unicode escape character is specified by the UESCAPE clause.
If 'Unicode_string_body' immediately follows the U& key letters, no pad characters can appear before the first apostrophe.
Use successive occurrences of 'Unicode_string_body' to concatenate multiple strings, separated by a SPACE (U+0020).
A Unicode delimited character literal can consist of a maximum of 31000 Unicode_string_body characters.
Unicode_esc_char
A single character from the session character set to use as the Unicode escape character in the delimited character literal.
The character must be within the printable ASCII range of Unicode characters (U+0021 through U+007E), with the following exceptions:
  • SPACE (U+0020)
  • QUOTATION MARK (U+0022)
  • APOSTROPHE (U+0027)
  • PLUS SIGN (U+002B)
  • Hexadecimal digit (0-9, a-f, or A-F):
    • U+0030 to U+0039
    • U+0041 to U+0046
    • U+0061 to U+0066
You can also specify the YEN SIGN (U+00A5) and WON SIGN (U+20A9) as the Unicode escape character.

Usage Notes

A Unicode delimited character literal is useful for inserting a character string containing characters that cannot be entered directly on the terminal keyboard or is not available in the current session character set.

In a session where Unicode Pass Through is enabled, noncharacters in a Unicode delimited character literal are converted to an FFFD replacement character. In sessions where Unicode Pass Through is not enabled, an error is returned. Noncharacters are Unicode code points that are permanently reserved for internal use.

Examples:
SELECT U&'#FFFE' UESCAPE '#';
SELECT U&'#+01FFFF' UESCAPE '#';

Surrogate pairs and inappropriate surrogates in Unicode delimited character literals cause an error whether Unicode Pass Through is enabled or not.

Examples:
SELECT U&'#D800#DC00' UESCAPE '#';
SELECT U&'#+00D800#+00DC00' UESCAPE '#';
SELECT U&'#+00D800#+00D801' UESCAPE '#';
SELECT U&'#+00D800' UESCAPE '#';
Unicode supplementary characters are allowed with any character set when Unicode Pass Through is enabled:
.set session charset 'ascii'
set session character set unicode pass through on;
sel u&'#+010000' uescape '#';
Unicode supplementary characters are not allowed with any character set when Unicode Pass Through is disabled. The following SELECT statement returns an error:
.set session charset 'utf8'
set session character set unicode pass through off;
sel u&'#+010000' uescape '#';

See UNICODE PASS THROUGH.

If your application is intended to be ANSI-compliant and portable, you can replace existing hexadecimal character literals of the form 'hexadecimal digits'XC with Unicode character literals.

Examples: Unicode Delimited Character Literal

Consider this table:

CREATE TABLE TextTable
   (IDNum INTEGER
   ,Ustring VARCHAR(10) CHARACTER SET UNICODE);

This statement uses a Unicode delimited character literal to insert the character string "" into the Ustring column, using the number sign (#) as the Unicode escape character:

INSERT TextTable (10, _Unicode U&'#8CC7#6599#5009#5132' UESCAPE '#');

This statement concatenates two strings to insert '855-34-9729' into the Ustring column:

INSERT TextTable (11, U&'855-34-' '9729' UESCAPE '%');

This statement selects all rows from the TextTable table where the string in the Ustring column is an empty string:

SELECT * FROM TextTable WHERE Ustring = U&'' UESCAPE '&';

Examples: Literals Containing Unicode BMP and Supplementary Characters

You can specify Unicode BMP characters using either the 4 or 6 digit syntax. Leading zeroes are required.

SELECT U&'#0041' UESCAPE '#'
SELECT U&'#+000041' UESCAPE '#'

You can specify Unicode supplementary characters using the 6 digit syntax. Leading zeroes are required.

SELECT U&'#+020000' UESCAPE '#'
SELECT U&'#+010000' UESCAPE '#'

Example: Error: Literal Containing Surrogate Code Points

The character LINEAR B SYLLABLE B008 A (U+010000) can be represented as follows:
SELECT U&'#+010000' UESCAPE '#'

This character requires two 16 bit surrogate code points in internal UNICODE format (which is based on UTF-16). The query:

SELECT CHAR2HEXINT(U&'#+010000' UESCAPE '#');
produces the following string:
'D800DC00'

However, the following query returns an error because surrogate code points are not allowed in a Unicode delimited character literal:

SELECT CHAR2HEXINT(U&'#D800#DC00' UESCAPE '#');