Usage Notes - Teradata Database

SQL Data Manipulation Language

Product
Teradata Database
Release Number
16.10
Published
June 2017
Language
English (United States)
Last Update
2018-04-25
dita:mapPath
psg1480972718197.ditamap
dita:ditavalPath
changebar_rev_16_10_exclude_audience_ie.ditaval
dita:id
B035-1146
lifecycle
previous
Product Category
Teradata® Database

Rules and Restrictions for the ORDER BY Clause

The rules and restrictions for the use of ORDER BY are:

  • Each column_name you specify in an ORDER BY clause must be the name of a column in a relation referenced in the SELECT expression list. The columns named do not have to match the columns in the SELECT expression list.

    You can also order on a column_name_alias.

    The maximum number of column names you can specify is 64.

    You cannot specify BLOB or CLOB columns in the column name list because you cannot sort on LOB values.

  • You cannot specify case sensitivity for a CLOB column.

    For details on defining case sensitivity for a column, see SQL Data Types and Literals .

  • The column position you specify in an ORDER BY clause must be an unsigned integer that refers to the sequential position of a column in the SELECT expression list. The column position cannot be passed as a parameter; attempts to do so cause it to be interpreted as an expression, and the data to be sorted by a constant.

    You cannot specify BLOB or CLOB columns in the column position list because you cannot sort on LOB values.

  • If an ORDER BY specification references a UDT column, then Teradata Database uses the defined ordering functionality for that UDT to order the result set.
  • You can specify a scalar subquery as a column expression in the ORDER BY clause of a query.
  • You can specify an ORDER BY clause before or after any WITH clause. Also see “ORDER BY Clause”.

    When you specify both WITH and ORDER BY clauses in the same SELECT request, they function together as follows:

    • The WITH clause defines the major sort key.
    • The ORDER BY clause defines the minor sort key.

      This is true regardless of the structure of the query or the number of WITH clauses. See Example: Combining WITH and ORDER BY Clauses.

  • If you have multiple ON clauses, you cannot specify ORDER BY as the only option in an ON clause. You must combine it with a PARTITION BY (ANY), HASH BY, or DIMENSION clause.
  • If you specify an ORDER BY or LOCAL ORDER BY clause with PARTITION BY or HASH BY input, then the following is required:
    • All inputs must have the same number of ORDER BY columns.
    • The data types of the columns must be the same type or matched using implicit cast.

Unexpected Results

This section describes some unexpected results that can occur when using the ORDER BY clause. They are:

Unexpected Row Length Errors: Sorting and Default Sort Order

Before performing the sort operation that orders the rows to be returned, Teradata Database creates a sort key and appends it to the rows. If the length of the sort key exceeds the system row maximum length of 1 MB, the operation returns an error. Depending on the situation, the error message text is one of the following.

  • A data row is too long.
  • Maximum row length exceeded in database_object_name.

For explanations of these messages, see Messages .

There are several possible reasons why a data row plus BYNET sort key might unexpectedly exceed the spool row size limit of 1 MB, even without including any updates.

  • The value for the MaxDecimal DBS Control field for your system might have been changed to a larger value, which could increase the number of bytes stored for each value in a DECIMAL column to as much as 38 bytes where it might previously have been only 1, 2, 4, or 8 bytes. See Database Design and SQL Data Types and Literals .
  • Your site has upgraded from a system with a 32-bit aligned row architecture to a system with a 64-bit architecture. This upgrade expands the size of data, both on disk and in memory, for a number of data types. The expansion is due to byte alignment restrictions that require padding to ensure alignment on 8-byte boundaries. These restrictions do not exist for 32-bit architectures. Therefore, the same row can be within the spool row 1 MB size boundary on a 32-bit system, but exceed that limit on a 64-bit system.

    Note that this issue is not relevant if your system uses the packed64 row architecture rather than the aligned row format architecture. For details, see Database Design .

By default, nulls sort before all other data in a column. In other words, nulls sort low. For example, suppose you want to sort the following set of values: 5, 3, null, 1, 4, and 2.

The values sort as follows:

     null
     1
     2
     3
     4
     5

By default, the result values of a character expression are sorted in ascending order, using the collating sequence in effect for the session.

Unexpected Sort Order When Querying DATE Data Type Columns

Using the ORDER BY clause to query columns with the DATE data type may yield results in which the order of the returned values seem to be out of order.

The factors that contribute to this type of unexpected result are:

  • DATE time values are stored by the system on disk in UTC time, not local time.
  • UTC time values are used to sort the column values, not the local time values.
  • DATE time values are affected by the system time zone setting, which can be set using TimeZoneString or the DBSControl general field 16.
  • DATE time values that cross the date boundary are adjusted based on the time zone offset for the system time zone (the time zone offset varies depending on the time zone).

For possible workarounds, see Workarounds for Unexpected Sort Order When Querying DATE.

Example of Unexpected Sort Order

This example shows a result set of TIME data type column values that do not seem to be sorted in the order specified in the ORDER BY clause. The local time values inserted into the column are followed by the values returned by the query.

The time zone in this example is Pacific Time, which has a +8 hour time zone offset.

CREATE TABLE table_1
(column_1 INTEGER, 
column_2 TIME);
INSERT INTO table_1 VALUEs ( 0, TIME'00:00:00.000000'); 
INSERT INTO table_1 VALUEs ( 1, TIME'01:00:00.000000'); 
INSERT INTO table_1 VALUEs ( 2, TIME'02:00:00.000000'); 
INSERT INTO table_1 VALUEs ( 3, TIME'03:00:00.000000'); 
INSERT INTO table_1 VALUEs ( 4, TIME'04:00:00.000000'); 
INSERT INTO table_1 VALUEs ( 5, TIME'05:00:00.000000'); 
INSERT INTO table_1 VALUEs ( 6, TIME'06:00:00.000000'); 
INSERT INTO table_1 VALUEs ( 7, TIME'07:00:00.000000'); 
INSERT INTO table_1 VALUEs ( 8, TIME'08:00:00.000000'); 
INSERT INTO table_1 VALUEs ( 9, TIME'09:00:00.000000'); 
INSERT INTO table_1 VALUEs (10, TIME'10:00:00.000000'); 
INSERT INTO table_1 VALUEs (11, TIME'11:00:00.000000'); 
INSERT INTO table_1 VALUEs (12, TIME'12:00:00.000000'); 
INSERT INTO table_1 VALUEs (13, TIME'13:00:00.000000'); 
INSERT INTO table_1 VALUEs (14, TIME'14:00:00.000000'); 
INSERT INTO table_1 VALUEs (15, TIME'15:00:00.000000'); 
INSERT INTO table_1 VALUEs (16, TIME'16:00:00.000000'); 
INSERT INTO table_1 VALUEs (17, TIME'17:00:00.000000'); 
INSERT INTO table_1 VALUEs (18, TIME'18:00:00.000000'); 
INSERT INTO table_1 VALUEs (19, TIME'19:00:00.000000'); 
INSERT INTO table_1 VALUEs (20, TIME'20:00:00.000000'); 
INSERT INTO table_1 VALUEs (21, TIME'21:00:00.000000'); 
INSERT INTO table_1 VALUEs (22, TIME'22:00:00.000000'); 
INSERT INTO table_1 VALUEs (23, TIME'23:00:00.000000');

This SELECT statement is used to query the time values in the TIME data type column (column_2).

SELECT * FROM table_1
ORDER BY column_2;

column_1  column_2
--------  ---------------
16        16:00:00.000000
17        17:00:00.000000
18        18:00:00.000000
19        19:00:00.000000
20        20:00:00.000000
21        21:00:00.000000
22        22:00:00.000000
23        23:00:00.000000
0         00:00:00.000000
1         01:00:00.000000
2         02:00:00.000000
3         03:00:00.000000
4         04:00:00.000000
5         05:00:00.000000
6         06:00:00.000000
7         07:00:00.000000
8         08:00:00.000000
9         09:00:00.000000
10        10:00:00.000000
11        11:00:00.000000
12        12:00:00.000000
13        13:00:00.000000
14        14:00:00.000000
15        15:00:00.000000


Because the sort order is based on UTC time (which is not visible in the column), the time values do not agree with the resulting sort order. Rows 16 through 23 appear at the top of the list because they belong at the top of the list in terms of UTC time.

The UTC equivalents for the returned time values are:

Returned Values                                      UTC Equivalents (local time + 8 hours)

column_1  column_2
--------  -------------
16      16:00:00.000000           00:00:00.000000
17      17:00:00.000000           01:00:00.000000
18      18:00:00.000000           02:00:00.000000
19      19:00:00.000000           03:00:00.000000
20      20:00:00.000000           04:00:00.000000
21      21:00:00.000000           05:00:00.000000
22      22:00:00.000000           06:00:00.000000
23      23:00:00.000000           07:00:00.000000

0       00:00:00.000000           08:00:00.000000
1       01:00:00.000000           09:00:00.000000
2       02:00:00.000000           10:00:00.000000
3       03:00:00.000000           11:00:00.000000
4       04:00:00.000000           12:00:00.000000
5       05:00:00.000000           13:00:00.000000
6       06:00:00.000000           14:00:00.000000
7       07:00:00.000000           15:00:00.000000
8       08:00:00.000000           16:00:00.000000
9       09:00:00.000000           17:00:00.000000
10      10:00:00.000000           18:00:00.000000
11      11:00:00.000000           19:00:00.000000
12      12:00:00.000000           20:00:00.000000
13      13:00:00.000000           21:00:00.000000
14      14:00:00.000000           22:00:00.000000
15      15:00:00.000000           23:00:00.000000

Workarounds for Unexpected Sort Order When Querying DATE

These workarounds can be used to get expected results:

  • Use INTERVAL
  • Use CAST to CHAR or VARCHAR
  • Use TIMESTAMP instead of TIME

Use INTERVAL

SELECT column_1, column_2, column_2 - interval '8' hour
FROM table_1
ORDER BY 3;

column_1     column_2            (column_2- 8)
--------     --------------      --------------
       0     00:00:00.000000     16:00:00.000000
       1     01:00:00.000000     17:00:00.000000
       2     02:00:00.000000     18:00:00.000000
       3     03:00:00.000000     19:00:00.000000
       4     04:00:00.000000     20:00:00.000000
       5     05:00:00.000000     21:00:00.000000
       6     06:00:00.000000     22:00:00.000000
       7     07:00:00.000000     23:00:00.000000
       8     08:00:00.000000     00:00:00.000000
       9     09:00:00.000000     01:00:00.000000
      10     10:00:00.000000     02:00:00.000000
      11     11:00:00.000000     03:00:00.000000
      12     12:00:00.000000     04:00:00.000000
      13     13:00:00.000000     05:00:00.000000
      14     14:00:00.000000     06:00:00.000000
      15     15:00:00.000000     07:00:00.000000
      16     16:00:00.000000     08:00:00.000000
      17     17:00:00.000000     09:00:00.000000
      18     18:00:00.000000     10:00:00.000000
      19     19:00:00.000000     11:00:00.000000
      20     20:00:00.000000     12:00:00.000000
      21     21:00:00.000000     13:00:00.000000
      22     22:00:00.000000     14:00:00.000000
      23     23:00:00.000000     15:00:00.000000


Use CAST to CHAR or VARCHAR

SELECT column_1, column_2, CAST(column_2 AS CHAR(15))
FROM table_1
ORDER BY 3;

column_1      column_2     column_2
--------  ------------     ---------------
       0  00:00:00.000000  00:00:00.000000
       1  01:00:00.000000  01:00:00.000000
       2  02:00:00.000000  02:00:00.000000
       3  03:00:00.000000  03:00:00.000000
       4  04:00:00.000000  04:00:00.000000
       5  05:00:00.000000  05:00:00.000000
       6  06:00:00.000000  06:00:00.000000
       7  07:00:00.000000  07:00:00.000000
       8  08:00:00.000000  08:00:00.000000
       9  09:00:00.000000  09:00:00.000000
      10  10:00:00.000000  10:00:00.000000
      11  11:00:00.000000  11:00:00.000000
      12  12:00:00.000000  12:00:00.000000
      13  13:00:00.000000  13:00:00.000000
      14  14:00:00.000000  14:00:00.000000
      15  15:00:00.000000  15:00:00.000000
      16  16:00:00.000000  16:00:00.000000
      17  17:00:00.000000  17:00:00.000000
      18  18:00:00.000000  18:00:00.000000
      19  19:00:00.000000  19:00:00.000000
      20  20:00:00.000000  20:00:00.000000
      21  21:00:00.000000  21:00:00.000000
      22  22:00:00.000000  22:00:00.000000
      23  23:00:00.000000  23:00:00.000000


Use TIMESTAMP instead of TIME

CREATE TABLE table_2
(column_1 INTEGER,
column_2 TIMESTAMP);

INSERT INTO table_2 SELECT * FROM table_1;

SELECT * FROM table_2
ORDER BY column_2;

column_1   column_2
--------   --------------------------
       0   2011-11-07 00:00:00.000000
       1   2011-11-07 01:00:00.000000
       2   2011-11-07 02:00:00.000000
       3   2011-11-07 03:00:00.000000
       4   2011-11-07 04:00:00.000000
       5   2011-11-07 05:00:00.000000
       6   2011-11-07 06:00:00.000000
       7   2011-11-07 07:00:00.000000
       8   2011-11-07 08:00:00.000000
       9   2011-11-07 09:00:00.000000
      10   2011-11-07 10:00:00.000000
      11   2011-11-07 11:00:00.000000
      12   2011-11-07 12:00:00.000000
      13   2011-11-07 13:00:00.000000
      14   2011-11-07 14:00:00.000000
      15   2011-11-07 15:00:00.000000
      16   2011-11-07 16:00:00.000000
      17   2011-11-07 17:00:00.000000
      18   2011-11-07 18:00:00.000000
      19   2011-11-07 19:00:00.000000
      20   2011-11-07 20:00:00.000000
      21   2011-11-07 21:00:00.000000
      22   2011-11-07 22:00:00.000000
      23   2011-11-07 23:00:00.000000

Specifying Collation

Different languages use different collation sequences, and you might need to alter your system collation to ensure that the ORDER BY clauses you specify in your queries sort correctly.

You can specify the default collation for a user using the CREATE USER or MODIFY USER statement, or you can specify collation for the duration of a session using the SET SESSION COLLATION statement. Otherwise, collation for a session is determined by the logon client system. The direction of the sort can be controlled by including the DESC (descending) or ASC (ascending) sort option in the SQL request.

Teradata Database supports ASCII, EBCDIC, and MULTINATIONAL collating sequences. If MULTINATIONAL is in effect, your collation will be one of the European (diacritical) or Kanji sort sequences described in International Sort Orders.

The following topics explain the results of ORDER BY as affected by whether character string expressions have the CASESPECIFIC or NOTCASESPECIFIC attribute.

Japanese Character Sort Order Considerations

If character strings are to be sorted NOT CASESPECIFIC, which is the default in Teradata session mode, only lowercase simple letters (for example, the characters a … z in the Latin alphabet) are converted to uppercase before a comparison or sorting operation is done.

The important thing to understood is that case differences are ignored in NOT CASESPECIFIC collations. In other words, the case of the characters is not critical for NOT CASESPECIFIC collation.

Any non-Latin single-byte character, any multibyte character, and any byte indicating a transition between single-byte characters and multibyte characters is excluded from this function.

If the character strings are to be sorted CASESPECIFIC, which is the default in ANSI session mode, then the case of the characters is critical for collation.

For Kanji1 character data in CASESPECIFIC mode, the letters in any alphabet that uses casing, such as Latin, Greek, or Cyrillic, are considered to be matched only if they are the identical letters and have the same case.

The system does not consider FULLWIDTH and HALFWIDTH characters to be matched whether in CASESPECIFIC or NOT CASESPECIFIC mode.

Teradata Database uses one of four server character sets to support Japanese characters:

  • Unicode
  • KanjiSJIS
  • Kanji1
  • Graphic

KANJI1 support is deprecated. KANJI1 is not allowed as a default character set. The system changes the KANJI1 default character set to the UNICODE character set. Creation of new KANJI1 objects is highly restricted. Although many KANJI1 queries and applications may continue to operate, sites using KANJI1 should convert to another character set as soon as possible.

If you are at a Japanese character site and you want to ensure that sorting of character data is as close as possible to that of the client, you should set your session collation (see “SET SESSION COLLATION” in SQL Data Definition Language ) as follows:

  • For character data stored on the Teradata platform as KanjiSJIS or Unicode, the best way to order the session character set is to use the CHARSET_COLL collation.

    For character data stored on the Teradata platform as either KanjiSJIS or Unicode, the CHARSET_COLL collation provides the collation that is closest to sorting on the client.

  • The JIS_COLL collation also provides an adequate collation, and has the added virtue of providing the same collation no matter what the session character set is.
  • The CHARSET_COLL and JIS_COLL collation sequences are not designed to support Kanji1 character data.

    For Kanji1 character data, the ASCII collation provides a collation similar to that of the client, assuming the session character set is KanjiSJIS_0S, KanjiEUC_0U, or something very similar. However, ASCII collation does not sort like the client if the data is stored on the Teradata platform using the ShiftJIS or Unicode server character sets.

  • Users under the KATAKANAEBCDIC, KANJIEBCDIC5026_0I, or KANJIEBCDIC5035_0I character sets who want to store their character data on the Teradata platform as Kanji1, and who want to collate in the session character set should install either KATAKANAEBCDIC, KANJIEBCDIC5026_0I, or KANJIEBCDIC5035_0I, respectively, at start-up time, and use MULTINATIONAL collation.

    Note: Each character set requires a different definition for its MULTINATIONAL collation to collate properly.

Teradata Database handles character data collation for the different Japanese server character sets as follows:

  • Under the KanjiEUC character set, the ss3 0x8F is converted to 0xFF. This means that a user-defined KanjiEUC codeset 3 are not ordered properly with respect to other KanjiEUC code sets. The order of other KanjiEUC code sets is proper (that is, ordering is the same as the binary ordering on the client system).

    ASCII collation collates Kanji1 data in binary order, but handles case matching of single-byte or HALFWIDTH Latin characters (see International Character Set Support for details). This matches collation on the client for KanjiSJIS_0S sessions, is close for KanjiEUC_0U, and is only reasonably close for double-byte data for KanjiEBCDIC sessions. Note that KanjiEUC_0U puts code set 3 after code set 1 rather than before code set 1 as KanjiEBCDIC does.

  • For Kanji1 data, characters identified as multibyte characters remain in the client encoding and are collated based on their binary values. This explains why ASCII collation works for double-byte characters in KanjiEBCDIC sessions.

    Note that it is not true that multibyte Kanji1 characters remain in the client encoding for KanjiEUC_0U sessions.

For details, see International Character Set Support and SQL Data Types and Literals.

International Sort Orders

MULTINATIONAL collation is enabled at the user level via the COLLATION option of the CREATE USER or MODIFY USER statement. If a collation is not specified, the default is HOST; that is, the standard sort ordering of the client system (EBCDIC for IBM clients, ASCII for all others).

You can override the default at the session level by issuing the SET SESSION COLLATION statement.

When MULTINATIONAL collation is in effect, the default collation sequence is determined by the collation setting installed at system start-up. Also see International Character Set Support .

Each international sort sequence is defined by program constants and no check is made to ensure that collation is compatible with the character set of the current session. The choice of sequence is controlled by your database administrator. The programmed character sequences cannot be changed.

European Sort Order

The sort order uses a two-level comparison that involves the following rules.

  • All lowercase letters are first mapped to their uppercase equivalents unless CASESPECIFIC is specified in the ORDER BY clause or was defined for the column being accessed.

    In ANSI session mode, the default is CASESPECIFIC and you must explicitly specify NOT CASESPECIFIC to change this.

  • All diacritical forms of the same letter are given the value of the base letter; that is, Ä is given the value of A, Ö is given the value of O, and so forth.
  • If two strings produce the same value, the characters are further ordered according to their sort position in their diacritical equivalence class (see the table below).
  • Unless the query specifies the DESC sort option, collation is in ascending order.

For details, see SQL Data Types and Literals .

When these rules are applied, the words “abbey,” “Active,” and “adage” are returned in this order,

     abbey       Active       adage

and the names Muller, Handl, Böckh, Mueller, Händl, Bohr, Bock, and Müller are ordered as:

     Bock        Böckh        Bohr       Handl
     Händl       Mueller      Muller     Müller

Equivalence classes and the ordering of diacritical characters in each class are shown in this table. The listed classes are those with characters that have diacritical forms.

European Sort Order
A C E I N O
a     A

à     À

á     Á

â     Â

ã     Ã

ä     Ä

c     C

ç     Ç

e     E

è     È

é     É

ê     Ê

ë     Ë

i     I

í     Ì

ì     Í

î     Î

ï     Ï

n     N

ñ     Ñ

o     O

ò     Ò

ó     Ó

ô     Ô

õ     Õ

ö     Ö

o     O

S U Y AE O slash A ring
s     S

β

u     U

ù     Ù

ú     Ú

û     Û

(U tilde)

ü     Ü

y     Y

ÿ     Ÿ

æ     Æ ø     Ø å     Å