DISTINCT, ALL, and NORMALIZE Operators - Teradata Database

SQL Data Manipulation Language

Product
Teradata Database
Release Number
15.00
Language
English (United States)
Last Update
2018-09-28
dita:id
B035-1146
lifecycle
previous
Product Category
Teradata® Database

DISTINCT, ALL, and NORMALIZE Operators

Purpose

DISTINCT specifies that duplicate values are not to be returned when an expression is processed.

ALL specifies that duplicate values are to be returned when an expression is processed.

NORMALIZE specifies that period values in the first period column that meet or overlap are combined to form a period that encompasses the individual period values.

Syntax  

where:

 

Syntax Element …

Specifies …

DISTINCT

only one row is to be returned from any set of duplicates that might result from a given expression list.

Two rows are considered duplicates only if each value in one is equal to the corresponding value in the other.

ALL

all rows, including duplicates, are to be returned in the results of the expression list.

This is the default value.

NORMALIZE

that the result of the select is normalized on the first period column in the select list. Period values that meet or overlap are coalesced, that is, combined to form a period that encompasses the individual period values.

Any period columns that are specified after the first period column are treated as regular column values. Normalize is the last operation performed on the result of a SELECT statement. You can use a SELECT statement with the normalize clause on a normalized or non-normalized table.

A SELECT statement cannot normalize a derived period column.

ON MEETS OR OVERLAPS

period values that meet or overlap are to be coalesced, that is, combined to form a period that encompasses the individual period values.

ON OVERLAPS

period values that overlap are to be coalesced, that is, combined to form a period that encompasses the individual period values.

ON OVERLAPS OR MEETS

period values that overlap or meet are to be coalesced, that is, combined to form a period that encompasses the individual period values.

ANSI Compliance

DISTINCT and ALL are ANSI SQL:2011-compliant.

NORMALIZE is aTeradata extension to the ANSI SQL:2011 SQL standard.

DISTINCT Operator and UDTs

If you specify the DISTINCT operator and the select list specifies a UDT column, Teradata Database applies the ordering functionality of that UDT to achieve distinctness.

Notice:

Depending on the ordering definition, the same query can return different results, similar to how the results of performing case-insensitive DISTINCT processing on character strings can vary. Issuing such a query twice can return different results.

For a UDT example, consider a structured UDT named CircleUdt composed of these attributes:

  • x INTEGER
  • y INTEGER
  • r INTEGER
  • Suppose that the external type for CircleUdt is a character string containing the ASCII form of the x, y, and r components. The ordering functionality is defined to map and compare the r components of two instance of CircleUdt.

    These circles are considered to be equal because their r values are both 9:

  • NEW CircleUdt('1,1,9'), where x=1, y=1, and r=9
  • NEW CircleUdt('6,10,9')
  • The system sometimes returns the '1,1,9' result and at other times returns the '6,10,9' result as part of the DISTINCT result set.

    DISTINCT Operator and Large Objects

    You cannot include LOB columns in the select list of a SELECT request if you also specify DISTINCT.

    You can specify LOB columns with DISTINCT if you CAST them to an appropriate data type, as documented in this table:

     

    IF the LOB column has this data type …

    THEN you can reference it with DISTINCT if you CAST it to this data type …

    BLOB

  • BYTE
  • VARBYTE
  • CLOB

  • CHARACTER
  • VARCHAR
  • SQL Elements That Cannot Be Used With a DISTINCT Operator

    The following SQL elements cannot be specified with a request that also specifies a DISTINCT operator.

  • WITH request modifier
  • TOP n operator
  • The recursive statement of a recursive query
  • However, you can specify DISTINCT within a nonrecursive seed statement in a recursive query.

    Unexpected Row Length Errors With the DISTINCT Operator

    Before performing the sort operation required to eliminate duplicates, Teradata Database creates a sort key and appends it to the rows to be sorted. If the length of this temporary data structure exceeds the system row length limit of 64 KB, the operation fails and the system returns an error. Depending on the situation, the error message text states that:

  • A data row is too long.
  • Maximum row length exceeded in database_object_name.
  • For explanations of these messages, see Messages.

    The BYNET only looks at the first 4,096 bytes of the sort key created to sort the specified fields, so if the column the sort key is based on is greater than 4,096 bytes, the key is truncated and rows in the set can either be identified as duplicates when they are not or identified as unique when they are duplicates.

    Using SELECT with NORMALIZE

    Following are the rules and restrictions for using SELECT with NORMALIZE. For information about temporal tables, see Temporal Table Support.

    The NORMALIZE operation is performed on the first period column in the select list.

    For NORMALIZE, at least one column in the select list must be of period data type.

    You can use NORMALIZE:

  • on normalized or non-normalized tables
  • in subqueries
  • in the SELECT INTO statement
  • When the first period value specified in the SELECT list is a USING value, a constant value, or DEFAULT function on the period column, NORMALIZE operates like SELECT with DISTINCT.

    When the NORMALIZE clause is specified with EXPAND, the rows are expanded and then the result is normalized.

    When a query includes NORMALIZE and ORDER BY, the rows are normalized first and the result is ordered on the normalized result.

    NORMALIZE is the last operation performed, except when the query includes ORDER BY or INTO. The other operations in a query are performed before NORMALIZE, except ORDER BY and INTO, and then the rows are normalized. If NORMALIZE is specified in a query with aggregation or OLAP, normalize is performed on the final result after the aggregate or OLAP operation.

    When SELECT NORMALIZE includes a view list, views are not updatable.

    A SELECT statement on a derived period column cannot include NORMALIZE.

    You cannot include LOB columns in the select column list for NORMALIZE.

    You cannot use NORMALIZE in a SELECT statement that includes the WITH clause or the TOP n operator.

    You can use NORMALIZE in a non-recursive seed statement in a recursive query. However, NORMALIZE is not allowed in a recursive statement of a recursive query.

    When NORMALIZE is specified in the CURSOR select statement, the CURSOR is not updatable.

    Example : Simple Uses of DISTINCT

    The following statement returns the number of unique job titles.

         SELECT COUNT(DISTINCT JobTitle) 
         FROM …

    The following statement lists the unique department numbers in the employee table.

         SELECT DISTINCT dept_no 
         FROM employee;

    The result returned is:

         dept_no 
         ‑‑‑-‑‑‑
             100 
             300 
             500 
             600 
             700

    Example : Using DISTINCT With Aggregates

    The following statement uses the DISTINCT operator to eliminate duplicate values before applying SUM and AVG aggregate operators, respectively, to column x:

         SELECT SUM(DISTINCT x), AVG(DISTINCT x) 
         FROM test_table;

    You can only perform DISTINCT aggregations at the first, or highest level of aggregation if you specify subqueries.

    Note that the result data type for a SUM operation on a DECIMAL(m,n) column is DECIMAL(p,n), as described in this table:

     

    IF DBS Control field MaxDecimal is set to …

    AND …

    THEN p is …

    0

    15

    m  15

    15

    15 < m  18

    18

    m > 18

    38

    18

    m  18

    18

    m > 18

    38

    31

    m  31

    31

    m > 31

    38

    38

    m = any value

    38

    Example : Using ALL

    In contrast to the second statement in “Example 1: Simple Uses of DISTINCT” on page 84, this statement returns the department number for every employee:

         SELECT ALL dept_no 
         FROM employee;

    ALL is the default option, except in query expressions using set operators, where ALL is an option, but not the default. See SQL Functions, Operators, Expressions, and Predicates.

    Example Using NORMALIZE

    Following is the table definition for the NORMALIZE examples;

         CREATE TABLE project
         (    emp_id        INTEGER,
              project_name  VARCHAR(20),
              dept_id       INTEGER,
              duration      PERIOD(DATE)
         );

    The table contains the following rows:

     

    Emp_ID

    Project_Name

    Dept_ID

    Duration

    10

    First Phase

    1000

    10 Jan 2010 - 20 Mar 2010

    10

    First Phase

    2000

    20 Mar 2010 - 15 July 2010

    10

    Second Phase

    2000

    15 June 2010 - 18 Aug 2010

    20

    First Phase

    2000

    10 Mar 2010 - 20 July 2010

    The following select statement performs a normalize operation on emp_id. Note that the select list contains only one period column.

         SELECT NORMALIZE ON MEETS OR OVERLAPS emp_id, duration 
         FROM project;

    The query returns the following result:

     

    Emp_ID

    Duration

    10

    10 Jan 2010 - 18 Aug 2010

    20

    10 Mar 2010 - 20 July 2010

    The following select statement performs a normalize operation on project_name. Note that the select list contains only one period column.

    SELECT NORMALIZE project_name, duration 
    FROM project;

    The query returns the following result:

     

    Project_Name

    Duration

    First Phase

    10 Jan 2010 - 20 July 2010

    Second Phase

    15 June 2010 - 18 Aug 2010

    The following select statement performs a normalize operation on project_name and dept_id. Note that the select list contains only one period column.

    SELECT NORMALIZE project_name, dept_id, duration 
    FROM project;

    The query returns the following result:

     

    Project_Name

    Dept_ID

    Duration

    First Phase

    1000

    10 Jan 2010 - 20 Mar 2010

    First Phase

    2000

    20 Mar 2010 - 20 July 2010

    Second Phase

    2000

    15 June 2010 - 18 Aug 2010