DISTINCT, ALL, and NORMALIZE Operators
Purpose
DISTINCT specifies that duplicate values are not to be returned when an expression is processed.
ALL specifies that duplicate values are to be returned when an expression is processed.
NORMALIZE specifies that period values in the first period column that meet or overlap are combined to form a period that encompasses the individual period values.
Syntax
where:
Syntax Element … |
Specifies … |
DISTINCT |
only one row is to be returned from any set of duplicates that might result from a given expression list. Two rows are considered duplicates only if each value in one is equal to the corresponding value in the other. |
ALL |
all rows, including duplicates, are to be returned in the results of the expression list. This is the default value. |
NORMALIZE |
that the result of the select is normalized on the first period column in the select list. Period values that meet or overlap are coalesced, that is, combined to form a period that encompasses the individual period values. Any period columns that are specified after the first period column are treated as regular column values. Normalize is the last operation performed on the result of a SELECT statement. You can use a SELECT statement with the normalize clause on a normalized or non-normalized table. A SELECT statement cannot normalize a derived period column. |
ON MEETS OR OVERLAPS |
period values that meet or overlap are to be coalesced, that is, combined to form a period that encompasses the individual period values. |
ON OVERLAPS |
period values that overlap are to be coalesced, that is, combined to form a period that encompasses the individual period values. |
ON OVERLAPS OR MEETS |
period values that overlap or meet are to be coalesced, that is, combined to form a period that encompasses the individual period values. |
ANSI Compliance
DISTINCT and ALL are ANSI SQL:2011-compliant.
NORMALIZE is aTeradata extension to the ANSI SQL:2011 SQL standard.
DISTINCT Operator and UDTs
If you specify the DISTINCT operator and the select list specifies a UDT column, Teradata Database applies the ordering functionality of that UDT to achieve distinctness.
Notice:
Depending on the ordering definition, the same query can return different results, similar to how the results of performing case-insensitive DISTINCT processing on character strings can vary. Issuing such a query twice can return different results.
For a UDT example, consider a structured UDT named CircleUdt composed of these attributes:
Suppose that the external type for CircleUdt is a character string containing the ASCII form of the x, y, and r components. The ordering functionality is defined to map and compare the r components of two instance of CircleUdt.
These circles are considered to be equal because their r values are both 9:
The system sometimes returns the '1,1,9' result and at other times returns the '6,10,9' result as part of the DISTINCT result set.
DISTINCT Operator and Large Objects
You cannot include LOB columns in the select list of a SELECT request if you also specify DISTINCT.
You can specify LOB columns with DISTINCT if you CAST them to an appropriate data type, as documented in this table:
IF the LOB column has this data type … |
THEN you can reference it with DISTINCT if you CAST it to this data type … |
BLOB |
|
CLOB |
|
SQL Elements That Cannot Be Used With a DISTINCT Operator
The following SQL elements cannot be specified with a request that also specifies a DISTINCT operator.
However, you can specify DISTINCT within a nonrecursive seed statement in a recursive query.
Unexpected Row Length Errors With the DISTINCT Operator
Before performing the sort operation required to eliminate duplicates, Teradata Database creates a sort key and appends it to the rows to be sorted. If the length of this temporary data structure exceeds the system row length limit of 64 KB, the operation fails and the system returns an error. Depending on the situation, the error message text states that:
For explanations of these messages, see Messages.
The BYNET only looks at the first 4,096 bytes of the sort key created to sort the specified fields, so if the column the sort key is based on is greater than 4,096 bytes, the key is truncated and rows in the set can either be identified as duplicates when they are not or identified as unique when they are duplicates.
Using SELECT with NORMALIZE
Following are the rules and restrictions for using SELECT with NORMALIZE. For information about temporal tables, see Temporal Table Support.
The NORMALIZE operation is performed on the first period column in the select list.
For NORMALIZE, at least one column in the select list must be of period data type.
You can use NORMALIZE:
When the first period value specified in the SELECT list is a USING value, a constant value, or DEFAULT function on the period column, NORMALIZE operates like SELECT with DISTINCT.
When the NORMALIZE clause is specified with EXPAND, the rows are expanded and then the result is normalized.
When a query includes NORMALIZE and ORDER BY, the rows are normalized first and the result is ordered on the normalized result.
NORMALIZE is the last operation performed, except when the query includes ORDER BY or INTO. The other operations in a query are performed before NORMALIZE, except ORDER BY and INTO, and then the rows are normalized. If NORMALIZE is specified in a query with aggregation or OLAP, normalize is performed on the final result after the aggregate or OLAP operation.
When SELECT NORMALIZE includes a view list, views are not updatable.
A SELECT statement on a derived period column cannot include NORMALIZE.
You cannot include LOB columns in the select column list for NORMALIZE.
You cannot use NORMALIZE in a SELECT statement that includes the WITH clause or the TOP n operator.
You can use NORMALIZE in a non-recursive seed statement in a recursive query. However, NORMALIZE is not allowed in a recursive statement of a recursive query.
When NORMALIZE is specified in the CURSOR select statement, the CURSOR is not updatable.
Example : Simple Uses of DISTINCT
The following statement returns the number of unique job titles.
SELECT COUNT(DISTINCT JobTitle)
FROM …
The following statement lists the unique department numbers in the employee table.
SELECT DISTINCT dept_no
FROM employee;
The result returned is:
dept_no
‑‑‑-‑‑‑
100
300
500
600
700
Example : Using DISTINCT With Aggregates
The following statement uses the DISTINCT operator to eliminate duplicate values before applying SUM and AVG aggregate operators, respectively, to column x:
SELECT SUM(DISTINCT x), AVG(DISTINCT x)
FROM test_table;
You can only perform DISTINCT aggregations at the first, or highest level of aggregation if you specify subqueries.
Note that the result data type for a SUM operation on a DECIMAL(m,n) column is DECIMAL(p,n), as described in this table:
IF DBS Control field MaxDecimal is set to … |
AND … |
THEN p is … |
0 15 |
m ≤ 15 |
15 |
15 < m ≤ 18 |
18 |
|
m > 18 |
38 |
|
18 |
m ≤ 18 |
18 |
m > 18 |
38 |
|
31 |
m ≤ 31 |
31 |
m > 31 |
38 |
|
38 |
m = any value |
38 |
Example : Using ALL
In contrast to the second statement in “Example 1: Simple Uses of DISTINCT” on page 84, this statement returns the department number for every employee:
SELECT ALL dept_no
FROM employee;
ALL is the default option, except in query expressions using set operators, where ALL is an option, but not the default. See SQL Functions, Operators, Expressions, and Predicates.
Example Using NORMALIZE
Following is the table definition for the NORMALIZE examples;
CREATE TABLE project
( emp_id INTEGER,
project_name VARCHAR(20),
dept_id INTEGER,
duration PERIOD(DATE)
);
The table contains the following rows:
Emp_ID |
Project_Name |
Dept_ID |
Duration |
10 |
First Phase |
1000 |
10 Jan 2010 - 20 Mar 2010 |
10 |
First Phase |
2000 |
20 Mar 2010 - 15 July 2010 |
10 |
Second Phase |
2000 |
15 June 2010 - 18 Aug 2010 |
20 |
First Phase |
2000 |
10 Mar 2010 - 20 July 2010 |
The following select statement performs a normalize operation on emp_id. Note that the select list contains only one period column.
SELECT NORMALIZE ON MEETS OR OVERLAPS emp_id, duration
FROM project;
The query returns the following result:
Emp_ID |
Duration |
10 |
10 Jan 2010 - 18 Aug 2010 |
20 |
10 Mar 2010 - 20 July 2010 |
The following select statement performs a normalize operation on project_name. Note that the select list contains only one period column.
SELECT NORMALIZE project_name, duration
FROM project;
The query returns the following result:
Project_Name |
Duration |
First Phase |
10 Jan 2010 - 20 July 2010 |
Second Phase |
15 June 2010 - 18 Aug 2010 |
The following select statement performs a normalize operation on project_name and dept_id. Note that the select list contains only one period column.
SELECT NORMALIZE project_name, dept_id, duration
FROM project;
The query returns the following result:
Project_Name |
Dept_ID |
Duration |
First Phase |
1000 |
10 Jan 2010 - 20 Mar 2010 |
First Phase |
2000 |
20 Mar 2010 - 20 July 2010 |
Second Phase |
2000 |
15 June 2010 - 18 Aug 2010 |