Characteristics of Ordered Analytical Functions
The Function Value
The function value for a column in a row considers that row (and a subset of all other rows in the group) and produces a new value.
The generic function describing this operation is as follows:
new_column_value = FUNCTION(column_value,rows_defined_by_window)
Use of QUALIFY Clause
Rows can be eliminated by applying conditions on the new column value. The QUALIFY clause is analogous to the HAVING clause of aggregate functions. The QUALIFY clause eliminates rows based on the function value, returning a new value for each of the participating rows. For example:
SELECT StoreID, SUM(profit) OVER (PARTITION BY StoreID)
FROM facts
QUALIFY SUM(profit) OVER (PARTITION BY StoreID) > 2;
An SQL query that contains both ordered analytical functions and aggregate functions can have both a QUALIFY clause and a HAVING clause, as in the following example:
SELECT StoreID, SUM(sale),
SUM(profit) OVER (PARTITION BY StoreID)
FROM facts
GROUP BY StoreID, sale, profit
HAVING SUM(sale) > 15
QUALIFY SUM(profit) OVER (PARTITION BY StoreID) > 2;
For details on the QUALIFY clause, see SQL Data Manipulation Language.
DISTINCT Clause Restriction
The DISTINCT clause is not permitted in window aggregate functions.
Permitted Query Objects
Ordered analytical functions are permitted in the following database query objects:
Where Ordered Analytical Functions are Not Permitted
Ordered analytical functions are not permitted in:
Use of Standard SQL Features
You can use standard SQL features within the same query to make your statements more sophisticated.
For example, you can use ordered analytical functions in the following ways:
Use an analytical function in this operation … |
To … |
INSERT … SELECT |
populate a new column. |
derived table |
create a new table to participate in a complex query. |
Ordered analytical functions having different sort expressions are evaluated one after another, reusing the same spool file. Different functions having the same sort expression are evaluated simultaneously.
Unsupported Data Types
Ordered analytical functions do not operate on the following data types:
Note that CLOB, BLOB, or UDT data types are usable inside an expression if the result is a supported data type. For example:
SELECT
RANK() OVER
(PARTITION BY(CASE WHEN b IS NULL THEN 1 ELSE 0 END) ORDER BY id)
FROM btab;
However, the following example results in an error because the function cannot sort by BLOB:
SELECT
RANK() OVER
(PARTITION BY b ORDER BY id)
FROM btab;
Ordered Analytical Functions and Period Data Types
Expressions that evaluate to Period data types can be specified for any expression within the following ordered analytical functions: QUANTILE, RANK (Teradata-specific function), and RANK(ANSI SQL Window function).
Ordered Analytical Functions and Recursive Queries
Ordered analytical functions cannot appear in a recursive statement of a recursive query. However, a non-recursive seed statement in a recursive query can specify an ordered analytical function.
Ordered Analytical Functions and Hash or Join Indexes
When a single table query specifies an ordered analytical function on columns that are also defined for a single table compressed hash or join index, the Optimizer does not select the hash or join index to process the query.
Ordered Analytical Functions and Row Level Security Tables
When a request that includes an ordered analytical function, such as MAVG, CSUM, or RANK, references a table protected by row level security, the operation is based on only the rows that are accessible to the requesting user. In order to apply all rows of the table to the function, the user must have one of the following:
For more information about row level security, see Security Administration.
Computation Sort Order and Result Order
The sort order that you specify in the window specification defines the sort order of the rows over which the function is applied; it does not define the ordering of the results.
For example, to compute the average sales for the months following the current month, order the rows by month:
SELECT StoreID, SMonth, ProdID, Sales,
AVG(Sales) OVER (PARTITION BY StoreID ORDER BY SMonth
ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)
FROM sales_tbl;
StoreID SMonth ProdID Sales Remaining Avg(Sales)
------- ------ ------ --------- --------------------
1001 6 C 30000.00 ?
1001 5 C 30000.00 30000.00
1001 4 C 25000.00 30000.00
1001 3 C 40000.00 28333.33
1001 2 C 25000.00 31250.00
1001 1 C 35000.00 30000.00
The default sort order is ASC for the computation. However, the results are returned in the reverse order.
To order the results, use an ORDER BY phrase in the SELECT statement. For example:
SELECT StoreID, SMonth, ProdID, Sales,
AVG(Sales) OVER (PARTITION BY StoreID ORDER BY SMonth
ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)
FROM sales_tbl
ORDER BY SMonth;
StoreID SMonth ProdID Sales Remaining Avg(Sales)
------- ------ ------ --------- --------------------
1001 1 C 35000.00 30000.00
1001 2 C 25000.00 31250.00
1001 3 C 40000.00 28333.33
1001 4 C 25000.00 30000.00
1001 5 C 30000.00 30000.00
1001 6 C 30000.00 ?
Data in Partitioning Column of Window Specification and Resource Impact
The columns specified in the PARTITION BY clause of a window specification determine the partitions over which the ordered analytical function executes. For example, the following query specifies the StoreID column in the PARTITION BY clause to compute the group sales sum for each store:
SELECT StoreID, SMonth, ProdID, Sales,
SUM(Sales) OVER (PARTITION BY StoreID)
FROM sales_tbl;
At execution time, Teradata Database moves all of the rows that fall into a partition to the same AMP. If a very large number of rows fall into the same partition, the AMP can run out of spool space. For example, if the sales_tbl table in the preceding query has millions or billions of rows, and the StoreID column contains only a few distinct values, an enormous number of rows are going to fall into the same partition, potentially resulting in out-of-spool errors.
To avoid this problem, examine the data in the columns of the PARTITION BY clause. If necessary, rewrite the query to include additional columns in the PARTITION BY clause to create smaller partitions that Teradata Database can distribute more evenly among the AMPs. For example, the preceding query can be rewritten to compute the group sales sum for each store for each month:
SELECT StoreID, SMonth, ProdID, Sales,
SUM(Sales) OVER (PARTITION BY StoreID, SMonth)
FROM sales_tbl;
Using Ordered Analytical Functions
Example : Using RANK and AVG
Consider the result of the following SELECT statement using the following ordered analytical functions, RANK and AVG.
SELECT item, smonth, sales,
RANK() OVER (PARTITION BY item ORDER BY sales DESC),
AVG(sales) OVER (PARTITION BY item
ORDER BY smonth
ROWS 3 PRECEDING)
FROM sales_tbl
ORDER BY item, smonth;
The results table might look like the following:
Item |
SMonth |
Sales |
Rank(Sales) |
Moving Avg(Sales) |
A |
1996-01 |
110 |
13 |
110 |
A |
1996-02 |
130 |
10 |
120 |
A |
1996-03 |
170 |
6 |
137 |
A |
1996-04 |
210 |
3 |
155 |
A |
1996-05 |
270 |
1 |
195 |
A |
1996-06 |
250 |
2 |
225 |
A |
1996-07 |
190 |
4 |
230 |
A |
1996-08 |
180 |
5 |
222 |
A |
1996-09 |
160 |
7 |
195 |
A |
1996-10 |
140 |
9 |
168 |
A |
1996-11 |
150 |
8 |
158 |
A |
1996-12 |
120 |
11 |
142 |
A |
1997-01 |
120 |
11 |
132 |
B |
1996-02 |
30 |
5 |
30 |
... |
... |
... |
... |
... |
Example : Using QUALIFY With RANK
Adding a QUALIFY clause to a query eliminates rows from an unqualified table.
For example, if you wanted to see whether the high sales months were unusual, you could add a QUALIFY clause to the previous query.
SELECT item, smonth, sales,
RANK() OVER (PARTITION BY item ORDER BY sales DESC),
AVG(sales) OVER (PARTITION BY item ORDER BY smonth ROWS 3 PRECEDING)
FROM sales_tbl
ORDER BY item, smonth
QUALIFY RANK() OVER(PARTITION BY item ORDER BY sales DESC) <=5;
This additional qualifier produces a results table that might look like the following:
Item |
SMonth |
Sales |
Rank(Sales) |
Moving Avg(Sales) |
A |
1996-04 |
210 |
3 |
155 |
A |
1996-05 |
270 |
1 |
195 |
A |
1996-06 |
250 |
2 |
225 |
A |
1996-07 |
190 |
4 |
230 |
A |
1996-08 |
180 |
5 |
222 |
B |
1996-02 |
30 |
1 |
30 |
... |
... |
... |
... |
... |
The result indicates that sales had probably been fairly low prior to the start of the current sales season.
Example : Using QUALIFY With RANK
Consider the following sales table named sales_tbl.
Store |
ProdID |
Sales |
1003 |
C |
20000.00 |
1003 |
D |
50000.00 |
1003 |
A |
30000.00 |
1002 |
C |
35000.00 |
1002 |
D |
25000.00 |
1002 |
A |
40000.00 |
1001 |
C |
60000.00 |
1001 |
D |
35000.00 |
1001 |
A |
100000.00 |
1001 |
B |
10000.00 |
Now perform the following simple SELECT statement against this table, qualifying answer rows by rank.
SELECT store, prodID, sales,
RANK() OVER (PARTITION BY store ORDER BY sales DESC)
FROM sales_tbl
QUALIFY RANK() OVER (PARTITION BY store ORDER BY sales DESC) <=3;
The result appears in the following typical output table.
Store |
ProdID |
Sales |
Rank(Sales) |
1001 |
A |
100000.00 |
1 |
1001 |
C |
60000.00 |
2 |
1001 |
D |
35000.00 |
3 |
1002 |
A |
40000.00 |
1 |
1002 |
C |
35000.00 |
2 |
1002 |
D |
25000.00 |
3 |
1003 |
D |
50000.00 |
1 |
1003 |
A |
30000.00 |
2 |
1003 |
C |
20000.00 |
3 |
Note that every row in the table is returned with the computed value for RANK except those that do not meet the QUALIFY clause (sales rank is less than third within the store).