15.00 - Characteristics of Ordered Analytical Functions - Teradata Database

Teradata Database SQL Functions, Operators, Expressions, and Predicates

Product
Teradata Database
Release Number
15.00
Content Type
Programming Reference
Publication ID
B035-1145-015K
Language
English (United States)

Characteristics of Ordered Analytical Functions

The Function Value

The function value for a column in a row considers that row (and a subset of all other rows in the group) and produces a new value.

The generic function describing this operation is as follows:

   new_column_value = FUNCTION(column_value,rows_defined_by_window)

Use of QUALIFY Clause

Rows can be eliminated by applying conditions on the new column value. The QUALIFY clause is analogous to the HAVING clause of aggregate functions. The QUALIFY clause eliminates rows based on the function value, returning a new value for each of the participating rows. For example:

   SELECT StoreID, SUM(profit) OVER (PARTITION BY StoreID)
   FROM facts
   QUALIFY SUM(profit) OVER (PARTITION BY StoreID) > 2;

An SQL query that contains both ordered analytical functions and aggregate functions can have both a QUALIFY clause and a HAVING clause, as in the following example:

   SELECT StoreID, SUM(sale), 
   SUM(profit) OVER (PARTITION BY StoreID)
   FROM facts
   GROUP BY StoreID, sale, profit
   HAVING SUM(sale) > 15
   QUALIFY SUM(profit) OVER (PARTITION BY StoreID) > 2;

For details on the QUALIFY clause, see SQL Data Manipulation Language.

DISTINCT Clause Restriction

The DISTINCT clause is not permitted in window aggregate functions.

Permitted Query Objects

Ordered analytical functions are permitted in the following database query objects:

  • Views
  • Macros
  • Derived tables
  • INSERT ... SELECT
  • Where Ordered Analytical Functions are Not Permitted

    Ordered analytical functions are not permitted in:

  • Subqueries
  • WHERE clauses
  • SELECT AND CONSUME statements
  • Use of Standard SQL Features

    You can use standard SQL features within the same query to make your statements more sophisticated.

    For example, you can use ordered analytical functions in the following ways:

     

    Use an analytical function in this operation …

    To …

    INSERT … SELECT

    populate a new column.

    derived table

    create a new table to participate in a complex query.

    Ordered analytical functions having different sort expressions are evaluated one after another, reusing the same spool file. Different functions having the same sort expression are evaluated simultaneously.

    Unsupported Data Types

    Ordered analytical functions do not operate on the following data types:

  • CLOB or BLOB data types
  • UDT data types
  • Note that CLOB, BLOB, or UDT data types are usable inside an expression if the result is a supported data type. For example:

       SELECT
       RANK() OVER 
       (PARTITION BY(CASE WHEN b IS NULL THEN 1 ELSE 0 END) ORDER BY id) 
       FROM btab;

    However, the following example results in an error because the function cannot sort by BLOB:

       SELECT
       RANK() OVER 
       (PARTITION BY b ORDER BY id) 
       FROM btab;

    Ordered Analytical Functions and Period Data Types

    Expressions that evaluate to Period data types can be specified for any expression within the following ordered analytical functions: QUANTILE, RANK (Teradata-specific function), and RANK(ANSI SQL Window function).

    Ordered Analytical Functions and Recursive Queries

    Ordered analytical functions cannot appear in a recursive statement of a recursive query. However, a non-recursive seed statement in a recursive query can specify an ordered analytical function.

    Ordered Analytical Functions and Hash or Join Indexes

    When a single table query specifies an ordered analytical function on columns that are also defined for a single table compressed hash or join index, the Optimizer does not select the hash or join index to process the query.

    Ordered Analytical Functions and Row Level Security Tables

    When a request that includes an ordered analytical function, such as MAVG, CSUM, or RANK, references a table protected by row level security, the operation is based on only the rows that are accessible to the requesting user. In order to apply all rows of the table to the function, the user must have one of the following:

  • The required security credentials to access all rows of the table.
  • The required OVERRIDE privileges on the security constraints in the table.
  • For more information about row level security, see Security Administration.

    Computation Sort Order and Result Order

    The sort order that you specify in the window specification defines the sort order of the rows over which the function is applied; it does not define the ordering of the results.

    For example, to compute the average sales for the months following the current month, order the rows by month:

       SELECT StoreID, SMonth, ProdID, Sales,
       AVG(Sales) OVER (PARTITION BY StoreID ORDER BY SMonth
                        ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)
       FROM sales_tbl;
       
       StoreID  SMonth  ProdID      Sales  Remaining Avg(Sales)
       -------  ------  ------  ---------  --------------------
          1001       6  C        30000.00                     ?
          1001       5  C        30000.00              30000.00
          1001       4  C        25000.00              30000.00
          1001       3  C        40000.00              28333.33
          1001       2  C        25000.00              31250.00
          1001       1  C        35000.00              30000.00
       

    The default sort order is ASC for the computation. However, the results are returned in the reverse order.

    To order the results, use an ORDER BY phrase in the SELECT statement. For example:

       SELECT StoreID, SMonth, ProdID, Sales,
       AVG(Sales) OVER (PARTITION BY StoreID ORDER BY SMonth
                        ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)
       FROM sales_tbl
       ORDER BY SMonth;
       
       StoreID  SMonth  ProdID      Sales  Remaining Avg(Sales)
       -------  ------  ------  ---------  --------------------
          1001       1  C        35000.00              30000.00
          1001       2  C        25000.00              31250.00
          1001       3  C        40000.00              28333.33
          1001       4  C        25000.00              30000.00
          1001       5  C        30000.00              30000.00
          1001       6  C        30000.00                     ?
       

    Data in Partitioning Column of Window Specification and Resource Impact

    The columns specified in the PARTITION BY clause of a window specification determine the partitions over which the ordered analytical function executes. For example, the following query specifies the StoreID column in the PARTITION BY clause to compute the group sales sum for each store:

       SELECT StoreID, SMonth, ProdID, Sales, 
       SUM(Sales) OVER (PARTITION BY StoreID)
       FROM sales_tbl;
       

    At execution time, Teradata Database moves all of the rows that fall into a partition to the same AMP. If a very large number of rows fall into the same partition, the AMP can run out of spool space. For example, if the sales_tbl table in the preceding query has millions or billions of rows, and the StoreID column contains only a few distinct values, an enormous number of rows are going to fall into the same partition, potentially resulting in out-of-spool errors.

    To avoid this problem, examine the data in the columns of the PARTITION BY clause. If necessary, rewrite the query to include additional columns in the PARTITION BY clause to create smaller partitions that Teradata Database can distribute more evenly among the AMPs. For example, the preceding query can be rewritten to compute the group sales sum for each store for each month:

       SELECT StoreID, SMonth, ProdID, Sales, 
       SUM(Sales) OVER (PARTITION BY StoreID, SMonth)
       FROM sales_tbl;

    Using Ordered Analytical Functions

    Example : Using RANK and AVG

    Consider the result of the following SELECT statement using the following ordered analytical functions, RANK and AVG.

       SELECT item, smonth, sales, 
       RANK() OVER (PARTITION BY item ORDER BY sales DESC), 
       AVG(sales) OVER (PARTITION BY item 
                        ORDER BY smonth 
                        ROWS 3 PRECEDING)
       FROM sales_tbl
       ORDER BY item, smonth;

    The results table might look like the following:

     

    Item

    SMonth

    Sales

    Rank(Sales)

    Moving Avg(Sales)

    A

    1996-01

    110

    13

    110

    A

    1996-02

    130

    10

    120

    A

    1996-03

    170

    6

    137

    A

    1996-04

    210

    3

    155

    A

    1996-05

    270

    1

    195

    A

    1996-06

    250

    2

    225

    A

    1996-07

    190

    4

    230

    A

    1996-08

    180

    5

    222

    A

    1996-09

    160

    7

    195

    A

    1996-10

    140

    9

    168

    A

    1996-11

    150

    8

    158

    A

    1996-12

    120

    11

    142

    A

    1997-01

    120

    11

    132

    B

    1996-02

    30

    5

    30

    ...

    ...

    ...

    ...

    ...

    Example : Using QUALIFY With RANK

    Adding a QUALIFY clause to a query eliminates rows from an unqualified table.

    For example, if you wanted to see whether the high sales months were unusual, you could add a QUALIFY clause to the previous query.

       SELECT item, smonth, sales, 
       RANK() OVER (PARTITION BY item ORDER BY sales DESC), 
       AVG(sales) OVER (PARTITION BY item ORDER BY smonth ROWS 3 PRECEDING)
       FROM sales_tbl
       ORDER BY item, smonth
       QUALIFY RANK() OVER(PARTITION BY item ORDER BY sales DESC) <=5;

    This additional qualifier produces a results table that might look like the following:

     

    Item

    SMonth

    Sales

    Rank(Sales)

    Moving Avg(Sales)

    A

    1996-04

    210

    3

    155

    A

    1996-05

    270

    1

    195

    A

    1996-06

    250

    2

    225

    A

    1996-07

    190

    4

    230

    A

    1996-08

    180

    5

    222

    B

    1996-02

    30

    1

    30

    ...

    ...

    ...

    ...

    ...

    The result indicates that sales had probably been fairly low prior to the start of the current sales season.

    Example : Using QUALIFY With RANK

    Consider the following sales table named sales_tbl.

     

    Store

    ProdID

    Sales

    1003

    C

    20000.00

    1003

    D

    50000.00

    1003

    A

    30000.00

    1002

    C

    35000.00

    1002

    D

    25000.00

    1002

    A

    40000.00

    1001

    C

    60000.00

    1001

    D

    35000.00

    1001

    A

    100000.00

    1001

    B

    10000.00

    Now perform the following simple SELECT statement against this table, qualifying answer rows by rank.

    SELECT store, prodID, sales, 
    RANK() OVER (PARTITION BY store ORDER BY sales DESC)
    FROM sales_tbl
    QUALIFY RANK() OVER (PARTITION BY store ORDER BY sales DESC) <=3;

    The result appears in the following typical output table.

     

    Store

    ProdID

    Sales

    Rank(Sales)

    1001

    A

    100000.00

    1

    1001

    C

    60000.00

    2

    1001

    D

    35000.00

    3

    1002

    A

    40000.00

    1

    1002

    C

    35000.00

    2

    1002

    D

    25000.00

    3

    1003

    D

    50000.00

    1

    1003

    A

    30000.00

    2

    1003

    C

    20000.00

    3

    Note that every row in the table is returned with the computed value for RANK except those that do not meet the QUALIFY clause (sales rank is less than third within the store).