15.00 - Working with Analytical Functions - Teradata Database

Teradata Database SQL Functions, Operators, Expressions, and Predicates

Product
Teradata Database
Release Number
15.00
Content Type
Programming Reference
Publication ID
B035-1145-015K
Language
English (United States)
Last Update
2018-09-24

Working with Analytical Functions

Teradata Database support for analytical functions allows you to perform computations at the SQL level rather than through a higher-level calculation engine.

Teradata Database supports ordered analytical syntax.

For complete information, see Chapter 22: “Ordered Analytical / Window Aggregate Functions.”

Analytical Functions and Performance

Analytical functions, which are extremely helpful for decision support, speed up order-based analytical queries.

Using analytical functions, you can target the data analysis within the data warehouse itself. This provides several advantages, including:

  • Improved processing performance.
  • Less data movement across the network.
  • Faster analysis than that performed by external tools and sort routines, but full access to ordered analytical functions by external tools such as Teradata Warehouse Miner.
  • For example, Teradata Warehouse Miner FREQ function uses CSUM, RANK, and QUALIFY in determining frequencies.

  • Support for ANSI version of existing aggregate functions, enabling you to use RANK, SUM, AVG, and COUNT on multiple partitions within a statement select list.
  • Simpler SQL programming, particularly because you can use:
  • Nested aggregates with the HAVING clause
  • Window functions
  • QUALIFY, RANK, and ORDER BY clauses
  • For example, Teradata Database permits this query structure:

    SELECT state, city, SUM(sale),
    RANK() OVER
    (PARTITION BY state ORDER BY SUM(sale))
    FROM Tbl1, Tbl2
    WHERE Tbl1.cityid = Tbl2.cityid
    GROUP BY state, city
    HAVING MAX(sale) > 10
    QUALIFY RANK() OVER
    (PARTITION BY state ORDER BY MIN(sale)) > 10;

    Example : Using Teradata RANK

    RANK (sort_expression_list) returns the rank (1..n) of all rows in a group by values of sort_expression_list.

    For example, assume you enter this query:

    SELECT ProdId, Month, Sales, RANK(Sales)
    FROM SalesHistory
    GROUP BY ProdId
    QUALIFY RANK(Sales) <=3;

    The rows of the response table are ranked as follows:

    ProdId      Month      Sales      RANK
    1234        0607       500        1
    1234        0609       300        2
    1234        0608       250        3
    ...
    5678        0609       450        1
    5678        0608       150        2
    5678        0607       100        3

    This opens up possibilities for applying RANK to non-analytical processing that may be cumbersome otherwise.

    For example, RANK can:

  • Process data sequentially.
  • Generate unique sequential numbers on columns that uniquely define the row.
  • Process consecutive rows in a predefined order, when you define a self-join on a ranked table.
  • For example:

    1 Create a copy of the table with a new column containing the rank, based on some ordering criteria; for example: term_eff_date or load_event_id.

    2 Define a self-join on the table similar to the following:

  • WHERE A.rankvalue = B.rankvalue - 1
  • AND A.policy_id = B.policy_id
  • 3 Use the self-joined table to process all table rows in a single pass (proceeding from row number n to row number n+1). This offers significant performance improvement over making multiple passes to process just two rows at a time.

    Data in Partition By Column and System Resources

    The columns specified in the PARTITION BY clause of a window specification determine the partitions over which the ordered analytical function executes.

    For example, the following query specifies the StoreID column in the PARTITION BY clause to compute the group sales sum for each store:

        SELECT StoreID, SMonth, ProdID, Sales,
        SUM(Sales) OVER (PARTITION BY StoreID)
        FROM sales_tbl;

    At execution time, Teradata Database moves all of the rows that fall into a partition to the same AMP. If a very large number of rows fall into the same partition, the AMP can run out of spool space.

    For example, if the sales_tbl table in the preceding query has millions or billions of rows, and the StoreID column contains only a few distinct values, an enormous number of rows are going to fall into the same partition, potentially resulting in out-of-spool errors.

    To avoid this problem, examine the data in the columns of the PARTITION BY clause. If necessary, rewrite the query to include additional columns in the PARTITION BY clause to create smaller partitions that Teradata Database can distribute more evenly among the AMPs.

    For example, the preceding query can be rewritten to compute the group sales sum for each store for each month:

        SELECT StoreID, SMonth, ProdID, Sales,
        SUM(Sales) OVER (PARTITION BY StoreID, SMonth)
        FROM sales_tbl;