15.00 - Related Strategies - Teradata Database

Teradata Database Design

prodname
Teradata Database
vrm_release
15.00
category
User Guide
featnum
B035-1094-015K

Related Strategies

Other functionally similar strategies for solving this problem can also be used. In general, only prototyping can determine which among the possible choices is best for a particular application environment and hardware configuration.

  • You can create a global temporary table definition and then populate a materialized instance of it with aggregated result sets. This is not so much an alternate strategy as it is an entirely different strategy designed for an entirely different application.
  • Global temporary tables are private to the session that materializes them and their data does not persist beyond the end of that session. Unless you specify the ON COMMIT PRESERVE option when the definition for the temporary table is created, its contents do not persist even across individual database transactions. Unless provisions are made to write their contents to a persistent base table when a session ends, their data is not saved. Without numerous safeguards built into the process, this is not a method that provides any assurances about the integrity of the data it produces because while the contents of any global temporary table are private to the session in which it is created, the definition of the table is global within a database and any number of different sessions and users could materialize a different version of the table, populate it, and write the results to the same base table as any other session.

    Note that the containing database or user for a global temporary table must have a minimum of 512 bytes of available PERM space to contain the table header for the GTT.

  • You can create a volatile table with aggregate expressions defined on some or all its columns. The drawbacks of global temporary tables for this application apply equally to volatile tables.
  • You can create a denormalized base table and populate it with an aggregated result set.
  • Denormalization always reduces the generality of the database for any ad hoc queries or data mining operations you might want to undertake as well as introducing various problematic update anomalies. While a relatively mild degree of denormalization is standard in physically implemented databases, the sort of denormalization called for by this solution is probably beyond what most DBAs would find acceptable, the enthusiasm of dimensional modeling theorists notwithstanding.

    Because there is no mechanism for keeping such a table synchronized with its base table, it can become quickly outdated.

    Nonetheless, for some applications this approach might be the only high-performing solution.

    Example

    This example shows how a simple aggregate join index can be used to create prejoins with aggregation on one or more of its columns while retaining full normalization of the physical database. Strictly speaking, the term normalization applies only to the logical schema for a database, not to its physical schema; however, the term has unfortunately come to be used equally for both logical and physical database schemas. See Chapter 5: “The Normalization Process” for details.

    Table Definitions

    This example set uses the following table definitions.

         CREATE TABLE customer (
           c_custkey    INTEGER NOT NULL,
           c_name       CHARACTER(26) CASESPECIFIC NOT NULL,
           c_address    VARCHAR(41),
           c_nationkey  INTEGER,
           c_phone      CHARACTER(16),
           c_acctbal    DECIMAL(13,2),
           c_mktsegment CHARACTER(21),
           c_comment    VARCHAR(127))
         UNIQUE PRIMARY INDEX (c_custkey);
     
         CREATE TABLE orders (
           o_orderkey      INTEGER NOT NULL,
           o_custkey       INTEGER,
           o_orderstatus   CHARACTER(1) CASESPECIFIC,       
           o_totalprice    DECIMAL(13,2) NOT NULL,
           o_orderdate     DATE FORMAT 'YYYY-MM-DD' NOT NULL,
           o_orderpriority CHARACTER(21),
           o_clerk         CHARACTER(16),
           o_shippriority  INTEGER,
           o_comment       VARCHAR(79))
         UNIQUE PRIMARY INDEX (o_orderkey);

    Example Query Statement

    Consider the following aggregate join query.

         SELECT COUNT(*), SUM(o_totalprice) 
         FROM orders, customer 
         WHERE o_custkey = c_custkey 
         AND   o_orderdate > DATE ‘2004-09-20’ 
         AND   o_orderdate < DATE ‘2004-10-15’ 
         GROUP BY c_nationkey;

    Query Plan: No Aggregate Join Index Defined

    Without an aggregate join index, a typical execution plan for this query might involve the following stages:

    1 Redistribute orders into a spool file.

    2 Sort the spool file on o_custkey.

    3 Merge join the sorted spool file and the customer file.

    4 Aggregate the result of the merge join.

    Aggregate Join Index Definition

    Suppose you define the following aggregate join index, which aggregates o_totalprice over a join of orders and customer.

         CREATE JOIN INDEX ord_cust_idx AS
          SELECT c_nationkey, SUM(o_totalprice(FLOAT)) 
                 AS  price, o_orderdate 
          FROM orders, customer 
          WHERE o_custkey = c_custkey 
          GROUP BY c_nationkey, o_orderdate
          ORDER BY o_orderdate;

    Query Plan for Aggregate Join Index

    The execution plan produced by the Optimizer for this query includes an aggregate step on the aggregate join index, which is much smaller than either one of the join tables. You can confirm this by performing an EXPLAIN on the query.

        EXPLAIN SELECT COUNT(*), SUM(o_totalprice) 
                FROM orders, customer 
                WHERE o_custkey = c_custkey 
                AND   o_orderdate > DATE ‘2005-09-20’ 
                AND   o_orderdate < DATE ‘2005-10-15’ 
                GROUP BY c_nationkey;
     
        *** Help information returned. 18 rows. 
         *** Total elapsed time was 3 seconds.
     
       Explanation
       ---------------------------------------------------------------------
         1) First, we lock a distinct TPCD."pseudo table" for read on a
            RowHash to prevent global deadlock for TPCD.ord_cust_idx.
         2) Next, we lock TPCD.ord_cust_idx for read.
         3) We do a SUM step to aggregate from join index table
            TPCD.ordcustidx by way of an all-rows scan with a condition of (
            "(TPCD.ord_cust_idx.O_ORDERDATE > DATE '2005-09-20') AND
            (TPCD.ord_cust_idx.O_ORDERDATE < DATE '2005-10-15')"), and the
            grouping identifier in field 1.  Aggregate Intermediate Results
            are computed globally, then placed in Spool 2.  The size of Spool
            2 is estimated to be 1 row.
         4) We do an all-AMPs RETRIEVE step from Spool 2 (Last Use) by way of
            an all-rows scan into Spool 1, which is built locally on the AMPs.
            The size of Spool 1 is estimated with no confidence to be 1 row.
            The estimated time for this step is 0.17 seconds.
         5) Finally, we send out an END TRANSACTION step to all AMPs involved
            in processing the request.
       -> The contents of Spool 1 are sent back to the user as the result of
          statement 1.