Related Strategies
Other functionally similar strategies for solving this problem can also be used. In general, only prototyping can determine which among the possible choices is best for a particular application environment and hardware configuration.
Global temporary tables are private to the session that materializes them and their data does not persist beyond the end of that session. Unless you specify the ON COMMIT PRESERVE option when the definition for the temporary table is created, its contents do not persist even across individual database transactions. Unless provisions are made to write their contents to a persistent base table when a session ends, their data is not saved. Without numerous safeguards built into the process, this is not a method that provides any assurances about the integrity of the data it produces because while the contents of any global temporary table are private to the session in which it is created, the definition of the table is global within a database and any number of different sessions and users could materialize a different version of the table, populate it, and write the results to the same base table as any other session.
Note that the containing database or user for a global temporary table must have a minimum of 512 bytes of available PERM space to contain the table header for the GTT.
Denormalization always reduces the generality of the database for any ad hoc queries or data mining operations you might want to undertake as well as introducing various problematic update anomalies. While a relatively mild degree of denormalization is standard in physically implemented databases, the sort of denormalization called for by this solution is probably beyond what most DBAs would find acceptable, the enthusiasm of dimensional modeling theorists notwithstanding.
Because there is no mechanism for keeping such a table synchronized with its base table, it can become quickly outdated.
Nonetheless, for some applications this approach might be the only high-performing solution.
Example
This example shows how a simple aggregate join index can be used to create prejoins with aggregation on one or more of its columns while retaining full normalization of the physical database. Strictly speaking, the term normalization applies only to the logical schema for a database, not to its physical schema; however, the term has unfortunately come to be used equally for both logical and physical database schemas. See Chapter 5: “The Normalization Process” for details.
Table Definitions
This example set uses the following table definitions.
CREATE TABLE customer (
c_custkey INTEGER NOT NULL,
c_name CHARACTER(26) CASESPECIFIC NOT NULL,
c_address VARCHAR(41),
c_nationkey INTEGER,
c_phone CHARACTER(16),
c_acctbal DECIMAL(13,2),
c_mktsegment CHARACTER(21),
c_comment VARCHAR(127))
UNIQUE PRIMARY INDEX (c_custkey);
CREATE TABLE orders (
o_orderkey INTEGER NOT NULL,
o_custkey INTEGER,
o_orderstatus CHARACTER(1) CASESPECIFIC,
o_totalprice DECIMAL(13,2) NOT NULL,
o_orderdate DATE FORMAT 'YYYY-MM-DD' NOT NULL,
o_orderpriority CHARACTER(21),
o_clerk CHARACTER(16),
o_shippriority INTEGER,
o_comment VARCHAR(79))
UNIQUE PRIMARY INDEX (o_orderkey);
Example Query Statement
Consider the following aggregate join query.
SELECT COUNT(*), SUM(o_totalprice)
FROM orders, customer
WHERE o_custkey = c_custkey
AND o_orderdate > DATE ‘2004-09-20’
AND o_orderdate < DATE ‘2004-10-15’
GROUP BY c_nationkey;
Query Plan: No Aggregate Join Index Defined
Without an aggregate join index, a typical execution plan for this query might involve the following stages:
1 Redistribute orders into spool.
2 Sort the spool on o_custkey.
3 Merge join the sorted spool and the customer file.
4 Aggregate the result of the merge join.
Aggregate Join Index Definition
Suppose you define the following aggregate join index, which aggregates o_totalprice over a join of orders and customer.
CREATE JOIN INDEX ord_cust_idx AS
SELECT c_nationkey, SUM(o_totalprice(FLOAT))
AS price, o_orderdate
FROM orders, customer
WHERE o_custkey = c_custkey
GROUP BY c_nationkey, o_orderdate
ORDER BY o_orderdate;
Query Plan for Aggregate Join Index
The execution plan produced by the Optimizer for this query includes an aggregate step on the aggregate join index, which is much smaller than either one of the join tables. You can confirm this by performing an EXPLAIN on the query.
EXPLAIN SELECT COUNT(*), SUM(o_totalprice)
FROM orders, customer
WHERE o_custkey = c_custkey
AND o_orderdate > DATE ‘2005-09-20’
AND o_orderdate < DATE ‘2005-10-15’
GROUP BY c_nationkey;
*** Help information returned. 18 rows.
*** Total elapsed time was 3 seconds.
Explanation
---------------------------------------------------------------------
1) First, we lock a distinct TPCD."pseudo table" for read on a
RowHash to prevent global deadlock for TPCD.ord_cust_idx.
2) Next, we lock TPCD.ord_cust_idx for read.
3) We do a SUM step to aggregate from join index table
TPCD.ordcustidx by way of an all-rows scan with a condition of (
"(TPCD.ord_cust_idx.O_ORDERDATE > DATE '2005-09-20') AND
(TPCD.ord_cust_idx.O_ORDERDATE < DATE '2005-10-15')"), and the
grouping identifier in field 1. Aggregate Intermediate Results
are computed globally, then placed in Spool 2. The size of Spool
2 is estimated to be 1 row.
4) We do an all-AMPs RETRIEVE step from Spool 2 (Last Use) by way of
an all-rows scan into Spool 1, which is built locally on the AMPs.
The size of Spool 1 is estimated with no confidence to be 1 row.
The estimated time for this step is 0.17 seconds.
5) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1.