Creating Join Indexes Using Outer Joins | Teradata Vantage - Creating Join Indexes Using Outer Joins - Advanced SQL Engine

Creating Join Indexes Using Outer Joins | Teradata Vantage - Creating Join Indexes Using Outer Joins - Advanced SQL Engine - Teradata Database

Database Design

Product

Advanced SQL Engine

Teradata Database

Release Number

17.05

17.00

Published

June 2020

Language

English (United States)

Last Update

2021-01-22

dita:mapPath

qby1588121512748.ditamap

dita:ditavalPath

lze1555437562152.ditaval

dita:id

B035-1094

lifecycle

Product Category

Teradata Vantage™

This example examines the EXPLAIN reports for a different outer join-based join index in some detail.

Table Definitions

     CREATE TABLE customer (
       c_custkey    INTEGER NOT NULL,
       c_name       CHARACTER(26),
       c_address    VARCHAR(41),
       c_nationkey  INTEGER,
       c_phone      CHARACTER(16),
       c_acctbal    DECIMAL(13,2),
       c_mktsegment CHARACTER(21),
       c_comment    VARCHAR(127))
     UNIQUE PRIMARY INDEX(c_custkey);
     CREATE TABLE orders (
       o_orderkey      INTEGER NOT NULL,
       o_custkey       INTEGER,
       o_orderstatus   CHARACTER(1),
       o_totalprice    DECIMAL(13,2) NOT NULL,
       o_orderdate     DATE FORMAT 'yyyy-mm-dd' NOT NULL,
       o_orderpriority CHARACTER(21),
       o_clerk         CHARACTER(16),
       o_shippriority  INTEGER,
       o_commment      VARCHAR(79))
     UNIQUE PRIMARY INDEX (o_orderkey);
     CREATE TABLE lineitem (
       l_orderkey      INTEGER NOT NULL,
       l_partkey       INTEGER NOT NULL,
       l_suppkey       INTEGER,
       l_linenumber    INTEGER,
       l_quantity      INTEGER NOT NULL,
       l_extendedprice DECIMAL(13,2) NOT NULL,
       l_discount      DECIMAL(13,2),
       l_tax           DECIMAL(13,2),
       l_returnflag    CHARACTER(1),
       l_linestatus    CHARACTER(1),
       l_shipdate      DATE FORMAT 'yyyy-mm-dd',
       l_commitdate    DATE FORMAT 'yyyy-mm-dd',
       l_receiptdate   DATE FORMAT 'yyyy-mm-dd',
       l_shipinstruct  VARCHAR(25),
       l_shipmode      VARCHAR(10),
       l_comment       VARCHAR(44))
     PRIMARY INDEX (l_orderkey);

Join Index Definition

The following statement defines a join index on these tables. Subsequent examples demonstrate the effect of this join index on how the Optimizer processes various queries.

     CREATE JOIN INDEX order_join_line
      AS SELECT (l_orderkey, o_orderdate, o_custkey,
                 o_totalprice),
                (l_partkey, l_quantity, l_extendedprice, l_shipdate)
      FROM lineitem LEFT JOIN orders ON l_orderkey = o_orderkey
      ORDER BY o_orderdate
      PRIMARY INDEX (l_orderkey);
      *** Index has been created.
      *** Total elapsed time was 15 seconds.

EXPLAIN for Query With Simple Predicate

The following EXPLAIN report shows how the newly created join index, order_join_line, might be used by the Optimizer.

     EXPLAIN SELECT o_orderdate, o_custkey, l_partkey, l_quantity,                     l_extendedprice
             FROM lineitem, orders
             WHERE l_orderkey = o_orderkey;
Explanation
---------------------------------------------------------------------------
  1) First, we lock LOUISB.order_join_line for read on a
     reserved Row Hash to prevent a global deadlock.
  2) Next, we do an all-AMPs RETRIEVE step from join index table
     LOUISB.order_join_line by way of an all-rows scan with a condition
     of ("NOT (LOUISB.order_join_line.o_orderdate IS NULL)") into Spool 1,
     which is built locally on the AMPs. The input table will not be
     cached in memory, but it is eligible for synchronized scanning.
     The result spool file will not be cached in memory. The size of
     Spool 1 is estimated to be 1,000,000 rows. The estimated time for
     this step is 4 minutes and 27 seconds.
  3) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.

EXPLAIN for Query With More Complicated Predicate

The following EXPLAIN report shows how the join index might be used in a query when an additional search condition is added on the join indexed rows.

     EXPLAIN SELECT o_orderdate, o_custkey, l_partkey,
                    l_quantity, l_extendedprice
             FROM lineitem, orders
             WHERE l_orderkey = o_orderkey
             AND o_orderdate > '2003-11-01';
   Explanation
   --------------------------------------------------------------------------
     1) First, we lock LOUISB.order_join_line for read on a
     reserved Row Hash to prevent a global deadlock.
     2) Next, we do an all-AMPs RETRIEVE step from join index table
        LOUISB.order_join_line with a range constraint of (
        "LOUISB.order_join_line.Field_1026 > 971101") with a residual
        condition of ("(NOT (LOUISB.order_join_line.o_orderdate IS NULL ))
        AND (LOUISB.order_join_line.Field_1026 > 971101)") into Spool 1,
        which is built locally on the AMPs. The input table will not be
        cached in memory, but it is eligible for synchronized scanning.
        The size of Spool 1 is estimated to be 1000 rows. The estimated time
        for this step is 0.32 seconds.
     4) Finally, we send out an END TRANSACTION step to all AMPs involved
        in processing the request.

EXPLAIN for Query With Aggregation

The following EXPLAIN shows how the join index might be used in a query when aggregation is performed on the join indexed rows.

     EXPLAIN SELECT l_partkey, AVG(l_quantity),
                    AVG(l_extendedprice)
             FROM lineitem , orders
             WHERE l_orderkey = o_orderkey
             AND   o_orderdate > '2003-11-01'
             GROUP BY l_partkey;
   Explanation
   --------------------------------------------------------------------------
     1) First, we lock LOUISB.order_join_line for read on a
     reserved Row Hash to prevent a global deadlock.
     2) Next, we do a SUM step to aggregate from join index table
        LOUISB.order_join_line with a range constraint of (
        "LOUISB.order_join_line.Field_1026 > 971101") with a residual
        condition of ("(LOUISB.order_join_line.Field_1026 > 971101) AND (NOT
        (LOUISB.order_join_line.o_orderdate IS NULL ))"), and the grouping
        identifier in field 1. Aggregate Intermediate Results are
        computed globally, then placed in Spool 3. The input table will
        not be cached in memory, but it is eligible for synchronized
        scanning.
     3) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
        an all-rows scan into Spool 1, which is built locally on the AMPs.
        The size of Spool 1 is estimated to be 10 rows. The estimated time
        for this step is 0.32 seconds.
     4) Finally, we send out an END TRANSACTION step to all AMPs involved
        in processing the request.

EXPLAIN for Query With Base Table-Join Index Table Join

The following EXPLAIN shows how the join index might be used in a query when join indexed rows are used to join with another base table.

     EXPLAIN SELECT o_orderdate, c_name, c_phone, l_partkey, l_quantity,
                    l_extendedprice
             FROM lineitem, orders, customer
             WHERE l_orderkey = o_orderkey
             AND   o_custkey = c_custkey;
   Explanation
   --------------------------------------------------------------------------
     1) First, we lock LOUISB.order_join_line for read on a
     reserved Row Hash to prevent a global deadlock.
     2) Next, we lock LOUISB.customer for read.
     3) We do an all-AMPs RETRIEVE step from join index table
        LOUISB.order_join_line by way of an all-rows scan with a condition
        of ("NOT (LOUISB.order_join_line.o_orderdate IS NULL)") into Spool 2,
        which is redistributed by hash code to all AMPs. Then we do a
        SORT to order Spool 2 by row hash. The size of Spool 2 is
        estimated to be 1,000,000 rows. The estimated time for this step
        is 1 minute and 53 seconds.
     4) We do an all-AMPs JOIN step from LOUISB.customer by way of a
        Row Hash match scan with no residual conditions, which is joined to
        Spool 2 (Last Use). LOUISB.customer and Spool 2 are joined using
        a merge join, with a join condition of ("Spool_2.o_custkey =
        LOUISB.customer.c_custkey"). The result goes into Spool 1, which
        is built locally on the AMPs. The size of Spool 1 is estimated to
        be 1,000,000 rows. The estimated time for this step is 32.14
        seconds.
     5) Finally, we send out an END TRANSACTION step to all AMPs involved
        in processing the request.

EXPLAIN for Query Against Single Table

The following EXPLAIN report shows how the join index might be used in a query of a single table.

    EXPLAIN SELECT l_orderkey, l_partkey, l_quantity, l_extendedprice
            FROM lineitem
            WHERE l_partkey = 1001;
   Explanation
   --------------------------------------------------------------------------
     1) First, we lock LOUISB.order_join_line for read on a
     reserved Row Hash to prevent a global deadlock.
     2) Next, we do an all-AMPs RETRIEVE step from join index table
        LOUISB.order_join_line by way of an all-rows scan with a condition
        of ("LOUISB.order_join_line.l_partkey = 1001") into Spool 1, which
        is built locally on the AMPs. The input table will not be cached
        in memory, but it is eligible for synchronized scanning. The
        result spool file will not be cached in memory. The size of Spool
        1 is estimated to be 100 rows. The estimated time for this step
        is 59.60 seconds.
     3) Finally, we send out an END TRANSACTION step to all AMPs involved
        in processing the request.