16.10 - Join Indexes - Teradata Database

Teradata Database SQL Request and Transaction Processing

prodname
Teradata Database
vrm_release
16.10
created_date
June 2017
category
Programming Reference
User Guide
featnum
B035-1142-161K

The following sections demonstrate the performance optimization achieved on table selects, deletes, inserts, and updates resulting from the use of join indexes.

For additional information about join indexes, see Database Design and SQL Data Definition Language Detailed Topics.

A Simple Join Query

The following is an example of a simple join query:

     EXPLAIN
     SELECT o_orderdate, o_custkey, 1_partkey, 1_quantity,
            1_extendedprice
     FROM lineitem, ordertbl
     WHERE 1_orderkey=o_orderkey;
   Explanation
   ----------------------------------------------------------------
   1) First, we lock a distinct df2.”pseudo table” for read on a
      RowHash to prevent global deadlock for df2.OrderJoinLine.
   2) Next, we lock df2.OrderJoinLine for read.
   3) We do an all-AMPs RETRIEVE step from join index table
      df2.OrderJoinLine by way of an all-rows scan with a condition
      of (“NOT(df2.OrderJoinLine.o_orderdate IS NULL)”) into Spool 1,
      which is built locally on the AMPs. The input table will not be
      cached in memory, but it is eligible for synchronized scanning.
      The result spool file will not be cached in memory. The size of
      Spool 1 is estimated to be 1,000,000 rows. The estimated time for
      this step is 4 minutes and 27 seconds.
   4) Finally, we send out an END TRANSACTION step to all AMPs involved
      in processing the request.

A Search on a Join Index

The following is an example of a search condition on the join index:

     EXPLAIN
     SELECT o_orderdate, o_custkey, 1_partkey, 1_quantity,
            1_extendedprice
     FROM lineitem, ordertbl
     WHERE 1_orderkey=o_orderkey
     AND   o_orderdate>‘1997-11-01’;
   Explanation
   ----------------------------------------------------
   1) First, we lock a distinct df2.”pseudo table” for read on a
      RowHash to prevent global deadlock for df2.OrderJoinLine.
   2) Next, we lock df2.OrderJoinLine for read.
   3) We do an all-AMPs RETRIEVE step from join index table
      df2.OrderJoinLine with a range constraint of
      (“df2.OrderJoinLine.Field_1026>971101” with a residual
      condition of (“(df2.OrderJoinLine.Field_1026>971101) AND (NOT
      (df2.OrderJoinLine.o_orderdate IS NULL))”), and the grouping
      identifier in Field 1. Aggregate Intermediate Results are
      computed globally, then placed in Spool 3. The input table will
      not be cached in memory, but it is eligible for synchronized
      scanning.
   4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
      an all-rows scan into Spool 1, which is built locally on the
      AMPs. The size of Spool 1 is estimated to be 10 rows. The 
      estimated time for this step is 0.32 seconds.
   5) Finally, we send out an END TRANSACTION to all AMPs involved
      in processing the request.

Aggregation on a Join Index

The following is an example of an aggregation on a join index.

     EXPLAIN
     SELECT 1_partkey, AVG(1_quantity), AVG(1_extendedprice)
     FROM lineitem, ordertbl
     WHERE 1_ordkey=o_orderkey
     AND   o_orderdate>‘1997-11-01’
     GROUP BY 1_partkey;
   Explanation
   -----------------------------------------------------------------
   1) First, we lock a distinct df2.”pseudo table” for read on a
      RowHash to prevent global deadlock for df2.OrderJoinLine.
   2) Next, we lock df2.OrderJoinLine for read.
   3) We do a SUM step to aggregate from join index table
      df2.OrderJoinLine with a range constant of
      (“df2.OrderJoinLine.Field_1026>971101”) with a residual
      condition of(“(df2.OrderJoinLine.Field_1026>971101)AND(NOT
      (df2.OrderJoinLine.o_orderdate IS NULL))”), and the grouping
      identifier in field 1. Aggregate Intermediate Results are
      computed globally, then placed in Spool 3. The input table will
      not be cached in memory, but it is eligible for synchronized
      scanning.
   4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
      an all-rows scan into Spool 1, which is built locally on the 
      AMPs. The size of Spool 1 is estimated to be 10 rows. The 
      estimated time for this step is 0.32 seconds.
   5) Finally, we sent out an END TRANSACTION step to all AMPs involved
      in processing the request.

Join Index Used to Join With Another Base Table

The following is an example of a join index used to join with another base table:

     EXPLAIN
     SELECT o_orderdate, c_name, c_phone, 1_partkey,1_quantity,
            1_extendedprice
     FROM lineitem, ordertbl, customer
     WHERE 1_orderkey=o_orderkey
     AND   o_custkey=c_custkey;
   Explanation
   -----------------------------------------------------------------
   1) First, we lock a distinct df2.”pseudo table” for read on a
      RowHash to prevent global deadlock for df2.OrderJoinLine.
   2) Next, we lock a distinct df2.”pseudo table” for read on a
      RowHash to prevent global deadlock for df2.customer.
   3) We lock df2.OrderJoinLine for read, and we lock df2.customer 
      for read.
   4) We do an all-AMPs RETRIEVE step from join index table
      df2.OrderJoinLine by way of an all-rows scan with a condition
      of (“NOT(df2.OrderJoinLine.o_orderdate IS NULL)”) into Spool 2,
      which is redistributed by has code to all AMPs. Then we do a
      SORT to order Spool 2 by row hash. The size of Spool 2 is
      estimated to be 1,000,000 rows. The estimated time for this step
      is 1 minute and 53 seconds.
   5) We do an all-AMPs JOIN step from df2.customer by way of a
      RowHash match scan with no residual conditions, which is joined 
      to Spool 2 (Last Use). df2.customer and Spool 2 are joined using
      a merge join, with a join condition of (“Spool_2.o_key=
      df2.customer.c_custkey”). The result goes into Spool 1, which
      is built locally on the AMPs. The size of Spool 1 is estimated to
      be 1,000,000 rows. The estimated time for this step is 32.14
      seconds.
   6) Finally, we send out an END TRANSACTION step to all AMPs involved
      in processing the request.

Join Index Used to Resolve Single-Table Query

The following is an example of a join index used to resolve single table query:

     EXPLAIN
     SELECT 1_orderkey, 1_partkey, 1_quantity, 1_extendedprice
     FROM lineitem
     WHERE 1_partkey = 1001;
   Explanation
   -----------------------------------------------------------------
   1) First we lock a distinct df2.”pseudo table” for read on a
      RowHash to prevent global deadlock for df2.OrderJoinLine.
   2) Next, we lock df2.OrderJoinLine for read.
   3) We do an all-AMPs RETRIEVE step from join index table
      df2.OrderJoinLine by way of an all-rows scan with a condition
      of (“df2.OrderJoinLine.1_partkey=1001”) into Spool 1, which
      is built locally on the AMPs. The input table will not be cached
      in memory, but it is eligible for synchronized scanning. The
      result spool file will not be cached in memory. The size of Spool
      1 is estimated to be 100 rows. The estimated time for this step
      is 59.60 seconds.
   4) Finally, we send out an END TRANSACTION step to all AMPs involved
      in processing the request.

Creating and Using a Secondary Index on a Join Index

The following is an example creating and using secondary index on a join index:

     CREATE INDEX shipidx(1_shipdate) ON OrderJoinLine;
   
     ***Index has been created.
     ***Total elapsed time was 5 seconds.
     EXPLAIN
     SELECT o_orderdate, o_custkey, 1_partkey, 1_quantity,  
            1_extendedprice
     FROM lineitem, ordertbl
     WHERE 1_ordkey=o_orderkey
     AND   l_shipdate=’1997-09-18’;
Explanation
-----------------------------------------------------------------
1) First, we lock a distinct df2.”pseudo table” for read on a
   RowHash to prevent global deadlock for df2.OrderJoinLine.
2) Next, we lock df2.OrderJoinLine for read.
3) We do an all-AMPs RETRIEVE step from join index table
   df2.OrderJoinLine by way of index # 12
   “df2.OrderJoinLine.1_shipdate=970918” with a residual
   condition of (“(NOT(df2.OrderJoinLine.1_shipdate=970918)”) into
   Spool 1, which is built locally on the AMPs. The input table will
   not be cached in memory, but it is eligible for synchronized 
   scanning. The result spool file will not be cached in memory. 
   The size of Spool 1 is estimated to be 500 rows. The estimated 
   time for this step is 0.37seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved in
   processing the request.

Join Index Left Outer Joined on 6 Tables

The following join index definition left outer joins table t1 with, in succession, tables t2, t3, t4, t5, and t6 on a series of equality conditions made on foreign key-primary key relationships among the underlying base tables:

     CREATE JOIN INDEX jiout AS
     SELECT a1,b1,c1,c2,d1,d3,e1,e4,f1,g1,h1,i1,j1,j5,k1,k6, x1
     FROM t1
     LEFT OUTER JOIN t2 ON  a1=a2
                        AND b1=b2
                        AND c1=c2
     LEFT OUTER JOIN t3 ON  d1=d3
     LEFT OUTER JOIN t4 ON  e1=e4
                        AND f1=f4
     LEFT OUTER JOIN t5 ON  g1=g5
                        AND h1=h5
                        AND i1=i5
                        AND j1=j5
     LEFT OUTER JOIN t6 ON  k1=k6;

Even though the following query references fewer tables than are defined in the join index, you would expect the Optimizer to include join index ji_out in its access plan because all the extra outer joins are defined on unique columns and the extra tables are the inner tables in the outer joins. A covering join index whose definition includes one or more tables that is not specified in the query it covers is referred to as a broad join index. A wide range of queries can make use of a broad join index, especially when there are foreign key-primary key relationships defined between the fact table and the dimension tables that enable the index to be used to cover queries over a subset of dimension tables.

The bold EXPLAIN report text indicates that the Optimizer does select ji_outfor the query plan. This is an example of a broad join index being used to cover a query.

     EXPLAIN SELECT a1, b1, c1, SUM(x1)
     FROM t1,t2
     WHERE a1=a2
     AND   b1=b2
     AND   c1=c2
     GROUP BY 1, 2, 3;
      *** Help information returned. 18 rows.
      *** Total elapsed time was 1 second.
    Explanation
    -------------------------------------------------------------------
      1) First, we lock a distinct HONG_JI."pseudo table" for read on a
         RowHash to prevent global deadlock for  HONG_JI.jiout.
      2) Next, we lock  HONG_JI.jiout  for read.
      3) We do an all-AMPs SUM step to aggregate from  HONG_JI.jiout  
         by way of an all-rows scan with no residual conditions, and 
         the grouping identifier in field 1. Aggregate Intermediate 
         Results are computed locally, then placed in Spool 3. The 
         size of Spool 3 is estimated with high confidence to be 2
         rows. The estimated time for this step is 0.03 seconds.
      4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by 
         way of an all-rows scan into Spool 1 (group_amps), which is 
         built locally on the AMPs. The size of Spool 1 is estimated 
         with high confidence to be 2 rows. The estimated time for 
         this step is 0.04 seconds.
      5) Finally, we send out an END TRANSACTION step to all AMPs
         involved in processing the request.
      -> The contents of Spool 1 are sent back to the user as the 
         result of statement 1.

Many More Tables Referenced by a Join Index Definition Than Referenced by Query

The following join index definition specifies all inner joins on tables t1, t2, t3, t4,t5and t6 and specifies equality conditions on all the foreign key-primary key relationships among those tables:

     CREATE JOIN INDEX ji_in AS
     SELECT a1,b1,c1,c2,d1,d3,e1,e4,f1,g1,g5,h1,i1,j1,k1,k6, x1
     FROM t1,t2,t3,t4,t5,t6
     WHERE a1=a2
     AND   b1=b2
     AND   c1=c2
     AND   d1=d3
     AND   e1=e4
     AND   f1=f4
     AND   g1=g5
     AND   h1=h5
     AND   i1=i5
     AND   j1=j5
     AND   k1=k6;

Even though 6 tables are referenced in the join index definition, and all its join conditions are inner joins, you would expect the Optimizer to include join index ji_inin its query plan for the following query, which only references 2 of the 6 tables, because all the conditions in the join index definition are based on foreign key-primary key relationships among the underlying base tables. This is also an example of a broad join index being used to cover a query.

The bold EXPLAIN report text indicates that the Optimizer does select ji_infor the query plan.

     EXPLAIN SELECT a1,b1,c1,SUM(x1)
     FROM t1,t2
     WHERE a1=a2
     AND   b1=b2
     AND   c1=c2
     GROUP BY 1, 2, 3;
      *** Help information returned. 18 rows.
      *** Total elapsed time was 1 second.
    Explanation
    -------------------------------------------------------------------
      1) First, we lock a distinct HONG_JI."pseudo table" for read on a
         RowHash to prevent global deadlock for  HONG_JI.ji_in.
      2) Next, we lock  HONG_JI.ji_in  for read.
      3) We do an all-AMPs SUM step to aggregate from  HONG_JI.ji_in  
         by way of an all-rows scan with no residual conditions, 
         and the grouping identifier in field 1. Aggregate Intermediate
         Results are computed locally, then placed in Spool 3. The size
         of Spool 3 is estimated with high confidence to be 2 rows.
         The estimated time for this step is 0.03 seconds.
      4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by 
         way of an all-rows scan into Spool 1 (group_amps), which
         is built locally on the AMPs. The size of Spool 1 is estimated
         with high confidence to be 2 rows. The estimated time for
         this step is 0.04 seconds.
      5) Finally, we send out an END TRANSACTION step to all AMPs
         involved in processing the request.
      -> The contents of Spool 1 are sent back to the user as the 
         result of statement 1.

Using a Join Index Defined With a Multiway Join Result

The following is an example defining and using a join index defined with a multiway join result:

     CREATE JOIN INDEX CustOrderJoinLine
     AS SELECT (1_orderkey,o_orderdate,c_nationkey,o_totalprice),
               (1_partkey,1_quantity,1_extendedprice,1_shipdate)
     FROM (lineitem
     LEFT OUTER JOIN ordertbl ON 1_orderkey=o_orderkey)
     INNER JOIN customer ON o_custkey=c_custkey
     PRIMARY INDEX (1_orderkey);
   
     *** Index has been created.
     *** Total elapsed time was 20 seconds.
     EXPLAIN
     SELECT (1_orderkey,o_orderdate,c_nationkey,o_totalprice),
            (1_partkey,1_quantity,1_extendedprice,1_shipdate)
     FROM lineitem,ordertbl,customer
     WHERE 1_orderkey=o_custkey
     AND o_custkey=c_custkey
     AND c_nationkey=10;
   
     *** Help information returned. 16 rows.
     *** Total elapsed time was 1 second.
   Explanation
   ----------------------------------------------------------------
   1) First, we lock a distinct df2.”pseudo table” for read on a
      RowHash to prevent global deadlock for df2.CustOrderJoinLine.
   2) Next, we lock df2.CustOrderJoinLine for read.
   3) We do an all-AMPs RETRIEVE step from join index table
      df2.CustOrderJoinLine by way of an all-rows scan with a
      condition of (“df2.CustOrderJoinLine.c_nationkey=10”) into
      Spool 1, which is built locally on the AMPs. The input table will
      not be cached in memory, but it is eligible for synchronized
      scanning. The result spool file will not be cached in memory.
      The size of Spool 1 is estimated to be 200 rows. The estimated 
      time for this step is 3 minutes and 57 seconds.
   4) Finally, we send out an END TRANSACTION step to all AMPs involved
      in processing the request.