Join Index Examples | Join Index Optimizations | Teradata Vantage - Join Indexes - Advanced SQL Engine - Teradata Database

SQL Request and Transaction Processing

Product
Advanced SQL Engine
Teradata Database
Release Number
17.05
17.00
Published
June 2020
Language
English (United States)
Last Update
2021-01-24
dita:mapPath
ykx1561500561173.ditamap
dita:ditavalPath
lze1555437562152.ditaval
dita:id
B035-1142
lifecycle
previous
Product Category
Teradata Vantage™

The following sections demonstrate the performance optimization achieved on table selects, deletes, inserts, and updates resulting from the use of join indexes.

For additional information about join indexes, see Teradata Vantage™ - Database Design, B035-1094 and Teradata Vantage™ - SQL Data Definition Language Detailed Topics , B035-1184 .

A Simple Join Query

The following is an example of a simple join query:

     EXPLAIN
     SELECT o_orderdate, o_custkey, l_partkey, l_quantity,
            1_extendedprice
     FROM lineitem, ordertbl
     WHERE l_orderkey=o_orderkey;

Part of the EXPLAIN output is shown below.

  4) We do an all-AMPs JOIN step in TD_MAP1 from DB1.ordertbl by way
     of a RowHash match scan with no residual conditions, which is
     joined to DB1.lineitem by way of a RowHash match scan with no
     residual conditions.  DB1.ordertbl and DB1.lineitem are joined
     using a sliding-window merge join, with a join condition of (
     "DB1.lineitem.l_orderkey = DB1.ordertbl.o_orderkey").  The
     result goes into Spool 1 (group_amps), which is built locally on
     the AMPs.  The size of Spool 1 is estimated with low confidence to
     be 11 rows (869 bytes).  The estimated time for this step is 0.21
     seconds.

A Search on a Join Index

The following is an example of a search condition on the join index:

     EXPLAIN
     SELECT o_orderdate, o_custkey, l_partkey, l_quantity,
            l_extendedprice
     FROM lineitem, ordertbl
     WHERE l_orderkey=o_orderkey
     AND   o_orderdate>'1997-11-01';

Part of the EXPLAIN output is shown below.

  4) We do an all-AMPs JOIN step in TD_MAP1 from 500 partitions of
     DB1.ordertbl by way of a RowHash match scan with a condition of (
     "DB1.ordertbl.o_orderdate > DATE '1997-11-01'"), which is joined
     to DB1.lineitem by way of a RowHash match scan with no residual
     conditions.  DB1.ordertbl and DB1.lineitem are joined using a
     sliding-window merge join, with a join condition of (
     "DB1.lineitem.l_orderkey = DB1.ordertbl.o_orderkey").  The
     result goes into Spool 1 (group_amps), which is built locally on
     the AMPs.  The size of Spool 1 is estimated with low confidence to
     be 2 rows (158 bytes).  The estimated time for this step is 0.21
     seconds.

Aggregation on a Join Index

The following is an example of an aggregation on a join index.

     EXPLAIN
     SELECT l_partkey, AVG(l_quantity), AVG(l_extendedprice)
     FROM lineitem, ordertbl
     WHERE l_orderkey=o_orderkey
     AND   o_orderdate >'1997-11-01'
     GROUP BY l_partkey;

Part of the EXPLAIN output is shown below.

  4) We do an all-AMPs JOIN step in TD_MAP1 from DB1.ordertbl by way
     of a RowHash match scan with no residual conditions, which is
     joined to DB1.lineitem by way of a RowHash match scan with no
     residual conditions.  DB1.ordertbl and DB1.lineitem are joined
     using a sliding-window merge join, with a join condition of (
     "DB1.lineitem.l_orderkey = DB1.ordertbl.o_orderkey").  The
     result goes into Spool 2 (all_amps), which is built locally on the
     AMPs.  The size of Spool 2 is estimated with low confidence to be
     11 rows (341 bytes).  The estimated time for this step is 0.21
     seconds.
  5) We do an all-AMPs SUM step in TD_Map1 to aggregate from Spool 2
     (Last Use) by way of an all-rows scan, grouping by field1 (
     DB1.lineitem.l_partkey).  Aggregate Intermediate Results are
     computed globally, then placed in Spool 4 in TD_Map1.  The size of
     Spool 4 is estimated with no confidence to be 9 rows (369 bytes).
     The estimated time for this step is 0.23 seconds.
  6) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 4 (Last Use)
     by way of an all-rows scan into Spool 1 (group_amps), which is
     built locally on the AMPs.  The size of Spool 1 is estimated with
     no confidence to be 9 rows (522 bytes).  The estimated time for
     this step is 0.16 seconds.

Join Index Used to Join With Another Base Table

The following is an example of a join index used to join with another base table:

     EXPLAIN
     SELECT o_orderdate, c_name, c_phone, l_partkey,l_quantity,
            l_extendedprice
     FROM lineitem, ordertbl, customer
     WHERE l_orderkey=o_orderkey
     AND   o_custkey=c_custkey;

Part of the EXPLAIN output is shown below.

  5) We do an all-AMPs JOIN step in TD_MAP1 from DB1.ordertbl by way
     of a RowHash match scan with no residual conditions, which is
     joined to DB1.lineitem by way of a RowHash match scan with no
     residual conditions.  DB1.ordertbl and DB1.lineitem are joined
     using a sliding-window merge join, with a join condition of (
     "DB1.lineitem.l_orderkey = DB1.ordertbl.o_orderkey").  The
     result goes into Spool 2 (all_amps), which is redistributed by the
     hash code of (DB1.ordertbl.o_custkey) to all AMPs in TD_Map1.
     Then we do a SORT to order Spool 2 by row hash.  The size of Spool
     2 is estimated with low confidence to be 11 rows (407 bytes).  The
     estimated time for this step is 0.11 seconds.
  6) We do an all-AMPs JOIN step in TD_MAP1 from DB1.customer by way
     of a RowHash match scan with a condition of (
     "(DB1.customer.c_custkey <= 49999) AND (DB1.customer.c_custkey
     >= 0)"), which is joined to Spool 2 (Last Use) by way of a RowHash
     match scan.  DB1.customer and Spool 2 are joined using a merge
     join, with a join condition of ("o_custkey =
     DB1.customer.c_custkey").  The result goes into Spool 1
     (group_amps), which is built locally on the AMPs.  The size of
     Spool 1 is estimated with low confidence to be 11 rows (1,188
     bytes).  The estimated time for this step is 0.21 seconds.

Join Index Used to Resolve Single-Table Query

The following is an example of a join index used to resolve single table query:

     EXPLAIN
     SELECT l_orderkey, l_partkey, l_quantity, l_extendedprice
     FROM lineitem
     WHERE l_partkey = 1001;

Part of the EXPLAIN output is shown below.

  3) We do an all-AMPs RETRIEVE step in TD_MAP1 from DB1.ORDERJOINLINE
     by way of an all-rows scan with a condition of (
     "DB1.ORDERJOINLINE.l_partkey = 1001") into Spool 1 (group_amps),
     which is built locally on the AMPs.  The size of Spool 1 is
     estimated with no confidence to be 1 row (69 bytes).  The
     estimated time for this step is 0.15 seconds.

Creating and Using a Secondary Index on a Join Index

The following is an example creating and using secondary index on a join index:

     CREATE INDEX shipidx(l_shipdate) ON OrderJoinLine;
   
     ***Index has been created.
     ***Total elapsed time was 5 seconds.

     EXPLAIN
     SELECT o_orderdate, o_custkey, l_partkey, l_quantity,  
            l_extendedprice
     FROM lineitem, ordertbl
     WHERE l_orderkey=o_orderkey
     AND   l_shipdate='1997-09-18';

Part of the EXPLAIN output is shown below.

  3) We do an all-AMPs RETRIEVE step in TD_MAP1 from DB1.ORDERJOINLINE
     by way of an all-rows scan with a condition of ("(NOT
     (DB1.ORDERJOINLINE.o_orderdate IS NULL )) AND
     (DB1.ORDERJOINLINE.l_shipdate = DATE '1997-09-18')") into Spool 1
     (group_amps), which is built locally on the AMPs.  The size of
     Spool 1 is estimated with no confidence to be 8 rows (632 bytes).
     The estimated time for this step is 0.15 seconds.

Join Index Left Outer Joined on 6 Tables

The following join index definition left outer joins table t1 with, in succession, tables t2, t3, t4, t5, and t6 on a series of equality conditions made on foreign key-primary key relationships among the underlying base tables:

     CREATE JOIN INDEX jiout AS
     SELECT a1,b1,c1,c2,d1,d3,e1,e4,f1,g1,h1,i1,j1,j5,k1,k6, x1
     FROM t1
     LEFT OUTER JOIN t2 ON  a1=a2
                        AND b1=b2
                        AND c1=c2
     LEFT OUTER JOIN t3 ON  d1=d3
     LEFT OUTER JOIN t4 ON  e1=e4
                        AND f1=f4
     LEFT OUTER JOIN t5 ON  g1=g5
                        AND h1=h5
                        AND i1=i5
                        AND j1=j5
     LEFT OUTER JOIN t6 ON  k1=k6;

Even though the following query references fewer tables than are defined in the join index, you would expect the Optimizer to include join index ji_out in its access plan because all the extra outer joins are defined on unique columns and the extra tables are the inner tables in the outer joins. A covering join index whose definition includes one or more tables that is not specified in the query it covers is referred to as a broad join index. A wide range of queries can make use of a broad join index, especially when there are foreign key-primary key relationships defined between the fact table and the dimension tables that enable the index to be used to cover queries over a subset of dimension tables.

The bold EXPLAIN report text indicates that the Optimizer does select ji_out for the query plan. This is an example of a broad join index being used to cover a query.

     EXPLAIN SELECT a1, b1, c1, SUM(x1)
     FROM t1,t2
     WHERE a1=a2
     AND   b1=b2
     AND   c1=c2
     GROUP BY 1, 2, 3;

Part of the EXPLAIN output is shown below.

      3) We do an all-AMPs SUM step to aggregate from  HONG_JI.jiout  
         by way of an all-rows scan with no residual conditions, and 
         the grouping identifier in field 1. Aggregate Intermediate 
         Results are computed locally, then placed in Spool 3. The 
         size of Spool 3 is estimated with high confidence to be 2
         rows. The estimated time for this step is 0.03 seconds.
      4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by 
         way of an all-rows scan into Spool 1 (group_amps), which is 
         built locally on the AMPs. The size of Spool 1 is estimated 
         with high confidence to be 2 rows. The estimated time for 
         this step is 0.04 seconds.

Many More Tables Referenced by a Join Index Definition Than Referenced by Query

The following join index definition specifies all inner joins on tables t1, t2, t3, t4, t5and t6 and specifies equality conditions on all the foreign key-primary key relationships among those tables:

     CREATE JOIN INDEX ji_in AS
     SELECT a1,b1,c1,c2,d1,d3,e1,e4,f1,g1,g5,h1,i1,j1,k1,k6, x1
     FROM t1,t2,t3,t4,t5,t6
     WHERE a1=a2
     AND   b1=b2
     AND   c1=c2
     AND   d1=d3
     AND   e1=e4
     AND   f1=f4
     AND   g1=g5
     AND   h1=h5
     AND   i1=i5
     AND   j1=j5
     AND   k1=k6;

Even though 6 tables are referenced in the join index definition, and all its join conditions are inner joins, you would expect the Optimizer to include join index ji_inin its query plan for the following query, which only references 2 of the 6 tables, because all the conditions in the join index definition are based on foreign key-primary key relationships among the underlying base tables. This is also an example of a broad join index being used to cover a query.

The bold EXPLAIN report text indicates that the Optimizer does select ji_infor the query plan.

     EXPLAIN SELECT a1,b1,c1,SUM(x1)
     FROM t1,t2
     WHERE a1=a2
     AND   b1=b2
     AND   c1=c2
     GROUP BY 1, 2, 3;

Part of the EXPLAIN output is shown below.

      3) We do an all-AMPs SUM step to aggregate from  HONG_JI.ji_in  
         by way of an all-rows scan with no residual conditions, 
         and the grouping identifier in field 1. Aggregate Intermediate
         Results are computed locally, then placed in Spool 3. The size
         of Spool 3 is estimated with high confidence to be 2 rows.
         The estimated time for this step is 0.03 seconds.
      4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by 
         way of an all-rows scan into Spool 1 (group_amps), which
         is built locally on the AMPs. The size of Spool 1 is estimated
         with high confidence to be 2 rows. The estimated time for
         this step is 0.04 seconds.

Using a Join Index Defined With a Multiway Join Result

The following is an example defining and using a join index defined with a multiway join result:

     CREATE JOIN INDEX CustOrderJoinLine
     AS SELECT (1_orderkey,o_orderdate,c_nationkey,o_totalprice),
               (1_partkey,1_quantity,1_extendedprice,1_shipdate)
     FROM (lineitem
     LEFT OUTER JOIN ordertbl ON 1_orderkey=o_orderkey)
     INNER JOIN customer ON o_custkey=c_custkey
     PRIMARY INDEX (1_orderkey);
   
     *** Index has been created.
     *** Total elapsed time was 20 seconds.


     EXPLAIN
     SELECT (1_orderkey,o_orderdate,c_nationkey,o_totalprice),
            (1_partkey,1_quantity,1_extendedprice,1_shipdate)
     FROM lineitem,ordertbl,customer
     WHERE 1_orderkey=o_custkey
     AND o_custkey=c_custkey
     AND c_nationkey=10;

Part of the EXPLAIN output is shown below.

   3) We do an all-AMPs RETRIEVE step from join index table
      df2.CustOrderJoinLine by way of an all-rows scan with a
      condition of (“df2.CustOrderJoinLine.c_nationkey=10”) into
      Spool 1, which is built locally on the AMPs. The input table will
      not be cached in memory, but it is eligible for synchronized
      scanning. The result spool file will not be cached in memory.
      The size of Spool 1 is estimated to be 200 rows. The estimated 
      time for this step is 3 minutes and 57 seconds.