The following sections demonstrate the performance optimization achieved on table selects, deletes, inserts, and updates resulting from the use of join indexes.
For additional information about join indexes, see Teradata Vantage™ - Database Design, B035-1094 and Teradata Vantage™ - SQL Data Definition Language Detailed Topics, B035-1184.
A Simple Join Query
The following is an example of a simple join query:
EXPLAIN SELECT o_orderdate, o_custkey, l_partkey, l_quantity, 1_extendedprice FROM lineitem, ordertbl WHERE l_orderkey=o_orderkey;
Part of the EXPLAIN output follows.
4) We do an all-AMPs JOIN step in TD_MAP1 from DB1.ordertbl by way of a RowHash match scan with no residual conditions, which is joined to DB1.lineitem by way of a RowHash match scan with no residual conditions. DB1.ordertbl and DB1.lineitem are joined using a sliding-window merge join, with a join condition of ( "DB1.lineitem.l_orderkey = DB1.ordertbl.o_orderkey"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 11 rows (869 bytes). The estimated time for this step is 0.21 seconds.
A Search on a Join Index
The following is an example of a search condition on the join index:
EXPLAIN SELECT o_orderdate, o_custkey, l_partkey, l_quantity, l_extendedprice FROM lineitem, ordertbl WHERE l_orderkey=o_orderkey AND o_orderdate>'1997-11-01';
Part of the EXPLAIN output follows.
4) We do an all-AMPs JOIN step in TD_MAP1 from 500 partitions of DB1.ordertbl by way of a RowHash match scan with a condition of ( "DB1.ordertbl.o_orderdate > DATE '1997-11-01'"), which is joined to DB1.lineitem by way of a RowHash match scan with no residual conditions. DB1.ordertbl and DB1.lineitem are joined using a sliding-window merge join, with a join condition of ( "DB1.lineitem.l_orderkey = DB1.ordertbl.o_orderkey"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 2 rows (158 bytes). The estimated time for this step is 0.21 seconds.
Aggregation on a Join Index
The following is an example of an aggregation on a join index.
EXPLAIN SELECT l_partkey, AVG(l_quantity), AVG(l_extendedprice) FROM lineitem, ordertbl WHERE l_orderkey=o_orderkey AND o_orderdate >'1997-11-01' GROUP BY l_partkey;
Part of the EXPLAIN output follows.
4) We do an all-AMPs JOIN step in TD_MAP1 from DB1.ordertbl by way of a RowHash match scan with no residual conditions, which is joined to DB1.lineitem by way of a RowHash match scan with no residual conditions. DB1.ordertbl and DB1.lineitem are joined using a sliding-window merge join, with a join condition of ( "DB1.lineitem.l_orderkey = DB1.ordertbl.o_orderkey"). The result goes into Spool 2 (all_amps), which is built locally on the AMPs. The size of Spool 2 is estimated with low confidence to be 11 rows (341 bytes). The estimated time for this step is 0.21 seconds. 5) We do an all-AMPs SUM step in TD_Map1 to aggregate from Spool 2 (Last Use) by way of an all-rows scan, grouping by field1 ( DB1.lineitem.l_partkey). Aggregate Intermediate Results are computed globally, then placed in Spool 4 in TD_Map1. The size of Spool 4 is estimated with no confidence to be 9 rows (369 bytes). The estimated time for this step is 0.23 seconds. 6) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 4 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 9 rows (522 bytes). The estimated time for this step is 0.16 seconds.
Join Index Used to Join with Another Base Table
The following is an example of a join index used to join with another base table:
EXPLAIN SELECT o_orderdate, c_name, c_phone, l_partkey,l_quantity, l_extendedprice FROM lineitem, ordertbl, customer WHERE l_orderkey=o_orderkey AND o_custkey=c_custkey;
Part of the EXPLAIN output follows.
5) We do an all-AMPs JOIN step in TD_MAP1 from DB1.ordertbl by way of a RowHash match scan with no residual conditions, which is joined to DB1.lineitem by way of a RowHash match scan with no residual conditions. DB1.ordertbl and DB1.lineitem are joined using a sliding-window merge join, with a join condition of ( "DB1.lineitem.l_orderkey = DB1.ordertbl.o_orderkey"). The result goes into Spool 2 (all_amps), which is redistributed by the hash code of (DB1.ordertbl.o_custkey) to all AMPs in TD_Map1. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated with low confidence to be 11 rows (407 bytes). The estimated time for this step is 0.11 seconds. 6) We do an all-AMPs JOIN step in TD_MAP1 from DB1.customer by way of a RowHash match scan with a condition of ( "(DB1.customer.c_custkey <= 49999) AND (DB1.customer.c_custkey >= 0)"), which is joined to Spool 2 (Last Use) by way of a RowHash match scan. DB1.customer and Spool 2 are joined using a merge join, with a join condition of ("o_custkey = DB1.customer.c_custkey"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 11 rows (1,188 bytes). The estimated time for this step is 0.21 seconds.
Join Index Used to Resolve Single-Table Query
The following is an example of a join index used to resolve single table query:
EXPLAIN SELECT l_orderkey, l_partkey, l_quantity, l_extendedprice FROM lineitem WHERE l_partkey = 1001;
Part of the EXPLAIN output follows.
3) We do an all-AMPs RETRIEVE step in TD_MAP1 from DB1.ORDERJOINLINE by way of an all-rows scan with a condition of ( "DB1.ORDERJOINLINE.l_partkey = 1001") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 1 row (69 bytes). The estimated time for this step is 0.15 seconds.
Creating and Using a Secondary Index on a Join Index
The following is an example creating and using secondary index on a join index:
CREATE INDEX shipidx(l_shipdate) ON OrderJoinLine;
Result:
***Index has been created. ***Total elapsed time was 5 seconds.
EXPLAIN SELECT o_orderdate, o_custkey, l_partkey, l_quantity, l_extendedprice FROM lineitem, ordertbl WHERE l_orderkey=o_orderkey AND l_shipdate='1997-09-18';
Part of the EXPLAIN output follows.
3) We do an all-AMPs RETRIEVE step in TD_MAP1 from DB1.ORDERJOINLINE by way of an all-rows scan with a condition of ("(NOT (DB1.ORDERJOINLINE.o_orderdate IS NULL )) AND (DB1.ORDERJOINLINE.l_shipdate = DATE '1997-09-18')") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 8 rows (632 bytes). The estimated time for this step is 0.15 seconds.
Join Index Left Outer Joined on 6 Tables
The following join index definition left outer joins table t1 with, in succession, tables t2, t3, t4, t5, and t6 on a series of equality conditions made on foreign key-primary key relationships among the underlying base tables:
CREATE JOIN INDEX jiout AS SELECT a1,b1,c1,c2,d1,d3,e1,e4,f1,g1,h1,i1,j1,j5,k1,k6, x1 FROM t1 LEFT OUTER JOIN t2 ON a1=a2 AND b1=b2 AND c1=c2 LEFT OUTER JOIN t3 ON d1=d3 LEFT OUTER JOIN t4 ON e1=e4 AND f1=f4 LEFT OUTER JOIN t5 ON g1=g5 AND h1=h5 AND i1=i5 AND j1=j5 LEFT OUTER JOIN t6 ON k1=k6;
Even though the following query references fewer tables than are defined in the join index, you expect the Optimizer to include join index ji_out in its access plan because all the extra outer joins are defined on unique columns and the extra tables are the inner tables in the outer joins.
A covering join index whose definition includes one or more tables that is not specified in its covered query is called a broad join index. A wide range of queries can make use of a broad join index, especially when there are foreign key-primary key relationships defined between the fact table and the dimension tables that enable the index to be used to cover queries over a subset of dimension tables.
The bold EXPLAIN report text indicates that the Optimizer does select ji_out for the query plan. This is an example of a broad join index being used to cover a query.
EXPLAIN SELECT a1, b1, c1, SUM(x1) FROM t1,t2 WHERE a1=a2 AND b1=b2 AND c1=c2 GROUP BY 1, 2, 3;
Part of the EXPLAIN output follows.
3) We do an all-AMPs SUM step to aggregate from HONG_JI.jiout
by way of an all-rows scan with no residual conditions, and
the grouping identifier in field 1. Aggregate Intermediate
Results are computed locally, then placed in Spool 3. The
size of Spool 3 is estimated with high confidence to be 2
rows. The estimated time for this step is 0.03 seconds.
4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by
way of an all-rows scan into Spool 1 (group_amps), which is
built locally on the AMPs. The size of Spool 1 is estimated
with high confidence to be 2 rows. The estimated time for
this step is 0.04 seconds.
More Tables Referenced by a Join Index Definition than Referenced by Query
The following join index definition specifies all inner joins on tables t1, t2, t3, t4, t5and t6 and specifies equality conditions on all the foreign key-primary key relationships among those tables:
CREATE JOIN INDEX ji_in AS SELECT a1,b1,c1,c2,d1,d3,e1,e4,f1,g1,g5,h1,i1,j1,k1,k6, x1 FROM t1,t2,t3,t4,t5,t6 WHERE a1=a2 AND b1=b2 AND c1=c2 AND d1=d3 AND e1=e4 AND f1=f4 AND g1=g5 AND h1=h5 AND i1=i5 AND j1=j5 AND k1=k6;
Even though 6 tables are referenced in the join index definition, and all its join conditions are inner joins, you expect the Optimizer to include join index ji_inin its query plan for the following query, which only references 2 of the 6 tables, because all the conditions in the join index definition are based on foreign key-primary key relationships among the underlying base tables. This is also an example of a broad join index being used to cover a query.
The bold EXPLAIN report text indicates that the Optimizer does select ji_infor the query plan.
EXPLAIN SELECT a1,b1,c1,SUM(x1) FROM t1,t2 WHERE a1=a2 AND b1=b2 AND c1=c2 GROUP BY 1, 2, 3;
Part of the EXPLAIN output follows.
3) We do an all-AMPs SUM step to aggregate from HONG_JI.ji_in
by way of an all-rows scan with no residual conditions,
and the grouping identifier in field 1. Aggregate Intermediate
Results are computed locally, then placed in Spool 3. The size
of Spool 3 is estimated with high confidence to be 2 rows.
The estimated time for this step is 0.03 seconds.
4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by
way of an all-rows scan into Spool 1 (group_amps), which
is built locally on the AMPs. The size of Spool 1 is estimated
with high confidence to be 2 rows. The estimated time for
this step is 0.04 seconds.
Using a Join Index Defined with a Multiway Join Result
The following is an example defining and using a join index defined with a multiway join result:
CREATE JOIN INDEX CustOrderJoinLine AS SELECT (1_orderkey,o_orderdate,c_nationkey,o_totalprice), (1_partkey,1_quantity,1_extendedprice,1_shipdate) FROM (lineitem LEFT OUTER JOIN ordertbl ON 1_orderkey=o_orderkey) INNER JOIN customer ON o_custkey=c_custkey PRIMARY INDEX (1_orderkey);
Result:
*** Index has been created. *** Total elapsed time was 20 seconds.
EXPLAIN SELECT (1_orderkey,o_orderdate,c_nationkey,o_totalprice), (1_partkey,1_quantity,1_extendedprice,1_shipdate) FROM lineitem,ordertbl,customer WHERE 1_orderkey=o_custkey AND o_custkey=c_custkey AND c_nationkey=10;
Part of the EXPLAIN output follows.
3) We do an all-AMPs RETRIEVE step from join index table df2.CustOrderJoinLine by way of an all-rows scan with a condition of (“df2.CustOrderJoinLine.c_nationkey=10”) into Spool 1, which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 1 is estimated to be 200 rows. The estimated time for this step is 3 minutes and 57 seconds.