This example examines the EXPLAIN reports for a different outer join-based join index in some detail.
Table Definitions
CREATE TABLE customer ( c_custkey INTEGER NOT NULL, c_name CHARACTER(26), c_address VARCHAR(41), c_nationkey INTEGER, c_phone CHARACTER(16), c_acctbal DECIMAL(13,2), c_mktsegment CHARACTER(21), c_comment VARCHAR(127)) UNIQUE PRIMARY INDEX(c_custkey); CREATE TABLE orders ( o_orderkey INTEGER NOT NULL, o_custkey INTEGER, o_orderstatus CHARACTER(1), o_totalprice DECIMAL(13,2) NOT NULL, o_orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL, o_orderpriority CHARACTER(21), o_clerk CHARACTER(16), o_shippriority INTEGER, o_commment VARCHAR(79)) UNIQUE PRIMARY INDEX (o_orderkey); CREATE TABLE lineitem ( l_orderkey INTEGER NOT NULL, l_partkey INTEGER NOT NULL, l_suppkey INTEGER, l_linenumber INTEGER, l_quantity INTEGER NOT NULL, l_extendedprice DECIMAL(13,2) NOT NULL, l_discount DECIMAL(13,2), l_tax DECIMAL(13,2), l_returnflag CHARACTER(1), l_linestatus CHARACTER(1), l_shipdate DATE FORMAT 'yyyy-mm-dd', l_commitdate DATE FORMAT 'yyyy-mm-dd', l_receiptdate DATE FORMAT 'yyyy-mm-dd', l_shipinstruct VARCHAR(25), l_shipmode VARCHAR(10), l_comment VARCHAR(44)) PRIMARY INDEX (l_orderkey);
Join Index Definition
The following statement defines a join index on these tables. Subsequent examples demonstrate the effect of this join index on how the Optimizer processes various queries.
CREATE JOIN INDEX order_join_line AS SELECT (l_orderkey, o_orderdate, o_custkey, o_totalprice), (l_partkey, l_quantity, l_extendedprice, l_shipdate) FROM lineitem LEFT JOIN orders ON l_orderkey = o_orderkey ORDER BY o_orderdate PRIMARY INDEX (l_orderkey); *** Index has been created. *** Total elapsed time was 15 seconds.
EXPLAIN for Query With Simple Predicate
The following EXPLAIN report shows how the newly created join index, order_join_line, might be used by the Optimizer.
EXPLAIN SELECT o_orderdate, o_custkey, l_partkey, l_quantity, l_extendedprice FROM lineitem, orders WHERE l_orderkey = o_orderkey; Explanation --------------------------------------------------------------------------- 1) First, we lock LOUISB.order_join_line for read on a reserved Row Hash to prevent a global deadlock. 2) Next, we do an all-AMPs RETRIEVE step from join index table LOUISB.order_join_line by way of an all-rows scan with a condition of ("NOT (LOUISB.order_join_line.o_orderdate IS NULL)") into Spool 1, which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 1 is estimated to be 1,000,000 rows. The estimated time for this step is 4 minutes and 27 seconds. 3) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
EXPLAIN for Query With More Complicated Predicate
The following EXPLAIN report shows how the join index might be used in a query when an additional search condition is added on the join indexed rows.
EXPLAIN SELECT o_orderdate, o_custkey, l_partkey, l_quantity, l_extendedprice FROM lineitem, orders WHERE l_orderkey = o_orderkey AND o_orderdate > '2003-11-01'; Explanation -------------------------------------------------------------------------- 1) First, we lock LOUISB.order_join_line for read on a reserved Row Hash to prevent a global deadlock. 2) Next, we do an all-AMPs RETRIEVE step from join index table LOUISB.order_join_line with a range constraint of ( "LOUISB.order_join_line.Field_1026 > 971101") with a residual condition of ("(NOT (LOUISB.order_join_line.o_orderdate IS NULL )) AND (LOUISB.order_join_line.Field_1026 > 971101)") into Spool 1, which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The size of Spool 1 is estimated to be 1000 rows. The estimated time for this step is 0.32 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
EXPLAIN for Query With Aggregation
The following EXPLAIN shows how the join index might be used in a query when aggregation is performed on the join indexed rows.
EXPLAIN SELECT l_partkey, AVG(l_quantity), AVG(l_extendedprice) FROM lineitem , orders WHERE l_orderkey = o_orderkey AND o_orderdate > '2003-11-01' GROUP BY l_partkey; Explanation -------------------------------------------------------------------------- 1) First, we lock LOUISB.order_join_line for read on a reserved Row Hash to prevent a global deadlock. 2) Next, we do a SUM step to aggregate from join index table LOUISB.order_join_line with a range constraint of ( "LOUISB.order_join_line.Field_1026 > 971101") with a residual condition of ("(LOUISB.order_join_line.Field_1026 > 971101) AND (NOT (LOUISB.order_join_line.o_orderdate IS NULL ))"), and the grouping identifier in field 1. Aggregate Intermediate Results are computed globally, then placed in Spool 3. The input table will not be cached in memory, but it is eligible for synchronized scanning. 3) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated to be 10 rows. The estimated time for this step is 0.32 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
EXPLAIN for Query With Base Table-Join Index Table Join
The following EXPLAIN shows how the join index might be used in a query when join indexed rows are used to join with another base table.
EXPLAIN SELECT o_orderdate, c_name, c_phone, l_partkey, l_quantity, l_extendedprice FROM lineitem, orders, customer WHERE l_orderkey = o_orderkey AND o_custkey = c_custkey; Explanation -------------------------------------------------------------------------- 1) First, we lock LOUISB.order_join_line for read on a reserved Row Hash to prevent a global deadlock. 2) Next, we lock LOUISB.customer for read. 3) We do an all-AMPs RETRIEVE step from join index table LOUISB.order_join_line by way of an all-rows scan with a condition of ("NOT (LOUISB.order_join_line.o_orderdate IS NULL)") into Spool 2, which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated to be 1,000,000 rows. The estimated time for this step is 1 minute and 53 seconds. 4) We do an all-AMPs JOIN step from LOUISB.customer by way of a Row Hash match scan with no residual conditions, which is joined to Spool 2 (Last Use). LOUISB.customer and Spool 2 are joined using a merge join, with a join condition of ("Spool_2.o_custkey = LOUISB.customer.c_custkey"). The result goes into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated to be 1,000,000 rows. The estimated time for this step is 32.14 seconds. 5) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
EXPLAIN for Query Against Single Table
The following EXPLAIN report shows how the join index might be used in a query of a single table.
EXPLAIN SELECT l_orderkey, l_partkey, l_quantity, l_extendedprice FROM lineitem WHERE l_partkey = 1001; Explanation -------------------------------------------------------------------------- 1) First, we lock LOUISB.order_join_line for read on a reserved Row Hash to prevent a global deadlock. 2) Next, we do an all-AMPs RETRIEVE step from join index table LOUISB.order_join_line by way of an all-rows scan with a condition of ("LOUISB.order_join_line.l_partkey = 1001") into Spool 1, which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 1 is estimated to be 100 rows. The estimated time for this step is 59.60 seconds. 3) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.