Product Joins with Dynamic Row Partition Elimination
After determining that the least costly method for making a join between a row-partitioned table and another relation on an IN or NOT IN condition, the Optimizer can choose to use an inclusion or exclusion product join with DPE, respectively. Whether the Optimizer uses an inclusion or exclusion product join with DPE depends on the condition on which the relations are joined.
Join Condition Term | Chosen Product Join |
---|---|
IN | Inclusion product join with DPE. |
NOT IN | Exclusion product join with DPE. |
- The number of comparisons between left relation and right relation rows is reduced, which in turn reduces the I/O on the left and right relations and the CPU costs to perform the join.
- The following is true for the case when one relation in a join is a table and the other is a spool:
When the number of values in the spool is small, resulting in fewer populated spool partitions than populated row partitions from the row-partitioned table, those table partitions that are not in spool need not be either read or joined, reducing both the I/O and CPU costs of the join.
Because the performance improvement gained from DPE joins is highly dependent on the number of rows in each row partition (and so the number of row comparisons that must be made), the Optimizer does not choose a DPE join if single-column PARTITION statistics have not been collected on the outer (row-partitioned) table in the join. PARTITION statistics improve the ability of the Optimizer to estimate the number of comparisons that must be made. The Optimizer does not choose a DPE join when there is a no confidence estimate on the cardinality of the subquery spool.
For example, suppose you create the following tables and submit the given SELECT request against those tables:
CREATE SET TABLE MWS.t1, FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT ( a INTEGER, b INTEGER) PRIMARY INDEX (a); CREATE SET TABLE MWS.t2, FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT ( a INTEGER, b INTEGER, c INTEGER, d INTEGER) PRIMARY INDEX (a) PARTITION BY (RANGE_N(b BETWEEN 1 /* Partitioning level one. */ AND 100 EACH 7, NO RANGE OR UNKNOWN), RANGE_N(c BETWEEN 1 /* Partitioning level two. */ AND 100 EACH 10, NO RANGE OR UNKNOWN), RANGE_N(d BETWEEN 1 /* Partitioning level three. */ AND 100 EACH 20, NO RANGE OR UNKNOWN)); SELECT * FROM t2 WHERE (b,c) IN (SELECT a,b FROM t1);
For this request to be eligible for an inclusion or exclusion product join with DPE, you must collect the following statistics:
COLLECT STATISTICS t1 COLUMN(a); COLLECT STATISTICS t2 COLUMN(PARTITION);
The following rules apply to collecting these statistics:
Join Type to Enable | On What You Must Collect Statistics |
---|---|
Product (all kinds) with DPE |
|
Inclusion or exclusion product with DPE | system-derived PARTITION column set of the row-partitioned table. |
Another name for the type of join being done with inclusion and exclusion product joins is semijoin. As a result, the inclusion and exclusion product joins with DPE can also be referred to by the name product semijoin with DPE.
When a product semijoin with DPE is applied by the Optimizer, the EXPLAIN text for the query indicates enhanced by dynamic row partition elimination for the corresponding AMP step. In the following partial EXPLAIN output, this text is highlighted in boldface type:
EXPLAIN SELECT COUNT(*) FROM t55 WHERE (b,c) NOT IN (SELECT 1, 1);
Result:
3) We do an INSERT into Spool 5. 4) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of an all-rows scan into Spool 4 (group_amps), which is redistributed by the hash code of (1, 1) to all AMPs. The size of Spool 4 is estimated with high confidence to be 1 row (22 bytes). The estimated time for this step is 0.03 seconds. 5) We do a group-AMP RETRIEVE step from Spool 4 (Last Use) by way of an all-rows scan into Spool 7 (all_amps), which is duplicated on all AMPs. Then we do a SORT to partition by rowkey. The size of Spool 7 is estimated with high confidence to be 2 rows (30 bytes). The estimated time for this step is 0.03 seconds. 6) We do an all-AMPs JOIN step from MWS.t55 by way of an all-rows scan with no residual conditions, which is joined to Spool 7 (Last Use) by way of an all-rows scan. MWS.t55 and Spool 7 are joined using an exclusion product join, with a join condition of ( "(MWS.t55.b = Field_2) AND (MWS.t55.c = Field_3)") enhanced by dynamic row partition elimination. The input table MWS.t55 will not be cached in memory, but it is eligible for synchronized scanning. The result goes into Spool 3 (all_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 3 is estimated with index join confidence to be 2,370,668 rows (35,560,020 bytes). The estimated time for this step is 39.38 seconds. 7) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by way of an all-rows scan. Aggregate Intermediate Results are computed globally, then placed in Spool 8. The size of Spool 8 is estimated with high confidence to be 1 row (23 bytes). The estimated time for this step is 3.06 seconds. 8) We do an all-AMPs RETRIEVE step from Spool 8 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 1 row (25 bytes). The estimated time for this step is 0.02 seconds.
The Optimizer applies an inclusion or exclusion product join with DPE when there are equality join terms between the partitioning columns at one or more partitioning levels of a row-partitioned table, which is the outer table of a semijoin, and another relation.
Inclusion and exclusion product joins with DPE enhance the performance of such queries when the row-partitioned table is large and a costly sort in preparation for a regular inclusion or exclusion merge join can be avoided.
Such queries further benefit when the number of rows for the join column selected by the subquery is small because in that case, the number of values in the spool is small, resulting in fewer populated spool row partitions than populated table partitions of the row-partitioned table. As a result, the table row partitions that are not in spool neither need to be read nor joined, reducing both the I/O and CPU costs of the join.
For example, assume you have the following table definitions and cardinalities:
CREATE SET TABLE t1 ( a INTEGER, b INTEGER, c INTEGER) PRIMARY INDEX ( a ); CREATE SET TABLE t8 ( a INTEGER, b INTEGER, c INTEGER) PRIMARY INDEX (a) PARTITION BY (RANGE_N(c BETWEEN 1 AND 1200 EACH 30, NO RANGE OR UNKNOWN), RANGE_N(b BETWEEN 1 AND 11000 EACH 7, NO RANGE OR UNKNOWN));
Table Name | Table Cardinality (rows) |
---|---|
t1 | 1,000 |
t8 | 9,000,000 |
For the following query, the number of rows returned by the subquery is small. Therefore, multiple row partitions in t8 can be eliminated dynamically, making a product join cost effective, and the t8 rows need not be sorted to make the join.
SELECT COUNT(*) FROM t8 WHERE (b,c) IN (SELECT a,b FROM t1 WHERE c = 1);
Result:
*** Query completed. One row found. One column returned. *** Total elapsed time was 1 second. Count(*) ----------- 65
If the Optimizer uses a merge join instead of an inclusion product join with DPE, the t8 rows need to be sorted, which takes the majority of the processing time, and the query takes 57 seconds to complete rather than 1 second.
SELECT COUNT(*) FROM t8 WHERE (b,c) IN (SELECT a,b FROM t1 WHERE c = 1);
Result:
*** Query completed. One row found. One column returned. *** Total elapsed time was 57 seconds. Count(*) ----------- 65
- The number of rows per row partition of the row-partitioned table
- The number of columns projected by the subquery
Costing takes into account those cases for which the row-partitioned table has a highly variable number of rows per row partition, and avoids applying an inclusion or exclusion product join with DPE for those tables.
Inclusion Product Join with Dynamic Row Partition Elimination
Inclusion product join with dynamic row partition elimination (DPE), which is designed for use when the outer relation in a join based on an IN term is a row-partitioned table, differ from inclusion product joins without DPE in that the DPE versions of the join are driven from the inner table instead of the outer, or row-partitioned, table, while the non-DPE versions are driven from the outer table.
For example, suppose you have the following table definitions and query against those tables:
CREATE SET TABLE MWS.t1, FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL,CHECKSUM = DEFAULT ( a INTEGER, b INTEGER) PRIMARY INDEX (a); CREATE SET TABLE MWS.t2, FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT ( a INTEGER, b INTEGER) PRIMARY INDEX (a) PARTITION BY (RANGE_N(a BETWEEN 1 /* Partitioning level 1 */ AND 60000 EACH 60000, NO RANGE OR UNKNOWN), RANGE_N(b BETWEEN -3 /* Partitioning level 2 */ AND 31580 EACH 1, NO RANGE, UNKNOWN) ); SELECT * FROM t2 WHERE (b) IN (SELECT a FROM t1);
- Sort t1 to remove duplicate values of column a.
- Duplicate the sorted values of t1 in a spool.
- Read the first row in t2.
If there are no more rows to read, the inclusion product join process is done.
- Search for a match based on the connecting term t2.b=t1.a until a match is found or all rows in spool have been read.
- Go to step 3 and read the next row in t2.
The database reads all rows in t2 sequentially and joins those rows with the spool in this manner.
For an inclusion product join with DPE, the system does not read t2 sequentially or compare each row in t2 with all rows in the spool.
When a bind term is specified on all, or on a subset of, the partitioning levels of a row-partitioned table, the system can use the information to build a spool having the same row partitioning as the row-partitioned table. The system can then join only the equivalent partitioning ranges between the base table row partitions and the spool row partitions.
The setup for the inclusion product join with DPE is the same as that for a standard inclusion product join, except that when duplicated, the spool is built with the same partitioning as is defined for the base table, and then the spool is sorted by partition number. In the previous example, the spool is row-partitioned using the partitioning expression defined for row-partitioned table t2:
PARTITION BY (RANGE_N(a BETWEEN -3 AND 31580 EACH 1, NO RANGE, UNKNOWN) );
Note that the row partitioning is done on column a of the t1 spool because the bind term t2.b=t1.a links partitioning on t2.b with t1.a.
The system performs the inclusion product join with DPE as follows:
- Sort t1 to remove duplicate values of column a.
- Duplicate the sorted values of t1 and row-partition the duplicated rows in a spool by the partitioning ranges defined for t1.
- Read the first row in the first row partition of the spool.
- Build the list of row partitions dynamically in the row-partitioned table to which the current spool partition is to join.
- Read the first row in the first DPE row partition of the row-partitioned table.
- Compare the row with all rows in the current spool row partition until either a match is found or there are no more rows in the spool row partition.
If a match is found, return the row-partitioned table row. Otherwise, go to step 7.
- Read the next row of the row-partitioned table in the participating DPE partitions.
If there are no more rows, go to step 6. Otherwise, go to step 4.
- Read the first row in the spool with a partition number greater than the current spool row partition.
If there are no such rows, the join is complete. If there is such a row, go to step 2.
Exclusion Product Join with Dynamic Row Partition Elimination
The algorithm for an exclusion product join with DPE is similar the algorithm for an inclusion product join with DPE, but with a restriction and other differences.
The restriction is that exclusion product join with DPE is only enabled with row-partitioned tables where the partitioning expression at each partitioning level consists solely of a RANGE_N function, and the test expression is a simple column. The reason for this restriction is that rows with null partitioning column values at any bound partitioning level must be joined with all partitions at that level.
Because that may eliminate much of the performance improvement gained from DPE, exclusion product join with DPE avoids reading the extra rows of the row-partitioned table by placing all spool rows with null partitioning column values into the error partition (meaning the NO RANGE partition in this case). Because expressions on columns containing nulls may produce non-nulls, there is no simple method of grouping all rows with nulls into one partition other than using the RANGE_N function with a simple column test expression.
To make sure rows with null partitioning columns are placed in the error partition, the system modifies qualifying RANGE_N expressions to remove both the UNKNOWN and NO RANGE partitions from their definition when grouped together as NO RANGE OR UNKNOWN. The join algorithm then joins rows in the error partition with all of the rows of the row-partitioned table.
All connecting terms must reference partitioning columns for the Optimizer to choose an exclusion product join with DPE. If additional non-partitioning columns are components of the connecting conditions, then the Optimizer does not select the exclusion product join with DPE algorithm.
- Build a list of row partitions that includes all uneliminated partitions in the row-partitioned table.
This is the unvisited row partition list, which must be joined with the error partition after all DPE partitions have been joined.
- Read the first row in the first row partition of the spool.
- Dynamically build the list of row partitions in t2 that this current spool row partition is to join.
Remove this set of row partitions from the unvisited row partition list.
- Read the first row in the first DPE partition of the row-partitioned table.
- Compare the row with all rows in the current spool row partition and with all rows in the error partition.
If there is a match, go to step 6. Otherwise, return the row of the row-partitioned table.
- Read the next row of the row-partitioned table in the participating DPE partitions.
If there are no more rows, go to step 7. Otherwise, go to step 5.
- Read the first row in the spool with a partition number greater than the current spool partition.
If there are no such rows, go to step 8. Otherwise, go to step 5.
- Read the first row in the row-partitioned table not eliminated by the unvisited row partition list.
If all of the connecting term columns are null, do not return the row, and go to step 12.
If at least one connecting column is null, but not all connecting columns are null, go to step 11.
If no connecting column is null, go to step 10.
- Join this row with all rows in the error partition of the spool.
- If no match is found, return the row.
- Read the next row in the row-partitioned table that was not eliminated by the unvisited partition list.
If there is such a row, go to step 9.