Definition of the Product Join
The product join compares every qualifying row from one relation to every qualifying row from the other relation and saves the rows that match the WHERE predicate filter. Because all rows of the left relation in the join must be compared with all rows of the right relation, the system always duplicates the smaller relation on all AMPs, and if the entire spool does not fit into available memory, the system is required to read the same data blocks more than one time. Reading the same data block multiple times is a costly operation.
This operation is called a product join because the number of comparisons needed is the algebraic product of the number of qualifying rows in the two relations.
- No WHERE clause is specified in the query.
- The join is on an inequality condition.
- There are ORed join conditions.
- A referenced relation is not specified in any join condition.
- The product join is the least costly join method available in the situation.
Product Join Types
- Inner product join
- Left outer product join
- Right outer product join
- Full outer product join
- Piggybacked product join
- Inclusion product join
Used only for a product join made on an IN term.
- Inner inclusion product join
- Outer inclusion product join
- Exclusion product join
Used only for a product join made on a NOT IN term.
Exclusion product joins with dynamic column partition elimination are not supported.
- Inner exclusion product join
- Outer exclusion product join
Processing a Product Join
The following list outlines the product join process:
- Cache the left relation rows.
- Join each row of the right relation with each row from the cached left relation.
An overall join condition is a WHERE constraint that links relations to be joined on a column that is common to each, as shown in the following example:
WHERE employee.deptno=department.deptno
The following graphic illustrates the generic product join process.
The following graphic illustrates a product join where rows from the Project table are duplicated to all AMPs.
Product Join Costs
Product joins are relatively more time consuming than other types of joins because of the number of comparisons that must be made.
The product join is typically the most costly join method available in terms of system resources, and is used only when there is not a more efficient method, such as a merge join, hash join, or a nested join. However, a product join can resolve any combination of join conditions.
Product Joins with Dynamic Row Partition Elimination
The database performs dynamic row partition elimination, or DPE, for product joins in the AMPs after a query has already been optimized, so in this case, the row partition elimination undertaken is dependent on the actual data on disk, not a statistical estimate of that data.
The term direct product join describes a join in which the table or join index of interest is not spooled in preparation for a product join, but is instead joined directly. The Optimizer may choose a direct product join when all the partitioning columns of one or more partitioning expressions are specified in equality join terms.
As is the case for static partition row elimination (see Static Row Partition Elimination), the database applies the product join DPE optimization for single-level partitioning independently for each partitioning expression.
Row partition elimination methods can be mixed within the same query. For example, static row partition elimination can be used for partitioning levels, DPE can be used for other levels, and there may be levels that do not evoke partition elimination. Partition levels may benefit from multiple forms of partition elimination.
The system combines the result of the partition elimination process to determine which internal partitions need to be accessed. Combining can occur for Static Partition Elimination as part of generating the query plan.
DPE occurs within the AMPs as row sets are processed, and is combined with any static row partition elimination from the query plan at that point.
When performing join steps, qualifying row partitions of the row-partitioned table or join index are determined dynamically, based on the values of rows selected from the other relation in the join. Instead of a product join against all rows in the row-partitioned table, the database does a product join only for each set of rows of the other relation of a binary join that match a single row partition. The database chooses this optimization for this type of query unless other join plans have a better estimated cost, depending on data demographics and partitioning form.
DPE for product joins can occur when there is an equality constraint between a partitioning column of one table and a column of another relation in the join. This is useful when the system uses a product join and must look up rows in one table and match those rows to rows in corresponding row partitions instead of performing a product join to the entire table. In this case, only those row partitions that are required to answer the request take part in the join.
For a product join with DPE, the left relation is sorted by RowKey using the same partitioning expression as the right relation.
Rows from the left relation are loaded into the cache one row partition at a time. This partition is then set as the partition for reading the right row-partitioned table. When the end of the partition for the right table is reached, the system reloads the left cache with the next row partition, and the process is repeated.
For example, if there are 100 row partitions in the row-partitioned table and only 5 are needed to answer the join request, the system does not join the other 95 row partitions, providing a 95% resource saving for the operation.
- The primary indexes
- The system-derived PARTITION columns
- The partitioning columns of the row-partitioned table
- The column in the other table being equated with the partitioning column of the row-partitioned table
Product Join with Dynamic Row Partition Elimination for Character Partitioning
- The estimated cost of the join is less than the estimated cost of all other join types.
- There must no more than one character partitioning column at the partitioning level being considered.
- If the session collation or the collation of the row-partitioned table is either MULTINATIONAL or CHARSET_COLL, and if any comparison or string function involving any non-constant expressions in the partitioning expression at the partitioning level being considered is case insensitive, the session collation must match the collation of the row-partitioned table.The following functions and attribute are case-insensitive.
- LOWER function
- SOUNDEX function
- UPPER function
- UPPERCASE attribute
The following functions are case-sensitive.- CHAR2HEXINT
- TRANSLATE
- TRANSLATE_CHK
- TRIM
- VARGRAPHIC
Presence of the concatenation operator (||) marks the expression as having the property of being both case sensitive and case insensitive.
The following functions follow the same rules as comparison operators, function input arguments, and the default case sensitivity for the session mode that was in effect when the CPPI was created or modified, all of which the system examines to determine case sensitivity:- INDEX
- MINDEX
- POSITION
Specifying the SUBSTRING function does not affect case sensitivity.
- If the WHERE clause predicate that qualifies a row partitioning level for a product join with DPE is case insensitive, then all comparisons and all string functions in any non-constant expressions in the partitioning expression for that level must also be case-insensitive.
The WHERE clause predicate is considered to be case-insensitive if any of the comparisons or string functions involving non-constant expressions in the condition is case-insensitive.
For the Optimizer to specify a product join with DPE for a given row partitioning level, the collation for the current session need not match the collation for the row-partitioned table when all equality join terms on character partitioning columns are case-sensitive.