Definition of the Product Join
The product join compares every qualifying row from one relation to every qualifying row from the other relation and saves the rows that match the WHERE predicate filter. Because all rows of the left relation in the join must be compared with all rows of the right relation, the system always duplicates the smaller relation on all AMPs, and if the entire spool does not fit into available memory, the system is required to read the same data blocks more than one time. Reading the same data block multiple times is a very costly operation.
This operation is called a product join because the number of comparisons needed is the algebraic product of the number of qualifying rows in the two relations.
Any of the following conditions can cause the Optimizer to apply a product join over other join methods:
- No WHERE clause is specified in the query.
- The join is on an inequality condition.
If you specify a connecting, or bind term between the relations, then the system does not specify a product join. Bind terms are those that are bound together by the equality operator.
- There are ORed join conditions.
- A referenced relation is not specified in any join condition.
- The product join is the least costly join method available in the situation.
Depending on the estimated cost, and whether the condition on which the join is being made is an equality or an inequality (hash joins are used only for equality conditions), the Optimizer might substitute a form of hash join (see Hash Join and Dynamic Hash Joins) in place of an equality product join.
Product Join Types
The following are the families of product join methods:
- Inner product join
- Left outer product join
- Right outer product join
- Full outer product join
- Piggybacked product join
- Inclusion product join
Used only for a product join made on an IN term.
- Inner inclusion product join
- Outer inclusion product join
- Exclusion product join
Used only for a product join made on a NOT IN term.
Exclusion product joins with dynamic column partition elimination are not supported.
- Inner exclusion product join
- Outer exclusion product join
Processing a Product Join
The following list outlines the product join process:
- Cache the left relation rows.
- Join each row of the right relation with each row from the cached left relation.
An overall join condition is a WHERE constraint that links relations to be joined on a column that is common to each, as shown in the following example:
The following graphic illustrates the generic product join process:
The following graphic illustrates a concrete product join:
Product Join Costs
Product joins are relatively more time consuming than other types of joins because of the number of comparisons that must be made.
The product join is usually the most costly join method available in terms of system resources, and is used only when there is not a more efficient method, such as a merge join or a nested join. However, a product join is useful because it can resolve any combination of join conditions.
Product Joins With Dynamic Row Partition Elimination
Teradata Database performs dynamic row partition elimination, or DPE, for product joins in the AMPs after a query has already been optimized, so in this case, the row partition elimination undertaken is dependent on the actual data on disk, not a statistical estimate of that data.
The term direct product join describes a join in which the table or join index of interest is not spooled in preparation for a product join, but is instead joined directly. The Optimizer might choose a direct product join when all the partitioning columns of one or more partitioning expressions are specified in equality join terms.
As is the case for static partition row elimination (see Static Row Partition Elimination), Teradata Database applies the product join DPE optimization for single-level partitioning independently for each partitioning expression.
Row partition elimination methods can be mixed within the same query. For example, static row partition elimination can be used for some partitioning levels, while DPE can be used for other levels, and some levels might not evoke any partition elimination. Some partition levels might even benefit from multiple forms of partition elimination.
The system combines the result of the partition elimination process to determine which internal partitions need to be accessed. Combining can occur for Static Partition Elimination as part of generating the query plan.
DPE occurs within the AMPs as row sets are processed, and is combined with any static row partition elimination from the query plan at that point.
The same restrictions that apply globally for single-level partitioning apply to multilevel partitioning at each level. The system does not support other forms of DPE, such as those based on conditions against subqueries.
When performing join steps, the qualifying row partitions of the row-partitioned table or join index are determined dynamically based on the values of rows selected from the other relation in the join. Instead of a product join against all the rows in the row-partitioned table, Teradata Database does a product join only for each set of rows of the other relation of a binary join that match with a single row partition. Teradata Database does not necessarily choose to use this optimization for this type of query because other join plans might have a better estimated cost depending on the demographics of the data and the form of partitioning that is used.
This extends to multilevel partitioning for cases where the partitioning column for one or more levels is in an equality condition with a column in the other relation.
For single-level row partitioning, DPE for product joins can occur when there is an equality constraint between the partitioning column of one table and a column of another relation in the join. This is useful when the system uses a product join and must look up rows in one table and match those rows to rows in corresponding row partitions instead of performing a product join to the entire table. In this case, only those row partitions that are required to answer the request take part in the join.
For a product join with DPE, the left relation is sorted by RowKey using the same partitioning expression as the right relation.
Rows from the left relation are loaded into the cache one row partition at a time. This partition is then set as the static partition for reading the right row-partitioned table. In other words, the file system returns data blocks for rows belonging to only this row partition from the right table. Once the EOF marker for the right table is reached, the system reloads the left cache with the subsequent row partition, and the process is repeated.
The AMP software does not always select this optimization for this type of query because other join plans might have a lower estimated cost, depending on the demographics of the data and the form of partitioning specified by the table definition.
DPE can occur when there is an equality constraint between the partitioning column of one table and a column of the other table in the join. This is useful when looking up a row in one table and matching those rows to corresponding row partitions using a product join instead of applying a product join on the entire table. When row partition elimination is invoked, only the partitions required to answer the request are involved in the join. While the AMP software is performing join steps, the qualifying partitions of the row-partitioned table are dynamically determined based on the values of rows from the other table. Instead of a product join against all the rows in the row-partitioned table, the system does a product join only for a row of the other table against a single partition.
For example, if there are 100 row partitions in the row-partitioned table and only 5 of them are needed to answer the join request, the system does not join the other 95 row partitions, providing a 95% resource saving for the operation.
Be sure to collect statistics on the following column sets from each table being joined:
- The primary indexes
- The system-derived PARTITION columns
- The partitioning columns of the row-partitioned table
- The column in the other table being equated with the partitioning column of the row-partitioned table
Teradata Database can also apply DPE to inner merge joins (see Slow Path Inner Merge Join and Fast Path Inner Merge Join) for single-level row partitioning if the right relation in the join is row-partitioned with a direct geography. This usually means there are equality join conditions on the primary index. The right relation must also be row partitioned by a single column with either of the following characteristics:
- A partitioning expression based on a RANGE_N function and a single range with its granularity defined by an EACH clause.
- A standalone partitioning column.
For example, assuming the partitioning column of the row-partitioned table is x, the following are all acceptable partitioning expressions:
- PARTITION BY x
PARTITION BY RANGE_N(x
With the same assumption for the partitioning column, the following are some unacceptable partitioning expressions:
- PARTITION BY (x MOD 65535) + 1
- PARTITION BY CASE_N(x<10, x<=50, NO CASE OR UNKNOWN)
- PARTITION BY RANGE_N(x BETWEEN *, 1, 3, 6
AND 10, 11
The join conditions must be conjunctive (ANDed) and there must be at least one binding on the right partitioning column of the right relation such as one of the following:
|Equality||Join on a single row partition|
|Inequality||Join on an open row partition range|
|Range||Join on a closed row partition range|
A bind term is one that connects two tables on an equality condition. For special partitions such as NO RANGE and UNKNOWN, the sliding window join algorithm is used, and the system does not apply DPE.
The Optimizer can consider a product join with dynamic row partition elimination between a row-partitioned table and another relation when a join condition is specified on the partitioning expression of the row-partitioned table. In this case, a spool is built from the other relation based on the RowKey of its join condition column, and it is duplicated across the AMPs. The product join is then made within the corresponding row partitions for the join.
Product Join With Dynamic Row Partition Elimination for Character Partitioning
For the Optimizer to apply dynamic row partition elimination for a character partitioning level with an equality join term qualifying that level, the following conditions must all be met:
- The estimated cost of the join is less than the estimated cost of all other join types.
- There must no more than one character partitioning column at the partitioning level being considered.
- If the session collation or the collation of the row-partitioned table is either MULTINATIONAL or CHARSET_COLL, and if any comparison or string function involving any non-constant expressions in the partitioning expression at the partitioning level being considered is case blind, the session collation must match the collation of the row-partitioned table.
The following functions and attribute are case blind.
- LOWER function
- SOUNDEX function
- UPPER function
- UPPERCASE attribute
The following functions are case sensitive.
Presence of the concatenation operator (||) marks the expression as having the property of being both case sensitive and case blind.
The following functions follow the same rules as comparison operators, function input arguments, and the default case sensitivity for the session mode that was in effect when the CPPI was created or modified, and the system examines them all to determine case sensitivity:
Specifying the SUBSTRING function does not affect case sensitivity.
- If the WHERE clause predicate that qualifies a row partitioning level for a product join with DPE is case blind, then all comparisons and all string functions in any non-constant expressions in the partitioning expression for that level must also be case blind.
The WHERE clause predicate is considered to be case blind if any of the comparisons or string functions involving non-constant expressions in the condition is case blind.
For the Optimizer to specify a product join with DPE for a given row partitioning level, the collation for the current session need not match the collation for the row-partitioned table when all equality join terms on character partitioning columns are case sensitive.