15.10 - Hash Join - Teradata Database

Teradata Database SQL Request and Transaction Processing

prodname
Teradata Database
vrm_release
15.10
category
Programming Reference
User Guide
featnum
B035-1142-151K

Hash join is a method that performs better than merge join (see “Merge Join” on page 405) and equality product join (see “Product Join” on page 398) under certain conditions. Hash join is applicable only to equijoins. The performance enhancements gained with the hybrid hash join comes mainly from eliminating the need to sort the tables to be joined before performing the actual join operation. In some circumstances, an in-memory hash join provides better performance by making better use of the cache, bulk join condition evaluation, and the use of SIMD instructions.

Note: Unless otherwise noted, the general information about hash joins in the hash join sections also applies to in-memory hash joins.

Teradata Database can use the various forms of hash join to handle the following join types.

  • Hash inner joins (classic, dynamic hash join, and dynamic hash join with dynamic row partition elimination)
  • Hash inclusion semijoins (dynamic hash join only; does not apply to in-memory hash joins)
  • Hash exclusion semijoins (dynamic hash join only; does not apply to in-memory hash joins)
  • Left, right, and full outer hash joins (classical and dynamic hash joins only; does not apply to in-memory hash joins)
  • Hash joins with dynamic row partition elimination (dynamic hash join only)
  • See “Classic Hash Join” on page 436 and “Dynamic Hash Joins” on page 443 for further information about these hash join types.

    Depending on whether the large table in the join must be spooled, the Optimizer might substitute a dynamic hash join (see “Dynamic Hash Joins” on page 443) or an in-memory dynamic hash join (see “Dynamic In-memory Hash Joins” on page 443) in place of a standard hash join. If join conditions are specified on a column set that is not the primary index, then Teradata Database must do some relocation of rows prior to the sort operation.

    Hash joins, like other join methods, perform optimally when the statistics on the join columns are current. This is particularly important for hash join costing to assist the Optimizer in detecting skew in the data (see “Effects of Data Skew on Hash Join Processing” on page 439 for details about the negative effect of skewed data on hash join performance).

    Like other join methods, hash joins are costed, and the Optimizer compares their cost to the cost of other eligible join methods and plans to determine the least costly plan for a query.

    The Optimizer also decides whether or not to create in-memory optimized spools.

    In-memory optimized spools are column-partitioned spools with a maximum of 4 column partitions, with the following contents:

  • A partition for the hash values.
  • A partition for the join column with the best selectivity that is used in a equality join condition of fixed length.
  • A partition for all the fixed length columns used in the equality join condition.
  • A partition for all the variable-length columns, residual join condition columns, and projected columns.
  • Description of the hash join method requires several new terms, which are defined in the following table.

     

    Term

    Definition

    Build Table

    The smaller of the 2 join tables.

    So named because it is used to build the hash table.

    Fanout

    The maximum number of hash join partitions to be created.

    When rows are fanned out, they are partitioned into several hash partitions.

    Hash join or
    In-memory hash join

    A join algorithm in which a hash table is built for the smaller of the two tables being joined based on the join column set. The larger table is then used to probe the hash table to perform the join.

    Hash table

    A memory‑resident table composed of several hash buckets, each of which has an ordered linked list of rows belonging to a particular hash range.

    In-memory hash table

    A memory-resident table built from left input, designed to increase cache efficiency during join comparisons. It is also designed to enable SIMD instruction execution.

    Probe table

    The larger of the two join tables in the hash join.

    Rows from this table are used to probe rows in the hash table for a hash match before the join conditions are evaluated.

    Partition

    A segment of a table in a hash join.

    Tables are segmented into a number of hash join partitions using the equation described in “Assigning Rows to a Hash Join Partition” on page 439).

    Hash join partitions can be memory‑resident, in which case they also constitute the hash table, or a spool.

    The limit on the number of hash join partitions for a hash join operation is 50.

    Note that hash join partitions have no relationship to the partitions of a row‑partitioned or column‑partitioned table or join index (see SQL Data Definition Language and Database Design). They are entirely different, unrelated things.

    Spillover

    During the hash table building process, Teradata Database allocates one 64KB segment at a time for storing the hash table rows.

    When the maximum number of allocated segments (MaxHTSegs) is reached, spillover occurs and the hash join process must start by using the right row hash to probe the hash table looking for a match row to join.

    The Optimizer might use a hash join instead of a merge join for better performance in the following situations.

  • At least one join key is not indexed.
  • To provide a 10-40% performance improvement for the join step.
  • Note: Because a single hash join is only part of a request, you might not see a 40% performance improvement for the entire request.

    By using a hash table, hash joins eliminate the sort used prior to the merge join.

    See “Controlling the Size of Hash Tables and In-memory Hash Tables” on page 440 for more information about optimizing the performance of hash joins.

    The Optimizer costs hash joins and in-memory hash joins. That is, the Optimizer evaluates the relative costs of available join methods to determine the least expensive method of joining two tables.

    In addition, Teradata Database supports dynamic hash join. In this variation of the hash join, the row hash code is computed dynamically instead of the join creating a spool with the row hash code based on the join conditions. Also see “Dynamic Hash Joins” on page 443.

    If the small table is small enough to fit into one hash partition and it is being duplicated to all AMPs, the redistribution of the large table can be eliminated by this type of dynamic hash join. In such a case, the large table is read directly, without spooling, redistributing or sorting, and the hash join is performed between the small table spool and the large table spool.

    Expected performance improvements come from, but are not limited to, the following:

  • Allowing hash joins and dynamic hash joins to be considered as a join option by costing.
  • Using dynamic hash joins, which eliminate large table spooling.
  • Classic hash join is applicable when the entire build table fits into available memory. The system reads rows from the build relation directly into a memory‑resident hash table. The process then reads rows from the probe table and uses them to individually probe the hash table for a match. A result row is returned for each hash match that satisfies all the join conditions. No partitioning, and thus no partitioning-related I/O, is incurred in this case. Consequently, it is the fastest form of hash join.

    Teradata Database uses a variant of classic hash join commonly referred to as hybrid hash join (developed by DeWitt et al., 1984 as a refinement of the GRACE hash join of Kitsuregawa et al., 1983; see Yu and Meng, 1998, for a concise description of the differences between the classic hash join and the hybrid hash join) as well as a variant of the hybrid hash join referred to as dynamic hash join (see “Dynamic Hash Joins” on page 443). The method deals with build tables that are too large to fit into available memory by dividing them into chunks called hash join partitions that are small enough to fit into memory (see “Partitioning the Smaller Table of a Hash Join Pair” on page 438 for details about how the system does this).

    The classic hash join can be applied to left, right, and full outer joins as well as inner joins. Geospatial column terms are not permitted for outer join conditions.

    The process applied by the hybrid hash join algorithm is provided by the following process:

    1 Read a row from the right table, which is an unsorted spool containing the row hash value for each row as well as its row data.

    2 Match each right row with all the left table rows having the same row hash.

    3 Join the rows.

    The following graphic illustrates the hybrid hash join process.

    Hash bucket entries cluster the entries with the same rowhash value together. This facilitates faster searches for a row using its row hash value because each hash bucket entry only needs to be visited once.

    All of the considerations described for classic hash joins also apply to in-memory hash joins. There are a few differences, one of which is that the amount of memory in an in-memory hash join can go up to a default value of 100 MB.

    The in-memory hash join algorithm is designed for bulk processing, that is, each step of the process that is described in “Classic Hash Join” on page 436 is enhanced to operate on multiple rows. This algorithm helps to improve the memory access pattern and thus, cache efficiency.

    Another difference is in the hash table data structure. The hash table structure for in-memory hash joins follows the in-memory spool partition format and stores values for each partition together.

    A hash table is derived from the smaller table in a hash join pair. It is a single‑partition, memory‑resident data structure that contains a hash array as well as the rows in the larger table that the hash array points to. The system configures the smaller table in a hash join operation as an ordered linked list of rows that belong to a particular range of hash values. Depending on is size, this table can be decomposed into several smaller partitions.

    When the smaller table is too large to fit into the memory available for hash join processing, the system splits it into several smaller, range‑bounded partitions. Each partition is small enough to fit into the available space. Partition size is controlled by the settings of several parameters in the DBS Control record (see “Hash Join Control Variables” on page 440). The default partition size for a non‑duplicated (larger) table is roughly 51 KB, while the default partition size for a duplicated (smaller) table is roughly 204 KB. In-memory hash join partition size is dynamically determined by the Optimizer, but the maximum number of partitions remains 50.

    A table can be divided into a maximum of 50 partitions. Partitioning avoids the maintenance overhead that would otherwise be required for virtual memory management whenever a hash bucket overflow condition occurs.

    The system segments the smaller table using an algorithm that uses the join columns to hash rows into the number of hash join partitions required to make the join (see “Assigning Rows to a Hash Join Partition” on page 439).

    For example, suppose 6 hash join partitions are needed to make the join. That is, the number of qualifying rows for the smaller table is six times larger than the largest single hash join partition that can fit into the available memory. The system then hashes each row into one of the 6 hash join partitions.

    The system spools and partitions the larger table in the same way. Although the partitions for the large table are also larger, they need not fit into memory. When the system makes the join, it brings a hash join partition of the smaller table, which is copied to a spool, into memory. Rows from the matching hash join partition in the other table, which is also in a spool, can then be matched to the rows in memory.

    Note that a row from one hash join partition cannot match a row in the other table that is in a different hash join partition because their respective hash values are different.

    Each left table hash join partition is then hash-joined in turn with its corresponding right table partition. The graphic in the section on “Classic Hash Join” shows partition 2 of the triple‑partitioned left table being hash-joined with hash join partition 2 of the triple‑partitioned right table.

    Hash join partitions are created by hashing the left and right table rows on their join columns in such a way that rows from a given left table hash join partition can only match with rows in the corresponding right table hash join partition, which is also illustrated in the graphic in the section on “Classic Hash Join.” The process of creating the hash join partitions is referred to as fanning out in EXPLAIN reports (see “Hash Join Examples” on page 441, where the phrases that describe the hash join and the partitioning of the hash join tables are highlighted in boldface).

    When the number of qualifying rows in a hash join table is too large to fit into the available memory, the system assigns groups of its rows to smaller table hash join partitions using the following equation:

    where:

     

    Equation element …

    Specifies the …

    partition_number

    number of the memory‑resident hash join partition to which a row from the table is assigned.

    row_hash_value

    row hash value for the row in question. See Database Design for details.

    MOD

    modulo function.

    fanout

    number of hash join partitions to be created for the hash join operation.

    This is the minimum value of fanout such that the following inequality is true.

    Data skew in the build table can seriously degrade the performance of hash joins. One of the premises of hash join is that the hashing algorithm is good enough to ensure that the build relation can be reduced into relatively equivalent-sized partitions. When there is a large quantity of duplicate row values in the build table, the hash partitioning algorithm might not partition it optimally. Skew in the probe table can also degrade performance if it results in the probe table being smaller than the build table.

    To make allowances for possible skew in either table in a hash join operation, you can use the DBS Control utility to size build table partitions proportionately to their parent hash table (see “SkewAllowance” on page 441).

    If the specified skew allowance is insufficient to correct for data skew, and hash table bucket overflow occurs, then the system matches rows from the corresponding probe table against build table rows that are already loaded into the memory‑resident hash table. After all the probe partition rows have been processed, the system clears the hash table and moves more rows from the oversized build table partition into memory. The system then re-reads rows from the probe partition and matches them against the freshly loaded build table rows. This procedure repeats until the entire build partition has been processed.

    For in-memory hash joins, data skew in the build table can also degrade join performance. Skew is more problematic for in-memory hash joins due to the higher default size of 100 MB for an in-memory hash table. With higher hash table sizes, the Optimizer may choose direct access for the right table. But if there is hash table overflow, then a larger right table must be scanned multiple times. When using smaller hash table sizes, the planner may choose to partition the two inputs if the available memory is deemed insufficient to hold the entire left input. During in-memory hash join planning, the Optimizer adjusts the cost more conservatively for skew by inflating the number of rows.

    You can control the size of the hash table using the HTMemAlloc and HTMemAllocBase DBS Control fields (see “HTMemAlloc” on page 441). If you specify a value of 0, the system cannot build a hash table. This effectively turns off hash join, and the Optimizer does not consider the method when it is doing its join plan evaluations.

    If you need to increase the size of the in-memory hash table, contact your Teradata Global Support Center representative.

    You can access the hash join‑related parameters HTMemAlloc and SkewAllowance using the DBS Control utility (see Utilities). Use them to optimize the performance of your hash joins. HTMemAllocBase is an internal DBS Control field that can only be accessed by Teradata support personnel. Contact your Teradata technical support team if you suspect the value of this field needs to be changed.

    Note: HTMemAlloc does not apply to in-memory hash joins; SkewAllowance is used for in-memory hash joins.

     

    DBS Control Field

    Function

    HTMemAlloc

    Varies the hash table size by calculating a percentage of the HTMemAllocBase value. The larger the specified percentage, the larger the hash table. The default is 10. A setting of zero prevents hash joins from being used as optimizations.

    This setting should be changed only under the direction of the Teradata Global Support Center.

    HJ2IMHJ

    Determines whether the Optimizer preferentially chooses the in-memory hash join method for performing hash joins and dynamic hash joins. The default value is FALSE (not enabled).

    When set to FALSE, the optimizer costs an in-memory hash join as a new join method and picks it based on cost. When set to TRUE, the optimizer uses the in-memory hash join method whenever possible, in preference to any other hash join method, regardless of cost.

    The change in the setting becomes effective after the DBS Control Record is written.

    SkewAllowance

    Provides for data skew by making the size of each partition smaller than the hash table.

    Valid input values range from 20 to 80 (percent).

    The default is 75 (percent), indicating that the partition size is set to 25 percent of the hash table to allow the actual partition to be four times more than the estimate made by the Optimizer before it is too large to fit into the hash table.

    For more information about DBS Control fields, see Utilities.

    The Optimizer decides to hash join the Employee and Department tables on the equality condition Employee.Location = Department.Location in this query. The EXPLAIN text indicates that the hash tables in Spool 2 (step 4) and Spool 3 (step 5) are segmented (fanned out) into 22 hash join partitions (see “Partitioning the Smaller Table of a Hash Join Pair” on page 438).

    Hash table memory allocation is set at 5% and skew allowance is set at 75% (see “Effects of Data Skew on Hash Join Processing” on page 439).

         EXPLAIN 
         SELECT employee.empnum, department.deptname, employee.salary
         FROM employee, department
         WHERE employee.location = department.location;
     
         ***Help information returned. 30 rows.
          ***Total elapsed time was 1 second. 
    Explanation 
    ---------------------------------------------------
    1) First, we lock a distinct PERSONNEL.“pseudo table” for read on a RowHash to prevent global deadlock for PERSONNEL.Department.
    2) Next, we lock a distinct PERSONNEL.”pseudo table” for read on a RowHash to prevent global deadlock for PERSONNEL.Employee.
    3) We lock PERSONNEL.Department for read, and we lock PERSONNEL.Employee for read.
    4) We do an all-AMPs RETRIEVE step from PERSONNEL.Employee by way of an all-rows scan with no residual conditions into Spool 2 fanned out into 22 hash join partitions, which is redistributed by hash code to all AMPs. The size of Spool 2 is estimated to be 3,995,664 rows. The estimated time for this step is 3 minutes and 54 seconds.
    5) We do an all-AMPs RETRIEVE step from PERSONNEL.Department by way of an all-rows scan with no residual conditions into Spool 3 fanned out into 22 hash join partitions, which is redistributed by hash code to all AMPs. The size of Spool 3 is estimated to be 4,000,256 rows. The estimated time for this step is 3 minutes and 54 seconds.
    6) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to Spool 3 (Last Use). Spool 2 and Spool 3 are joined using a hash join of 22 partitions, with a join condition of (“Spool_2.Location = Spool_3.Location”). The result goes into Spool 1, which is built locally on the AMPs. The result spool field will not be cached in memory. The size of Spool 1 is estimated to be 1,997,895,930 rows. The estimated time for this step is 4 hours and 42 minutes.
    7) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
    -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 4 hours and 49 minutes.
     
    DBS Control Record - Performance Fields:
    HTMemAlloc       =  5%
    SkewAllowance    = 75%

    The partial EXPLAIN from the following example shows how the Optimizer handles an in-memory hash join.

    EXPLAIN 
    SELECT pit103.c2,pit104.c3,cpt1.c2 
    FROM pit104,pit103,cpt1 
    WHERE cpt1.c4 = pit103.c4 AND pit103.c3 = pit104.c4 AND cpt1.c2 > 100;
     
      5) We execute the following steps in parallel.
           1) We do an all-AMPs RETRIEVE step from TEST.pit104 by way of an
              all-rows scan with a condition of ("NOT (TEST.pit104.c4 IS
              NULL)") into Spool 2 (all_amps), which is built locally on
              the AMPs.  Spool 2 is built as in-memory optimized spool with
              3 column partitions.  The size of Spool 2 is estimated with
              low confidence to be 20 rows (500 bytes).  The estimated time
              for this step is 0.37 seconds.
           2) We do an all-AMPs RETRIEVE step from TEST.pit103 by way of an
              all-rows scan with a condition of ("(NOT (TEST.pit103.c4 IS
              NULL )) AND (NOT (TEST.pit103.c3 IS NULL ))") into Spool 3
              (all_amps), which is duplicated on all AMPs.  Spool 3 is
              built as in-memory optimized spool with 3 column partitions.
              The size of Spool 3 is estimated with low confidence to be
              400 rows (11,600 bytes).  The estimated time for this step is
              0.37 seconds.
      6) We do an all-AMPs JOIN step from Spool 2 (Last Use), which is
         joined to Spool 3 (Last Use).  Spool 2 and Spool 3 are joined
         using a single partition in-memory hash join, with a join
         condition of ("c3 = c4").  The result goes into Spool 4 (all_amps),
         which is built locally on the AMPs.  Then we do a SORT to order
         Spool 4 by the hash code of (TEST.pit103.c4).  The size of Spool 4
         is estimated with no confidence to be 90 rows (2,250 bytes).  The
         estimated time for this step is 0.33 seconds.
      7) We do an all-AMPs BULK RETRIEVE step from 3 column partitions of
         TEST.cpt1 with a condition of ("(TEST.cpt1.c2 >= 101) AND (NOT
         (TEST.cpt1.c4 IS NULL ))") into Spool 5 (all_amps), which is
         duplicated on all AMPs.  Then we do a SORT to order Spool 5 by the
         hash code of (TEST.cpt1.c4).  The size of Spool 5 is estimated
         with no confidence to be 140 rows (2,940 bytes).  The estimated
         time for this step is 0.11 seconds.
      8) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of a
         RowHash match scan, which is joined to Spool 5 (Last Use) by way
         of a RowHash match scan.  Spool 4 and Spool 5 are joined using a
         merge join, with a join condition of ("c4 = c4").  The result goes
         into Spool 1 (group_amps), which is built locally on the AMPs.
         The size of Spool 1 is estimated with no confidence to be 67 rows
         (3,618 bytes).  The estimated time for this step is 0.56 seconds. 

    Dynamic hash join provides the ability to do an equality join directly between a small table and a large table on non‑primary index columns without placing the large table into a spool. For dynamic hash join to be used, the left table must be small enough to fit in a single hash join partition.

    Dynamic hash join can be used only when two tables are joined based on non‑primary index columns, and one table, referred to as the left table, is very small compared to the other,

    The process is as follows:

    1 Duplicate the smaller table.

    2 Place the smaller table in a hash array

    3 Read a row from the right table

    4 Compute the row hash code

    5 Match each right row with all rows in the hash array having the same row hash.

    6 Join the matching rows.

    This is faster than redistributing the small table, putting it into the hash array, reading the large table, building the row hash code, writing the data out to spool, and then reading the table in again to do the hash join.

    The dynamic hash join can be applied to left, right, and full outer joins as well as inner joins, and can also take advantage of equality conditions for dynamic partition elimination. Geospatial column terms are not permitted for outer join conditions.

    The inclusion hash join and the exclusion hash join can also use dynamic partition elimination when you specify an equality condition.

    All of the considerations described for dynamic hash joins also apply to in-memory hash joins. There are a few differences, one of which is that the amount of memory in an in-memory hash join can go up to the default value of 100 MB or up to the value set (see “Controlling the Size of Hash Tables and In-memory Hash Tables” on page 440).

    The in-memory dynamic hash join algorithm is designed for bulk processing; after the first 2 steps described in “Dynamic Hash Joins” on page 443, each subsequent step is enhanced to operate on multiple rows. This algorithm helps to improve the memory access pattern and thus, cache efficiency.

    The hash table data structure that is used for an in-memory dynamic hash join is the same as that used for the classic in-memory hash join.

    The Optimizer may use a hash join instead of a merge join of tables for better performance:

  • If at least one join key is not indexed.
  • To provide a 10-40% performance improvement for the join step.
  • Note: Since the join is only part of a query, you may not see a 40% improvement in the entire query.

    The hash join eliminates the sort used prior to the merge join by using a hash table instead.

    You can optimize system usage of hash joins with the following DBS Control fields.

     

                    DBS Control Field

                                     Best Initial Setting

    HTMemAlloc

    Note: Changing HTMemAlloc does not affect in-memory hash joins.

      2

    SkewAllowance

    75

     

    For information on...

    See...

    setting hash join-related DBS Control fields

    the DBS Control fields “HTMemAlloc,” “HJ2IMHJ,” and “Skew Allowance” in Utilities.

    strategies for using hash join-related DBS Control fields

    “Adjusting Skew Allowance” in Database Administration.

    The following are recommendations when using in-memory hash joins:

  • Make the better selective column a fixed-length column and make the fixed-length column either a column partition by itself or a narrow partition.
  • Collect multicolumn statistics if join conditions involve multiple columns because incomplete statistics can make the cardinality of the table incorrect. This situation can lead the Optimizer to think that the table can fit in the available memory (by default, 100 MB) and therefore it does not cost hash table overflows, which eventually lead to a lesser cost. This same situation may not occur with a hash join, since it uses less memory for building the hash table and it also uses fan-out, which performs better. Based on these facts, it is recommended that you collect multicolumn statistics on the join columns from the left table.
  • Keep your statistics updated to avoid the problem mentioned in the previous bullet.
  • Teradata Database supports left outer hash joins, right outer hash joins, and full outer hash joins, but they are not supported for in-memory hash join methods. The processing for each of these join types is described in the following topics.

    The process followed in doing a left outer hash join, where the outer relation is on the left, is as follows.

    1 Read all rows from the left relation to build the hash table.

    Determine if there is spillover.

    2 Sequentially read a row from the right relation.

    3 Use the row hash for the right relation row to probe the hash table for a left matching row.

    4 If a left relation matching row is found, join the left relation row with the right relation row and return it.

    Set the bit flag of the row header in memory to mark the left row as having been matched.

    5 Repeat step 2 until there are no more right relation rows to read.

    6 Reread the hash table entries to find all of the rows whose bit flags are not marked as matched.

    7 Join the left relation non‑qualifying row with a row of nulls and return the row.

    8 If there is spillover, repeat the whole process starting from step 1 to build the hash table using spillover rows.

    If there are no spillover rows, the outer hash join process is complete.

    The process followed in doing a right outer hash join, where the outer relation is on the right, is as follows.

    1 Read all rows from the left relation to build the hash table and determine if there is spillover.

    2 Sequentially read a right relation row.

    3 Use the rowhash for the right relation row to probe the hash table.

  • If the right relation row matches the left relation row,
  • i Join the right relation row with the left relation row and return it.

    i Join the right relation row with the left relation row and return it.

    ii If there is spillover and no bit spool is marked as matched for the right row, set the bit spool flag to mark the right relation row as matched.

  • If the right relation does not match any left relation row and there is no spillover, join the right non‑qualifying row with a row of nulls.
  • 4 Repeat stage 2 until there are no more right relation rows to read.

    5 If there is spillover, repeat the whole process starting from stage 1 to build a hash table using spillover rows.

    If all spillover rows are processed, re‑read the right relation rows to find the non‑matching rows and return the non‑matching rows.

    The processing logic for the full outer hash join is implemented in the following sequences:

    1 Read all rows from the left relation to build the hash table and determine if there is spillover.

    2 Read a row from the right relation.

    3 Use the rowhash for the right relation row to probe the hash table.

  • If there is a matching left relation row, join it with its matching right relation row and return the resulting row.
  • If there is spillover, set the right relation bitmap to mark the right relation row as matched.
  • Set the bit flag of the left relation row header in memory as matched.
  • 4 Repeat stages 2 and 3 until there are no more right relation rows to read.

    5 Reread the hash table entries, find all rows whose bit flags are not marked as matched, and join those left relation rows with a row of nulls, then return the resulting rows.

    6 If there is spillover, repeat the process starting at stage 1 to build a new hash table using the spillover rows.

    7 If spillover rows are processed, reread the right relation rows and use the bitmap to find the non‑matching rows.

    Join those right relation rows with a row of nulls, then return the resulting rows.

    An exclusion hash join is a hash join where only the rows that do not satisfy (meaning that they are NOT IN) any condition specified in the request are joined.

    In other words, exclusion hash join finds rows in the first table that do not have a matching row in the second table.

    Exclusion hash join is an implicit form of outer join. Geospatial column terms are not permitted for outer join conditions, and exclusion hash joins are not supported for in-memory hash joins.

    The high‑level process for executing an exclusion hash join is as follows.

  • If the first column of the inner table is null, no rows from the outer table can be qualified.
  • If the first column of the outer row is null, do not return the outer row unless the inner table is empty.
  • If a column other than the first column is null, the first column of either the outer row or the inner row excepted, the result is determined according to the row comparison rule defined by the ANSI SQL:2011 specification.
  • The rule is as follows.

  • Two row value expressions are equal if all their respective values are equal.
  • If two row value expressions are not equal, their relation is determined by the comparison of the first pair of respective values from the left end of the row value expressions for which the equal comparison yields either an UNKNOWN or FALSE result.
  • The following, more detailed, process is undertaken by the Optimizer for an exclusion hash join when the outer relation is on the right:

    1 Read the rows in the left, or inner, relation for a null in the first join column.

    If a null is found, quit the exclusion hash join and return no rows.

    2 Read the left relation rows to build the hash table. Continue to read the rows until either the end of the file is reached or spillover has occurred.

    3 Read a right relation row.

  • If the first column of the right relation row is null, do not join this row.
  • If there is spillover, mark the bitspool flag for this row as matched so that it is not returned.
  • Go to stage 5.

    4 Use the rowhash value for the right relation row to probe the hash table.

  • If the right relation row matches the left relation row, stop probing for this right relation row.
  • If there is spillover, mark the bitspool as matched for this right relation row.

    Repeat stage 3 to read the next right relation row.

  • If a column other than the first is null for either row, compare the columns according to the ANSI SQL:2011 rule defined in the third bullet under “Exclusion Hash Join” on page 446.
  • If the comparison result is undefined, do not return the outer row and mark the bitspool as matched if there is spillover so the outer row is not returned.

    If the compared result is not a match, return the outer row if there is no spillover.

    5 Repeat stage 3 until there are no more right relation rows to read.

    6 If there is spillover, repeat stage 2 to build the hash table using the remaining left relation rows.

    If there are no more left relation rows to build the hash table with and there was spillover, reread the right relation rows to find the non‑matching rows by checking the bitspool flags for those rows.

    Return any right relation rows that do not have their corresponding bitspool flag marked as matched.

    The following process occurs for an exclusion hash join when the outer relation is on the left:

    1 Read the rows in the right, or inner, relation for nulls in the first join column.

    If a null are found in the first join column, quit the exclusion hash join and return no rows.

    2 Read all rows from left, or outer, relation to build the hash table.

    Read the first join column of each left relation row for nulls.

    If a null is found, skip that left relation row because it cannot be used to join rows from the two relations.

    3 Sequentially read a right relation row.

    4 Use the rowhash value for the right relation row to probe the hash table to find a hash bucket entry.

    If the flag for the hash bucket entry is marked as matched, go on to the next entry whose matched flag is not set.

  • If a matching row is found in the left relation, set the bit flag of the left relation row header in memory as matched.
  • If a column other than the first is null for either row, compare the columns according to the ANSI SQL:2011 rule defined in the third bullet under “Exclusion Hash Join” on page 446.
  • If the comparison result is undefined or a match, use the bit flag in the row header to mark it as matched so that the outer row is not returned.

    If the compared result does not match, do nothing.

    5 Repeat stage 3 until there are no more right relation rows to read.

    6 Reread the hash table and return any outer rows that do not have their bit flag set as matched in the row header.

    7 If there is spillover, repeat the whole process starting from stage 1 to build the hash table using spillover rows.

    If there is no spillover, the exclusion hash join is completed.

    An inclusion hash join is a hash join where the first right relation row that matches the left relation row is joined to it.

    Note: Inclusion hash joins are not supported for in-memory hash join methods.

    The following process describes an inclusion hash join when the outer relation is on the right:

    1 Read all rows from the left relation to build the hash table and to determine if spillover occurs.

    2 Read a row from the right relation.

    If the bitspool is set to mark the right row as matched, skip the row, and repeat stage 2 of this process.

    3 Use the rowhash for the right relation row to probe the hash table entry.

    If the right relation row matches the left relation row, return the right relation row.

    If there is spillover, mark the bitspool flag for the row as matched.

    4 Repeat stage 2 until all of the right relation rows have been read.

    5 If there is spillover, repeat stage 1 to build the hash table using spillover rows.

    If there is no spillover, the inclusion hash join is complete.

    The following process describes an inclusion hash join when the outer relation is on the left:

    1 Read all rows from the left relation to build the hash table and determine if there is spillover.

    2 Read a row from the right relation.

    3 Use the rowhash for the right relation row to probe the hash table entry.

    Skip this hash table entry if the row header bit flag is set as matched and continue to the next hash table entry.

    If the right relation row matches the left relation row, return the left relation row and set the left relation row header bit flag as matched.

    4 Repeat stage 2 until there are no more right relation rows to read.

    5 If there is spillover, repeat step 1 to build a hash table using spillover rows.

    If there is no spillover, the inclusion hash join is complete.