15.10 - Direct Row-partitioned PI Merge Join - Teradata Database

Teradata Database SQL Request and Transaction Processing

prodname
Teradata Database
vrm_release
15.10
category
Programming Reference
User Guide
featnum
B035-1142-151K

The term direct merge join describes a join method in which the table or join index of interest is not spooled in preparation for a merge join, but instead is done directly. The Optimizer might choose a direct merge join when at minimum all columns of the primary index are specified in equality join terms.

To qualify for a direct row-partitioned PI merge join, there must be equality conditions on all the primary index columns of the two relations. This applies equally to character and non‑character row-partitioned PIs. There are several forms of this optimization. The particular form selected by the Optimizer depends on factors such as the following.

  • Any additional conditions in the query
  • The total number of row partitions
  • The number of populated row partitions
  • In the following example, the Optimizer can choose to do a direct merge join of markets and market_penetration instead of redistributing both tables to spool, sorting the spool in hash order of the primary index, and then doing a row hash merge join.

    The example uses the following table definitions.

         CREATE TABLE markets (
           productid       INTEGER NOT NULL,
           region          BYTEINT NOT NULL,
           activity_date   DATE FORMAT 'yyyy-mm-dd' NOT NULL,
           revenue_code    BYTEINT NOT NULL,
           business_sector BYTEINT NOT NULL,
           note            VARCHAR(256))
         PRIMARY INDEX (productid, region)
         PARTITION BY (
         RANGE_N(region          BETWEEN 1 
                                 AND     9 
                                 EACH    3),
         RANGE_N(business_sector BETWEEN 0 
                                 AND    49 
                                 EACH   10),
         RANGE_N(revenue_code    BETWEEN 1 
                                 AND    34 
                                 EACH    2),
         RANGE_N(activity_date   BETWEEN DATE '1986-01-01' 
                                 AND     DATE '2007-05-31' 
                                 EACH INTERVAL '1' MONTH));
     
         CREATE TABLE market_penetration (
           productid       INTEGER NOT NULL,
           region          BYTEINT NOT NULL,
           activity_date   DATE FORMAT 'yyyy-mm-dd' NOT NULL,
           revenue_code    BYTEINT NOT NULL,
           business_sector BYTEINT NOT NULL,
           saturation      FLOAT)
         PRIMARY INDEX (productid, region)
         PARTITION BY (
         RANGE_N(region BETWEEN 1 
                        AND     9 
                        EACH    3),
         RANGE_N(business_sector BETWEEN 0 
                                 AND    49 
                                 EACH   10),
         RANGE_N(revenue_code    BETWEEN 1 
                                 AND    34 
                                 EACH    2),
         RANGE_N(activity_date   BETWEEN DATE '1986-01-01' 
                                 AND    DATE '2007-05-31' 
                                 EACH INTERVAL '1' MONTH));

    The example request joins markets and market_penetration. Because of the specified conditions, the Optimizer is able to select a direct row-partitioned PI-to-row-partitioned PI merge join to join the relations.

         SELECT a.*, b.saturation 
         FROM   markets AS a INNER JOIN market_penetration AS b
         WHERE  a.productid       = b.productid
         AND    a.region          = b.region
         AND    a.business_sector = b.business_sector
         AND    a.revenue_code    = b.revenue_code
         AND    a.activity_code   = b.activity_code;