16.10 - Direct Row-partitioned PI Merge Join - Teradata Database

Teradata Database SQL Request and Transaction Processing

prodname
Teradata Database
vrm_release
16.10
created_date
June 2017
category
Programming Reference
User Guide
featnum
B035-1142-161K

About the Direct Row-partitioned PI Merge Join

The term direct merge join describes a join method in which the table or join index of interest is not spooled in preparation for a merge join, but instead is done directly. The Optimizer might choose a direct merge join when at minimum all columns of the primary index are specified in equality join terms.

To qualify for a direct row-partitioned PI merge join, there must be equality conditions on all the primary index columns of the two relations. This applies equally to character and non-character row-partitioned PIs. There are several forms of this optimization. The particular form selected by the Optimizer depends on factors such as the following:

  • Any additional conditions in the query
  • The total number of row partitions
  • The number of populated row partitions

In the following example, the Optimizer can choose to do a direct merge join of markets and market_penetration instead of redistributing both tables to spool, sorting the spool in hash order of the primary index, and then doing a row hash merge join.

The example uses the following table definitions:

     CREATE TABLE markets (
       productid       INTEGER NOT NULL,
       region          BYTEINT NOT NULL,
       activity_date   DATE FORMAT 'yyyy-mm-dd' NOT NULL,
       revenue_code    BYTEINT NOT NULL,
       business_sector BYTEINT NOT NULL,
       note            VARCHAR(256))
     PRIMARY INDEX (productid, region)
     PARTITION BY (
     RANGE_N(region          BETWEEN 1
                             AND     9
                             EACH    3),
     RANGE_N(business_sector BETWEEN 0
                             AND    49
                             EACH   10),
     RANGE_N(revenue_code    BETWEEN 1
                             AND    34
                             EACH    2),
     RANGE_N(activity_date   BETWEEN DATE '1986-01-01'
                             AND     DATE '2007-05-31'
                             EACH INTERVAL '1' MONTH));
     CREATE TABLE market_penetration (
       productid       INTEGER NOT NULL,
       region          BYTEINT NOT NULL,
       activity_date   DATE FORMAT 'yyyy-mm-dd' NOT NULL,
       revenue_code    BYTEINT NOT NULL,
       business_sector BYTEINT NOT NULL,
       saturation      FLOAT)
     PRIMARY INDEX (productid, region)
     PARTITION BY (
     RANGE_N(region BETWEEN 1
                    AND     9
                    EACH    3),
     RANGE_N(business_sector BETWEEN 0
                             AND    49
                             EACH   10),
     RANGE_N(revenue_code    BETWEEN 1
                             AND    34
                             EACH    2),
     RANGE_N(activity_date   BETWEEN DATE '1986-01-01'
                             AND    DATE '2007-05-31'
                             EACH INTERVAL '1' MONTH));

The example request joins markets and market_penetration. Because of the specified conditions, the Optimizer is able to select a direct row-partitioned PI-to-row-partitioned PI merge join to join the relations.

     SELECT a.*, b.saturation
     FROM   markets AS a INNER JOIN market_penetration AS b
     WHERE  a.productid       = b.productid
     AND    a.region          = b.region
     AND    a.business_sector = b.business_sector
     AND    a.revenue_code    = b.revenue_code
     AND    a.activity_code   = b.activity_code;