Direct Row-partitioned PI Merge Join | VantageCloud Lake - Direct Row-partitioned PI Merge Join - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

The term direct merge join describes a join method in which the table or join index of interest is not spooled in preparation for a merge join, but instead is done directly. The Optimizer may choose a direct merge join when at minimum all columns of the primary index are specified in equality join terms.

To qualify for a direct row-partitioned PI merge join, there must be equality conditions on the primary index columns of the two relations. This applies to character and noncharacter row-partitioned PIs. There are multiple forms of this optimization. The form selected by the Optimizer depends on factors such as the following:
  • Any additional conditions in the query
  • The total number of row partitions
  • The number of populated row partitions

In the following example, the Optimizer can choose to do a direct merge join of markets and market_penetration instead of redistributing both tables to spool, sorting the spool in hash order of the primary index, and then doing a row hash merge join.

The example uses the following table definitions:

CREATE TABLE markets (
  productid       INTEGER NOT NULL,
  region          BYTEINT NOT NULL,
  activity_date   DATE FORMAT 'yyyy-mm-dd' NOT NULL,
  revenue_code    BYTEINT NOT NULL,
  business_sector BYTEINT NOT NULL,
  note            VARCHAR(256))
PRIMARY INDEX (productid, region)
PARTITION BY (
RANGE_N(region          BETWEEN 1
                        AND     9
                        EACH    3),
RANGE_N(business_sector BETWEEN 0
                        AND    49
                        EACH   10),
RANGE_N(revenue_code    BETWEEN 1
                        AND    34
                        EACH    2),
RANGE_N(activity_date   BETWEEN DATE '1986-01-01'
                        AND     DATE '2007-05-31'
                        EACH INTERVAL '1' MONTH));
CREATE TABLE market_penetration (
  productid       INTEGER NOT NULL,
  region          BYTEINT NOT NULL,
  activity_date   DATE FORMAT 'yyyy-mm-dd' NOT NULL,
  revenue_code    BYTEINT NOT NULL,
  business_sector BYTEINT NOT NULL,
  saturation      FLOAT)
PRIMARY INDEX (productid, region)
PARTITION BY (
RANGE_N(region BETWEEN 1
               AND     9
               EACH    3),
RANGE_N(business_sector BETWEEN 0
                        AND    49
                        EACH   10),
RANGE_N(revenue_code    BETWEEN 1
                        AND    34
                        EACH    2),
RANGE_N(activity_date   BETWEEN DATE '1986-01-01'
                        AND    DATE '2007-05-31'
                        EACH INTERVAL '1' MONTH));

The example request joins markets and market_penetration. Because of the specified conditions, the Optimizer is able to select a direct row-partitioned PI-to-row-partitioned PI merge join to join the relations.

SELECT a.*, b.saturation
FROM   markets AS a INNER JOIN market_penetration AS b
WHERE  a.productid       = b.productid
AND    a.region          = b.region
AND    a.business_sector = b.business_sector
AND    a.revenue_code    = b.revenue_code
AND    a.activity_code   = b.activity_code;