Rowkey-Based Merge Join | Join Planning/Optimization | Teradata Vantage - Rowkey-Based Merge Join - Advanced SQL Engine - Teradata Database

SQL Request and Transaction Processing

Product
Advanced SQL Engine
Teradata Database
Release Number
17.05
17.00
Published
June 2020
Language
English (United States)
Last Update
2021-01-24
dita:mapPath
ykx1561500561173.ditamap
dita:ditavalPath
lze1555437562152.ditaval
dita:id
B035-1142
lifecycle
previous
Product Category
Teradata Vantageā„¢

A rowkey-based merge join for single-level and multilevel partitioning requires equality conditions on all the primary index columns and partitioning columns of the two relations.

To be eligible for a rowkey-based merge join, both relations must also have the same partitioning. Otherwise, one of the relations must be spooled and partitioned to impose equal partitioning between the two.

Vantage does not support full outer rowkey-based merge joins with partition remapping for row-partitioned PI tables having 8-byte partitioning.

Also see Necessary Conditions for a Rowkey-Based Merge Join With a Character-Partitioned Relation below for additional restrictions that apply for character PPIs.

Supported Join Methods for Rowkey-Based Merge Join

The following join types are supported for a rowkey-based merge join, where fast path designates row hash match scan within left and right matching partitions and slow path designates index access on one relation followed by a lookup in the other table for all rows with matching row hashes within the corresponding partition:
  • Fast path inner, left and full outer merge join.
  • Slow path inner and left outer merge join.
  • Fast path inner and outer inclusion/exclusion merge join.
  • Slow path inner and outer inclusion/exclusion merge join.
  • Fast path correlated inclusion/exclusion merge join.
  • Slow path correlated inclusion/exclusion merge join.

    Note that this type of join can only occur between a table and a spool, and not directly between two tables.

Right outer merge join is not supported for a rowkey-based merge join. Note that the Optimizer can switch the relations so that the join type is a left outer merge join.

If the internal partition mapping is not the same for both relations, a rowkey-based merge join is not eligible for synchronized scanning. If the internal partition mapping is the same for both relations, a rowkey-based merge join is eligible for synchronized scanning only for the following join types:
  • Slow path left outer merge join.
  • Slow path inner merge join.

A rowkey-based merge join can be considered for a join between a primary-indexed table and another table whose primary index is not a join condition. To make such a join, the system builds the rowkey for the other table in spool based on the partitioning expressions of the row-partitioned PI table on the other side of the join.

Unsupported Rowkey-Based Merge Join Partition Elimination Feature

Vantage does not support dynamic row partition elimination for full outer rowkey-based merge joins with partition remapping for 8-byte partitioning.

Making a Row-partitioned PI-to-Spool Join

In this example, the database can select rows from the dimension tables and join the intermediate results set to form a spool that is sorted by a rowkey matching that of the row-partitioned PI table. Then the system can do a rowkey-based merge join from the PPI table to the spool.

This method replaces having to redistribute the spool and then sorting it on the join columns, with the other tables joined and spool also sorted on the join columns, and then joining the two spools using a rowhash merge join or some other join plan. Ultimately, the Optimizer selects the plan that it estimates to be the least costly of its available options.

Definition DDL request text for the markets table.

CREATE TABLE markets (
  productid       INTEGER NOT NULL,
  region          BYTEINT NOT NULL,
  activity_date   DATE FORMAT 'yyyy-mm-dd' NOT NULL,
  revenue_code    BYTEINT NOT NULL,
  business_sector BYTEINT NOT NULL,
  note            VARCHAR(256))
PRIMARY INDEX (productid, region)
PARTITION BY (
RANGE_N(region          BETWEEN 1
                        AND     9
                        EACH    3),
RANGE_N(business_sector BETWEEN 0
                        AND    49
                        EACH   10),
RANGE_N(revenue_code    BETWEEN 1
                        AND    34
                        EACH    2),
RANGE_N(activity_date   BETWEEN DATE '1986-01-01'
                        AND     DATE '2007-05-31'
                        EACH INTERVAL '1' MONTH));

Definition DDL request text for the products table is as follows:

CREATE TABLE products (
  productid    INTEGER NOT NULL,
  product_name CHARACTER(30),
  description  VARCHAR(256))
PRIMARY INDEX (productid);

Definition DDL request text for the regions table is as follows:

CREATE TABLE regions (
  region      INTEGER NOT NULL,
  region_name CHARACTER(30),
  description VARCHAR(256))
PRIMARY INDEX (region_name);

Definition DDL statement text for the business_sectors table is as follows:

CREATE TABLE business_sectors (
  productid            INTEGER NOT NULL,
  business_sector_name CHARACTER(30),
  description          VARCHAR(256))
PRIMARY INDEX (business_sector_name);

Definition DDL statement text for the revenue_codes table is as follows:

CREATE TABLE revenue_codes (
  revenue_code      INTEGER NOT NULL,
  revenue_code_name CHARACTER(30),
  description       VARCHAR(256))
PRIMARY INDEX (revenue_code_name);

Definition DDL statement text for the activity_calendar table is as follows:

CREATE TABLE activity_calendar (
  quarter       CHARACTER(6),
  activity_date DATE FORMAT 'yyyy-mm-dd' NOT NULL,
  description   VARCHAR(256))
PRIMARY INDEX (quarter);

Example query that joins all these tables is as follows:

SELECT p.product_name, r.region_name, b.business_sector_name,
       rc.revenue_code_name, a.quarter, m.activity_date, m.note
FROM   markets AS m, products AS p, regions AS r,
       business_sectors AS b, revenue_codes AS rc,
       activity_calendar AS a
WHERE  p.productid BETWEEN 4000
                   AND     4999
  AND r.region_name = 'West'
  AND b.business_sector_name IN ('Cosmetics','Snack Food','Hats')
  AND rc.revenue_code_name IN ('Web', 'Catalog')
  AND a.quarter IN ('2006Q1', '2005Q1')
  AND m.productid       = p.productid
  AND m.region          = r.region
  AND m.business_sector = b.business_sector
  AND m.revenue_code    = rc.revenue_code
  AND m.activity_code   = a.activity_code;

Necessary Conditions for a Rowkey-Based Merge Join With a Character-Partitioned Relation

The following rules apply to direct rowkey-based merge joins for character-partitioned relations. In each case, the attributes must be the same for both relations in the join for it to be eligible for the rowkey-based merge join method.
A Direct Join is a binary join operation for which the relation of interest is not spooled in preparation for the join. For example, a direct merge join is a join in which the relation of interest is not spooled in preparation for a merge join. Similarly, a direct product join is a product join in which the relation of interest is not spooled in preparation for a product join.
  • The relations must have the same collation.
  • The relations must have the same server character set for their character partitioning columns.
  • The character partitioning columns of the relations must have the same data type, including column length and case specific attribute.

Unless all of these conditions are true for both of the relations to be joined, one of them must be spooled and then repartitioned to match the partitioning of the other partitioned relation in the join.

Whether the Optimizer selects a direct rowkey-based merge join, or whether one relation is spooled and then partitioned to match the partitioning of the character-partitioned relation, the following rules must be met for the Optimizer to specify a rowkey-based merge join:
  • The estimated cost of the join must be less than the estimated cost of all other join types.
  • There must be no more than one character partitioning column at each partitioning level.
  • If the session collation or table collation is MULTINATIONAL or CHARSET_COLL, and if any comparison or string function involving any non-constant expression in the partitioning expression at any partitioning level is case-insensitive, the session collation must match the table collation.
    The following functions and attribute are case-insensitive:
    • LOWER function
    • SOUNDEX function
    • UPPER function
    • UPPERCASE attribute
    The following functions are case-sensitive:
    • CHAR2HEXINT
    • TRANSLATE
    • TRANSLATE_CHK
    • TRIM
    • VARGRAPHIC

    Presence of the concatenation operator (||) marks the expression as having the property of being both case sensitive and case-insensitive.

    The following functions follow the same rules as comparison operators, function input arguments, and the default case sensitivity for the session mode that was in effect when the character partitioning was created or modified, and the system examines them all to determine case sensitivity.
    • INDEX
    • MINDEX
    • POSITION

    Specifying the SUBSTRING function does not affect case sensitivity.

  • If the WHERE clause predicate that qualifies a partitioning level for a rowkey-based merge join is case insensitive, then all comparisons and all string functions in any non-constant expressions in the partitioning expression for that level must also be case-insensitive.

    The WHERE clause predicate is considered to be case insensitive if any of the comparisons or string functions involving non-constant expressions in the condition is case-insensitive.

  • If the character set of the character partitioning column in an equality join condition is different than character set of the expression involving that column in the other relation in the join condition, then you must store the partitioning column data using the Unicode server character set.

    This is because the system handles comparisons between non-constant character expressions by implicitly converting both expressions to Unicode before making the comparison.

For the Optimizer to specify a rowkey-based merge join, the collation for the current session need not match the collation for the table when all equality join terms on character partitioning columns are case sensitive.

If you create a character-partitioned table in an ANSI mode session, the system implicitly casts the partitioning columns compared in CASE_N-based partitioning expressions or generalized partitioning expressions to CASESPECIFIC unless the constant expression in the comparison is explicitly CAST as NOT CASESPECIFIC.

This has an effect on the types of join terms that can be evaluated using rowkey-based merge join with such a partitioned primary index (see the example of a case where a rowkey-based merge join cannot be used later in this topic).

In this example, 2 character-partitioned tables are joined on their primary indexes and partitioning expressions. Assume the following table definitions, both of which were created under the same session collation:

CREATE SET TABLE test1, NO FALLBACK, NO BEFORE JOURNAL,
                        NO AFTER JOURNAL,CHECKSUM = DEFAULT (
  i INTEGER,
  j CHARACTER(4) CHARACTER SET UNICODE CASESPECIFIC)
PRIMARY INDEX (i)
PARTITION BY RANGE_N(j BETWEEN 'AAAA','ZZZZ','aaaa','yyyy'
                       AND     'zzzz');
CREATE SET TABLE test11, NO FALLBACK, NO BEFORE JOURNAL,
                         NO AFTER JOURNAL, CHECKSUM = DEFAULT (
  i INTEGER,
  j CHARACTER(4) CHARACTER SET UNICODE CASESPECIFIC)
PRIMARY INDEX (i)
PARTITION BY RANGE_N(j BETWEEN 'AAAA','ZZZZ','aaaa','yyyy'
                       AND     'zzzz');

Assume you submit the following SELECT request that joins these tables:

SELECT *
FROM test1 INNER JOIN test11
WHERE test1.i = test11.i
AND   test1.j = test11.j;
The database can join these tables using a direct rowkey-based merge join instead of having to redistribute them both to spool, sorting the spool by the hash of the primary index, and then making a row hash merge join because the relations being joined conform to the following requirements for a direct rowkey-based merge join.
  • The relations have the same session collation for their character partitioning columns.
  • The relations have the same server character set for their character partitioning columns.

    In this case, the matching server character set is Unicode for character partitioning column j in both relations.

  • The relations have the same data type, column length, and case specific attribute for character partitioning column j.

    In this case, the data types (CHARACTER), their lengths (4 characters), and their case specificity (CASESPECIFIC) are the same for character partitioning column j in both relations.

In this example, 2 tables, only one of which has character partitioning, are joined on their primary indexes. Assume the following table definitions, both of which were created under the same session collation:

CREATE TABLE markets (
  product_name    VARCHAR(50) CHARACTER SET UNICODE CASESPECIFIC,
  region          BYTEINT NOT NULL,
  activity_date   DATE FORMAT 'yyyy-mm-dd' NOT NULL,
  revenue_code    BYTEINT NOT NULL,
  business_sector BYTEINT NOT NULL,
  note            VARCHAR(256))
PRIMARY INDEX (productname)
PARTITION BY (RANGE_N(productname BETWEEN 'A','B','C','D','E','F',
                                          'G','H','I','J','K','L',
                                          'M','N','O','P','Q','R',
                                          'S','T','U','V','W','X',
                                          'Y','Z'
                                  AND
            'ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ',
                 NO RANGE,
                 UNKNOWN));
CREATE TABLE products (
  product_name VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
  description VARCHAR(256))
PRIMARY INDEX (productname);

Note that markets is a character-partitioned relation, while products is not.

Assume you submit the following SELECT request that joins these tables:

SELECT *
FROM products INNER JOIN markets
WHERE products.product_name = markets.product_name;
The database cannot join these tables using a direct merge join, so it must instead redistribute the products table to spool, sort the spool by the hash of the primary index, and then make a rowkey-based merge join because the relations being joined do not conform to the requirements for a direct rowkey-based merge join in the following ways:
  • Only one of the relations in the join, markets, has partitioning.
  • The relations do not have the same case-specific attribute for the join column product_name.

    In this case, the case specificity is CASESPECIFIC for column product_name in relation markets, but NOT CASESPECIFIC for character partitioning column product_name in relation products.

Instead of using a rowkey-based merge join, the system selects rows from the products table to form a spool that it then sorts by a rowkey that is the same as that for the character-partitioned relation markets, and it then makes a rowkey-based merge join from the character-partitioned relation markets to the spooled relation products.

The session collation need not match the character-partitioned table collation because all comparisons in the partitioned table are case specific, but if the table collation definition changes after the table was created (this is applicable to the MULTINATIONAL and CHARSET_COLL collations only), then the Optimizer does not specify a rowkey-based merge join.

The following CREATE TABLE requests are submitted in ANSI session mode:

CREATE MULTISET TABLE tt1(
  a INTEGER,
  b CHARACTER(30) NOT CASESPECIFIC)
PRIMARY INDEX (a)
PARTITION BY CASE_N(b BETWEEN 'A' AND 'Z',
                    b BETWEEN 'a' AND 'z',
             NO CASE OR UNKNOWN);
CREATE MULTISET TABLE tt2(
  a INTEGER,
  b CHARACTER(30) NOT CASESPECIFIC)
PRIMARY INDEX (a);

The following SELECT request joining these tables is submitted in Teradata session mode:

SELECT *
FROM tt1 INNER JOIN tt2
WHERE tt1.a = tt2.a
AND   tt1.b = tt2.b;

The system cannot use a rowkey merge join for this request because the comparisons in the partitioning expression are case-sensitive, but the join conditions in the SELECT request are case-insensitive.

The following CREATE TABLE requests are submitted in Teradata session mode:

CREATE MULTISET TABLE tt3(
  a INTEGER,
  b CHARACTER(30) NOT CASESPECIFIC)
PRIMARY INDEX (a)
PARTITION BY CASE_N(TRIM(LEADING 'a' FROM b) BETWEEN 'A'
                                             AND     'Z',
                    TRIM(LEADING 'a' FROM b) BETWEEN 'a'
                                             AND     'z',
                    NO CASE OR UNKNOWN);
CREATE MULTISET TABLE tt2(
  a INTEGER,
  b CHARACTER(30) NOT CASESPECIFIC)
PRIMARY INDEX (a);

The following SELECT request joining these tables is submitted in Teradata session mode:

SELECT *
FROM tt3 INNER JOIN tt2
WHERE tt3.a = tt2.a
AND   tt3.b = tt2.b;

The Optimizer cannot select a rowkey merge join for this case because the comparisons in the partitioning expression for tt3 are case-sensitive because of the TRIM function, but the join conditions in the SELECT request are case insensitive.