16.10 - Eliminating Redundant Joins - Teradata Database

Teradata Database SQL Request and Transaction Processing

prodname
Teradata Database
vrm_release
16.10
created_date
June 2017
category
Programming Reference
User Guide
featnum
B035-1142-161K

Joins are among the most frequently specified operations in SQL queries. They are also among the most demanding resource consumers in the palette of an SQL coder. Join elimination removes unnecessary tables from queries.

Join elimination is most commonly used in the following situations:

  • Inner joins based on any form of referential integrity between 2 tables.
  • Left and right outer joins can be eliminated if the join is based on unique columns from the right table in the join specification.

In both cases, a join and a table (the table eliminated is the parent table in inner joins and the inner table in outer joins) are removed from the query, assuming that no projections are needed from the eliminated table to keep the query whole.

Because of its many optimized join algorithms, Teradata Database processes joins with relative ease. Nevertheless, when joins can be eliminated or made less complex, the result is always better performance. An excellent example of how the Optimizer recognizes and eliminates unnecessary or redundant joins is provided in the topic “Outer Join Case Study” and its subordinate topics in SQL Data Manipulation Language. The topics “First Attempt” and “Third Attempt” are particularly good examples of how the Optimizer can streamline poorly written joins.

Primary key-foreign key relationships between tables are very common in normalized data models and their corresponding physical databases. Define the parent tables in such a relationship as PK-tables and the child tables as FK-tables. In Teradata Database, you can define these relationships explicitly by specifying referential integrity constraints when you create or alter tables (see “CREATE TABLE” and “ALTER TABLE” in SQL Data Definition Language. Also see the chapter “Designing for Database Integrity” in Database Design).

It also common to query PK-tables and FK-tables with joins based on their primary key and foreign key column sets, which are referred to as PK-FK joins. Sometimes the PK-FK joins are redundant, as, for example, when a query does not reference columns from the PK-table other than the PK-column set itself. Recognizing and eliminating such redundant joins can significantly reduce query execution times.

Join Elimination and Normalized Tables

The following rules apply to join elimination and normalized tables:

  • There is no impact for 2 tables being joined when one of the two tables is temporal except for SEQUENCED temporal joins. In this case, join elimination is enabled if either of the following conditions is true.
    • The SELECT request specifies NORMALIZE.
    • The parent table in the join is normalized and the normalize Period column is a VALIDTIME column.
  • There is no impact when neither the 2 tables being joined is a temporal table when the SELECT request specifies NORMALIZE in its select list.

Vertical Partitioning

An obvious way to partition tables vertically is to define them as column-partitioned tables. This should be your first choice for vertical partitioning, and you should only use vertical table partitioning for those cases where column-partitioning is not possible, for example when partitions require different attributes.

Another application where you might want to implement referential integrity constraints between 2 tables is the vertical partitioning of a single wide table. Vertical partitioning means that a table with a very large number of columns, many of which are seldom queried, is split into 2 tables where one table retains the frequently accessed columns and the other table has the less frequently accessed columns. To ensure consistency between the tables, thereby maintaining a robustly consistent virtual table that contains correct data in all of the columns, a referential integrity constraint is needed between the tables. The choice of the PK-table and the FK-table in this case is arbitrary, but the best practice is to define the table with the frequently accessed columns as the FK-table.

To hide this artificial partitioning, you can define a view that selects all of the columns from both tables using a PK-FK join. When a request is submitted against the view that only references columns from the FK-table, its join is obviously redundant and can be eliminated. Note that without the intervention of Query Rewrite, you have no way of removing such a redundant join from a query.

Example of Join Elimination for Redundant Joins

The following request illustrates redundant joins:

     SELECT s.suppkey, s.address, n.nationkey
     FROM supplier AS s, nation AS n
     WHERE s.nation_key=n.nation_key
     ORDER BY n.nation_key;

The join is redundant in this request because every row in supplier has exactly one match in nation based on the following referential integrity definition for the supplier table.

     FOREIGN KEY (nation_key) REFERENCES nation (nation_key).

Also, only n_nationkey is referenced in the request, and it can be replaced by s.nation_key. The modified query after Join elimination looks as follows:

     SELECT s.supp_key, s.address, s.nation_key
     FROM supplier
     ORDER BY s.nation_key;

Example of Join Elimination for SEQUENCED Inner Join Views

It is not possible to eliminate SEQUENCED inner joins unless the parent table is normalized and the normalize column is a VALIDTIME column. However, SEQUENCED inner join semantics can be eliminated from a view or derived table if the qualification from the view or derived table is qualified by either a CURRENT or AS OF qualifier. SEQUENCED inner join semantics can be eliminated by pushing the CURRENT or AS OF qualifier to individual valid-time, transaction-time, or bitemporal tables in the view. Join elimination is facilitated still further if there is a CURRENT or SEQUENCED referential integrity relationship between the tables.

For example, assume you define the following SEQUENCED VALIDTIME view v1.

     CREATE VIEW v1 AS
     SEQUENCED VALIDTIME SELECT vt1.c2, vt2.c3
                         FROM vt1, vt2
                         WHERE vt1.c1=vt2.c2;

Now explain a simple AS OF-qualified SELECT request on v1 to illustrate how Teradata Database can use join elimination to simplify the SELECT request.

     EXPLAIN VALIDTIME AS OF DATE '2005-01-01' SELECT c2, c3
                                               FROM v1;

The following EXPLAIN text indicates that to solve this request, Query Rewrite joins with the SEQUENCED VALIDTIME qualifier with overlapping join condition between vt1 and vt2 VALIDTIME columns.

4) We do an all-AMPs RETRIEVE step from TEST.vt2 in view v1 (with
   temporal qualifier as "SEQUENCED VALIDTIME") by way of an all-rows
   scan with a condition of ("(NOT (TEST.vt2 in view v1.vt1 IS NULL
   )) AND (NOT (TEST.vt2 in view v1.c2 IS NULL))") into Spool 2
   (all_amps), which is redistributed by hash code to all AMPs. Then
   we do a SORT to order Spool 2 by row hash. The size of Spool 2 is
   estimated with no confidence to be 8 rows (232 bytes). The
   estimated time for this step is 0.04 seconds.
5) We do an all-AMPs JOIN step from TEST.vt1 in view v1 (with
   temporal qualifier as "SEQUENCED VALIDTIME") by way of a RowHash
   match scan with a condition of ("NOT (TEST.vt1 in view v1.vt1 IS
   NULL)"), which is joined to Spool 2 (Last Use) by way of a RowHash
   match scan. TEST.vt1 and Spool 2 are joined using a merge join,
   with a join condition of ("(TEST.vt1.c1 = c2) AND
   (((BEGIN({LeftTable}.vt1 P_INTERSECT {RightTable}.vt1 ))<= DATE
   '2005-01-01') AND (((END({LeftTable}.vt1 P_INTERSECT
   {RightTable}.vt1 ))> DATE '2005-01-01') AND
   (((BEGIN({LeftTable}.vt1 ))< (END({RightTable}.vt1 )))AND
   ((END({LeftTable}.vt1 ))> (BEGIN({RightTable}.vt1 ))))))"). The
   result goes into Spool 1 (used to materialize view, derived table
   or table function v1) (all_amps), which is built locally on the
   AMPs. The size of Spool 1 is estimated with no confidence to be
   11 rows (407 bytes). The estimated time for this step is 0.20
   seconds.
6) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of
   an all-rows scan with a condition of ("((END(v1."VALIDTIME" ))>
   DATE '2005-01-01') AND ((BEGIN(v1."VALIDTIME" ))<= DATE
   '2005-01-01')") into Spool 3 (group_amps), which is built locally
   on the AMPs. The size of Spool 3 is estimated with no confidence
   to be 11 rows (473 bytes). The estimated time for this step is
   0.16 seconds.

You can rewrite the definition for v1 as indicated by pushing the VALIDTIME AS OF qualifier into the view.

     REPLACE VIEW v1 AS
     VALIDTIME AS OF DATE '2005-01-01' SELECT vt1.c2, vt2.c3
                                       FROM vt1, vt2
                                       WHERE vt1.c1=vt2.c2;

Because of pushing the VALIDTIME AS OF qualifier into the view, Query Rewrite is then able to do the following.

  • Eliminate SEQUENCED semantics from requests made on the view
  • Eliminate inner joins when a CURRENT or AS OF qualifier is pushed inside the view
  • Reduce the spool size when the view is not folded

For these reasons, Query Rewrite rewrites the query by pushing the CURRENT or AS OF qualifier into a SEQUENCED view or derived table when the following conditions are true.

  • A query in a view or derived table has SEQUENCED qualifier.
  • The view or derived must be qualified with a CURRENT or AS OF qualifier, so the qualifier on the spool must be either CURRENT or AS OF.
  • The system-projected VALIDTIME or TRANSACTIONTIME column must not be referenced in the parent query block of the view or derived table.
Only VALIDTIME, TRANSACTIONTIME, or both qualifiers from an outer query can be pushed into a view or derived table. Furthermore, join elimination does not occur if a request specifies conditions on the system-projected VALIDTIME column of the view.

Conditions for Eliminating a PK-FK Join

The following conditions are sufficient to eliminate a PK-FK join and, of course, the PK-table, and all are checked by this rewrite method.

  • There is a referential integrity constraint defined between the 2 tables.
  • The query conditions are conjunctive.
  • No columns from the PK-table other than the PK-columns are referenced in the request. This includes the SELECT, WHERE, GROUP BY, HAVING, ORDER BY, and other clauses.
  • The PK-columns in the WHERE clause can only appear in PK=FK (equality) joins.

If these four conditions are met, then the PK-table and PK-FK join can be removed from the request and all references to the PK columns in the query are mapped to the corresponding FK columns. Also, Query Rewrite adds a NOT NULL condition on the FK columns if they are nullable. This optimization is applied as a rule, so no costing is required. Also, Query Rewrite does these optimizations automatically. To trigger this optimization, you must define a referential integrity constraint between the child and parent tables.

Be aware that the cost of referential integrity maintenance can exceed the benefit of join elimination. To mitigate this possibility, you can substitute either a batch referential integrity constraint or a Referential Constraint (see “CREATE TABLE” in SQL Data Definition Language and the chapter “Designing for Database Integrity” in Database Design for details of these referential integrity types).

The following table outlines the differences between these 2 referential integrity mechanisms.

Referential Integrity Type Description
Batch Constraint Referential integrity is enforced between the tables in the relationship.

The cost of maintaining a Batch RI constraint is normally significantly less than that of maintaining the same relationship with a standard Referential Integrity constraint.

Referential Constraint Referential integrity is not enforced between the tables in the relationship.

The constraint is used mainly as a hint for join elimination.

Generally speaking, you should only specify a Referential Constraint when you are confident that the parent-child relationship is consistent (meaning that there is little or no possibility for referential integrity violations to occur).

Consider this SELECT request.

     SELECT sales1.*
     FROM sales1, product
     WHERE sales_product_key=product_key;

The table product can safely be removed from the query because it is a dimension table of sales1 and the only reference to it is by means of a PK-FK join with sales1. This logic is also applied to remove the inner table of an outer join if the join condition is an equality condition on a unique column of the inner table.

The following example illustrates outer join elimination:

     SELECT sales1.*
     FROM sales1 LEFT OUTER JOIN product
     ON sales_product_key=product_key;