Eliminating Redundant Joins - Advanced SQL Engine - Teradata Database

SQL Request and Transaction Processing

Product
Advanced SQL Engine
Teradata Database
Release Number
17.05
17.00
Published
June 2020
Language
English (United States)
Last Update
2021-01-24
dita:mapPath
ykx1561500561173.ditamap
dita:ditavalPath
lze1555437562152.ditaval
dita:id
B035-1142
lifecycle
previous
Product Category
Teradata Vantage™

Joins are among the most frequently specified operations in SQL queries. They are also among the most demanding resource consumers in the palette of an SQL coder. Join elimination removes unnecessary tables from queries.

Join elimination is most commonly used in the following situations:
  • Inner joins based on any form of referential integrity between 2 tables.
  • Left and right outer joins can be eliminated if the join is based on unique columns from the right table in the join specification.

In both cases, a join and a table (the table eliminated is the parent table in inner joins and the inner table in outer joins) are removed from the query, assuming that no projections are needed from the eliminated table to keep the query whole.

Because of its many optimized join algorithms, Teradata Database processes joins with relative ease. Nevertheless, when joins can be eliminated or made less complex, the result is always better performance. An excellent example of how the Optimizer recognizes and eliminates unnecessary or redundant joins is provided in the topic “Outer Join Case Study” and its subordinate topics in Teradata Vantage™ - SQL Data Manipulation Language, B035-1146. The topics “First Attempt” and “Third Attempt” are particularly good examples of how the Optimizer can streamline poorly written joins.

Primary key-foreign key relationships between tables are very common in normalized data models and their corresponding physical databases. Define the parent tables in such a relationship as PK-tables and the child tables as FK-tables. In Teradata Database, you can define these relationships explicitly by specifying referential integrity constraints when you create or alter tables (see “CREATE TABLE” and “ALTER TABLE” in Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144. Also see “Designing for Database Integrity” in Teradata Vantage™ - Database Design, B035-1094).

It also common to query PK-tables and FK-tables with joins based on their primary key and foreign key column sets, which are referred to as PK-FK joins. Sometimes the PK-FK joins are redundant, as, for example, when a query does not reference columns from the PK-table other than the PK-column set itself. Recognizing and eliminating such redundant joins can significantly reduce query execution times.

Vertical Partitioning

An obvious way to partition tables vertically is to define them as column-partitioned tables. This should be your first choice for vertical partitioning, and you should only use vertical table partitioning for those cases where column-partitioning is not possible, for example when partitions require different attributes.

Another application where you might want to implement referential integrity constraints between 2 tables is the vertical partitioning of a single wide table. Vertical partitioning means that a table with a very large number of columns, many of which are seldom queried, is split into 2 tables where one table retains the frequently accessed columns and the other table has the less frequently accessed columns. To ensure consistency between the tables, thereby maintaining a robustly consistent virtual table that contains correct data in all of the columns, a referential integrity constraint is needed between the tables. The choice of the PK-table and the FK-table in this case is arbitrary, but the best practice is to define the table with the frequently accessed columns as the FK-table.

To hide this artificial partitioning, you can define a view that selects all of the columns from both tables using a PK-FK join. When a request is submitted against the view that only references columns from the FK-table, its join is obviously redundant and can be eliminated. Note that without the intervention of Query Rewrite, you have no way of removing such a redundant join from a query.

Example of Join Elimination for Redundant Joins

The following request illustrates redundant joins:

     SELECT s.suppkey, s.address, n.nationkey
     FROM supplier AS s, nation AS n
     WHERE s.nation_key=n.nation_key
     ORDER BY n.nation_key;

The join is redundant in this request because every row in supplier has exactly one match in nation based on the following referential integrity definition for the supplier table.

     FOREIGN KEY (nation_key) REFERENCES nation (nation_key).

Also, only n_nationkey is referenced in the request, and it can be replaced by s.nation_key. The modified query after Join elimination looks as follows:

     SELECT s.supp_key, s.address, s.nation_key
     FROM supplier
     ORDER BY s.nation_key;

Conditions for Eliminating a PK-FK Join

The following conditions are sufficient to eliminate a PK-FK join and, of course, the PK-table, and all are checked by this rewrite method.
  • There is a referential integrity constraint defined between the 2 tables.
  • The query conditions are conjunctive.
  • No columns from the PK-table other than the PK-columns are referenced in the request. This includes the SELECT, WHERE, GROUP BY, HAVING, ORDER BY, and other clauses.
  • The PK-columns in the WHERE clause can only appear in PK=FK (equality) joins.

If these four conditions are met, then the PK-table and PK-FK join can be removed from the request and all references to the PK columns in the query are mapped to the corresponding FK columns. Also, Query Rewrite adds a NOT NULL condition on the FK columns if they are nullable. This optimization is applied as a rule, so no costing is required. Also, Query Rewrite does these optimizations automatically. To trigger this optimization, you must define a referential integrity constraint between the child and parent tables.

Be aware that the cost of referential integrity maintenance can exceed the benefit of join elimination. To mitigate this possibility, you can substitute either a batch referential integrity constraint or a Referential Constraint (see “CREATE TABLE” in Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144 and “Designing for Database Integrity” in Teradata Vantage™ - Database Design, B035-1094 for details of these referential integrity types).

The following table outlines the differences between these 2 referential integrity mechanisms.

Referential Integrity Type Description
Batch Constraint Referential integrity is enforced between the tables in the relationship.

The cost of maintaining a Batch RI constraint is normally significantly less than that of maintaining the same relationship with a standard Referential Integrity constraint.

Referential Constraint Referential integrity is not enforced between the tables in the relationship.

The constraint is used mainly as a hint for join elimination.

Generally speaking, you should only specify a Referential Constraint when you are confident that the parent-child relationship is consistent (meaning that there is little or no possibility for referential integrity violations to occur).

Consider this SELECT request.

     SELECT sales1.*
     FROM sales1, product
     WHERE sales_product_key=product_key;

The table product can safely be removed from the query because it is a dimension table of sales1 and the only reference to it is by means of a PK-FK join with sales1. This logic is also applied to remove the inner table of an outer join if the join condition is an equality condition on a unique column of the inner table.

The following example illustrates outer join elimination:

     SELECT sales1.*
     FROM sales1 LEFT OUTER JOIN product
     ON sales_product_key=product_key;