Joins are among the most frequently specified operations in SQL queries. They are also among the most demanding resource consumers in the palette of an SQL coder. Join elimination removes unnecessary tables from queries.
- Inner joins based on any form of referential integrity between 2 tables.
- Left and right outer joins can be eliminated if the join is based on unique columns from the right table in the join specification.
In both cases, a join and a table (the table eliminated is the parent table in inner joins and the inner table in outer joins) are removed from the query, assuming that no projections are needed from the eliminated table to keep the query whole.
Because of its many optimized join algorithms, Vantage processes joins with relative ease. Nevertheless, when joins can be eliminated or made less complex, the result is always better performance. For an example of how the Optimizer recognizes and eliminates unnecessary or redundant joins, see Teradata Vantage™ - SQL Data Manipulation Language, B035-1146.
Primary key-foreign key relationships between tables are very common in normalized data models and their corresponding physical databases. Define the parent tables in such a relationship as PK-tables and the child tables as FK-tables. You can define these relationships explicitly by specifying referential integrity constraints using CREATE TABLE and ALTER TABLE (see Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144 and Teradata Vantage™ - Database Design, B035-1094).
It also common to query PK-tables and FK-tables with joins based on their primary key and foreign key column sets, which are referred to as PK-FK joins. Sometimes the PK-FK joins are redundant, as, for example, when a query does not reference columns from the PK-table other than the PK-column set itself. Recognizing and eliminating such redundant joins can significantly reduce query execution times.
An obvious way to partition tables vertically is to define them as column-partitioned tables. This should be your first choice for vertical partitioning, and you should only use vertical table partitioning for those cases where column-partitioning is not possible, for example when partitions require different attributes.
Another application where you might want to implement referential integrity constraints between 2 tables is the vertical partitioning of a single wide table. Vertical partitioning means that a table with a very large number of columns, many of which are seldom queried, is split into 2 tables where one table retains the frequently accessed columns and the other table has the less frequently accessed columns. To ensure consistency between the tables, thereby maintaining a robustly consistent virtual table that contains correct data in all of the columns, a referential integrity constraint is needed between the tables. The choice of the PK-table and the FK-table in this case is arbitrary, but the best practice is to define the table with the frequently accessed columns as the FK-table.
To hide this artificial partitioning, you can define a view that selects all of the columns from both tables using a PK-FK join. When a request is submitted against the view that only references columns from the FK-table, its join is obviously redundant and can be eliminated. Note that without the intervention of Query Rewrite, you have no way of removing such a redundant join from a query.
Example of Join Elimination for Redundant Joins
The following request illustrates redundant joins:
SELECT s.suppkey, s.address, n.nationkey FROM supplier AS s, nation AS n WHERE s.nation_key=n.nation_key ORDER BY n.nation_key;
The join is redundant in this request because every row in supplier has exactly one match in nation based on the following referential integrity definition for the supplier table.
FOREIGN KEY (nation_key) REFERENCES nation (nation_key).
Also, only n_nationkey is referenced in the request, and it can be replaced by s.nation_key. The modified query after Join elimination looks as follows:
SELECT s.supp_key, s.address, s.nation_key FROM supplier ORDER BY s.nation_key;
Conditions for Eliminating a PK-FK Join
- There is a referential integrity constraint defined between the 2 tables.
- The query conditions are conjunctive.
- No columns from the PK-table other than the PK-columns are referenced in the request. This includes the SELECT, WHERE, GROUP BY, HAVING, ORDER BY, and other clauses.
- The PK-columns in the WHERE clause can only appear in PK=FK (equality) joins.
If these four conditions are met, then the PK-table and PK-FK join can be removed from the request and all references to the PK columns in the query are mapped to the corresponding FK columns. Also, Query Rewrite adds a NOT NULL condition on the FK columns if they are nullable. This optimization is applied as a rule, so no costing is required. Also, Query Rewrite does these optimizations automatically. To trigger this optimization, you must define a referential integrity constraint between the child and parent tables.
Be aware that the cost of referential integrity maintenance can exceed the benefit of join elimination. To mitigate this possibility, you can substitute either a batch referential integrity constraint or a Referential Constraint. For details of these referential integrity types, see Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144 and Teradata Vantage™ - Database Design, B035-1094.
The following table outlines the differences between these 2 referential integrity mechanisms.
|Referential Integrity Type||Description|
|Batch Constraint||Referential integrity is enforced between the tables in the relationship.
The cost of maintaining a Batch RI constraint is normally significantly less than that of maintaining the same relationship with a standard Referential Integrity constraint.
|Referential Constraint||Referential integrity is not enforced between the tables in the relationship.
The constraint is used mainly as a hint for join elimination.
Generally speaking, you should only specify a Referential Constraint when you are confident that the parent-child relationship is consistent (meaning that there is little or no possibility for referential integrity violations to occur).
Consider this SELECT request.
SELECT sales1.* FROM sales1, product WHERE sales_product_key=product_key;
The table product can safely be removed from the query because it is a dimension table of sales1 and the only reference to it is by means of a PK-FK join with sales1. This logic is also applied to remove the inner table of an outer join if the join condition is an equality condition on a unique column of the inner table.
The following example illustrates outer join elimination:
SELECT sales1.* FROM sales1 LEFT OUTER JOIN product ON sales_product_key=product_key;