Eliminating Redundant Joins - Teradata Vantage

Teradata® VantageCloud Lake

Deployment

VantageCloud

Edition

Lake

Product

Teradata Vantage

Published

January 2023

Language

English (United States)

Last Update

2024-04-03

dita:mapPath

phg1621910019905.ditamap

dita:ditavalPath

pny1626732985837.ditaval

dita:id

phg1621910019905

Joins are frequently specified operations in SQL queries, and are also demanding resource consumers in the palette of an SQL coder. Join elimination removes unnecessary tables from queries.

Join elimination is most commonly used in the following situations:

Inner joins based on any form of referential integrity between 2 tables.
Left and right outer joins can be eliminated if the join is based on unique columns from the right table in the join specification.

In both cases, a join and a table (the table eliminated is the parent table in inner joins and the inner table in outer joins) are removed from the query, assuming that no projections are needed from the eliminated table to keep the query whole.

Although Vantage has optimized join algorithms, eliminating or simplifying joins improves performance. See Outer Join Case Study.

Primary key-foreign key relationships between tables are common in normalized data models and their corresponding physical databases. Define the parent tables in such a relationship as PK-tables and the child tables as FK-tables. You can define these relationships explicitly by specifying referential integrity constraints using CREATE TABLE and ALTER TABLE, and Semantic Constraint Specifications).

Referential integrity is supported only on the Block File System on the primary cluster, not on the Object File System.

Also common is querying PK-tables and FK-tables with joins based on their primary key and foreign key column sets, which are called PK-FK joins. PK-FK joins can be redundant, as, for example, when a query does not reference columns from the PK-table other than the PK-column set itself. Recognizing and eliminating redundant joins can significantly reduce query run times.

Vertical Partitioning

Your first choice for vertical partitioning is to define tables as column-partitioned. When that is not possible (for example, when partitions require different attributes), use vertical table partitioning.

Another application where you may want to implement referential integrity constraints between 2 tables is the vertical partitioning of a single wide table. Split the table into two tables, putting the frequently accessed columns in one table and the less frequently accessed columns in the other table. To keep the tables consistent, use a referential integrity constraint. The choice of the PK-table and the FK-table is arbitrary, but the best practice is to define the table with the frequently accessed columns as the FK-table.

To hide this artificial partitioning, you can define a view that selects all of the columns from both tables using a PK-FK join. When a request is submitted against the view that only references columns from the FK-table, its join is obviously redundant and can be eliminated. Without the intervention of Query Rewrite, you have no way of removing such a redundant join from a query.

Example: Join Elimination for Redundant Joins

The following request shows redundant joins:

SELECT s.suppkey, s.address, n.nationkey
FROM supplier AS s, nation AS n
WHERE s.nation_key=n.nation_key
ORDER BY n.nation_key;

The join is redundant in this request because every row in supplier has exactly one match in nation based on the following referential integrity definition for the supplier table.

FOREIGN KEY (nation_key) REFERENCES nation (nation_key).

Also, the request references only n_nationkey, which can be replaced by s.nation_key. The modified query after Join elimination looks as follows:

SELECT s.supp_key, s.address, s.nation_key
FROM supplier
ORDER BY s.nation_key;

Conditions for Eliminating a PK-FK Join

The following conditions are sufficient to eliminate a PK-FK join and the PK-table, and are checked by this rewrite method.

There is a referential integrity constraint defined between the 2 tables.
The query conditions are conjunctive.
No columns from the PK-table other than the PK-columns are referenced in the request. This includes the SELECT, WHERE, GROUP BY, HAVING, ORDER BY, and other clauses.
The PK-columns in the WHERE clause can only appear in PK=FK (equality) joins.

If these four conditions are met, the PK-table and PK-FK join can be removed from the request and references to the PK columns in the query are mapped to the corresponding FK columns. Query Rewrite adds a NOT NULL condition on FK columns that are nullable. This optimization is applied as a rule, so no costing is required. Also, Query Rewrite does these optimizations automatically. To trigger this optimization, you must define a referential integrity constraint between the child and parent tables.

The cost of referential integrity maintenance can exceed the benefit of join elimination. To mitigate this possibility, you can substitute either a batch referential integrity constraint or a Referential Constraint. For details of these referential integrity types, see WITH CHECK OPTION in REFERENCES.

The following table outlines the differences between these 2 referential integrity mechanisms.

Referential Integrity Type	Description
Batch Constraint	Referential integrity is enforced between the tables in the relationship. The cost of maintaining a Batch RI constraint is typically significantly less than that of maintaining the same relationship with a standard Referential Integrity constraint.
Referential Constraint	Referential integrity is not enforced between the tables in the relationship. The constraint is used mainly as a hint for join elimination. Specify a Referential Constraint when you are confident that the parent-child relationship is consistent (meaning that there is little or no possibility for referential integrity violations to occur).

Referential Integrity Type

Description

Batch Constraint

Referential integrity is enforced between the tables in the relationship.

The cost of maintaining a Batch RI constraint is typically significantly less than that of maintaining the same relationship with a standard Referential Integrity constraint.

Referential Constraint

Referential integrity is not enforced between the tables in the relationship.

The constraint is used mainly as a hint for join elimination.

Specify a Referential Constraint when you are confident that the parent-child relationship is consistent (meaning that there is little or no possibility for referential integrity violations to occur).

Consider this SELECT request.

SELECT sales1.*
FROM sales1, product
WHERE sales_product_key=product_key;

The table product is a dimension table of sales1, referenced only by means of a PK-FK join with sales1. Therefore, you can remove the table product from the query. This logic is also applied to remove the inner table of an outer join if the join condition is an equality condition on a unique column of the inner table.

The following example shows outer join elimination:

SELECT sales1.*
FROM sales1 LEFT OUTER JOIN product
ON sales_product_key=product_key;