Restriction on Coverage by Join Indexes When a Join Index Definition References More Tables Than a Query - Advanced SQL Engine

Restriction on Coverage by Join Indexes When a Join Index Definition References More Tables Than a Query - Advanced SQL Engine - Teradata Database

SQL Data Definition Language Detailed Topics

Product

Advanced SQL Engine

Teradata Database

Release Number

17.10

Published

July 2021

Language

English (United States)

Last Update

2021-07-27

dita:mapPath

imq1591724555718.ditamap

dita:ditavalPath

imq1591724555718.ditaval

dita:id

B035-1184

lifecycle

Product Category

Teradata Vantage™

Whether the Optimizer decides to include a join index in its query plan is a more complicated choice than simply determining if the index contains all the table columns specified in the query. The columns on which the base tables in the index definition are joined and their respective referential constraints also play an important role. See Rules for Whether Join Indexes With Extra Tables Cover Queries.

In many cases, the Optimizer does not consider using a join index in its access plan if that index is defined on more tables than the query references.

The join index is more likely to be used in the query access plan if the extra inner joins are defined on primary key-foreign key relationships in the underlying base tables that ensure proper row preservation. The referential integrity relationship between the base table primary and foreign keys can be specified using any of the three available methods for establishing referential constraints between tables.

In the case of join index outer joins, outer table rows are always preserved automatically, so there is no requirement for a referential integrity constraint to exist to enable their preservation.

In the case of foreign key-primary key inner joins, the same preservation of rows follows from a declarative referential integrity constraint. In this case, the Optimizer does consider a join index with extra inner joins in its definition to cover a query. The following paragraphs explain why a referential constraint preserves logical integrity.

Assume that the base tables in a join index definition can be divided into two distinct sets, s1 and s2.

s1 contains the base tables referenced in the query, while s2 contains the extra base tables the query does not reference. The base tables in s1 are joined to the base tables in s2 on foreign key-primary key columns, with the tables in s2 being the primary keys in the relationships. The foreign key values cannot be null because if the foreign key column set contains nulls, the losslessness of the foreign key table cannot be guaranteed.

The following assertions about these base tables are true:

The extra joins do not eliminate valid rows from the join result among the base tables in s1 because FOREIGN KEY and NOT NULL constraints ensure that every row in the foreign key table finds its match in the primary key table.
The extra joins do not introduce duplicate rows in the join result among the base tables in s1 because the primary key is, by definition, unique and not nullable.

These assertions are also true for extra joins made between base tables that are both in s2.

Therefore, the extra joins in a join index definition, if made on base tables that are defined in a way that observes these assertions, preserve all the rows resulting from the joins among the base tables in s1 and do not add spurious rows.

This result permits the Optimizer to use the join index to cover a query that references fewer tables than the index definition inner joins together.

A covering join index whose definition includes one or more tables that is not specified in the query it covers is referred to as a broad join index. A wide range of queries can make use of a broad join index, especially when there are foreign key-primary key relationships defined between the fact table and the dimension tables that enable the index to be used to cover queries over a subset of dimension tables. For more information about broad join indexes, see Teradata Vantage™ - Database Design, B035-1094 and Teradata Vantage™ - SQL Request and Transaction Processing, B035-1142.

The Optimizer can select a join index for a query plan if the index contains either the same set, or a subset, of the tables referenced by the query. If more tables are referenced in the join index definition than are referenced by the query, the Optimizer generally does not consider that index as a candidate for coverage because the extra joins can either eliminate rows or produce duplicate rows or both.

Because a referential integrity relationship guarantees the losslessness of the foreign key table in the join with its primary key table, extra tables in a join index definition do not disqualify it from query plan consideration if the extra joins allow the join index to preserve all the rows for the join result of the subset of tables in the query.

For example, suppose you define a join index on a set of 5 tables, t1 - t5, respectively, with foreign key-primary key joins in the directions indicated (foreign key table followed by its parent primary key table):

t1 t2 t3 t4 t5

Queries that reference the following table subsets can be covered by this join index because the extra joins, either between two tables where one is in the query and the other is not, or between two tables that are both not in the query, do not cause any loss of rows for the join result of the subset of tables in the query:

t1
t1, t2
t1, t2, t3
t1, t2, t3, t4

As a result of this property, the following conditions that reference extra joins can be exploited by the Optimizer when the number of tables referenced by the join index definition exceeds the number of tables referenced by the query. In each case, x1 is a unique RI-related column in table t1 and x2 is a unique RI-related column in table t2:

Join Index Extra Join Condition	Qualifications
x1 = x2	x1 is the foreign key in the relationship. t1 is referenced by the query. t2 is not referenced by the query.
x1 = x2	x1 is the primary key in the relationship. t2 is referenced by the query. t1 is not referenced by the query.
x1 = x2	x1 is the foreign key in the relationship. x2 is the primary key in the relationship. Neither t1 nor t2 is referenced by the query.
x1 = x2	x1 is the primary key in the relationship. x2 is the foreign key in the relationship. Neither t1 nor t2 is referenced by the query.

One restriction with two critical exceptions must be added to the above optimization to make the coverage safe: when one table referenced in the join index definition is the parent table of more than one FK table, the join index is generally disqualified from covering any query that references fewer tables than are referenced in the join index.

t1 t2 t3 t4 t5

For these RI relationships, table t4 is the parent table of both tables t3 and t5. The losslessness of the foreign key table depends on the fact that the parent table has all the primary keys available. Joining the parent table with another child table can render this premise false. Therefore, the final join result cannot be viewed as lossless for any arbitrary subset of tables.

The following two points are exceptions to this restriction:

When the foreign key tables are joined on the foreign key columns, there is no loss on any of the foreign key tables because they all reference the same primary key values in their common parent table.
All foreign key tables reference the same primary key column in the primary key table. By transitive closure, all these foreign key tables are related by equijoins on the foreign key columns.

By specifying extra joins in the join index definition, you can greatly enhance its flexibility.

For example, suppose you have a star schema based on a sales fact table and the following dimension tables:

customer
product
location
time

You decide it is desirable to define a join index that joins the fact table sales to its various dimension tables to avoid the relatively expensive join processing between the fact table and its dimension tables whenever ad hoc join queries are made against them.

If there are foreign key-primary key relationships between the join columns, which is often the case, the join index can also be used to optimize queries that only reference a subset of the dimension tables.

Without taking advantage of this optimization, you must either create a different join index for each category of query, incurring the greater cost of maintaining multiple join indexes, or you lose the benefit of join indexes for optimizing the join queries on these tables altogether. By exploiting the foreign key-primary key join properties, the same join index can be selected by the Optimizer to generate access plans for a wide variety of queries.