Restriction on Coverage by Join Indexes When a Join Index Definition References More Tables Than a Query
Whether the Optimizer decides to include a join index in its query plan is a more complicated choice than simply determining if the index contains all the table columns specified in the query. The columns on which the base tables in the index definition are joined and their respective referential constraints also play an important role. See “Rules for Whether Join Indexes With Extra Tables Cover Queries” on page 390.
In many cases, the Optimizer does not consider using a join index in its access plan if that index is defined on more tables than the query references. This is because the so‑called extra inner joins involving the tables not referenced by the query can cause both spurious row loss and spurious duplicate row creation during the optimization process, and either outcome produces incorrect results.
This outcome can be avoided, and the join index is more likely to be used in the query access plan, if the extra inner joins are defined on primary key‑foreign key relationships in the underlying base tables that ensure proper row preservation. The referential integrity relationship between the base table primary and foreign keys can be specified using any of the three available methods for establishing referential constraints between tables. See SQL Data Definition Language and SQL Data Definition Language Detailed Topics for further information.
In the case of join index outer joins, outer table rows are always preserved automatically, so there is no requirement for a referential integrity constraint to exist to enable their preservation.
In the case of foreign key-primary key inner joins, the same preservation of rows follows from a declarative referential integrity constraint. In this case, the Optimizer does consider a join index with extra inner joins in its definition to cover a query. The following paragraphs explain why a referential constraint preserves logical integrity.
Assume that the base tables in a join index definition can be divided into two distinct sets, s1 and s2.
s1 contains the base tables referenced in the query, while s2 contains the extra base tables the query does not reference. The base tables in s1 are joined to the base tables in s2 on foreign key‑primary key columns, with the tables in s2 being the primary keys in the relationships. The foreign key values cannot be null because if the foreign key column set contains nulls, the losslessness of the foreign key table cannot be guaranteed.
The following assertions about these base tables are true:
These assertions are also true for extra joins made between base tables that are both in s2.
Therefore, the extra joins in a join index definition, if made on base tables that are defined in a way that observes these assertions, preserve all the rows resulting from the joins among the base tables in s1 and do not add spurious rows.
This result permits the Optimizer to use the join index to cover a query that references fewer tables than the index definition inner joins together.
A covering join index whose definition includes one or more tables that is not specified in the query it covers is referred to as a broad join index. A wide range of queries can make use of a broad join index, especially when there are foreign key‑primary key relationships defined between the fact table and the dimension tables that enable the index to be used to cover queries over a subset of dimension tables. See Database Design and SQL Request and Transaction Processing for more information about broad join indexes.
The Optimizer can select a join index for a query plan if the index contains either the same set, or a subset, of the tables referenced by the query. If more tables are referenced in the join index definition than are referenced by the query, the Optimizer generally does not consider that index as a candidate for coverage because the extra joins can either eliminate rows or produce duplicate rows or both.
Because a referential integrity relationship guarantees the losslessness of the foreign key table in the join with its primary key table, extra tables in a join index definition do not disqualify it from query plan consideration if the extra joins allow the join index to preserve all the rows for the join result of the subset of tables in the query.
For example, suppose you define a join index on a set of 5 tables, t1 - t5, respectively, with foreign key‑primary key joins in the directions indicated (arrows point from a foreign key table to its parent primary key table) by the following diagram:
t1 → t2 → t3 → t4 → t5
Queries that reference the following table subsets can be covered by this join index because the extra joins, either between two tables where one is in the query and the other is not, or between two tables that are both not in the query, do not cause any loss of rows for the join result of the subset of tables in the query:
As a result of this property, the following conditions that reference extra joins can be exploited by the Optimizer when the number of tables referenced by the join index definition exceeds the number of tables referenced by the query. In each case, x1 is a unique RI‑related column in table t1 and x2 is a unique RI‑related column in table t2:
Join Index Extra Join Condition
One restriction with two critical exceptions must be added to the above optimization to make the coverage safe: when one table referenced in the join index definition is the parent table of more than one FK table, the join index is generally disqualified from covering any query that references fewer tables than are referenced in the join index.
For example, suppose you define a join index on a set of 5 tables, t1 - t5, respectively, with foreign key‑primary key joins in the directions indicated (arrows point from a foreign key table to its parent primary key table) by the following diagram.
t1 → t2 → t3 → t4 ← t5
For these RI relationships, table t4 is the parent table of both tables t3 and t5. The losslessness of the foreign key table depends on the fact that the parent table has all the primary keys available. Joining the parent table with another child table can render this premise false. Therefore, the final join result cannot be viewed as lossless for any arbitrary subset of tables.
The following two points are exceptions to this restriction:
By specifying extra joins in the join index definition, you can greatly enhance its flexibility.
For example, suppose you have a star schema based on a sales fact table and the following dimension tables:
You decide it is desirable to define a join index that joins the fact table sales to its various dimension tables to avoid the relatively expensive join processing between the fact table and its dimension tables whenever ad hoc join queries are made against them.
If there are foreign key‑primary key relationships between the join columns, which is often the case, the join index can also be used to optimize queries that only reference a subset of the dimension tables.
Without taking advantage of this optimization, you must either create a different join index for each category of query, incurring the greater cost of maintaining multiple join indexes, or you lose the benefit of join indexes for optimizing the join queries on these tables altogether. By exploiting the foreign key‑primary key join properties, the same join index can be selected by the Optimizer to generate access plans for a wide variety of queries.