Rules and Restrictions for Query Coverage in Join Indexes - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

Restriction on Partial Coverage of Queries Containing a TOP n or TOP m PERCENT Clause

A join index cannot be used to partially cover a query that specifies the TOP n or TOP m PERCENT option.

Rules for Whether Join Indexes with Extra Tables Cover Queries

The following rules explain how to design a set of underlying base tables for a join index definition to make sure the Optimizer selects the index for an access plan if it inner joins more tables than the query references. Such join indexes are called broad join indexes. For more information on broad join indexes, see Join Index Definition Restrictions and Join Indexes.

All of the Following Conditions Are True Result
  • There are more inner-joined tables in a join index definition than the number of tables referenced in a query.
  • The extra joins are not made on foreign key-primary key columns in the underlying base tables
The Optimizer does not consider the join index for the query plan.

Reason: the extra joins in the definition can eliminate existing rows from the query evaluation or produce duplicate rows during optimization.

  • There are more inner-joined tables in a join index definition than the number of tables referenced in a query.
  • The extra joins are made on foreign key-primary key columns in the underlying base tables.
The Optimizer considers the join index for use in for the query plan.
  • There are more inner-joined tables in a join index definition than the number of tables referenced in a query.
  • The join column set of the inner table in the extra outer join is unique.
  • The inner table or the inner and outer tables involved in the extra outer join are extra tables.

Restriction on Join Index Coverage When Join Index Definition References More Tables than Query

Whether the Optimizer decides to include a join index in its query plan is a more complicated choice than simply determining if the index contains all the table columns specified in the query. As noted in the previous section, the columns on which the base tables in the index definition are joined and their respective referential constraints also play an important role.

The Optimizer may not consider using a join index in its access plan if that index is defined on more tables than the query references.

The join index is more likely to be used in the query access plan if the extra inner joins are defined on primary key-foreign key relationships in the underlying base tables that preserve rows. The referential integrity relationship between the base table primary and foreign keys can be specified using any of the three available methods for establishing referential constraints between tables.

Referential integrity is supported only on the Block File System on the primary cluster, not on the Object File System.

In the case of join index outer joins, outer table rows are preserved automatically, so there is no requirement for a referential integrity constraint to exist to enable their preservation.

In the case of foreign key-primary key inner joins, the same preservation of rows follows from a declarative referential integrity constraint. The Optimizer does consider a join index with extra inner joins in its definition to cover a query. The following paragraphs explain why a referential constraint preserves logical integrity.

Assume that the base tables in a join index definition can be divided into two distinct sets, s1 and s2.

s1 contains the base tables referenced in the query, while s2 contains the extra base tables the query does not reference. The base tables in s1 are joined to the base tables in s2 on foreign key-primary key columns, with the tables in s2 being the primary keys in the relationships. The foreign key values cannot be null because if the foreign key column set contains nulls, the losslessness of the foreign key table cannot be guaranteed.

The following assertions about these base tables are true:
  • The extra joins do not eliminate valid rows from the join result among the base tables in s1 because FOREIGN KEY and NOT NULL constraints make sure that every row in the foreign key table finds its match in the primary key table.
  • The extra joins do not introduce duplicate rows in the join result among the base tables in s1 because the primary key is, by definition, unique and not nullable.

These assertions are also true for extra joins made between base tables that are both in s2.

Therefore, the extra joins in a join index definition, if made on base tables that are defined in a way that observes these assertions, preserve all the rows resulting from the joins among the base tables in s1 and do not add spurious rows.

This result permits the Optimizer to use the join index to cover a query that references fewer tables than the index definition inner joins together.

A covering join index whose definition includes one or more tables that is not specified in the query it covers is called a broad join index. Many queries can use a broad join index, especially when there are foreign key-primary key relationships defined between the fact table and the dimension tables that enable the index to be used to cover queries over a subset of dimension tables. See Join Index Definition Restrictions and Join Indexes.

The Optimizer can select a join index for a query plan if the index contains either the same set, or a subset, of the tables referenced by the query. If more tables are referenced in the join index definition than are referenced by the query, the Optimizer rarely considers that index as a candidate for coverage because the extra joins can either eliminate rows or produce duplicate rows or both.

Because a referential integrity relationship guarantees the losslessness of the foreign key table in the join with its primary key table, extra tables in a join index definition do not disqualify it from query plan consideration if the extra joins allow the join index to preserve all the rows for the join result of the subset of tables in the query.

For example, suppose you define a join index on a set of 5 tables, t1 - t5, respectively, with foreign key-primary key joins in the directions indicated (foreign key table followed by its parent primary key table): t1 t2 t3 t4 t5

Queries that reference the following table subsets can be covered by this join index because the extra joins, either between two tables where one is in the query and the other is not, or between two tables that are both not in the query, do not cause any loss of rows for the join result of the subset of tables in the query:
  • t1
  • t1, t2
  • t1, t2, t3
  • t1, t2, t3, t4

As a result of this property, the following conditions that reference extra joins can be exploited by the Optimizer when the number of tables referenced by the join index definition exceeds the number of tables referenced by the query. In each case, x1 is a unique RI-related column in table t1 and x2 is a unique RI-related column in table t2:

Join Index Extra Join Condition Qualifications
x1 = x2
  • x1 is the foreign key in the relationship.
  • t1 is referenced by the query.
  • t2 is not referenced by the query.
x1 = x2
  • x1 is the primary key in the relationship.
  • t2 is referenced by the query.
  • t1 is not referenced by the query.
x1 = x2
  • x1 is the foreign key in the relationship.
  • x2 is the primary key in the relationship.
  • Neither t1 nor t2 is referenced by the query.
x1 = x2
  • x1 is the primary key in the relationship.
  • x2 is the foreign key in the relationship.
  • Neither t1 nor t2 is referenced by the query.

To make the coverage safe, add this restriction to the preceding optimization: when one table referenced in the join index definition is the parent table of more than one FK table, the join index is disqualified from covering any query that references fewer tables than are referenced in the join index.

For example, suppose you define a join index on a set of 5 tables, t1 - t5, respectively, with foreign key-primary key joins in the directions indicated (foreign key table followed by its parent primary key table).

For these RI relationships, table t4 is the parent table of both tables t3 and t5. The losslessness of the foreign key table depends on the fact that the parent table has all the primary keys available. Joining the parent table with another child table can render this premise false. Therefore, the final join result cannot be viewed as lossless for any arbitrary subset of tables.

The following two points are exceptions to this restriction:
  • When the foreign key tables are joined on the foreign key columns, there is no loss on the foreign key tables, because they reference the same primary key values in their common parent table.
  • All foreign key tables reference the same primary key column in the primary key table. By transitive closure, all these foreign key tables are related by equijoins on the foreign key columns.

By specifying extra joins in the join index definition, you can greatly enhance its flexibility.

For example, suppose you have a star schema based on a sales fact table and the following dimension tables:
  • Customer
  • Product
  • Location
  • Time

You define a join index that joins the fact table sales to its dimension tables to avoid the relatively expensive join processing between the fact table and its dimension tables whenever ad hoc join queries are made against them.

If there are foreign key-primary key relationships between the join columns, which is often the case, the join index can also be used to optimize queries that only reference a subset of the dimension tables.

Without taking advantage of this optimization, you must either create a different join index for each category of query, incurring the greater cost of maintaining multiple join indexes, or you lose the benefit of join indexes for optimizing the join queries on these tables altogether. By exploiting the foreign key-primary key join properties, the same join index can be selected by the Optimizer to generate access plans for a wide variety of queries.