Query Coverage by Join Indexes | CREATE JOIN INDEX | Teradata Vantage - 17.10 - Query Coverage by Join Indexes - Advanced SQL Engine - Teradata Database

Teradata Vantage™ - SQL Data Definition Language Detailed Topics

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Release Date
July 2021
Content Type
Programming Reference
Publication ID
B035-1184-171K
Language
English (United States)

When the columns requested by a query can be retrieved from an index instead of scanning the base table that index supports, the index is said to cover the query. Some vendors refer to this as index-only access. Although secondary indexes can sometimes cover a minor query, only hash and join indexes are generally called upon to respond to queries of any consequence. If a join index is defined correctly, it can also cover a query that requests columns it does not carry. This situation is called partial query coverage, and when a join index partially covers the query, the Optimizer can weigh the cost of using it against the cost of scanning the base table set it supports. The criteria for defining a join index to make it eligible for consideration as a partial query cover are described in the following paragraphs.

Other, more complicated, criteria for determining whether a join index can cover a query are described in Restriction on Coverage by Join Indexes When a Join Index Definition References More Tables Than a Query, and Rules for Whether Join Indexes With Extra Tables Cover Queries.

Be aware that a join index defined with an expression in its select list has less coverage than a join index that is specified using base columns in its select list.

For example, the Query Rewrite Subsystem can use the following join index, ji_f1, to rewrite a query that specifies any character function on column f1,

CREATE JOIN INDEX ji_f1 AS
  SELECT b1, f1
  FROM t1
  WHERE a1 > 0;

However, the Query Rewrite Subsystem can only use join index ji_substr_f1 to rewrite a query that specifies the same SUBSTR function on column f1 as join index ji_substr_f1 specifies in its select list.

CREATE JOIN INDEX ji_substr_f1 AS
  SELECT b1, SUBSTR(f1,1,10) AS s
  FROM t1
  WHERE a1 > 0;

See Teradata Vantage™ - Database Design, B035-1094 for information about using global join indexes with tactical queries.

A join index that has the capability of accessing base table columns via a prime access key specified in its definition DDL is eligible for partial query covering. A join index defined in this way is sometimes referred to as a global index or a global join index.

Support for global join indexes is provided by the following substitution logic:

IF the definition for a single-table join index … THEN that join index …
specifies any one or more of the following items:
  • ROWID keyword as a column name in the column_name list

    You can only specify ROWID in the outermost SELECT of the CREATE JOIN INDEX statement.

  • Column set that defines the UPI of the underlying base table
  • Column set that defines the NUPI of the underlying base table plus the ROWID keyword
  • Column set that defines the NUPI of the underlying base table plus a column set that defines a USI on the underlying base table
qualifies as a partial covering index if it also contains a subset of the columns specified by a query.

Partial coverage is not restricted to single-table join indexes.

Join indexes defined with one of these column sets are sometimes referred to as global indexes or global join indexes. See Teradata Vantage™ - Database Design, B035-1094 for some specific applications of global join indexes.

The Optimizer specifies a join from the join index to the base table to pick up columns that are requested by the query but not included in the single-table join index definition.

does not specify the ROWID keyword, base table UPI columns, or base table NUPI columns plus either the ROWID keyword or a base table USI column set cannot be used to partially cover a query.

The Optimizer specifies a join from the join index to the base table to pick up columns that are requested by the query but not included in the join index.

IF this item is specified in the join index definition … THEN the join from the single-table join index to the base table is made using this join type …
the base table ROWID only

You can only specify ROWID in the outermost SELECT of the CREATE JOIN INDEX statement.

Row ID
the base table primary index Merge

In this case, both the primary index column set and the ROWID, if present, are used to make the join.

Even though you do not explicitly specify this join when you write your query, it counts against the 128 tables and views per query block restriction on joins.

IF this item is specified in the join index definition … THEN it is …
UPI not necessary to include ROWID in the index definition to enable it as a partial cover because a UPI is sufficient to identify any base table row.

You can only specify ROWID in the outermost SELECT of the CREATE JOIN INDEX statement.

NUPI necessary to include either of the following specifications to make it eligible for partial coverage of a query.
  • The base table NUPI column set and ROWID in the index definition.
  • The base table NUPI column set and a base table USI column set in the index definition.

The NUPI plus ROWID option is the preferable choice.

The Optimizer specifies a partially covering join index in a join plan only if the cost of using it is less than the cost of not using it, just as it does for an index that fully covers the query.

For example, consider the following table and join index definitions:

CREATE TABLE t1 (
  x1 INTEGER,
  y1 INTEGER,
  z1 INTEGER)
PRIMARY INDEX (x1);
    
CREATE TABLE t2 (
  x2 INTEGER,
  y2 INTEGER,
  z2 INTEGER)
PRIMARY INDEX (x2);
    
CREATE JOIN INDEX j1 AS
  SELECT y1
  FROM t1
PRIMARY INDEX (y1);

The Optimizer selects single-table join index j1 as a partial covering index for its join plan for the following query.

EXPLAIN
SELECT x1, y1, z2
FROM t1, t2
WHERE y1 = x2
AND   y2 = 1;
    
Explanation
---------------------------------------------------------------
1) First, we lock MyDB.t2 for read on a reserved RowHash to prevent
   global deadlock.
2) Next, we lock MyDB.t1 for read on a reserved RowHash to prevent
   global deadlock.
3) We lock MyDB.t2 for read, and we lock MyDB.t1 for read.
4) We do an all-AMPs RETRIEVE step from MyDB.t1 by way of an all-rows
   scan with a condition of ("NOT (MyDB.t1.y1 IS NULL)") into Spool 2
   (all_amps), which is redistributed by the hash code of (MyDB.t1.y1)
   to all AMPs.  Then we do a SORT to order Spool 2 by row hash. The
   size of Spool 2 is estimated with low confidence to be 4 rows (84
   bytes). The estimated time for this step is 0.03 seconds.
5) We do an all-AMPs JOIN step from MyDB.t2 by way of a RowHash match
   scan with a condition of (""MyDB.t2.y2 = 1""), which is joined to
   Spool 2 (Last Use) by way of a RowHash match scan. MyDB.t2 and
   Spool 2 are joined using a merge join, with a join condition of
   (""y1 = MyDB.t2.x2""). The result goes into Spool 1 (group_amps),
   which is built locally on the AMPs.  The size of Spool 1 is
   estimated with no confidence to be 4 rows (132 bytes).  The
   estimated time for this step is 0.11 seconds.
   -> The contents of Spool 1 are sent back to the user as the result
   of statement 1.  The total estimated time is 0.13 seconds.

In this query, joining tables t1 and t2 would require redistributing all the t1 rows by t1.y1. This is an expensive operation if t1 is very large.

An alternative, more cost-effective solution is to perform a local join between join index j1 and base table t2 and then to redistribute the intermediate join result to join with base table t1. Since there is a precise constraint condition on base table t2 that t2.y2 = 1, redistributing the join result of j1 and t2 and then joining the result to t1 is less expensive than the redistribution cost of t1. Therefore, the Optimizer selects j1 to process the t1-t2 join.

By contrast, consider a similar query without the t2.y2 = 1 constraint. In this case, the Optimizer does not select j1 to cover the query even though it qualifies as a partially covering index:

EXPLAIN SELECT x1, y1, z2
FROM t1,t2
WHERE y1 = x2;
    
 *** Help information returned. 22 rows.
 *** Total elapsed time was 1 second.
    
Explanation
-----------------------------------------------------------
1) First, we lock MyDB.t2 for read on a reserved RowHash to prevent
   global deadlock.
2) Next, we lock MyDB.t1 for read on a reserved RowHash to prevent
   global deadlock.
3) We lock MyDB.t2 for read, and we lock MyDB.t1 for read.
4) We do an all-AMPs RETRIEVE step from MyDB.t1 by way of an all-rows
   scan with a condition of ("NOT (MyDB.t1.y1 IS NULL)") into Spool 2
   (all_amps), which is redistributed by the hash code of(MyDB.t1.y1)
   to all AMPs.  Then we do a SORT to order Spool 2 by row hash. The
   size of Spool 2 is estimated with low confidence to be 4 rows (84
   bytes).  The estimated time for this step is 0.03 seconds.
5) We do an all-AMPs JOIN step from MyDB.t2 by way of a RowHash match
   scan, which is joined to Spool 2 (Last Use) by way of a RowHash
   match scan.  MyDB.t2 and Spool 2 are joined using a merge join,
   with a join condition of ("y1 = MyDB.t2.x2"). The result goes into
   Spool 1 (group_amps), which is built locally on the AMPs. The
   size of Spool 1 is estimated with index join confidence to be 6
   rows (198 bytes). The estimated time for this step is 0.11
   seconds.
-> The contents of Spool 1 are sent back to the user as the result of
   statement 1.  The total estimated time is 0.13 seconds.