15.00 - Query Coverage by Join Indexes - Teradata Database

Teradata Database SQL Data Definition Language Detailed Topics

prodname
Teradata Database
vrm_release
15.00
category
Programming Reference
featnum
B035-1184-015K

Query Coverage by Join Indexes

When the columns requested by a query can be retrieved from an index instead of scanning the base table that index supports, the index is said to cover the query. Some vendors refer to this as index‑only access. Although secondary indexes can sometimes cover a minor query, only hash and join indexes are generally called upon to respond to queries of any consequence. If a join index is defined correctly, it can also cover a query that requests columns it does not carry. This situation is called partial query coverage, and when a join index partially covers the query, the Optimizer can weigh the cost of using it against the cost of scanning the base table set it supports. The criteria for defining a join index to make it eligible for consideration as a partial query cover are described in the following paragraphs.

Other, more complicated, criteria for determining whether a join index can cover a query are described in “Restriction on Coverage by Join Indexes When a Join Index Definition References More Tables Than a Query” on page 386, and “Rules for Whether Join Indexes With Extra Tables Cover Queries” on page 390.

Be aware that a join index defined with an expression in its select list has less coverage than a join index that is specified using base columns in its select list.

For example, the Query Rewrite Subsystem can use the following join index, ji_f1, to rewrite a query that specifies any character function on column f1,

     CREATE JOIN INDEX ji_f1 AS 
       SELECT b1, f1 
       FROM t1 
       WHERE a1 > 0;

However, the Query Rewrite Subsystem can only use join index ji_substr_f1 to rewrite a query that specifies the same SUBSTR function on column f1 as join index ji_substr_f1 specifies in its select list.

     CREATE JOIN INDEX ji_substr_f1 AS 
       SELECT b1, SUBSTR(f1,1,10) AS s 
       FROM t1 
       WHERE a1 > 0;

See Database Design for information about using global join indexes with tactical queries.

A join index that has the capability of accessing base table columns via a prime access key specified in its definition DDL is eligible for partial query covering. A join index defined in this way is sometimes referred to as a global index or a global join index.

Support for global join indexes is provided by the following substitution logic:

 

IF the definition for a single-table join index …

THEN that join index …

specifies any one or more of the following items:

  • ROWID keyword as a column name in the
    column_name list
  • You can only specify ROWID in the outermost SELECT of the CREATE JOIN INDEX statement.

  • Column set that defines the UPI of the underlying base table
  • Column set that defines the NUPI of the underlying base table plus the ROWID keyword
  • Column set that defines the NUPI of the underlying base table plus a column set that defines a USI on the underlying base table
  • qualifies as a partial covering index if it also contains a subset of the columns specified by a query.

    Partial coverage is not restricted to single-table join indexes.

    Join indexes defined with one of these column sets are sometimes referred to as global indexes or global join indexes. See Database Design for some specific applications of global join indexes.

    The Optimizer specifies a join from the join index to the base table to pick up columns that are requested by the query but not included in the single-table join index definition.

    does not specify the ROWID keyword, base table UPI columns, or base table NUPI columns plus either the ROWID keyword or a base table USI column set

    cannot be used to partially cover a query.

    The Optimizer specifies a join from the join index to the base table to pick up columns that are requested by the query but not included in the join index.

     

    IF this item is specified in the join index definition …

    THEN the join from the single-table join index to the base table is made using this join type …

    the base table ROWID only

    You can only specify ROWID in the outermost SELECT of the CREATE JOIN INDEX statement.

    Row ID

    the base table primary index

    Merge

    In this case, both the primary index column set and the ROWID, if present, are used to make the join.

    Even though you do not explicitly specify this join when you write your query, it counts against the 128 tables and views per query block restriction on joins.

     

    IF this item is specified in the join index definition …

    THEN it is …

    UPI

    not necessary to include ROWID in the index definition to enable it as a partial cover because a UPI is sufficient to identify any base table row.

    You can only specify ROWID in the outermost SELECT of the CREATE JOIN INDEX statement.

    NUPI

    necessary to include either of the following specifications to make it eligible for partial coverage of a query.

  • The base table NUPI column set and ROWID in the index definition.
  • The base table NUPI column set and a base table USI column set in the index definition.
  • The NUPI plus ROWID option is the preferable choice.

    The Optimizer specifies a partially covering join index in a join plan only if the cost of using it is less than the cost of not using it, just as it does for an index that fully covers the query.

    For example, consider the following table and join index definitions:

        CREATE TABLE t1 (
          x1 INTEGER, 
          y1 INTEGER, 
          z1 INTEGER) 
        PRIMARY INDEX (x1);
        
        CREATE TABLE t2 (
          x2 INTEGER, 
          y2 INTEGER, 
          z2 INTEGER) 
        PRIMARY INDEX (x2);
        
        CREATE JOIN INDEX j1 AS 
          SELECT y1
          FROM t1 
        PRIMARY INDEX (y1);

    The Optimizer selects single-table join index j1 as a partial covering index for its join plan for the following query.

        EXPLAIN 
        SELECT x1, y1, z2 
        FROM t1, t2 
        WHERE y1 = x2 
        AND   y2 = 1;
         
         *** Help information returned. 29 rows.
         *** Total elapsed time was 1 second.
    Explanation
    ---------------------------------------------------------------------------
     1) First, we lock a distinct STJI."pseudo table" for read on a
        RowHash to prevent global deadlock for STJI.j1.
     2) Next, we lock a distinct STJI."pseudo table" for read on a RowHash
        to prevent global deadlock for STJI.t2.
     3) We lock a distinct STJI."pseudo table" for read on a RowHash to
        prevent global deadlock for STJI.t1.
     4) We lock STJI.j1 for read, we lock STJI.t2 for read, and we lock
        STJI.t1 for read.
     5) We do an all-AMPs JOIN step from STJI.t2 by way of a RowHash match
        scan. with a condition of ("STJI.t2.y2 = 1"), which is joined to
        STJI.j1.  STJI.t2 and STJI.j1 are joined using a merge join, with
        a join condition of ("STJI.j1.y1 = STJI.t2.x2").  The input table
        STJI.j1 will not be cached in memory.  The result goes into Spool
        2, which is redistributed by hash code to all AMPs.  Then we do a
        SORT to order Spool 2 by field Id 1.  The size of Spool 2 is
        estimated with no confidence to be 1 row.  The estimated time for
        this step is 0.06 seconds.
     6) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
        all-rows scan, which is joined to STJI.t1.  Spool 2 and STJI.t1
        are joined using a row id join, with a join condition of (
        "Field_1 = STJI.t1.RowID").  The input table STJI.t1 will not be
        cached in memory.  The result goes into Spool 1, which is built
        locally on the AMPs.  The size of Spool 1 is estimated with no
        confidence to be 1 row.  The estimated time for this step is 0.20
        seconds.
     7) Finally, we send out an END TRANSACTION step to all AMPs involved
        in processing the request.
     -> The contents of Spool 1 are sent back to the user as the result of
        statement 1.  The total estimated time is 0.26 seconds. 

    In this query, joining tables t1 and t2 would require redistributing all the t1 rows by t1.y1. This is an expensive operation if t1 is very large.

    An alternative, more cost-effective solution is to perform a local join between join index j1 and base table t2 and then to redistribute the intermediate join result to join with base table t1. Since there is a precise constraint condition on base table t2 that t2.y2 = 1, redistributing the join result of j1 and t2 and then joining the result to t1 is less expensive than the redistribution cost of t1. Therefore, the Optimizer selects j1 to process the t1-t2 join.

    By contrast, consider a similar query without the t2.y2 = 1 constraint. In this case, the Optimizer does not select j1 to cover the query even though it qualifies as a partially covering index:

        INSERT SELECT x1, y1, z2 
        FROM t1,t2 
        WHERE y1 = x2;
        
         *** Help information returned. 22 rows. 
         *** Total elapsed time was 1 second.
        
        Explanation
        ---------------------------------------------------------------------------
          1) First, we lock a distinct STJI."pseudo table" for read on a
             RowHash to prevent global deadlock for STJI.t2.
          2) Next, we lock a distinct STJI."pseudo table" for read on a RowHash
             to prevent global deadlock for STJI.t1.
          3) We lock STJI.t2 for read, and we lock STJI.t1 for read.
          4) We do an all-AMPs RETRIEVE step from STJI.t1 by way of an all-rows
             scan with a condition of ("NOT (STJI.t1.y1 IS NULL)") into Spool 2,
             which is redistributed by hash code to all AMPs.  Then we do a
             SORT to order Spool 2 by row hash.  The size of Spool 2 is
             estimated with low confidence to be 101 rows.  The estimated time
             for this step is 3 minutes and 9 seconds.
          5) We do an all-AMPs JOIN step from STJI.t2 by way of a RowHash match
             scan., which is joined to Spool 2 (Last Use).  STJI.t2 and Spool 2
             are joined using a merge join, with a join condition of ("y1 =
             STJI.t2.x2").  The result goes into Spool 1, which is built
             locally on the AMPs.  The size of Spool 1 is estimated with no
             confidence to be 3 rows.  The estimated time for this step is 0.18
             seconds.
          6) Finally, we send out an END TRANSACTION step to all AMPs involved
             in processing the request.
          -> The contents of Spool 1 are sent back to the user as the result of
             statement 1.  The total estimated time is 3 minutes and 9 seconds.