16.10 - Join Index Definition Restrictions - Teradata Database

Teradata Database Design

Product
Teradata Database
Release Number
16.10
Release Date
June 2017
Content Type
User Guide
Publication ID
B035-1094-161K
Language
English (United States)

This topic lists several restrictions on join index definitions. For complete details about join index syntax, see the documentation for CREATE JOIN INDEX in SQL Data Definition Language.

Restrictions on Number of Join Indexes Defined Per Base Table

The maximum number of secondary, hash, and join indexes that can be defined for a table, in any combination, is 32. This includes the system-defined secondary indexes used to implement PRIMARY KEY and UNIQUE constraints. Each composite NUSI that specifies an ORDER BY clause counts as 2 consecutive indexes in this calculation (see Importance of Consecutive Indexes for Value-Ordered NUSIs). You cannot define join, or any other, indexes on global temporary trace tables. See “CREATE GLOBAL TEMPORARY TRACE TABLE” in SQL Data Definition Language Detailed Topics.

Suppose you have four tables, each with multiple secondary, hash, and join indexes defined on them:

  • Table_1 has 32 secondary indexes and no hash or join indexes.
  • Table_2 has 16 secondary indexes, no hash indexes, and 16 join indexes.
  • Table_3 has 10 secondary indexes, 10 hash indexes, and 12 join indexes.
  • Table_4 has no secondary or hash indexes, but has 32 join indexes.

Each of these combinations is valid, but they all operate at the boundaries of the defined limits.

Note that if any of the secondary indexes defined on tables 1, 2, or 3 is a composite NUSI defined with an ORDER BY clause, the defined limits are exceeded, and the last index you attempt to create on the table will fail. Because each composite NUSI defined with an ORDER BY clause counts as 2 consecutive indexes in the count against the maximum of 32 per table, you could define only 8 of them on table_2, for example, if you also defined 16 join indexes on the table.

Restrictions on the Number of Columns Per Referenced Base Table

The following restrictions apply to the number of columns per table that can be specified in a join index definition.

  • The maximum number of columns that can be specified in a join index per referenced base table is 64.
  • The maximum number of columns that can be specified in a multitable join index definition is 2,048.
  • When you specify join index row compression, then each of the column_1 and column_2 sets is limited to 64 common base table references.

Restrictions on the Data Type of Join Index Columns

You cannot specify columns having any of the following data types in the definition of a join index:

  • XML columns
  • BLOB columns
  • CLOB columns
  • BLOB-based UDT columns
  • CLOB-based UDT columns
  • XM-based UDT columns
  • ARRAY/VARRAY columns
  • Geospatial columns

    This also means that a join index cannot have a geospatial NUSI.

Restrictions on the Use of the System-Derived PARTITION[#L n] Column

You cannot use a system-derived PARTITION[#L n] column in a join index definition.

Restrictions on Outer Join Definitions

The following restrictions apply to outer joins when used to define a join index.

  • FULL OUTER JOIN is not valid.
  • When you specify a LEFT or RIGHT outer join, the following rules apply:
    • The outer table joining column for each condition must be contained in either column_1_name or column_2_name.
    • The inner table of each join condition must have at least one non-nullable column in either column_1_name or column_2_name.

Restrictions on Secondary Index Definitions

You cannot define unique secondary indexes on a join index.

You can define the primary index of a join index to be row partitioned if and only if the join index is not also defined with row compression. See the documentation for CREATE JOIN INDEX in SQL Data Definition Language for more information about specifying partitioned primary indexes for join indexes.

Restrictions on Built-In Functions and Join Index Definitions

When you create a join index that specifies a built-in, or system, function in its WHERE clause (see SQL Functions, Operators, Expressions, and Predicates for more information about built-in functions), the system resolves the function at the time the join index is created and then stores the result as part of the index definition rather than evaluating it dynamically at the time the Optimizer would use the index to build a query plan.

As a result, the Optimizer does not use a join index in the access plan for a query that qualifies its WHERE clause with the same built-in function used to define that join index because it cannot determine whether the index covers the query or not. The Optimizer does use the join index if the query specifies an explicit value in its WHERE clause that matches the resolved value stored in the index definition.

(The exception to this is the CURRENT_TIME and CURRENT_TIMESTAMP functions, which are resolved for a query. The resolved value is used to check if the query can be covered by a join index.)

For example, suppose you decide to define a join index using the CURRENT_DATE built-in function on January 4, 2010 as follows:

     CREATE JOIN INDEX curr_date AS
       SELECT *
       FROM orders
       WHERE order_date = CURRENT_DATE;

On January 7, 2010, you perform the following SELECT statement:

     SELECT *
     FROM orders
     WHERE order_date = CURRENT_DATE;

When you EXPLAIN this query, you find that the Optimizer does not use join index curr_date because the date stored in the index definition is the explicit value ‘2010-01-04’, not the current system date ‘2010-01-07’.

Note that if you had defined curr_date with a predicate of order_date >= CURRENT_DATE instead of order_date = CURRENT_DATE, then the Optimizer could use curr_date to cover the query if it were the least costly way to process the request.

On the other hand, if you were to perform the following SELECT statement on January 7, 2010, or any other date, retaining the original curr_date predicate, the Optimizer does use join index curr_date for the query plan because the statement explicitly specifies the same date that was stored with the join index definition when it was created:

     
SELECT *
     FROM orders
     WHERE order_date = DATE ‘2010-01-04’;

Restriction on Number of Join Indexes Selected Per Query

The Optimizer can use several join indexes for a single query, selecting one multitable join index as well as additional single-table join indexes for its join plan. The join indexes selected depend on the structure of the query, and the Optimizer might not choose all applicable join indexes for the plan. Always examine your EXPLAIN reports to determine which join indexes are used for the join plans generated for your queries. If a join index you think should have been used by a query was not included in the join plan, try restructuring the query and then EXPLAIN it once again.

The limit on the number of join indexes considered per query is enforced to limit the number of possible combinations and permutations of table joins in the Optimizer search space during its join planning phase. The rule helps to ensure that the optimization is worth the effort. In other words, that the time spent generating the query plan does not exceed the accrued performance enhancement.

Restrictions on Partial Covering by Join Indexes

The Optimizer can use a join index that partially covers a query in the following cases:

  • One of the columns in the index definition is the keyword ROWID.

    You can only specify ROWID in the outermost SELECT of the CREATE JOIN INDEX statement. See “CREATE JOIN INDEX” in SQL Data Definition Language Detailed Topics.

  • The column set defining the UPI of the underlying base table is also carried in the definition.
  • The column set defining the NUPI of the underlying base table plus either of the following is also carried in the definition:
    • One of the columns in the definition of that index is the keyword ROWID.

      You can only specify ROWID in the outermost SELECT of a CREATE JOIN INDEX request.

    • The column set defining a USI on the underlying base table.

      The ROWID option is the preferable choice.

  • Partial covering is not supported for the inner table of an outer join.
  • Partial covering is not supported for queries that contain a TOP n or TOP m PERCENT clause.

If statistics indicate that it would be cost-effective, the Optimizer can specify that the partially covering single-table join index be joined to one of its underlying base tables using either the ROWID or the UPI or USI to join to the column data not defined for the index itself.

Even though you do not explicitly specify this join when you write your query, it counts against the 128 table restriction on joins.

For example, suppose you define the tables t1, t2, and t3 and the join indexes j1 and j2 as follows:

     CREATE TABLE t1 (
       a1 INTEGER,
       b1 INTEGER,
       c1 INTEGER);

     CREATE TABLE t2 (
       a2 INTEGER,
       b2 INTEGER,
       c2 INTEGER);

     CREATE TABLE t3 (
       a3 INTEGER,
       b3 INTEGER,
       c3 INTEGER);

     CREATE JOIN INDEX j1 AS
      SELECT b1, b2, t1.ROWID AS t1rowid
      FROM t1,t2
      WHERE a1=a2;

     CREATE JOIN INDEX j2 AS
      SELECT b1, b2, t1.ROWID t1rowid, t2.rowid t2rowid
      FROM t1,t2
      WHERE a1=b2;

Join index j1 partially covers the following queries:

     SELECT a1, b1, c1, b2
     FROM t1,t2
     WHERE t1.a1=t2.b2
     AND   t1.b1=10;

     SELECT a1, b1, c1, b2, b3, c3
     FROM t1,t2,t3
     WHERE t1.a1=t2.b2
     AND   t1.b1=t3.a3
     AND   t3.b3 > 0;

The same join index does not partially cover the following queries:

     SELECT *
     FROM t1,t2
     WHERE t1.a1=t2.b2
     AND   t1.b1=10;

     SELECT *
     FROM t1,t2,t3
     WHERE t1.a1=t2.b2
     AND   t1.b1=t3.a3
     AND   t3.b3 > 0;

Join index j2, on the other hand, partially covers all of these queries.

Even though you do not explicitly specify the join back to the base table when you write your query, it counts against the 64 tables per query block restriction on joins.

Be aware that a join index defined with an expression in its select list provides less coverage than a join index that is defined using base columns.

For example, the Optimizer can use join index ji_f1 to rewrite a query that specifies any character function on f1.

     CREATE JOIN INDEX ji_f1 AS
      SELECT b1, f1
      FROM t1
      WHERE a1 > 0;

At the same time, because it is defined with a SUBSTR expression in its select list, the Optimizer can only use join index ji_substr_f1 to rewrite a query that specifies the same SUBSTR function on f1.

     CREATE JOIN INDEX ji_substr_f1 AS
      SELECT b1, SUBSTR(f1,1,10) AS s
      FROM t1
      WHERE a1 > 0;

Restriction on Coverage by Join Indexes When a Join Index Definition References More Tables Than a Query

Whether the Optimizer decides to include a join index in its query plan is a more complicated choice than simply determining if the index contains all the table columns specified in the query. The columns on which the base tables in the index definition are joined and their respective referential constraints also play an important role. See Rules for Whether Join Indexes With Extra Tables Cover Queries below.

In many cases, the Optimizer does not consider using a join index in its access plan if that index is defined on more tables than the query references. This is because the so-called extra inner joins involving the tables not referenced by the query can cause both spurious row loss and spurious duplicate row creation during the optimization process, and either outcome produces incorrect results.

This outcome can be avoided, and the join index used in the query access plan, if the extra inner joins are defined on primary key-foreign key relationships in the underlying base tables that ensure proper row preservation. Such a join index is referred to as a broad join index. The referential integrity relationship between the base table primary and foreign keys can be specified using any of the three available methods for establishing referential constraints between tables. See SQL Data Definition Language for further information.

A broad join index is a covering join index whose definition includes one or more tables that is not specified in the query it covers. A wide range of queries can make use of a broad join index, especially when there are foreign key-primary key relationships defined between the fact table and the dimension tables that enable the index to be used to cover queries over a subset of dimension tables.

In the case of join index outer joins, outer table rows are always preserved automatically, so there is no requirement for a referential integrity constraint to exist in order to preserve them.

In the case of foreign key-primary key inner joins, the same preservation of rows follows from a declarative referential integrity constraint. In this case, the Optimizer does consider a join index with extra inner joins in its definition to cover a query. The following paragraphs explain why a referential constraint preserves logical integrity:

Assume that the base tables in a join index definition can be divided into two distinct sets, s1 and s2.

s1 contains the base tables referenced in the query, while s2 contains the extra base tables the query does not reference. The base tables in s1 are joined to the base tables in s2 on foreign key-primary key columns, with the tables in s2 being the primary keys in the relationships. The foreign key values cannot be null because if the foreign key column set contains nulls, the losslessness of the foreign key table cannot be guaranteed.

The following assertions about these base tables are true:

  • The extra joins do not eliminate valid rows from the join result among the base tables in s1 because FOREIGN KEY and NOT NULL constraints ensure that every row in the foreign key table finds its match in the primary key table.
  • The extra joins do not introduce duplicate rows in the join result among the base tables in s1 because the primary key is, by definition, unique and not nullable.

These assertions are also true for extra joins made between base tables that are both in s2.

Therefore, the extra joins in a join index definition, if made on base tables that are defined in a way that observes these assertions, preserve all the rows resulting from the joins among the base tables in s1 and do not add spurious rows. This result permits the Optimizer to use the join index to cover a query that references fewer tables than the index definition inner joins together.

The Optimizer can select a join index for a query plan if the index contains either the same set, or a subset, of the tables referenced by the query. If more tables are referenced in the join index definition than are referenced by the query, the Optimizer generally does not consider that index as a candidate for coverage because the extra joins can either eliminate rows or produce duplicate rows or both. Because a referential integrity relationship guarantees the losslessness of the foreign key table in the join with its primary key table, extra tables in a join index definition do not disqualify it from query plan consideration if the extra joins allow the join index to preserve all the rows for the join result of the subset of tables in the query.

For example, suppose you define a join index on a set of 5 tables, t1 - t5, respectively, with foreign key-primary key joins in the directions indicated (arrows point from a foreign key table to its parent primary key table) by the following diagram:

        t1   t2    t3   t4   t5

Queries that reference the following table subsets can be covered by this join index because the extra joins, either between two tables where one is in the query and the other is not, or between two tables that are both not in the query, do not cause any loss of rows for the join result of the subset of tables in the query:

  • t1
  • t1, t2
  • t1, t2, t3
  • t1, t2, t3, t4

As a result of this property, the following conditions that reference extra joins can be exploited by the Optimizer when the number of tables referenced by the join index definition exceeds the number of tables referenced by the query. In each case, x1 is a unique RI-related column in table t1 and x2 is a unique RI-related column in table t2:

Join Index Extra Join Condition Qualifications
x1 = x2
  • x1 is the foreign key in the relationship.
  • t1 is referenced by the query.
  • t2 is not referenced by the query.
x1 = x2
  • x1 is the primary key in the relationship.
  • t2 is referenced by the query.
  • t1 is not referenced by the query.
x1 = x2
  • x1 is the foreign key in the relationship.
  • x2 is the primary key in the relationship.
  • Neither t1 nor t2 is referenced by the query.
x1 = x2
  • x1 is the primary key in the relationship.
  • x2 is the foreign key in the relationship.
  • Neither t1 nor t2 is referenced by the query.

One restriction with two critical exceptions must be added to the above optimization to make the coverage safe: when one table referenced in the join index definition is the parent table of more than one FK table, the join index is generally disqualified from covering any query that references fewer tables than are referenced in the join index.

For example, suppose you define a join index on a set of 5 tables, t1 - t5, respectively, with foreign key-primary key joins in the directions indicated (arrows point from a foreign key table to its parent primary key table) by the following diagram:

        t1   t2   t3   t4   t5

For these RI relationships, table t4 is the parent table of both tables t3 and t5. The losslessness of the foreign key table depends on the fact that the parent table has all the primary keys available. Joining the parent table with another child table can render this premise false. Therefore, the final join result cannot be viewed as lossless for any arbitrary subset of tables.

The following two points are exceptions to this restriction:

  • When the foreign key tables are joined on the foreign key columns, there is no loss on any of the foreign key tables because they all reference the same primary key values in their common parent table.
  • All foreign key tables reference the same primary key column in the primary key table. By transitive closure, all these foreign key tables are related by equijoins on the foreign key columns.

By specifying extra joins in the join index definition, you can greatly enhance its flexibility.

For example, suppose you have a star schema based on a Sales fact table and the following dimension tables:

  • Customer
  • Product
  • Location
  • Time

You decide it is desirable to define a join index that joins the fact table Sales to its various dimension tables in order to avoid the relatively expensive join processing between the fact table and its dimension tables whenever ad hoc join queries are made against them.

If there are foreign key-primary key relationships between the join columns, which is often the case, the join index can also be used to optimize queries that only reference a subset of the dimension tables.

Without taking advantage of this optimization, you must either create a different join index for each category of query, incurring the greater cost of maintaining multiple join indexes, or you lose the benefit of join indexes for optimizing the join queries on these tables altogether. By exploiting the foreign key-primary key join properties, the same join index can be selected by the Optimizer to generate access plans for a wide variety of queries.

As is always true, even if this optimization can be used for a given situation, the plan produced using it is compared with other generated plans, and the least costly plan from the generated plan set is the one that the system uses.

Rules for Whether Join Indexes With Extra Tables Cover Queries

The following rules explain how to design a set of underlying base tables for a join index definition in such a way to ensure that the Optimizer selects the index for an access plan if it inner joins more tables than the query references.

IF there are more inner-joined tables in a join index definition than the number of tables referenced in a query and … THEN the Optimizer …
the extra joins are on not made on foreign key-primary key columns in the underlying base tables does not consider the join index for the query plan.

This is because the presence of extra joins in the definition can either eliminate existing rows from the query evaluation or produce duplicate rows during optimization.

the extra joins are made on foreign key-primary key columns in the underlying base tables considers the join index for use in for the query plan.
both of the following conditions are true:
  • The join column set of the inner table in the extra outer join is unique
  • Either the inner table or both the inner and outer tables involved in the extra outer join are extra tables

Examples That Obey the General Covering Rules for Extra Tables in the Join Index Definition

The following set of base tables, join indexes, queries, and EXPLAIN reports demonstrate how the referential integrity relationships among the underlying base tables in a join index definition influence whether the Optimizer selects the index for queries that reference fewer base tables than are referenced by the join index.

    CREATE SET TABLE t1, NO FALLBACK, NO BEFORE JOURNAL,
                         NO AFTER JOURNAL (
      x1 INTEGER NOT NULL,
      a1 INTEGER NOT NULL,
      b1 INTEGER NOT NULL,
      c1 INTEGER NOT NULL,
      d1 INTEGER NOT NULL,
      e1 INTEGER NOT NULL,
      f1 INTEGER NOT NULL,
      g1 INTEGER NOT NULL,
      h1 INTEGER NOT NULL,
      i1 INTEGER NOT NULL,
      j1 INTEGER NOT NULL,
      k1 INTEGER NOT NULL,
    CONSTRAINT ri1 FOREIGN KEY (a1, b1, c1) REFERENCES t2 (a2,b2,c2),
    CONSTRAINT ri2 FOREIGN KEY (d1) REFERENCES t3(d3),
    CONSTRAINT ri3 FOREIGN KEY (e1,f1) REFERENCES t4 (e4,f4),
    CONSTRAINT ri4 FOREIGN KEY (g1,h1,i1,j1) REFERENCES t5(g5,h5,i5,j5),
    CONSTRAINT ri5 FOREIGN KEY (k1) REFERENCES t6(k6));

     CREATE SET TABLE t2, NO FALLBACK, NO BEFORE JOURNAL,
                          NO AFTER JOURNAL (
       a2 INTEGER NOT NULL,
       b2 INTEGER NOT NULL,
       c2 INTEGER NOT NULL,
       x2 INTEGER)
     UNIQUE PRIMARY INDEX(a2, b2, c2);

     CREATE SET TABLE t3, NO FALLBACK, NO BEFORE JOURNAL,
                          NO AFTER JOURNAL (
       d3 INTEGER NOT NULL,
       x3 INTEGER)
     UNIQUE PRIMARY INDEX(d3);

     CREATE SET TABLE t4, NO FALLBACK, NO BEFORE JOURNAL,
                          NO AFTER JOURNAL (
       e4 INTEGER NOT NULL,
       f4 INTEGER NOT NULL,
       x4 INTEGER)
     UNIQUE PRIMARY INDEX(e4, f4);

     CREATE SET TABLE t5, NO FALLBACK, NO BEFORE JOURNAL,
                          NO AFTER JOURNAL (
       g5 INTEGER NOT NULL,
       h5 INTEGER NOT NULL,
       i5 INTEGER NOT NULL,
       j5 INTEGER NOT NULL,
       x5 INTEGER)
     UNIQUE PRIMARY INDEX(g5, h5, i5, j5);

     CREATE SET TABLE t6, NO FALLBACK, NO BEFORE JOURNAL,
                          NO AFTER JOURNAL (
       k6 INTEGER not null,
       x6 INTEGER)
     UNIQUE PRIMARY INDEX(k6);

Example: All Outer Joins in Join Index Definition

The following join index definition left outer joins t1 to t3 on da=d3 and then left outer joins that result to t6 on k1=k6.

     CREATE JOIN INDEX jiout AS
      SELECT d1, d3, k1, k6, x1
      FROM t1 LEFT OUTER JOIN t3 ON d1=d3
              LEFT OUTER JOIN t6 ON k1=k6;

You would expect the Optimizer to use jiout with the following query, because all the outer joins in the join index are inner joined to the query tables on unique columns (d1 and k1 are declared foreign keys in t1 and d3 and k6, the primary keys for t3 and t6, respectively, are declared as the unique primary index for those tables).

The bold EXPLAIN report text indicates that the Optimizer does use jiout in its query plan.

     EXPLAIN SELECT d1, SUM(x1)
             FROM t1, t3
             WHERE d1=d3
             GROUP BY 1;
     *** Help information returned. 17 rows.
     *** Total elapsed time was 1 second.
Explanation
------------------------------------------------------------------------
1) First, we lock a distinct HONG_JI."pseudo table" for read on a
    RowHash to prevent global deadlock for HONG_JI.jiout.
2) Next, we lock  HONG_JI.jiout  for read.
3) We do an all-AMPs SUM step to aggregate from  HONG_JI.jiout  by way
    of an all-rows scan with no residual conditions, and the grouping
    identifier in field 1. Aggregate Intermediate Results are
    computed locally, then placed in Spool 3. The size of Spool 3 is
    estimated with low confidence to be 1 row. The estimated time for
    this step is 0.03 seconds.
4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
    an all-rows scan into Spool 1 (group_amps), which is built locally
    on the AMPs. The size of Spool 1 is estimated with low confidence
    to be 1 row. The estimated time for this step is 0.04 seconds.
5)  Finally, we send out an END TRANSACTION step to all AMPs involved
    in processing the request.
   -> The contents of Spool 1 are sent back to the user as the result of
      statement 1.

Example: All Inner Joins Without Aggregation in Join Index Definition

The following simple join index definition specifies inner joins on tables t1, t3 and t6.

     CREATE JOIN INDEX jiin AS
      SELECT d1, d3, k1, k6, x1
      FROM t1, t3, t6
      WHERE d1=d3
      AND   k1=k6;

You would expect the Optimizer to use jiin with the following query, and the bold EXPLAIN report text indicates that it does.

     EXPLAIN SELECT d1, SUM(x1)
             FROM t1, t3
             WHERE d1=d3
             GROUP BY 1;

     *** Help information returned. 17 rows.
     *** Total elapsed time was 1 second.
   
   Explanation
   ---------------------------------------------------------------------------
     1) First, we lock a distinct HONG_JI."pseudo table" for read on a
        RowHash to prevent global deadlock for  HONG_JI.jiin.
     2) Next, we lock HONG_JI.jiin for read.
     3) We do an all-AMPs SUM step to aggregate from  HONG_JI.jiin  by way
        of an all-rows scan with no residual conditions, and the grouping
        identifier in field 1. Aggregate Intermediate Results are
        computed locally, then placed in Spool 3. The size of Spool 3 is
        estimated with low confidence to be 1 row. The estimated time for
        this step is 0.03 seconds.
     4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
        an all-rows scan into Spool 1 (group_amps), which is built locally
        on the AMPs. The size of Spool 1 is estimated with low confidence
        to be 1 row. The estimated time for this step is 0.04 seconds.
     5) Finally, we send out an END TRANSACTION step to all AMPs involved
        in processing the request.
     -> The contents of Spool 1 are sent back to the user as the result of
        statement 1.

Example: All Inner Joins With Aggregation in Join Index Definition

The following aggregate join index definition specifies inner joins on tables t1, t2, and t4.

     CREATE JOIN INDEX jiin_aggr AS
      SELECT a1, e1, SUM(x1) AS total
      FROM t1, t2, t4
      WHERE a1=a2
      AND   e1=e4
      AND   a1>1
      GROUP BY 1, 2;

You would expect the Optimizer to use join index jiin_aggr in its plan for the following query because it has the same join term as the query.

The bold EXPLAIN report text indicates that the Optimizer does use jiin_aggr in its plan:

     EXPLAIN SELECT a1, e1, SUM(x1) AS total
             FROM t1, t2, t4
             WHERE a1=a2
             AND e1=e4
             AND a1>2
             GROUP BY 1, 2;

     *** Help information returned. 13 rows.
     *** Total elapsed time was 1 second.
   Explanation
   ---------------------------------------------------------------------------
     1) First, we lock a distinct HONG_JI."pseudo table" for read on a
        RowHash to prevent global deadlock for  HONG_JI.jiin_aggr.
     2) Next, we lock  HONG_JI.jiin_aggr  for read.
     3) We do an all-AMPs RETRIEVE step from HONG_JI.jiin_aggr by way of
        an all-rows scan with a condition of ("(HONG_JI.jiin_aggr.a1  > 2)
        AND (HONG_JI.jiin_aggr.a1 >= 3)") into Spool 1 (group_amps), which
        is built locally on the AMPs. The size of Spool 1 is estimated
        with no confidence to be 1 row. The estimated time for this step
        is 0.03 seconds.
     4) Finally, we send out an END TRANSACTION step to all AMPs involved
        in processing the request.
     -> The contents of Spool 1 are sent back to the user as the result of
        statement 1. The total estimated time is 0.03 seconds.

Example: All Inner Joins With Aggregation in Join Index Definition

You would not expect the Optimizer to use join index jiin_aggr (see Example: All Inner Joins With Aggregation in Join Index Definition above) in its plan for the following query because the condition b1=b2 AND f1=f4 is not covered by the join index defined by jiin_aggr. As a result, the Optimizer specifies a full-table scan to retrieve the specified rows.

The EXPLAIN report text indicates that the Optimizer does not choose jiin_aggr to cover the query:

     EXPLAIN SELECT a1, e1, SUM(x1) AS total
             FROM t1, t2, t4
             WHERE b1=b2
             AND   f1=f4
             AND   a1>2
             GROUP BY 1, 2;

   Explanation
   ---------------------------------------------------------------------
    1) First, we lock a distinct HONG_JI."pseudo table" for read on a
       RowHash to prevent global deadlock for HONG_JI.t4.
    2) Next, we lock a distinct HONG_JI."pseudo table" for read on a
       RowHash to prevent global deadlock for HONG_JI.t2.
    3) We lock a distinct HONG_JI."pseudo table" for read on a RowHash to
       prevent global deadlock for HONG_JI.t1.
    4) We lock HONG_JI.t4 for read, we lock HONG_JI.t2 for read, and we
       lock HONG_JI.t1 for read.
    5) We do an all-AMPs RETRIEVE step from HONG_JI.t1 by way of an
       all-rows scan with a condition of ("HONG_JI.t1.a1 > 2") into Spool
       4 (all_amps), which is duplicated on all AMPs. The size of Spool
       4 is estimated with no confidence to be 2 rows. The estimated
       time for this step is 0.03 seconds.
    6) We do an all-AMPs JOIN step from HONG_JI.t2 by way of an all-rows
       scan with no residual conditions, which is joined to Spool 4 (Last
       Use). HONG_JI.t2 and Spool 4 are joined using a product join,
       with a join condition of ("b1 = HONG_JI.t2.b2"). The result goes
       into Spool 5 (all_amps), which is duplicated on all AMPs. The
       size of Spool 5 is estimated with no confidence to be 3 rows. The
       estimated time for this step is 0.04 seconds.
    7) We do an all-AMPs JOIN step from HONG_JI.t4 by way of an all-rows
       scan with no residual conditions, which is joined to Spool 5 (Last
       Use). HONG_JI.t4 and Spool 5 are joined using a product join,
       with a join condition of ("f1 = HONG_JI.t4.f4"). The result goes
       into Spool 3 (all_amps), which is built locally on the AMPs. The
       size of Spool 3 is estimated with no confidence to be 2 rows. The
       estimated time for this step is 0.04 seconds.
    8) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
       way of an all-rows scan, and the grouping identifier in field 1.
       Aggregate Intermediate Results are computed globally, then placed
       in Spool 6. The size of Spool 6 is estimated with no confidence
       to be 2 rows. The estimated time for this step is 0.05 seconds.
    9) We do an all-AMPs RETRIEVE step from Spool 6 (Last Use) by way of
       an all-rows scan into Spool 1 (group_amps), which is built locally
       on the AMPs. The size of Spool 1 is estimated with no confidence
       to be 2 rows. The estimated time for this step is 0.04 seconds.
   10) Finally, we send out an END TRANSACTION step to all AMPs involved
       in processing the request.
    -> The contents of Spool 1 are sent back to the user as the result of
       statement 1.

Example: More Inner Joined Tables in Aggregate Join Index Definition Than in Query

The following aggregate join index definition specifies inner joins on tables t1, t3, and t6 using conditions that exploit a foreign key-primary key relationship between table t1 and tables t3 and t6, respectively.

     CREATE JOIN INDEX jiin_aggr AS
      SELECT d1, k1, SUM(x1) AS total
      FROM t1, t3, t6
      WHERE d1=d3
      AND   k1=k6
      GROUP BY 1, 2;

You would expect the Optimizer to include join index jiin_aggr in its access plan for the following query even though jiin_aggr is defined with an inner joined table, t6, that the query does not reference. This is acceptable to the Optimizer because of the foreign key-primary key relationship between t1 and the extra table, t6, on columns k1 and k6,which have a foreign key-primary key relationship and are explicitly defined as a foreign key and as the unique primary index for their respective tables.

The bold EXPLAIN report text indicates that the Optimizer does select jiin_aggr for the query plan:

     EXPLAIN SELECT d1, SUM(x1)
             FROM t1, t3
             WHERE d1=d3
             GROUP BY 1;

    *** Help information returned. 17 rows.
    *** Total elapsed time was 1 second.
   
   Explanation
   ---------------------------------------------------------------------------
     1) First, we lock a distinct HONG_JI."pseudo table" for read on a
        RowHash to prevent global deadlock for  HONG_JI.jiin_aggr.
     2) Next, we lock  HONG_JI.jiin_aggr  for read.
     3) We do an all-AMPs SUM step to aggregate from  HONG_JI.jiin_aggr  by
        way of an all-rows scan with no residual conditions, and the
        grouping identifier in field 1. Aggregate Intermediate Results
        are computed locally, then placed in Spool 3. The size of Spool 3
        is estimated with low confidence to be 1 row. The estimated time
        for this step is 0.03 seconds.
     4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
        an all-rows scan into Spool 1 (group_amps), which is built locally
        on the AMPs. The size of Spool 1 is estimated with low confidence
        to be 1 row. The estimated time for this step is 0.04 seconds.
     5) Finally, we send out an END TRANSACTION step to all AMPs involved
        in processing the request.
     -> The contents of Spool 1 are sent back to the user as the result of
        statement 1.

Example: Join Index Left Outer Joined on Six Tables

The following join index definition left outer joins table t1 with, in succession, tables t2, t3, t4, t5, and t6 on a series of equality conditions made on foreign key-primary key relationships among the underlying base tables.

     CREATE JOIN INDEX jiout AS
      SELECT a1, b1, c1, c2, d1, d3, e1, e4, f1, g1, h1, i1, j1, j5,
             k1, k6, x1
      FROM t1
      LEFT OUTER JOIN t2 ON a1=a2 AND b1=b2 AND c1=c2
      LEFT OUTER JOIN t3 ON d1=d3
      LEFT OUTER JOIN t4 ON e1=e4 AND f1=f4
      LEFT OUTER JOIN t5 ON g1=g5 AND h1=h5 AND i1=i5 AND j1=j5
      LEFT OUTER JOIN t6 ON k1=k6;

Even though the following query references fewer tables than are defined in the join index, you would expect the Optimizer to include join index ji_out in its access plan because all the extra outer joins are defined on unique columns and the extra tables are the inner tables in the outer joins.

The bold EXPLAIN report text indicates that the Optimizer does select ji_out for the query plan:

     EXPLAIN SELECT a1, b1, c1, SUM(x1)
             FROM t1, t2
             WHERE a1=a2
             AND   b1=b2
             AND   c1=c2
             GROUP BY 1, 2, 3;

    *** Help information returned. 18 rows.
    *** Total elapsed time was 1 second.
   
   Explanation
   ---------------------------------------------------------------------------
     1) First, we lock a distinct HONG_JI."pseudo table" for read on a
        RowHash to prevent global deadlock for  HONG_JI.jiout.
     2) Next, we lock  HONG_JI.jiout  for read.
     3) We do an all-AMPs SUM step to aggregate from  HONG_JI.jiout  by way
        of an all-rows scan with no residual conditions, and the grouping
        identifier in field 1. Aggregate Intermediate Results are
        computed locally, then placed in Spool 3. The size of Spool 3 is
        estimated with high confidence to be 2 rows. The estimated time
        for this step is 0.03 seconds.
     4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
        an all-rows scan into Spool 1 (group_amps), which is built locally
        on the AMPs. The size of Spool 1 is estimated with high
        confidence to be 2 rows. The estimated time for this step is 0.04
        seconds.
     5) Finally, we send out an END TRANSACTION step to all AMPs involved
        in processing the request.
     -> The contents of Spool 1 are sent back to the user as the result of
        statement 1.

Example: Many More Tables Referenced by Join Index Definition Than Referenced by Query

The following join index definition specifies all inner joins on tables t1, t2, t3, t4, t5 and t6 and specifies equality conditions on all the foreign key-primary key relationships among those tables.

     CREATE JOIN INDEX ji_in AS
      SELECT a1, b1, c1, c2, d1, d3, e1, e4, f1, g1, g5, h1, i1, j1,
             k1, k6, x1
      FROM t1, t2, t3, t4, t5, t6
      WHERE a1=a2
      AND   b1=b2
      AND   c1=c2
      AND   d1=d3
      AND   e1=e4
      AND   f1=f4
      AND   g1=g5
      AND   h1=h5
      AND   i1=i5
      AND   j1=j5
      AND   k1=k6;

Even though 6 tables are referenced in the join index definition, and all its join conditions are inner joins, you would expect the Optimizer to include join index ji_in in its query plan for the following query, which only references 2 of the 6 tables, because all the conditions in the join index definition are based on foreign key-primary key relationships among the underlying base tables.

The bold EXPLAIN report text indicates that the Optimizer does select ji_in for the query plan:

     EXPLAIN SELECT a1, b1, c1, SUM(x1)
             FROM t1, t2
             WHERE a1=a2
             AND   b1=b2
             AND   c1=c2
             GROUP BY 1, 2, 3;
    *** Help information returned. 18 rows.
    *** Total elapsed time was 1 second.
   
   Explanation
   ---------------------------------------------------------------------------
     1) First, we lock a distinct HONG_JI."pseudo table" for read on a
        RowHash to prevent global deadlock for  HONG_JI.ji_in.
     2) Next, we lock  HONG_JI.ji_in  for read.
     3) We do an all-AMPs SUM step to aggregate from  HONG_JI.ji_in  by way
        of an all-rows scan with no residual conditions, and the grouping
        identifier in field 1. Aggregate Intermediate Results are
        computed locally, then placed in Spool 3. The size of Spool 3 is
        estimated with high confidence to be 2 rows. The estimated time
        for this step is 0.03 seconds.
     4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
        an all-rows scan into Spool 1 (group_amps), which is built locally
        on the AMPs. The size of Spool 1 is estimated with high
        confidence to be 2 rows. The estimated time for this step is 0.04
        seconds.
     5) Finally, we send out an END TRANSACTION step to all AMPs involved
        in processing the request.
     -> The contents of Spool 1 are sent back to the user as the result of
        statement 1.

Example: Using a Join Index That Has an Extra Inner Join In Its Definition

The following example illustrates how the Optimizer uses a join index with an extra inner join when the connections among the tables in its definition are appropriately defined.

Suppose you have the following table definitions:

     CREATE SET TABLE fact (
       f_d1 INTEGER NOT NULL,
       f_d2 INTEGER NOT NULL,
     FOREIGN KEY (f_d1) REFERENCES WITH NO CHECK OPTION dim1 (d1),
     FOREIGN KEY (f_d2) REFERENCES WITH NO CHECK OPTION dim2 (d2))
     UNIQUE PRIMARY INDEX (f_d1,f_d2);

     CREATE SET TABLE dim1 (
       a1 INTEGER NOT NULL,
       d1 INTEGER NOT NULL,
     FOREIGN KEY (a1) REFERENCES WITH NO CHECK OPTION dim1_1 (d11))
     UNIQUE PRIMARY INDEX (d1);

     CREATE SET TABLE dim2 (
       d1 INTEGER NOT NULL,
       d2 INTEGER NOT NULL)
     UNIQUE PRIMARY INDEX (d2);

     CREATE SET TABLE dim1_1 (
       d11 INTEGER NOT NULL,
       d22 INTEGER NOT NULL)
     UNIQUE PRIMARY INDEX (d11);

The following graphic sketches the dimensional relationships among these tables:



In the following join index definition, the fact table is joined with dimension tables dim1 and dim2 by foreign key-primary key joins on fact.f_d1=dim1.d1 and fact.f_d2=dim2.d2. Dimension table dim1 is also joined with its dimension subtable dim1_1 by a foreign key-primary key join on dim1.a1=dim1_1.d11. A query on the fact and dim2 tables uses the join index ji_all because of its use of foreign key-primary key relationships among the tables:

     CREATE JOIN INDEX ji_all AS
      SELECT (COUNT(*)(FLOAT)) AS countstar, dim1.d1, dim1_1.d11,
              dim2.d2, (SUM(fact.f_d1)(FLOAT)) AS sum_f1
      FROM fact, dim1, dim2, dim1_1
      WHERE ((fact.f_d1 = dim1.d1 )
      AND   (fact.f_d2 = dim2.d2 ))
      AND   (dim1.a1 = dim1_1.d11 )
      GROUP BY dim1.d1, dim1_1.d11, dim2.d2
      PRIMARY INDEX (d1);

Note that the results of the aggregate operations COUNT and SUM are both typed as FLOAT (see Restrictions on Join Index Aggregate Functions below).

As the bold text in the following EXPLAIN report indicates, the Optimizer uses the join index ji_all for its query plan in this situation because even though table fact is the parent table of both tables dim1 and dim2, those tables are joined with fact on foreign key columns in the join index definition. Similarly, dim1 is joined to dim1_1 in the join index definition on a foreign key column.

     EXPLAIN SELECT d2, SUM(f_d1)
             FROM fact, dim2
             WHERE f_d2=d2
             GROUP BY 1;
    *** Help information returned. 18 rows.
    *** Total elapsed time was 1 second.
   
   Explanation
   ---------------------------------------------------------------------------
     1) First, we lock a distinct HONG_JI."pseudo table" for
        read on a RowHash to prevent global deadlock for
        HONG_JI.ji_all.
     2) Next, we lock  HONG_JI.ji_all  for read.
     3) We do an all-AMPs SUM step to aggregate from
        HONG_JI.ji_all  by way of an all-rows scan with no
        residual conditions, and the grouping identifier in field 1.
        Aggregate Intermediate Results are computed globally, then placed
        in Spool 3. The size of Spool 3 is estimated with no confidence
        to be 3 rows. The estimated time for this step is 0.08 seconds.
     4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
        an all-rows scan into Spool 1 (group_amps), which is built locally
        on the AMPs. The size of Spool 1 is estimated with no confidence
        to be 3 rows. The estimated time for this step is 0.07 seconds.
     5) Finally, we send out an END TRANSACTION step to all AMPs involved
        in processing the request.
     -> The contents of Spool 1 are sent back to the user as the result of
        statement 1.

Examples of Exceptions to the General Rules for Extra Tables in the Join Index Definition

The following examples illustrate cases that are exceptions to the general coverage rules for extra tables in a join index definition (see also Restriction on Coverage by Join Indexes When a Join Index Definition References More Tables Than a Query).

Example: Table T9 is the Parent Table of Tables T7 and T8.

In the following example, table t9 is the parent table of tables t7 and t8. Generally, this relationship disqualifies a join index from covering any query with fewer tables than are referenced in the definition for that index. However, because t7 and t8 are joined on the FK columns (y7=x8) in the join index definition, the Optimizer uses the index ji to cover the query, as you can see by looking at the bold text in the EXPLAIN report:

     CREATE SET TABLE t7(
       x7 INTEGER NOT NULL,
       y7 INTEGER NOT NULL,
       z7 INTEGER NOT NULL,
     CONSTRAINT r7 FOREIGN KEY (y7)
       REFERENCES WITH NO CHECK OPTION t9 (y9))
     PRIMARY INDEX (x7);

     CREATE SET TABLE t8(
       x8 INTEGER NOT NULL,
       y8 INTEGER NOT NULL,
       z8 INTEGER NOT NULL,
     CONSTRAINT r8 FOREIGN KEY (x8)
       REFERENCES WITH NO CHECK OPTION t9 (x9));

     CREATE SET TABLE t9(
       x9 INTEGER NOT NULL UNIQUE,
       y9 INTEGER NOT NULL,
       z9 INTEGER NOT NULL)
     UNIQUE PRIMARY INDEX(y9);

     CREATE JOIN INDEX ji AS
       SELECT x7, y7, x8, y8, x9, y9
       FROM t7, t8, t9
       WHERE y7=x8
       AND   y7=y9
       AND   x8=x9;

     EXPLAIN SELECT x7, y7, x8, y8
             FROM t7, t8
             WHERE y7=x8
             AND   x7>1;

    *** Help information returned. 14 rows.
    *** Total elapsed time was 1 second.
   
Explanation
---------------------------------------------------------------------------
  1) First, we lock a distinct HONG_JI."pseudo table" for
     read on a RowHash to prevent global deadlock for  HONG_JI.ji.
  2) Next, we lock  HONG_JI.ji  for read.
  3) We do an all-AMPs RETRIEVE step from  HONG_JI.ji  by way
     of an all-rows scan with a condition of (
     "HONG_JI.ji.x1 > 1") into Spool 1 (group_amps), which
     is built locally on the AMPs. The size of Spool 1 is estimated
     with no confidence to be 3 rows. The estimated time for this step
     is 0.06 seconds.
  4) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1. The total estimated time is 0.06 seconds.

Example: SELECT Join Index Covers a Query

As with Example: Table T9 is the Parent Table of Tables T7 and T8, this example demonstrates how a join index can be used to cover a query when one table in the join index definition is a parent of two others if the tables are joined on foreign key-primary key relationships. t9 is the parent table of both t7 and t10. But because t7 and t10 are joined with t9 on the same PK column, by transitive closure, t7 and t10 are joined on y7=x10. The Optimizer does select join index ji to cover the query, as the bold text in the EXPLAIN report for the example query demonstrates:

     CREATE SET TABLE t10(
       x10 INTEGER NOT NULL,
       y10 INTEGER NOT NULL,
       z10 INTEGER NOT NULL,
     CONSTRAINT r10 FOREIGN KEY ( x10 )
       REFERENCES WITH NO CHECK OPTION t9 ( y9 ))
     PRIMARY INDEX x10;
     CREATE JOIN INDEX ji AS
       SELECT x7, y7, x10, y10, x9, y9
       FROM t7, t10, t9
       WHERE y7=y9
       AND   x10=y9;
     EXPLAIN SELECT x7, y7, x10, y10
             FROM t7, t10
             WHERE y7=x10
             AND   x7>1;
    *** Help information returned. 14 rows.
    *** Total elapsed time was 1 second.
   
   Explanation
   ---------------------------------------------------------------------------
     1) First, we lock a distinct HONG_JI."pseudo table" for
        read on a RowHash to prevent global deadlock for  HONG_JI.ji.
     2) Next, we lock  HONG_JI.ji  for read.
     3) We do an all-AMPs RETRIEVE step from  HONG_JI.ji  by way
        of an all-rows scan with a condition of ("HONG_JI.ji.x1 > 1")
        into Spool 1 (group_amps), which is built locally on the AMPs.
        The size of Spool 1 is estimated with no confidence to be 3 rows.
        The estimated time for this step is 0.06 seconds.
     4) Finally, we send out an END TRANSACTION step to all AMPs involved
        in processing the request.
     -> The contents of Spool 1 are sent back to the user as the result of
        statement 1. The total estimated time is 0.06 seconds.

Restrictions on Join Index Aggregate Functions

The following restrictions apply to aggregate functions used to define aggregate join indexes:

  • Only the COUNT and SUM functions, in any combination, are valid.
  • COUNT DISTINCT and SUM DISTINCT are not valid.
  • To avoid overflow problems, always type the COUNT and SUM columns in a join index definition as FLOAT or DECIMAL(38,0) when the data type of the base table column is INTEGER or FLOAT, and as DECIMAL(38,xx) when the data type of the base table column is DECIMAL(NN,xx).
IF you … THEN the system …
do not define an explicit data type for a COUNT or SUM column that has a data type of INTEGER or FLOAT assigns the FLOAT data type to it.
do not define an explicit data type for a COUNT or SUM column that has a data type of DECIMAL assigns the DECIMAL(38,xx) data type to it
define a COUNT or SUM column that has a data type of INTEGER as anything other than FLOAT and DECIMAL(38,0) returns an error and does not create the aggregate join index.
define a COUNT or SUM column that has a data type of DECIMAL as anything other than DECIMAL(38,xx) returns an error and does not create the aggregate join index.
define a COUNT or SUM column that has a data type of FLOAT as anything other than FLOAT returns an error and does not create the aggregate join index.

Restrictions on Sparse Join Index WHERE Clause Predicates

The following restrictions apply to WHERE clause join conditions specified in a join index definition:

  • Data types for any columns used in a join condition must be drawn from the same domain because neither explicit nor implicit data type conversions are permitted.
  • Multiple join conditions must be connected using the AND logical operator.

    The OR operator is not a valid way to connect multiple join conditions in a join index definition.

  • You cannot specify independent inequality WHERE clause join conditions in a join index definition.

    The following rules apply to the use of inequality join conditions in a join index definition WHERE clause:

    • Inequality join conditions are supported only if they are ANDed to at least one equality join condition.
    • Inequality join conditions can be specified only for columns having the same data type in order to enforce domain integrity.
    • The only valid comparison operators for an inequality join condition are the following:

      <

      <=

      >

      >=

      The following join index definition is valid because the WHERE clause inequality join condition on o_totalprice and c_acctbal it specifies is ANDed with the previous equality join condition on o_custkey and c_custkey:

           CREATE JOIN INDEX ord_cust_idx AS
            SELECT c_name, o_orderkey, o_orderdate
            FROM orders, customer
            WHERE o_custkey = c_custkey
            AND   o_totalprice > c_acctbal;

      The following join index definition is not valid because the WHERE clause inequality join condition has no logical AND relationship with an equality join condition:

           CREATE JOIN INDEX ord_cust_idx AS
            SELECT c_name, o_orderkey, o_orderdate
            FROM orders, customer
            WHERE o_totalprice > c_acctbal;

Restrictions on Join Index ORDER BY Clauses

The following restrictions apply to ORDER BY clauses specified in a join index definition:

  • Sort order is restricted to ASC.

    DESC is not valid.

  • Aggregate columns and expressions are not permitted as a column_name or column_position specification.
  • Supported data types for column_name are restricted to the following:
    • DATE
    • BYTEINT
    • DECIMAL

      The DECIMAL type is valid only for columns of four or fewer bytes.

    • INTEGER
    • SMALLINT

Restrictions on Load Utilities

You cannot use FastLoad, MultiLoad, or the Teradata Parallel Transporter operators LOAD and UPDATE to load data into base tables that have join indexes because those indexes are not maintained during the execution of these utilities. If you attempt to load data into base tables with join indexes using these utilities, the load operation aborts and the system returns an error message to the requestor.

To load data into a join-indexed base table, you must drop all defined join indexes on that base table before you can run FastLoad, MultiLoad, or the Teradata Parallel Transporter operators LOAD and UPDATE.

Be aware that you cannot drop a join index to enable MultiLoad or FastLoad batch loads until any requests that access that index complete processing. Requests place locks on any join indexes they access, and the system defers processing of any DROP JOIN INDEX requests against locked indexes until their locks have all been released.

Load utilities like Teradata Parallel Data Pump, BTEQ, and the Teradata Parallel Transporter operators INSERT and STREAM, which perform standard SQL row inserts and updates, are supported for join-indexed tables, as are multirow INSERT … SELECT and MERGE requests.