This topic lists several restrictions on join index definitions. For complete details about join index syntax, see Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144.
Restrictions on Number of Join Indexes Defined Per Base Table
The maximum number of secondary, hash, and join indexes that can be defined for a table, in any combination, is 32. This includes the system-defined secondary indexes used to implement PRIMARY KEY and UNIQUE constraints. Each composite NUSI that specifies an ORDER BY clause counts as 2 consecutive indexes in this calculation (see Importance of Consecutive Indexes for Value-Ordered NUSIs). You cannot define join, or any other, indexes on global temporary trace tables (see Teradata Vantage™ - SQL Data Definition Language Detailed Topics, B035-1184).
- table_1 has 32 secondary indexes and no hash or join indexes.
- table_2 has 16 secondary indexes, no hash indexes, and 16 join indexes.
- table_3 has 10 secondary indexes, 10 hash indexes, and 12 join indexes.
- table_4 has no secondary or hash indexes, but has 32 join indexes.
Each of these combinations is valid, but they all operate at the boundaries of the defined limits.
Note that if any of the secondary indexes defined on tables 1, 2, or 3 is a composite NUSI defined with an ORDER BY clause, the defined limits are exceeded, and the last index you attempt to create on the table will fail. Because each composite NUSI defined with an ORDER BY clause counts as 2 consecutive indexes in the count against the maximum of 32 per table, you could define only 8 of them on table_2, for example, if you also defined 16 join indexes on the table.
Restrictions on the Number of Columns Per Referenced Base Table
- The maximum number of columns that can be specified in a join index per referenced base table is 64.
- The maximum number of columns that can be specified in a multitable join index definition is 2,048.
- When you specify join index row compression, then each of the column_1 and column_2 sets is limited to 64 common base table references.
Restrictions on the Use of the System-Derived PARTITION[#L n] Column
You cannot use a system-derived PARTITION[#L n] column in a join index definition.
Restrictions on Outer Join Definitions
- FULL OUTER JOIN is not valid.
- When you specify a LEFT or RIGHT outer join, the following rules apply:
- The outer table joining column for each condition must be contained in either column_1_name or column_2_name.
- The inner table of each join condition must have at least one non-nullable column in either column_1_name or column_2_name.
Restrictions on Secondary Index Definitions
You cannot define unique secondary indexes on a join index.
You can define the primary index of a join index to be row partitioned if and only if the join index is not also defined with row compression. For more information about specifying partitioned primary indexes for join indexes, see Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144.
Restrictions on Built-In Functions and Join Index Definitions
When you create a join index that specifies a built-in, or system, function in its WHERE clause (see Teradata Vantage™ - SQL Functions, Expressions, and Predicates, B035-1145 for more information about built-in functions), the system resolves the function at the time the join index is created and then stores the result as part of the index definition rather than evaluating it dynamically at the time the Optimizer would use the index to build a query plan.
As a result, the Optimizer does not use a join index in the access plan for a query that qualifies its WHERE clause with the same built-in function used to define that join index because it cannot determine whether the index covers the query or not. The Optimizer does use the join index if the query specifies an explicit value in its WHERE clause that matches the resolved value stored in the index definition.
(The exception to this is the CURRENT_TIME and CURRENT_TIMESTAMP functions, which are resolved for a query. The resolved value is used to check if the query can be covered by a join index.)
For example, suppose you decide to define a join index using the CURRENT_DATE built-in function on January 4, 2010 as follows:
CREATE JOIN INDEX curr_date AS SELECT * FROM orders WHERE order_date = CURRENT_DATE;
On January 7, 2010, you perform the following SELECT statement:
SELECT * FROM orders WHERE order_date = CURRENT_DATE;
When you EXPLAIN this query, you find that the Optimizer does not use join index curr_date because the date stored in the index definition is the explicit value ‘2010-01-04’, not the current system date ‘2010-01-07’.
Note that if you had defined curr_date with a predicate of order_date >= CURRENT_DATE instead of order_date = CURRENT_DATE, then the Optimizer could use curr_date to cover the query if it were the least costly way to process the request.
On the other hand, if you were to perform the following SELECT statement on January 7, 2010, or any other date, retaining the original curr_date predicate, the Optimizer does use join index curr_date for the query plan because the statement explicitly specifies the same date that was stored with the join index definition when it was created:
SELECT * FROM orders WHERE order_date = DATE ‘2010-01-04’;
Restriction on Number of Join Indexes Selected Per Query
The Optimizer can use several join indexes for a single query, selecting one multitable join index as well as additional single-table join indexes for its join plan. The join indexes selected depend on the structure of the query, and the Optimizer might not choose all applicable join indexes for the plan. Always examine your EXPLAIN reports to determine which join indexes are used for the join plans generated for your queries. If a join index you think should have been used by a query was not included in the join plan, try restructuring the query and then EXPLAIN it once again.
The limit on the number of join indexes considered per query is enforced to limit the number of possible combinations and permutations of table joins in the Optimizer search space during its join planning phase. The rule helps to ensure that the optimization is worth the effort. In other words, that the time spent generating the query plan does not exceed the accrued performance enhancement.
Restrictions on Partial Covering by Join Indexes
- One of the columns in the index definition is the keyword ROWID.
You can only specify ROWID in the outermost SELECT of the CREATE JOIN INDEX statement (see Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144).
- The column set defining the UPI of the underlying base table is also carried in the definition.
- The column set defining the NUPI of the underlying base table plus either of the following is also carried in the definition:
- One of the columns in the definition of that index is the keyword ROWID.
You can only specify ROWID in the outermost SELECT of a CREATE JOIN INDEX request.
- The column set defining a USI on the underlying base table.
The ROWID option is the preferable choice.
- One of the columns in the definition of that index is the keyword ROWID.
- Partial covering is not supported for the inner table of an outer join.
- Partial covering is not supported for queries that contain a TOP n or TOP m PERCENT clause.
If statistics indicate that it would be cost-effective, the Optimizer can specify that the partially covering single-table join index be joined to one of its underlying base tables using either the ROWID or the UPI or USI to join to the column data not defined for the index itself.
Even though you do not explicitly specify this join when you write your query, it counts against the 128 table restriction on joins.
For example, suppose you define the tables t1, t2, and t3 and the join indexes j1 and j2 as follows:
CREATE TABLE t1 ( a1 INTEGER, b1 INTEGER, c1 INTEGER); CREATE TABLE t2 ( a2 INTEGER, b2 INTEGER, c2 INTEGER); CREATE TABLE t3 ( a3 INTEGER, b3 INTEGER, c3 INTEGER); CREATE JOIN INDEX j1 AS SELECT b1, b2, t1.ROWID AS t1rowid FROM t1,t2 WHERE a1=a2; CREATE JOIN INDEX j2 AS SELECT b1, b2, t1.ROWID t1rowid, t2.rowid t2rowid FROM t1,t2 WHERE a1=b2;
Join index j1 partially covers the following queries:
SELECT a1, b1, c1, b2 FROM t1,t2 WHERE t1.a1=t2.b2 AND t1.b1=10; SELECT a1, b1, c1, b2, b3, c3 FROM t1,t2,t3 WHERE t1.a1=t2.b2 AND t1.b1=t3.a3 AND t3.b3 > 0;
The same join index does not partially cover the following queries:
SELECT * FROM t1,t2 WHERE t1.a1=t2.b2 AND t1.b1=10; SELECT * FROM t1,t2,t3 WHERE t1.a1=t2.b2 AND t1.b1=t3.a3 AND t3.b3 > 0;
Join index j2, on the other hand, partially covers all of these queries.
Even though you do not explicitly specify the join back to the base table when you write your query, it counts against the 64 tables per query block restriction on joins.
Be aware that a join index defined with an expression in its select list provides less coverage than a join index that is defined using base columns.
For example, the Optimizer can use join index ji_f1 to rewrite a query that specifies any character function on f1.
CREATE JOIN INDEX ji_f1 AS SELECT b1, f1 FROM t1 WHERE a1 > 0;
At the same time, because it is defined with a SUBSTR expression in its select list, the Optimizer can only use join index ji_substr_f1 to rewrite a query that specifies the same SUBSTR function on f1.
CREATE JOIN INDEX ji_substr_f1 AS SELECT b1, SUBSTR(f1,1,10) AS s FROM t1 WHERE a1 > 0;
Restriction on Coverage by Join Indexes When a Join Index Definition References More Tables Than a Query
Whether the Optimizer decides to include a join index in its query plan is a more complicated choice than simply determining if the index contains all the table columns specified in the query. The columns on which the base tables in the index definition are joined and their respective referential constraints also play an important role. See Rules for Whether Join Indexes With Extra Tables Cover Queries.
In many cases, the Optimizer does not consider using a join index in its access plan if that index is defined on more tables than the query references. This is because the so-called extra inner joins involving the tables not referenced by the query can cause both spurious row loss and spurious duplicate row creation during the optimization process, and either outcome produces incorrect results.
This outcome can be avoided, and the join index used in the query access plan, if the extra inner joins are defined on primary key-foreign key relationships in the underlying base tables that ensure proper row preservation. Such a join index is referred to as a broad join index. The referential integrity relationship between the base table primary and foreign keys can be specified using any of the three available methods for establishing referential constraints between tables. For more information, see Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144.
A broad join index is a covering join index whose definition includes one or more tables that is not specified in the query it covers. A wide range of queries can make use of a broad join index, especially when there are foreign key-primary key relationships defined between the fact table and the dimension tables that enable the index to be used to cover queries over a subset of dimension tables.
In the case of join index outer joins, outer table rows are always preserved automatically, so there is no requirement for a referential integrity constraint to exist in order to preserve them.
In the case of foreign key-primary key inner joins, the same preservation of rows follows from a declarative referential integrity constraint. In this case, the Optimizer does consider a join index with extra inner joins in its definition to cover a query. The following paragraphs explain why a referential constraint preserves logical integrity:
Assume that the base tables in a join index definition can be divided into two distinct sets, s1 and s2.
s1 contains the base tables referenced in the query, while s2 contains the extra base tables the query does not reference. The base tables in s1 are joined to the base tables in s2 on foreign key-primary key columns, with the tables in s2 being the primary keys in the relationships. The foreign key values cannot be null because if the foreign key column set contains nulls, the losslessness of the foreign key table cannot be guaranteed.
- The extra joins do not eliminate valid rows from the join result among the base tables in s1 because FOREIGN KEY and NOT NULL constraints ensure that every row in the foreign key table finds its match in the primary key table.
- The extra joins do not introduce duplicate rows in the join result among the base tables in s1 because the primary key is, by definition, unique and not nullable.
These assertions are also true for extra joins made between base tables that are both in s2.
Therefore, the extra joins in a join index definition, if made on base tables that are defined in a way that observes these assertions, preserve all the rows resulting from the joins among the base tables in s1 and do not add spurious rows. This result permits the Optimizer to use the join index to cover a query that references fewer tables than the index definition inner joins together.
The Optimizer can select a join index for a query plan if the index contains either the same set, or a subset, of the tables referenced by the query. If more tables are referenced in the join index definition than are referenced by the query, the Optimizer generally does not consider that index as a candidate for coverage because the extra joins can either eliminate rows or produce duplicate rows or both. Because a referential integrity relationship guarantees the losslessness of the foreign key table in the join with its primary key table, extra tables in a join index definition do not disqualify it from query plan consideration if the extra joins allow the join index to preserve all the rows for the join result of the subset of tables in the query.
For example, suppose you define a join index on a set of 5 tables, t1 - t5, respectively, with foreign key-primary key joins in the directions indicated (arrows point from a foreign key table to its parent primary key table) by the following diagram:
t1 → t2 → t3 → t4 → t5
- t1
- t1, t2
- t1, t2, t3
- t1, t2, t3, t4
As a result of this property, the following conditions that reference extra joins can be exploited by the Optimizer when the number of tables referenced by the join index definition exceeds the number of tables referenced by the query. In each case, x1 is a unique RI-related column in table t1 and x2 is a unique RI-related column in table t2:
Join Index Extra Join Condition | Qualifications |
---|---|
x1 = x2 |
|
x1 = x2 |
|
x1 = x2 |
|
x1 = x2 |
|
One restriction with two critical exceptions must be added to the above optimization to make the coverage safe: when one table referenced in the join index definition is the parent table of more than one FK table, the join index is generally disqualified from covering any query that references fewer tables than are referenced in the join index.
For example, suppose you define a join index on a set of 5 tables, t1 - t5, respectively, with foreign key-primary key joins in the directions indicated (arrows point from a foreign key table to its parent primary key table) by the following diagram:
t1 → t2 → t3 → t4 ← t5
For these RI relationships, table t4 is the parent table of both tables t3 and t5. The losslessness of the foreign key table depends on the fact that the parent table has all the primary keys available. Joining the parent table with another child table can render this premise false. Therefore, the final join result cannot be viewed as lossless for any arbitrary subset of tables.
- When the foreign key tables are joined on the foreign key columns, there is no loss on any of the foreign key tables because they all reference the same primary key values in their common parent table.
- All foreign key tables reference the same primary key column in the primary key table. By transitive closure, all these foreign key tables are related by equijoins on the foreign key columns.
By specifying extra joins in the join index definition, you can greatly enhance its flexibility.
- Customer
- Product
- Location
- Time
You decide it is desirable to define a join index that joins the fact table Sales to its various dimension tables in order to avoid the relatively expensive join processing between the fact table and its dimension tables whenever ad hoc join queries are made against them.
If there are foreign key-primary key relationships between the join columns, which is often the case, the join index can also be used to optimize queries that only reference a subset of the dimension tables.
Without taking advantage of this optimization, you must either create a different join index for each category of query, incurring the greater cost of maintaining multiple join indexes, or you lose the benefit of join indexes for optimizing the join queries on these tables altogether. By exploiting the foreign key-primary key join properties, the same join index can be selected by the Optimizer to generate access plans for a wide variety of queries.
As is always true, even if this optimization can be used for a given situation, the plan produced using it is compared with other generated plans, and the least costly plan from the generated plan set is the one that the system uses.
Rules for Whether Join Indexes With Extra Tables Cover Queries
The following rules explain how to design a set of underlying base tables for a join index definition in such a way to ensure that the Optimizer selects the index for an access plan if it inner joins more tables than the query references.
IF there are more inner-joined tables in a join index definition than the number of tables referenced in a query and … | THEN the Optimizer … |
---|---|
the extra joins are on not made on foreign key-primary key columns in the underlying base tables | does not consider the join index for the query plan. This is because the presence of extra joins in the definition can either eliminate existing rows from the query evaluation or produce duplicate rows during optimization. |
the extra joins are made on foreign key-primary key columns in the underlying base tables | considers the join index for use in for the query plan. |
both of the following conditions
are true:
|
Examples That Obey the General Covering Rules for Extra Tables in the Join Index Definition
The following set of base tables, join indexes, queries, and EXPLAIN reports demonstrate how the referential integrity relationships among the underlying base tables in a join index definition influence whether the Optimizer selects the index for queries that reference fewer base tables than are referenced by the join index.
CREATE SET TABLE t1, NO BEFORE JOURNAL, NO AFTER JOURNAL ( x1 INTEGER NOT NULL, a1 INTEGER NOT NULL, b1 INTEGER NOT NULL, c1 INTEGER NOT NULL, d1 INTEGER NOT NULL, e1 INTEGER NOT NULL, f1 INTEGER NOT NULL, g1 INTEGER NOT NULL, h1 INTEGER NOT NULL, i1 INTEGER NOT NULL, j1 INTEGER NOT NULL, k1 INTEGER NOT NULL, CONSTRAINT ri1 FOREIGN KEY (a1, b1, c1) REFERENCES t2 (a2,b2,c2), CONSTRAINT ri2 FOREIGN KEY (d1) REFERENCES t3(d3), CONSTRAINT ri3 FOREIGN KEY (e1,f1) REFERENCES t4 (e4,f4), CONSTRAINT ri4 FOREIGN KEY (g1,h1,i1,j1) REFERENCES t5(g5,h5,i5,j5), CONSTRAINT ri5 FOREIGN KEY (k1) REFERENCES t6(k6)); CREATE SET TABLE t2, NO BEFORE JOURNAL, NO AFTER JOURNAL ( a2 INTEGER NOT NULL, b2 INTEGER NOT NULL, c2 INTEGER NOT NULL, x2 INTEGER) UNIQUE PRIMARY INDEX(a2, b2, c2); CREATE SET TABLE t3, NO BEFORE JOURNAL, NO AFTER JOURNAL ( d3 INTEGER NOT NULL, x3 INTEGER) UNIQUE PRIMARY INDEX(d3); CREATE SET TABLE t4, NO BEFORE JOURNAL, NO AFTER JOURNAL ( e4 INTEGER NOT NULL, f4 INTEGER NOT NULL, x4 INTEGER) UNIQUE PRIMARY INDEX(e4, f4); CREATE SET TABLE t5, NO BEFORE JOURNAL, NO AFTER JOURNAL ( g5 INTEGER NOT NULL, h5 INTEGER NOT NULL, i5 INTEGER NOT NULL, j5 INTEGER NOT NULL, x5 INTEGER) UNIQUE PRIMARY INDEX(g5, h5, i5, j5); CREATE SET TABLE t6, NO BEFORE JOURNAL, NO AFTER JOURNAL ( k6 INTEGER not null, x6 INTEGER) UNIQUE PRIMARY INDEX(k6);
Example: All Outer Joins in Join Index Definition
The following join index definition left outer joins t1 to t3 on da=d3 and then left outer joins that result to t6 on k1=k6.
CREATE JOIN INDEX jiout AS SELECT d1, d3, k1, k6, x1 FROM t1 LEFT OUTER JOIN t3 ON d1=d3 LEFT OUTER JOIN t6 ON k1=k6;
You would expect the Optimizer to use jiout with the following query, because all the outer joins in the join index are inner joined to the query tables on unique columns (d1 and k1 are declared foreign keys in t1 and d3 and k6, the primary keys for t3 and t6, respectively, are declared as the unique primary index for those tables).
The bold EXPLAIN report text indicates that the Optimizer does use jiout in its query plan.
EXPLAIN SELECT d1, SUM(x1) FROM t1, t3 WHERE d1=d3 GROUP BY 1; *** Help information returned. 17 rows. *** Total elapsed time was 1 second. Explanation ------------------------------------------------------------------------ 1) First, we lock HONG_JI.jiout for read on a reserved RowHash to prevent a global deadlock. 2) Next, we do an all-AMPs SUM step to aggregate from HONG_JI.jiout by way of an all-rows scan with no residual conditions, and the grouping identifier in field 1. Aggregate Intermediate Results are computed locally, then placed in Spool 3. The size of Spool 3 is estimated with low confidence to be 1 row. The estimated time for this step is 0.03 seconds. 3) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 1 row. The estimated time for this step is 0.04 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1.
Example: All Inner Joins Without Aggregation in Join Index Definition
The following simple join index definition specifies inner joins on tables t1, t3 and t6.
CREATE JOIN INDEX jiin AS SELECT d1, d3, k1, k6, x1 FROM t1, t3, t6 WHERE d1=d3 AND k1=k6;
You would expect the Optimizer to use jiin with the following query, and the bold EXPLAIN report text indicates that it does.
EXPLAIN SELECT d1, SUM(x1) FROM t1, t3 WHERE d1=d3 GROUP BY 1; *** Help information returned. 17 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock HONG_JI.jiin for read on a reserved RowHash to prevent a global deadlock. 2) Next, we do an all-AMPs SUM step to aggregate from HONG_JI.jiin by way of an all-rows scan with no residual conditions, and the grouping identifier in field 1. Aggregate Intermediate Results are computed locally, then placed in Spool 3. The size of Spool 3 is estimated with low confidence to be 1 row. The estimated time for this step is 0.03 seconds. 3) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 1 row. The estimated time for this step is 0.04 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1.
Example: All Inner Joins With Aggregation in Join Index Definition
The following aggregate join index definition specifies inner joins on tables t1, t2, and t4.
CREATE JOIN INDEX jiin_aggr AS SELECT a1, e1, SUM(x1) AS total FROM t1, t2, t4 WHERE a1=a2 AND e1=e4 AND a1>1 GROUP BY 1, 2;
You would expect the Optimizer to use join index jiin_aggr in its plan for the following query because it has the same join term as the query.
The bold EXPLAIN report text indicates that the Optimizer does use jiin_aggr in its plan:
EXPLAIN SELECT a1, e1, SUM(x1) AS total FROM t1, t2, t4 WHERE a1=a2 AND e1=e4 AND a1>2 GROUP BY 1, 2; *** Help information returned. 13 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock HONG_JI.jiin_aggr for read on a reserved RowHash to prevent a global deadlock. 2) Next, we do an all-AMPs RETRIEVE step from HONG_JI.jiin_aggr by way of an all-rows scan with a condition of ("(HONG_JI.jiin_aggr.a1 > 2) AND (HONG_JI.jiin_aggr.a1 >= 3)") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 1 row. The estimated time for this step is 0.03 seconds. 3) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.03 seconds.
Example: All Inner Joins With Aggregation in Join Index Definition
You would not expect the Optimizer to use join index jiin_aggr (see Example: All Inner Joins With Aggregation in Join Index Definition above) in its plan for the following query because the condition b1=b2 AND f1=f4 is not covered by the join index defined by jiin_aggr. As a result, the Optimizer specifies a full-table scan to retrieve the specified rows.
The EXPLAIN report text indicates that the Optimizer does not choose jiin_aggr to cover the query:
EXPLAIN SELECT a1, e1, SUM(x1) AS total FROM t1, t2, t4 WHERE b1=b2 AND f1=f4 AND a1>2 GROUP BY 1, 2; Explanation --------------------------------------------------------------------- 1) First, we lock HONG_JI.t4 for read on a reserved RowHash to prevent a global deadlock. 2) Next, we lock HONG_JI.t2 for read on a reserved RowHash to prevent a global deadlock. 3) We lock HONG_JI.t1 for read on a reserved RowHash to prevent a global deadlock. 4) We do an all-AMPs RETRIEVE step from HONG_JI.t1 by way of an all-rows scan with a condition of ("HONG_JI.t1.a1 > 2") into Spool 4 (all_amps), which is duplicated on all AMPs. The size of Spool 4 is estimated with no confidence to be 2 rows. The estimated time for this step is 0.03 seconds. 5) We do an all-AMPs JOIN step from HONG_JI.t2 by way of an all-rows scan with no residual conditions, which is joined to Spool 4 (Last Use). HONG_JI.t2 and Spool 4 are joined using a product join, with a join condition of ("b1 = HONG_JI.t2.b2"). The result goes into Spool 5 (all_amps), which is duplicated on all AMPs. The size of Spool 5 is estimated with no confidence to be 3 rows. The estimated time for this step is 0.04 seconds. 6) We do an all-AMPs JOIN step from HONG_JI.t4 by way of an all-rows scan with no residual conditions, which is joined to Spool 5 (Last Use). HONG_JI.t4 and Spool 5 are joined using a product join, with a join condition of ("f1 = HONG_JI.t4.f4"). The result goes into Spool 3 (all_amps), which is built locally on the AMPs. The size of Spool 3 is estimated with no confidence to be 2 rows. The estimated time for this step is 0.04 seconds. 7) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by way of an all-rows scan, and the grouping identifier in field 1. Aggregate Intermediate Results are computed globally, then placed in Spool 6. The size of Spool 6 is estimated with no confidence to be 2 rows. The estimated time for this step is 0.05 seconds. 8) We do an all-AMPs RETRIEVE step from Spool 6 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 2 rows. The estimated time for this step is 0.04 seconds. 9) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1.
Example: More Inner Joined Tables in Aggregate Join Index Definition Than in Query
The following aggregate join index definition specifies inner joins on tables t1, t3, and t6 using conditions that exploit a foreign key-primary key relationship between table t1 and tables t3 and t6, respectively.
CREATE JOIN INDEX jiin_aggr AS SELECT d1, k1, SUM(x1) AS total FROM t1, t3, t6 WHERE d1=d3 AND k1=k6 GROUP BY 1, 2;
You would expect the Optimizer to include join index jiin_aggr in its access plan for the following query even though jiin_aggr is defined with an inner joined table, t6, that the query does not reference. This is acceptable to the Optimizer because of the foreign key-primary key relationship between t1 and the extra table, t6, on columns k1 and k6,which have a foreign key-primary key relationship and are explicitly defined as a foreign key and as the unique primary index for their respective tables.
The bold EXPLAIN report text indicates that the Optimizer does select jiin_aggr for the query plan:
EXPLAIN SELECT d1, SUM(x1) FROM t1, t3 WHERE d1=d3 GROUP BY 1; *** Help information returned. 17 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock HONG_JI.jiin_aggr for read on a reserved RowHash to prevent a global deadlock. 2) Next, we do an all-AMPs SUM step to aggregate from HONG_JI.jiin_aggr by way of an all-rows scan with no residual conditions, and the grouping identifier in field 1. Aggregate Intermediate Results are computed locally, then placed in Spool 3. The size of Spool 3 is estimated with low confidence to be 1 row. The estimated time for this step is 0.03 seconds. 3) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 1 row. The estimated time for this step is 0.04 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1.
Example: Join Index Left Outer Joined on Six Tables
The following join index definition left outer joins table t1 with, in succession, tables t2, t3, t4, t5, and t6 on a series of equality conditions made on foreign key-primary key relationships among the underlying base tables.
CREATE JOIN INDEX jiout AS SELECT a1, b1, c1, c2, d1, d3, e1, e4, f1, g1, h1, i1, j1, j5, k1, k6, x1 FROM t1 LEFT OUTER JOIN t2 ON a1=a2 AND b1=b2 AND c1=c2 LEFT OUTER JOIN t3 ON d1=d3 LEFT OUTER JOIN t4 ON e1=e4 AND f1=f4 LEFT OUTER JOIN t5 ON g1=g5 AND h1=h5 AND i1=i5 AND j1=j5 LEFT OUTER JOIN t6 ON k1=k6;
Even though the following query references fewer tables than are defined in the join index, you would expect the Optimizer to include join index ji_out in its access plan because all the extra outer joins are defined on unique columns and the extra tables are the inner tables in the outer joins.
The bold EXPLAIN report text indicates that the Optimizer does select ji_out for the query plan:
EXPLAIN SELECT a1, b1, c1, SUM(x1) FROM t1, t2 WHERE a1=a2 AND b1=b2 AND c1=c2 GROUP BY 1, 2, 3; *** Help information returned. 18 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock HONG_JI.jiout for read on a reserved RowHash to prevent a global deadlock. 2) Next, we do an all-AMPs SUM step to aggregate from HONG_JI.jiout by way of an all-rows scan with no residual conditions, and the grouping identifier in field 1. Aggregate Intermediate Results are computed locally, then placed in Spool 3. The size of Spool 3 is estimated with high confidence to be 2 rows. The estimated time for this step is 0.03 seconds. 3) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 2 rows. The estimated time for this step is 0.04 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1.
Example: Many More Tables Referenced by Join Index Definition Than Referenced by Query
The following join index definition specifies all inner joins on tables t1, t2, t3, t4, t5 and t6 and specifies equality conditions on all the foreign key-primary key relationships among those tables.
CREATE JOIN INDEX ji_in AS SELECT a1, b1, c1, c2, d1, d3, e1, e4, f1, g1, g5, h1, i1, j1, k1, k6, x1 FROM t1, t2, t3, t4, t5, t6 WHERE a1=a2 AND b1=b2 AND c1=c2 AND d1=d3 AND e1=e4 AND f1=f4 AND g1=g5 AND h1=h5 AND i1=i5 AND j1=j5 AND k1=k6;
Even though 6 tables are referenced in the join index definition, and all its join conditions are inner joins, you would expect the Optimizer to include join index ji_in in its query plan for the following query, which only references 2 of the 6 tables, because all the conditions in the join index definition are based on foreign key-primary key relationships among the underlying base tables.
The bold EXPLAIN report text indicates that the Optimizer does select ji_in for the query plan:
EXPLAIN SELECT a1, b1, c1, SUM(x1) FROM t1, t2 WHERE a1=a2 AND b1=b2 AND c1=c2 GROUP BY 1, 2, 3; *** Help information returned. 18 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock HONG_JI.ji_in for read on a reserved RowHash to prevent a global deadlock. 2) Next, we do an all-AMPs SUM step to aggregate from HONG_JI.ji_in by way of an all-rows scan with no residual conditions, and the grouping identifier in field 1. Aggregate Intermediate Results are computed locally, then placed in Spool 3. The size of Spool 3 is estimated with high confidence to be 2 rows. The estimated time for this step is 0.03 seconds. 3) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 2 rows. The estimated time for this step is 0.04 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1.
Example: Using a Join Index That Has an Extra Inner Join In Its Definition
The following example illustrates how the Optimizer uses a join index with an extra inner join when the connections among the tables in its definition are appropriately defined.
Suppose you have the following table definitions:
CREATE SET TABLE fact ( f_d1 INTEGER NOT NULL, f_d2 INTEGER NOT NULL, FOREIGN KEY (f_d1) REFERENCES WITH NO CHECK OPTION dim1 (d1), FOREIGN KEY (f_d2) REFERENCES WITH NO CHECK OPTION dim2 (d2)) UNIQUE PRIMARY INDEX (f_d1,f_d2); CREATE SET TABLE dim1 ( a1 INTEGER NOT NULL, d1 INTEGER NOT NULL, FOREIGN KEY (a1) REFERENCES WITH NO CHECK OPTION dim1_1 (d11)) UNIQUE PRIMARY INDEX (d1); CREATE SET TABLE dim2 ( d1 INTEGER NOT NULL, d2 INTEGER NOT NULL) UNIQUE PRIMARY INDEX (d2); CREATE SET TABLE dim1_1 ( d11 INTEGER NOT NULL, d22 INTEGER NOT NULL) UNIQUE PRIMARY INDEX (d11);
The following graphic sketches the dimensional relationships among these tables:
In the following join index definition, the fact table is joined with dimension tables dim1 and dim2 by foreign key-primary key joins on fact.f_d1=dim1.d1 and fact.f_d2=dim2.d2. Dimension table dim1 is also joined with its dimension subtable dim1_1 by a foreign key-primary key join on dim1.a1=dim1_1.d11. A query on the fact and dim2 tables uses the join index ji_all because of its use of foreign key-primary key relationships among the tables:
CREATE JOIN INDEX ji_all AS SELECT (COUNT(*)(FLOAT)) AS countstar, dim1.d1, dim1_1.d11, dim2.d2, (SUM(fact.f_d1)(FLOAT)) AS sum_f1 FROM fact, dim1, dim2, dim1_1 WHERE ((fact.f_d1 = dim1.d1 ) AND (fact.f_d2 = dim2.d2 )) AND (dim1.a1 = dim1_1.d11 ) GROUP BY dim1.d1, dim1_1.d11, dim2.d2 PRIMARY INDEX (d1);
Note that the results of the aggregate operations COUNT and SUM are both typed as FLOAT (see Restrictions on Join Index Aggregate Functions below).
As the bold text in the following EXPLAIN report indicates, the Optimizer uses the join index ji_all for its query plan in this situation because even though table fact is the parent table of both tables dim1 and dim2, those tables are joined with fact on foreign key columns in the join index definition. Similarly, dim1 is joined to dim1_1 in the join index definition on a foreign key column.
EXPLAIN SELECT d2, SUM(f_d1) FROM fact, dim2 WHERE f_d2=d2 GROUP BY 1; *** Help information returned. 18 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock HONG_JI.ji_all for read on a reserved RowHash to prevent a global deadlock. 2) Next, we do an all-AMPs SUM step to aggregate from HONG_JI.ji_all by way of an all-rows scan with no residual conditions, and the grouping identifier in field 1. Aggregate Intermediate Results are computed globally, then placed in Spool 3. The size of Spool 3 is estimated with no confidence to be 3 rows. The estimated time for this step is 0.08 seconds. 3) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 3 rows. The estimated time for this step is 0.07 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1.
Examples of Exceptions to the General Rules for Extra Tables in the Join Index Definition
The following examples illustrate cases that are exceptions to the general coverage rules for extra tables in a join index definition.
Example: Table T9 is the Parent Table of Tables T7 and T8.
In the following example, table t9 is the parent table of tables t7 and t8. Generally, this relationship disqualifies a join index from covering any query with fewer tables than are referenced in the definition for that index. However, because t7 and t8 are joined on the FK columns (y7=x8) in the join index definition, the Optimizer uses the index ji to cover the query, as you can see by looking at the bold text in the EXPLAIN report:
CREATE SET TABLE t7( x7 INTEGER NOT NULL, y7 INTEGER NOT NULL, z7 INTEGER NOT NULL, CONSTRAINT r7 FOREIGN KEY (y7) REFERENCES WITH NO CHECK OPTION t9 (y9)) PRIMARY INDEX (x7); CREATE SET TABLE t8( x8 INTEGER NOT NULL, y8 INTEGER NOT NULL, z8 INTEGER NOT NULL, CONSTRAINT r8 FOREIGN KEY (x8) REFERENCES WITH NO CHECK OPTION t9 (x9)); CREATE SET TABLE t9( x9 INTEGER NOT NULL UNIQUE, y9 INTEGER NOT NULL, z9 INTEGER NOT NULL) UNIQUE PRIMARY INDEX(y9); CREATE JOIN INDEX ji AS SELECT x7, y7, x8, y8, x9, y9 FROM t7, t8, t9 WHERE y7=x8 AND y7=y9 AND x8=x9; EXPLAIN SELECT x7, y7, x8, y8 FROM t7, t8 WHERE y7=x8 AND x7>1; *** Help information returned. 14 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock HONG_JI.ji for read on a reserved RowHash to prevent a global deadlock. 2) Next, we do an all-AMPs RETRIEVE step from HONG_JI.ji by way of an all-rows scan with a condition of ( "HONG_JI.ji.x1 > 1") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 3 rows. The estimated time for this step is 0.06 seconds. 3) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.06 seconds.
Example: SELECT Join Index Covers a Query
As with Example: Table T9 is the Parent Table of Tables T7 and T8, this example demonstrates how a join index can be used to cover a query when one table in the join index definition is a parent of two others if the tables are joined on foreign key-primary key relationships. t9 is the parent table of both t7 and t10. But because t7 and t10 are joined with t9 on the same PK column, by transitive closure, t7 and t10 are joined on y7=x10. The Optimizer does select join index ji to cover the query, as the bold text in the EXPLAIN report for the example query demonstrates:
CREATE SET TABLE t10( x10 INTEGER NOT NULL, y10 INTEGER NOT NULL, z10 INTEGER NOT NULL, CONSTRAINT r10 FOREIGN KEY ( x10 ) REFERENCES WITH NO CHECK OPTION t9 ( y9 )) PRIMARY INDEX x10; CREATE JOIN INDEX ji AS SELECT x7, y7, x10, y10, x9, y9 FROM t7, t10, t9 WHERE y7=y9 AND x10=y9; EXPLAIN SELECT x7, y7, x10, y10 FROM t7, t10 WHERE y7=x10 AND x7>1; *** Help information returned. 14 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock HONG_JI.ji for read on a reserved RowHash to prevent a global deadlock. 2) Next, we do an all-AMPs RETRIEVE step from HONG_JI.ji by way of an all-rows scan with a condition of ("HONG_JI.ji.x1 > 1") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 3 rows. The estimated time for this step is 0.06 seconds. 3) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.06 seconds.
Restrictions on Join Index Aggregate Functions
- Only the COUNT and SUM functions, in any combination, are valid.
- COUNT DISTINCT and SUM DISTINCT are not valid.
- To avoid overflow problems, always type the COUNT and SUM columns in a join index definition as FLOAT or DECIMAL(38,0) when the data type of the base table column is INTEGER or FLOAT, and as DECIMAL(38,xx) when the data type of the base table column is DECIMAL(NN,xx).
IF you … | THEN the system … |
---|---|
do not define an explicit data type for a COUNT or SUM column that has a data type of INTEGER or FLOAT | assigns the FLOAT data type to it. |
do not define an explicit data type for a COUNT or SUM column that has a data type of DECIMAL | assigns the DECIMAL(38,xx) data type to it |
define a COUNT or SUM column that has a data type of INTEGER as anything other than FLOAT and DECIMAL(38,0) | returns an error and does not create the aggregate join index. |
define a COUNT or SUM column that has a data type of DECIMAL as anything other than DECIMAL(38,xx) | returns an error and does not create the aggregate join index. |
define a COUNT or SUM column that has a data type of FLOAT as anything other than FLOAT | returns an error and does not create the aggregate join index. |
Restrictions on Sparse Join Index WHERE Clause Predicates
- Data types for any columns used in a join condition must be drawn from the same domain because neither explicit nor implicit data type conversions are permitted.
- Multiple join conditions must be connected using the AND logical operator.
The OR operator is not a valid way to connect multiple join conditions in a join index definition.
- You cannot specify independent inequality WHERE clause join conditions in a join index definition.
The following rules apply to the use of inequality join conditions in a join index definition WHERE clause:
- Inequality join conditions are supported only if they are ANDed to at least one equality join condition.
- Inequality join conditions can be specified only for columns having the same data type in order to enforce domain integrity.
- The only valid comparison operators for an inequality join condition are the following:
<
<=
>
>=
The following join index definition is valid because the WHERE clause inequality join condition on o_totalprice and c_acctbal it specifies is ANDed with the previous equality join condition on o_custkey and c_custkey:
CREATE JOIN INDEX ord_cust_idx AS SELECT c_name, o_orderkey, o_orderdate FROM orders, customer WHERE o_custkey = c_custkey AND o_totalprice > c_acctbal;
The following join index definition is not valid because the WHERE clause inequality join condition has no logical AND relationship with an equality join condition:
CREATE JOIN INDEX ord_cust_idx AS SELECT c_name, o_orderkey, o_orderdate FROM orders, customer WHERE o_totalprice > c_acctbal;
Restrictions on Join Index ORDER BY Clauses
- Sort order is restricted to ASC.
DESC is not valid.
- Aggregate columns and expressions are not permitted as a column_name or column_position specification.
- Supported data types for column_name are restricted to the following:
- DATE
- BYTEINT
- DECIMAL
The DECIMAL type is valid only for columns of four or fewer bytes.
- INTEGER
- SMALLINT
Restrictions on Load Utilities
You cannot use FastLoad, MultiLoad, or the Teradata Parallel Transporter operators LOAD and UPDATE to load data into base tables that have join indexes because those indexes are not maintained during the execution of these utilities. If you attempt to load data into base tables with join indexes using these utilities, the load operation aborts and the system returns an error message to the requestor.
To load data into a join-indexed base table, you must drop all defined join indexes on that base table before you can run FastLoad, MultiLoad, or the Teradata Parallel Transporter operators LOAD and UPDATE.
Be aware that you cannot drop a join index to enable MultiLoad or FastLoad batch loads until any requests that access that index complete processing. Requests place locks on any join indexes they access, and the system defers processing of any DROP JOIN INDEX requests against locked indexes until their locks have all been released.
Load utilities like Teradata Parallel Data Pump, BTEQ, and the Teradata Parallel Transporter operators INSERT and STREAM, which perform standard SQL row inserts and updates, are supported for join-indexed tables, as are multirow INSERT SELECT and MERGE requests.