This topic describes various tips that you can use to optimize the functionality of your join indexes.
When You Should Consider Defining a Join Index
Join indexes are not suited for all applications and situations. The usefulness of a join index, like that of any other index, depends on the type of work it is designed to perform. Always prototype any join index and evaluate its usefulness to the applications it is designed to support before adding it to your production environment. The overhead of updating join index tables can outweigh their benefit in some situations.
The following situations all make a join index a likely performance enhancer:
- Frequent joins of large tables with other large or moderately-sized tables that result in a significant number of the rows from both tables being joined.
- Frequent joins of tables of high degree (having many columns) for which the same relatively small set of columns is repeatedly requested.
- An alternate partitioning sequence for a vertical subset of data in one of the base tables (a so-called single-table join index) would remove the necessity of redistributing rows for a frequently made join.
- The overhead in time and storage capacity for the creation and maintenance of a join index does not outweigh its retrieval benefits.
- The performance of frequent range queries requiring joins of large tables with other large or moderately-sized tables that result in a significant number of the rows from both tables being joined.
- A row-partitioned join index can enhance the performance of queries if you specify an equality or range constraint on the partitioning column set. For example, a single-table row-partitioned join index can take advantage of row partition elimination to improve both the performance of a query retrieving rows from itself.
Be aware that you cannot define row partitioning for a row-compressed join index.
- If a frequently run query specifies a complex expression in its predicate, consider creating a single-table join index or a hash index on the table that includes that expression in the select list or column list, respectively, of its definition. Although you cannot collect statistics on complex base table expressions, creating a single-table join using the expression transforms it into a simple column, and you can then collect statistics on that column. The Optimizer can then use those statistics to estimate the single-table cardinality of evaluations of the expression in a query predicate that specifies the expression using a base table column. See SQL Request and Transaction Processing for more information.
- Most queries against a column-partitioned table or join index are expected to be very selective on a variable subset of columns, and project a variable subset of the columns where the subset of accessed columns is less than 10% of the column partitions for any particular query.
Sometimes you just need to experiment.
For example, application of a row-partitioned join index might be for queries that involve row-partitioned base tables. However, if the base table is not a row-partitioned table, but is designed to handle efficient joins on the primary index, it is also conceivable that a row-partitioned join index might be defined to provide an alternative organization of the data for optimal access based on row partitions. This is only valid if the join index is not row-compressed. Partitioning is not valid for row-compressed join indexes.
See SQL Data Definition Language for further information about PPI join indexes.
Using Outer Joins to Define Simple Join Indexes
Because join indexes generated from inner joins do not preserve unmatched rows, you should consider using outer joins to define simple join indexes, noting the following restrictions.
- Inequality conditions are not supported under any circumstances for ON clauses in join index definitions.
- Outer joins are not supported under any circumstances for aggregate join indexes.
Defining join indexes on outer joins enables them to satisfy queries with fewer join conditions than those used to generate the index. See Defining a Simple Join Index on a Binary Join Result and Using Outer Joins to Define Join Indexes for a demonstration of this property.
Also see Restriction on Coverage by Join Indexes When a Join Index Definition References More Tables Than a Query and related examples.
Collecting Statistics on the Columns and Indexes for Join Indexes
Keeping fresh statistics on join indexes is just as critical for join indexes as it is for base tables for optimizing the query plans generated by the Optimizer.
This is particularly true if you have created a single-table join index whose definition includes a complex expression in its select list and whose statistics are to be used to make more accurate single-table cardinality estimates when a query that specifies a matching expression in its predicate is run against the table on which the join index is defined. The Optimizer can also use expression mapping when it detects an expression in a query predicate within a non-matching predicate. In this case, the Optimizer maps to the join index column that is defined using the matched expression. See SQL Request and Transaction Processing for details.
You cannot collect statistics on the PARTITION column of a PPI join index.
Basing a Join Index on Foreign Key-Primary Key Equality Conditions
To avoid storing redundant data, base join index definitions on foreign key=primary key predicates. Add either the primary key or the foreign key to the index definition, but not both, because the Optimizer has the intelligence to derive either from the other.
Adding Join Constraints That Facilitate Joining to Other Tables
Include join constraint columns that support joining to tables not defined in the join index. It makes no difference whether you assign these columns to the fixed part of the join index or to the repeating part.
For example, you might carry the join columns c_nationkey and l_partkey in the join index to facilitate joining the nation and parts tables to the join index.
When the join index is defined using an outer join, use the outer table join column rather than the inner table join column to enhance performance.
Specifying a Row-Partitioned or a Value-Ordered Sort Key for Range Queries
Sorting a join index by data values, as opposed to row hash values, is especially useful for range queries that involve the sort key. This means that defining a join index designed to support range queries using row-partitioning can significantly facilitate the performance of range queries.
Similarly, in some circumstances, you might obtain better performance by designing the join index without row partitioning, but using a value-ordered NUSI.
Note the following limitations:
- Value ordering is limited to a single numeric or DATE column with a maximum length of four bytes.
- The column you specify in the ORDER BY clause must be drawn from the set of fixed columns. You cannot order by a column from the set of repeating columns.
In the following example, the join index rows are hash-distributed across the AMPs on the primary index, c_name. Once assigned to an AMP, the join index rows are value-ordered using c_custkey as the sort key.
CREATE JOIN INDEX ord_cust_idx AS SELECT (o_custkey, c_name), (o_status, o_date, o_comment) FROM Orders LEFT JOIN Customer ON o_custkey = c_custkey ORDER BY o_custkey PRIMARY INDEX (c_name);