Join Indexes | Database Design | Teradata Vantage - 17.10 - Join Indexes - Advanced SQL Engine - Teradata Database

Teradata Vantage™ - Database Design

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Release Date
July 2021
Content Type
User Guide
Publication ID
B035-1094-171K
Language
English (United States)

Join indexes are designed to permit queries (join queries in the case of multitable join indexes) to be resolved by accessing the index instead of accessing, and possibly joining, their underlying base tables.

Multitable join indexes are useful for queries where the index contains all the columns referenced by one or more joins, thereby allowing the index to cover that part of the query, making it possible to retrieve the requested data from the index rather than accessing its underlying base tables. For obvious reasons, an index with this property is often referred to as a covering index (some vendors refer to this as index-only access). Note that a join index that is defined with an expression in its select list provides less coverage than a join index that is defined using a base column (see Restrictions on Partial Covering by Join Indexes).

Even if a join index does not completely cover a query, the Optimizer can use it to join to its underlying base tables in a way that provides better query optimization than scanning the base tables for all the columns specified in the request (see Partial Query Coverage).

From the point of view of a database designer, the multitable join index permits great adaptability because it provides the flexibility of normalization while at the same time offering the opportunity to create alternative, denormalized virtual data models, providing what might be called materialized views of the database.

Depending on how the index is defined, single-table join indexes can also be useful for queries where the index contains only some of the columns referenced in the statement. This situation is referred to as a partial covering of the query. Multitable join indexes can also be used to partially cover a query for one or more of the tables defined in the join index.

Join indexes are also useful for queries that aggregate columns from tables with large cardinalities. These indexes play the role of prejoin and summary tables without denormalizing the logical design of the database and without incurring the update anomalies and performance problems presented by denormalized tables. While it is true that denormalization often enhances the performance of a particular query or family of queries, it can and often does make other queries perform more poorly.

Unlike traditional indexes, join indexes are not required to store pointers to their associated base table rows. Instead, they are generally used as a fast path final access point that eliminates the need to access and join the base tables they represent. They substitute for, rather than point to, base table rows. The only exception to this is the case where an index partially covers a query. If the index is defined using either the ROWID keyword, the UPI of its base table, or a USI on the base table as one of its columns, then it can be used to join with the base table to cover the query. A join index defined in this way is sometimes called a global index or global join index. Note that you can only specify ROWID in the outermost SELECT of the CREATE JOIN INDEX statement. See Teradata Vantage™ - SQL Data Definition Language Detailed Topics, B035-1184.

In this case, you might create a join index to contain only the join column and the ROWID or the UPI column set of the table. The process is as follows.

  1. The join index is joined with the other table and any selection conditions that can be evaluated during the join to eliminate disqualified rows are applied.
  2. The result of the join is stored in a temporary table and redistributed based on the ROWID, UPI, or USI of the base table to perform a join with the base table.

This join can happen at different points in the query plan, depending on estimated costs. For example, the Optimizer might determine that it is cheaper to perform a join with another table involved in the query before joining to the base table.

Depending on the workloads they are designed to support, you can create join indexes that have either partitioned or nonpartitioned primary indexes or are column-partitioned. You can also create column-partitioned join indexes that have no primary index. See Teradata Vantage™ - SQL Data Definition Language Detailed Topics, B035-1184 for the usage rules for column-partitioned join indexes.

Rules for Using the ROWID Keyword in a Join Index Definition

  • The ROWID keyword can only be used in a join index definition.
  • You can optionally specify ROWID for a base table in the select list of a join index definition. For a column-partitioned join index, the ROWID for the base table in the select list of the join index definition is required.

    If you reference multiple tables in the join index definition, then you must fully qualify each ROWID specification.

  • You can reference an alias name for ROWID, or the keyword ROWID itself if no correlation name has been specified for it, in the primary index definition or in a secondary index defined for the join index in its index clause.

    This does not mean that you can reference a ROWID or its alias in a secondary index defined separately with a CREATE INDEX statement (see Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144) after the join index has been created.

  • An alias is required for ROWID if the join index is column partitioned.
  • If you reference a ROWID alias in the select list of a join index definition, then you can also reference that alias in a CREATE INDEX statement that creates a secondary index on the join index.
  • Aliases are required to resolve any column name or ROWID ambiguities in the select list of a join index definition.

    An example is the situation where you specify ROWID for more than one base table in the index definition.

Also see the description of the CREATE JOIN INDEX statement in Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144.

Rules for Using the System-Derived PARTITION Column in a Hash or Join Index Definition

You cannot specify the system-derived PARTITION column in any hash or join index definition.

You can specify a user-named column named "partition" in an index definition.

Rules for Using Join Indexes on Load-Isolated Tables

When you create a join index, if any of the base tables are load isolated, then the JI is marked as a table with load isolation. This means that the load-isolated join index is 8 bytes wider because each row contains a RowLoadID. Modification operations on a join index are driven by modifications to the base table and the JI definition.

A load operation on the base table implicitly starts a load operation on the JI. A load commit on the load-isolated base table commits the load on the underlying JI. A compressed join index is not supported on a load-isolated table.

Default Column Multivalue Compression for Join Index Columns When the Referenced Base Table Column Is Compressed

When you create a join index, Vantage automatically transfers any column multivalue compression defined on the base table, with a few exceptions, to the join index definition. In contrast, hash indexes do not inherit the multivalue compression defined for the columns of their parent base table.

The following rules and restrictions apply to automatically transferring the column compression characteristics of a base table to its underlying join index columns. All of these rules and restrictions must be met for base table column multivalue compression to transfer to the join index definition:
  • Base table column multivalue compression transfers to a join index definition even if there is an alias name specified for the columns in the join index definition.
  • Base table column compression transfers to a multivalue join index definition only up to the point that the maximum table header length of the join index is exceeded.

    The CREATE JOIN INDEX request does not abort at that point, but the transfer of column multivalue compression from the base table to the join index definition stops.

  • Base table column multivalue compression does not transfer to a join index definition if the column is a component of the primary index for the join index.
  • Base table column multivalue compression does not transfer to a join index definition for any of the columns that are components of a partitioned primary index, a partitioning expression for a PPI join index, or a partitioning expression for a column-partitioned join index.
  • Base table column multivalue compression does not transfer to a join index column if the column is specified as the argument for any of the following functions.
    • COUNT
    • EXTRACT
    • MIN
    • MAX
    • SUM
  • Base table column multivalue compression does not transfer to a column in a compressed join index that has indexes defined on its column_1 and column_2 sets.
  • Base table column multivalue compression does not transfer to a join index definition if the column is a component of an ORDER BY clause in the join index definition.

Compression of Join Indexes at the Block Level

Because join indexes are primary tables, they can be compressed at the block level. See Compressing Data Loaded into Empty Subtables Set to AUTOTEMP, and Compression Types Supported by Vantage for more information about data block compression.

Summary of Join Index Functions

A join index always has at least one of the following functions.
  • Replicates all, or a vertical subset, of a single base table and partitions its rows using a different primary index (or a different column partitioning if the base table is a NoPI column-partitioned table) than the base table, such as a foreign key column to facilitate joins of very large tables by hashing them to the same AMP.
    A partitioning expression for a row-partitioned join index cannot contain a row-level security constraint column.
  • Joins multiple tables (optionally with aggregation) in a prejoin table.
  • Aggregates one or more columns of a single table as a summary table.

Join indexes are updated automatically, so the only administrative task a DBA must perform is to keep the statistics on multitable join index columns and their indexes current. For non-sparse single-table join indexes, the best policy is to use base table statistics rather than to collect statistics directly on the columns of the index.

The recommended practice for recollecting statistics is to set appropriate thresholds for recollection using the THRESHOLD options of the COLLECT STATISTICS statement. For details, see Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144.

You cannot collect statistics on complex expressions specified in a base table definition. However, if you frequently submit queries that specify complex base table predicate expressions, you can create a single-table join index or hash index that specifies those frequently used predicate expressions in its select list or column list, respectively, and then collect statistics on the expression defined as a simple column in your index.

There are several specific cases where join index statistics can provide more accurate cardinality estimates than are otherwise available for base table predicates written using complex date expressions.
  • The case where an EXTRACT expression specified in a query predicate can be matched with a join index predicate.
  • The case where an EXTRACT/DATE expression specified in a query predicate condition can be mapped to an expression specified in the select list of a join index.

    The Optimizer uses expression mapping when it detects an identical query expression or a matching query expression subset within a non-matching predicate. When this occurs, the Optimizer maps the predicate to the identical column of the join index, which enables it to use the statistics collected on the join index column to estimate the cardinality of the expression result.

When you create a single-table join index that specifies a complex expression, Vantage transforms the expression into a simple join index column. This enables the Optimizer to map the statistics collected on those complex expressions to the base table to facilitate single-table cardinality estimates or to match the predicates (see Teradata Vantage™ - SQL Request and Transaction Processing, B035-1142 for details).

Note that the derived statistics framework supports bidirectional inheritance of statistics between a non-sparse single-table join index and the base table it supports (see Teradata Vantage™ - SQL Request and Transaction Processing, B035-1142 for details), so the entity on which statistics are collected is no longer as important as it once was.

Similarities of Join Indexes to User Data Tables

In many respects, hash and join indexes are identical to base user data tables.

For example, you can do the following things with a join index.
  • Create a unique or nonunique primary index, either a PPI or an nonpartitioned primary index, on its columns.

    UPIs are supported only for uncompressed single-table join indexes without an ORDER BY clause. For more information, see Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144.

  • Create a column-partitioned, single-table uncompressed, non-aggregate join index. Such a join index can be sparse join index.
  • Create nonunique secondary indexes on its columns.
  • Perform any of the following statements against it.
    • COLLECT STATISTICS (Optimizer Form)
    • DROP JOIN INDEX
    • DROP STATISTICS (Optimizer Form)
    • HELP JOIN INDEX
    • SHOW JOIN INDEX
  • Specify UDT and BEGIN and END bound functions on Period or derived Period columns in its definition.
  • Specify row-level security constraint columns in its definition.
    You can do this only if both of the following criteria are true:
    • The index references a maximum of one row-level security-protected base table.
    • All of the row-level security constraint columns defined in the base table are included in the join index definition.
      You cannot specify row-level security constraint columns in a partitioning expression for a row-partitioned join index.
  • Specify any valid expressions in the select list and WHERE clause when the expressions reference at least one column.
    The following expression types are not valid in a join index definition:
    • OLAP expressions
    • UDF expressions
    • Built-in functions that are explicitly not valid such as DEFAULT and PARTITION.

      Note that a join index defined with an expression in its select list provides less coverage than a join index that is defined using a base column (see Restrictions on Partial Covering by Join Indexes).

Unlike base tables, you cannot do the following things with join indexes:
  • Create a join index on a join index.
  • Query or update join index rows.

    For example, if ordCustIdx is a join index, then the following query is not legal:

       SELECT o_status, o_date, o_comment
           FROM ordCustIdx;
  • Create a USI on its columns.
  • Define multivalue or algorithmic compression on its columns.

    If multivalue or algorithmic compression are defined on any columns of its parent base table set, however, a join index does inherit that compression under most circumstances (see Default Column Multivalue Compression for Join Index Columns When the Referenced Base Table Column Is Compressed).

Join Index Applications

Join indexes are useful for queries where the index table contains all the columns referenced by one or more joins, thereby allowing the Optimizer to cover all or part of the query by planning to access the index rather than its underlying base tables. An index that supplies all the columns requested by a query is said to cover that query and, for obvious reasons, is often referred to as a covering index. Some vendors refer to this as index-only access. A join index can be particularly useful for queries that access both nonpartitioned and column-partitioned tables (see NoPI Tables, Column-Partitioned NoPI Tables, and Column-Partitioned NoPI Join Indexes). If either an nonpartitioned NoPI table or a column-partitioned NoPI table has no secondary indexes, a covering join index is the only way to access its rows without using a full-table scan. Be aware that join indexes can slow the loading of rows into a table using Teradata Parallel Data Load array INSERT operations.

The Optimizer can also use join indexes that only cover a query partially if the index is defined properly (see Partial Query Coverage). Note that query covering is not restricted to join indexes: other indexes can also cover queries either in whole or in part.

Join indexes are also useful for queries that aggregate columns from tables with large cardinalities. For these applications, join indexes play the role of prejoin and summary tables without denormalizing the logical design of the database and without incurring the update anomalies and ad hoc query performance issues frequently presented by denormalized tables.

You can create join indexes that limit the number of rows in the index to only those that are accessed when a frequently run query references a small, well-known subset of the rows of a large base table. You create this type of join index by specifying a constant expression as the RHS of the WHERE clause, which narrowly filters the rows included in the join index. This is known as a sparse join index.

You can also create join indexes that have a partitioned primary index (you can only define a PPI for a join index if the index is not row-compressed) or that are column-partitioned join indexes. Note that you cannot create a nonpartitioned NoPI join index. PPI join indexes are useful for covering range queries (see Designing for Range Queries: Guidelines for Choosing Between a PPI and a Value-Ordered NUSI), providing excellent performance by means of row partition elimination (see Teradata Vantage™ - SQL Request and Transaction Processing, B035-1142).

See Sparse Join Indexes and Tactical Queries for specific design issues related to join index support for tactical queries.

Partial Query Coverage

Partial query coverage allows join indexes whose columns do not match an entire query to be used to cover a subset of it. For example, one or two tables specified by the query might be covered by the join index, but the entire request is not. In some situations, there might be a number of commonly performed queries that join several tables where each of the queries joins two tables, say t1 and t2, on the same columns. For this situation, you can create a join index to join t1 and t2, and the Optimizer can use that join index for any queries that need to perform that join.

Partial query coverage also allows join indexes that contain only a subset of the columns of a base table referenced in the query to cover the query if that join index can be joined to the base table to retrieve additional referenced columns (this form of partial coverage is also used to implement hash indexes: see Hash Indexes).

For example, suppose there is a large table that needs to be joined frequently with another table on a column that is not the distributing column of the table. You can define a join index that redistributes the base table by the join column. However, because of the large number of rows and columns that need to be projected into the join index, the extra disk storage required does not allow the creation of such a join index.

You can also define a join index in such a way that its partial coverage of a query can be extended further by joining with a parent base table to pick up any columns requested by the query but not referenced in the join index definition.

Such a join index, sometimes called a global index or global join index, is defined with one of the following elements, which the Optimizer can use to join it with a parent base table to extend its coverage:

  • One of the columns in the index definition is the keyword ROWID. You can only specify ROWID in the outermost SELECT of the CREATE JOIN INDEX statement. See Teradata Vantage™ - SQL Data Definition Language Detailed Topics, B035-1184.
  • The column set defining the UPI of the underlying base table.
  • The underlying base table.

See Restrictions on Partial Covering by Join Indexes for an example of a global join index.

Designing for Range Queries: Guidelines for Choosing Between a PPI and a Value-Ordered NUSI

Row-partitioned primary indexes and value-ordered NUSIs are both designed to optimize the performance of range queries. Because you cannot define both a PPI and a value-ordered primary index on the same join index (nor can you define a PPI for a row-compressed join index), you must determine which is more likely to enhance the performance of a given query workload for those situations that exclude using a join index with a value-ordered primary index.

In general, a value-ordered primary index is the preferred choice if it meets the restriction of a 4-byte maximum column size. Note that this cannot be used for an MLPPI because MLPPIs do not support value-ordered NUSIs by definition.

You might want to consider creating a multilevel partitioning on the base table as an alternative if the usage would be roughly equivalent to a value-ordered NUSI on a join index plus a non-value-ordered NUSI.

In nearly all cases, a join index with a value-ordered primary index is the preferred choice over a value-ordered NUSI.

You should consider the following guidelines when determining whether to design a join index for a particular workload using row partitioning on the join index or using an nonpartitioned primary index and adding a value-ordered NUSI to create a value-ordered access path to the rows:
  • It is better to use row partitioning on the join index than to use a value-ordered NUSI and an nonpartitioned primary index on the join index.
  • If row compression is defined on the join index, then you cannot define partitioning for it.

    For this scenario, a value-ordered primary index or value-ordered NUSI are your only join index design choice possibilities for optimizing range queries.

  • Each time an underlying base table for a join index is updated, the join index also must be updated.

    If there is a value-ordered NUSI defined on a join index, then it, too, must be updated each time the base table (and join index) rows are updated.

    You can avoid this additional maintenance when you define the join index with row partitioning. Row partition elimination makes updating a join index row even faster than the equivalent update operation against an nonpartitioned primary index join index. "Update" is used in a generic sense to include the delete and update operations performed by the DELETE, MERGE, and UPDATE SQL statements, but excluding insert operations performed by the SQL INSERT and MERGE statements, where row partition elimination is not a factor.

    Row partitioning can also improve insert operations if the inserted rows are clustered in the data blocks corresponding to the partitions. In this case, the number of data blocks read and written is reduced compared with the nonpartitioned primary index join index case where the inserted rows are scattered among all the data blocks. Because of the way that the rows of a column-partitioned join index are distributed to the AMPs, you should not expect to see the positive effects of data block clustering for singleton INSERT requests. However, you should see very positive effects for large INSERT ... SELECT loads into column-partitioned join indexes when the base table is loaded with rows using an INSERT ... SELECT request.

  • Row-partitioned join indexes offer the benefit of providing direct access to join index rows, while a value-ordered NUSI does not.

    Using a NUSI to access rows is always an indirect operation, touching the NUSI subtable before being able to go back to the join index to access a row set.

    Besides offering a direct path for row access, row partitioning provides a means for attaining better join and aggregation strategies on the primary index of the join index.

  • If you specify the primary index column set in the query, row partitioning and join indexes offer the additional benefit of enhanced direct join index row access using row partition elimination.

    The comparative row access times ultimately depend on the selectivity of a particular value-ordered NUSI, the join index row size, and whether a value-ordered NUSI covers a given query or not.

  • If a join index partitioning column has more than 65,535 unique values, and the query workload you are designing the index for probes for a specific value, a value-ordered NUSI is likely to be more selective than a join index partitioned on that column.

    Note that because NUSI access is indirect, the system might read entire data blocks to retrieve one or a few rows using a NUSI, while row-partitioned index access is direct, with like rows being clustered together, so row-partitioned join index read operations are usually far more efficient than value-ordered NUSI read operations.

Collecting Statistics on a Join Index

Issues concerning collecting statistics on the indexes of a join index are documented in Teradata Vantage™ - SQL Data Definition Language Detailed Topics, B035-1184. Also see Teradata Vantage™ - SQL Request and Transaction Processing, B035-1142 for information about using single-table join index statistics to make cardinality estimates that cannot otherwise be made with the same level of confidence.

Fallback With Join Indexes

You can define fallback for join indexes. The criteria for deciding whether to define a join index with fallback are similar to those used for deciding whether to define fallback on base tables.

If you do not define fallback for a join index and an AMP is down, then the following additional criteria become critical:
  • The join index cannot be used by the Optimizer to solve any queries that it covers.
  • The base tables on which the join index is defined cannot be updated.
You cannot use the NO FALLBACK option and the NO FALLBACK default on platforms optimized for fallback.

Limits for Hash and Join Indexes

  • Tables can have up to 32 secondary, hash, and join indexes.

    These 32 indexes can be any combination of secondary, hash, and join indexes, including the system-defined secondary indexes used to implement PRIMARY KEY and UNIQUE constraints.

    Each multicolumn NUSI that specifies an ORDER BY clause counts as two consecutive indexes in this calculation.

  • Index columns cannot have XML, BLOB, CLOB, BLOB-based UDT, CLOB-based UDT, XML-based UDT, JSON, ARRAY, or VARRAY. Hash indexed columns cannot have Period data types.
  • Join index columns can have Period data types and can include expressions composed of system and user-defined functions and methods, including the use of the BEGIN, END, and P_INTERSECT built-in functions on a Period data type column to compose an expression in the projection list and a single-table condition in its WHERE or ON clauses.
  • Both hash and join indexes can be created on a row-level security-protected table only if all of the following criteria are met.
    • The hash or join index references a maximum of one row-level security-protected table.
    • All of the row-level security constraint columns in the indexed table are included in the hash or join index definition.

Further Information

Consult the following documents for more detailed information on creating and using join indexes to enhance the performance of your database applications:
  • Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144
  • Teradata Vantage™ - Temporal Table Support, B035-1182
  • Teradata Vantage™ - Advanced SQL Engine Security Administration, B035-1100