Single-Table Join Indexes | Database Design | Teradata Vantage - Single-Table Join Indexes - Advanced SQL Engine

You can define a simple join index on a single table. A single-table join index is a database object created using the CREATE JOIN INDEX statement, but specifying only one table in its FROM clause. This permits you to hash some or all of the columns of a large replicated base table on a foreign key that hashes rows to the same AMP as another large table. In some situations, this is more high-performing than building a multitable join index on the same columns. In effect, you are redistributing an entire base table or a frequently accessed subset of base table columns using a join index when you do this. The main advantage comes from less under-the-covers update maintenance on the single-table form of the index.

Single-table join indexes are the only type of join index that can be defined with a unique primary index.

The term single-table join index might seem to be a contradiction because there are no joins in a single-table join index. However, the Optimizer can use single-table join indexes to facilitate joins. The single-table join index came about because an observant software architect had the insight that it was possible to use the join index mechanism with a single table to horizontally partition all or a subset of a very large base table as a join index on a different primary index than that used by the original base table in order to hash its rows to the same AMPs as another very large base table that with which it was frequently joined. In this respect, a single-table join index is essentially a hashed NUSI.

Because of the way the rows of a column-partitioned join index are distributed to the AMPs, this advantage does not generalize to them or by the Fast Load utility. However, column-partitioned join indexes are useful as an alternative method to partition a base table in an entirely different way when such a option provides an appropriate choice for the Optimizer to consider for some queries.

This application is analogous to how NUPIs are often used in database design to hash the base table rows of a minor entity to the same AMP as rows from another table they are likely to be joined with in a well known query workload (see Nonunique Primary Indexes), though you cannot explicitly specify a join to a join index in a DML request. Instead, the Optimizer must determine if joining base table rows with join index rows would be less costly than other methods.

Functions of Single-Table Join Indexes

Even though each single-table join index you create partly or entirely replicates its base table, you cannot query or update them directly just as you cannot directly query or update any other join index.

When you have an application for which join queries against a base table would benefit from replicating some or all of its columns in a different table hashed on the join key (usually the primary index of the table to which it is to be joined) rather than the primary index of the original base table, then you should consider creating one or more single-table join indexes on that table.

For example, you might want to create a single-table join index to avoid redistributing a large base table or to avoid the sometimes prohibitive storage requirements of a multitable join index. A single-table join index might be useful for commonly made joins having low predicate selectivity but high join selectivity, for example.

This strategy substitutes the join index for the underlying base table and defines a primary index that ensures that rows containing only the columns to be joined are hashed to the same AMPs, eliminating the need to redistribute rows when the database manager joins the tables.

As another example, suppose you have a primary index defined on a major entity column that joins with many foreign key subentity columns. The cost of the maintenance required to update a multitable join index defined on this table is many times greater than the cost of maintaining the underlying base table.

The Optimizer can use unique single-table join indexes to access base table rows.

When you have a table with a large number of columns that is queried frequently, but only on a small subset of those columns, you can create either a hash index or a single-table join index to effectively partition the table vertically. Partitioning the rows of a table, as Teradata Database does to distribute rows to the AMPs, is often called horizontal partitioning. This is not what a single-table join index or hash index does. Instead, those indexes effectively partition tables on their columns, a method referred to as vertical partitioning. For example, for a table with 1,500 columns, only 25 of which are frequently queried, you could create a hash or single-table join index on those 25 frequently queried columns, which has the same effect as vertically partitioning the base table itself into two sets of columns: one set of 25 frequently queried columns and another set of 1,475 infrequently queried columns. Note that neither horizontal partitioning nor vertical partitioning is related in any way to how Teradata Database partitions the rows of a table having a partitioned primary index on an AMP, and that is why the terms horizontal partitioning and vertical partitioning are generally avoided in this document.

With a hash or single-table join index available that contains all of the frequently queried columns from the base table (and in the case of a join index, either the ROWID key word, the unique primary index of the base table, or a USI from the base table), the Optimizer can use that index to cover queries on that column subset, and then join to the base table to pick up any additional columns from the table that a query might specify in its select list.

You can also use single-table join indexes as a mechanism to collect statistics on complex expressions that are defined in their select list. The Optimizer can then either use mapping to exploit a matched expression that it finds in a non-matching predicate by mapping to the join index column statistics, or it can use matching when it detects identical predicates in both the join index definition and in a query made against the base table on which the join index is defined. For more information, see Teradata Vantage™ - SQL Request and Transaction Processing, B035-1142.

Column-Partitioned Single-Table Join Indexes

You can create column-partitioned single-table join indexes with the following restrictions.

The index cannot compute aggregates.
The index cannot be row-compressed.
The index cannot have value ordering (but they can be row-partitioned).

Column-partitioned join indexes are designed to support requests that very selectively access a variable small subset of the columns and rows, either in predicates or as column projections.

The Optimizer can also use column-partitioned join indexes to support direct access to a column-partitioned table using a RowID join.

See “CREATE JOIN INDEX” in Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144 for more information about column partitioning and single-table join indexes.

Maintenance Costs of Single-Table Join Indexes

For a single-table join index, the maintenance cost is roughly double the cost of maintaining the base table.

When you design a schema, there are often some tables that are queried in such a way that for some frequently run workloads, the table is joined on one column, but for another important query, the table is joined on another column. The usual design solution is to distribute the rows of this table on the column that is most frequently used in a join. If there is more than one column, then a join index might be a good design choice. A join index can be used to redistribute the table on the secondary join attribute so that joins can be done without a redistribution step.

Join indexes can also be used to evaluate parameterized queries. For the Optimizer to use a join index in this situation, the query must also contain a non-parameterized condition in its WHERE clause that the join index covers.

For example, suppose you create the following base table and single-table join index:

     CREATE TABLE tp1 (
      pid     INTEGER,
      name    VARCHAR(32),
      address VARCHAR(32),
      zipcode INTEGER);
     CREATE JOIN INDEX tp1_ji AS
      SELECT pid, name, zipcode
      FROM tp1
      WHERE zipcode >50000
      AND   zipcode < 55000;

Parameterized Queries and Single-Table Join Indexes

The following parameterized query can use this join index because the Optimizer knows that the matching rows are contained in the index because the WHERE clause predicate in the query is a conjunction between the covered term zipcode and the parameterized term :N.

     USING (N VARCHAR(32))
     SELECT pid, name
     FROM tp1
     WHERE zipcode IN (54455, 53066)
     AND   name = :N;

The explanation for this query looks like the following report:

   Explanation
   ---------------------------------------------------------------------------
     1) First, we lock CURT.TP1_JI for read on a
        reserved RowHash to prevent global deadlock.
     2) Next, we do an all-AMPs RETRIEVE step from CURT.TP1_JI by way of an
        all-rows scan with a condition of ("((CURT.TP1_JI.zipcode = 53066 )
        OR (CURT.TP1_JI.zipcode = 54455 )) AND (CURT.TP1_JI.name = :N)")
        into Spool 1 (group_amps), which is built locally on the AMPs.
        The size of Spool 1 is estimated with no confidence to be 1 row (
        64 bytes).  The estimated time for this step is 0.03 seconds.
     3) Finally, we send out an END TRANSACTION step to all AMPs involved
        in processing the request.
     -> The contents of Spool 1 are sent back to the user as the result of
        statement 1. The total estimated time is 0.03 seconds.

Only prototyping can determine which is the better design for a given set of tables, applications, and hardware configuration.

Related Strategies

Other functionally similar strategies for solving this problem can also be used. In general, only prototyping can determine which among the possible choices is best for a particular application environment and hardware configuration.

The following list describes some of the alternative strategies to creating single-table join indexes:

You can create a hash index.
For some applications, a hash index is a better choice than a single-table join index if only because of its simpler syntax; however, it might be unclear what defaults Teradata Database used to create the index. In nearly all cases you can, and should, create single-table join indexes that have identical effects on query workloads as the equivalent hash index. Also, multivalue compression can be carried over to join index but not for a hash index.

See Hash Indexes for more information.
The design technique of assigning a NUPI to a subentity table that hashes related rows to the same AMPs as a related major entity is superficially similar to a single-table join index. The differences are as follows.
- Cardinalities
  The cardinalities of tables for which a single-table join index is defined are typically very similar to the base tables they are designed to be joined with, while those for major entity-subentity joins are typically very different, with the major entity typically having many more rows than the subentity.
  
  The entity PI-subentity NUPI strategy is typically used when the subentity is a relatively small table in terms of its degree as well as its cardinality.
  
  The single-table join index strategy is typically used when only a small subset of the columns from the base table from which the single-table join index is derived are frequently joined with the base table in question.
- Specialization
  When you create a single-table join index, the parent base table from which it is derived might have a different primary index, in which case its rows hash to different AMPs. The single-table join index is a denormalized, specialized database object defined for a specific purpose, while the parent base table is a normalized, more general database object. Both tables in an entity-subentity relationship remain normalized and generalized database objects.
You can create a multitable join index that prejoins the entity attributes most likely to be joined in a query.
Updating a multitable join index can have a varying cost depending on which table in the multitable join is update, the indexes on this join index and base tables, and so on. In some cases, the update can about the same as single-table, sometimes it can be very expensive if it requires an expensive join to be able to do the maintenance.

The upside of a standard multitable join index strategy is that, at least for the queries for which they are designed, Teradata Database does not have to perform any join processing because the required rows are already prejoined. The single-table join index can avoid a costly redistribution of table rows, but join processing is still required to respond to the query.
You can create a denormalized prejoin base table.
Denormalization reduces the generality of the database for ad hoc queries and data mining operations as well as introducing various problematic update anomalies. Nevertheless, a relatively mild degree of denormalization is standard in physically implemented databases, and for some applications might be the only high-performing solution.

See Single-Table Join Index for an example of using a single-table join index.