Sparse Join Indexes and Tactical Queries | Database Design | Teradata Vantage - Sparse Join Indexes and Tactical Queries - Advanced SQL Engine - Teradata Database

Database Design

Product
Advanced SQL Engine
Teradata Database
Release Number
17.05
17.00
Published
June 2020
Language
English (United States)
Last Update
2021-01-22
dita:mapPath
qby1588121512748.ditamap
dita:ditavalPath
lze1555437562152.ditaval
dita:id
B035-1094
lifecycle
previous
Product Category
Teradata Vantageā„¢

Join indexes are particularly useful for tactical query applications. Join and Hash Indexes discusses join indexes more fully, but one interesting join index approach that is worth highlighting here for its relevance to tactical queries is the sparse join index.

Sparse join indexes include only a subset of a base table rows in their definition, using a WHERE clause to determine which base tables rows are retained and which are not (see Sparse Join Indexes). Sparse join indexes are quicker to build, faster to scan, and take up less disk space, depending on the degree of sparseness. Like all join indexes, sparse join indexes support single-AMP access when based on their primary index definition.

Sparse Join Index Defined on One Row Partition

A partitioned primary index can be defined on a join index as long as the index is not row compressed. You can also define a sparse join index on only one row partition of a row-partitioned base table by expressing sparseness-defining criteria that match the borders of the row partition.

Building sparse join indexes on row partitions of row-partitioned tables that support single-AMP access is frequently useful for situations in which tactical queries always have both the sparse-defining column (in this example, a date range that matches one row partition) and the primary index value (in this example, the store identifier) of the sparse join index. Of course, the tactical queries also need to be accessing a similar subset of columns from the base table: the ones carried in the sparse join index.

A different sparse join index could be built independently on several different row partitions of the same row-partitioned table. As long as each query specifies a constraint that matches the sparse-defining columns for one of those sparse join indexes, the Optimizer can choose the appropriate one to use for the query.

The appropriate primary index for a sparse join index depends on what values the tactical queries specify when they are submitted.

Considerations For Using Sparse Join Indexes With Dense NUPIs

Selecting a primary index for a sparse join index that has thousands of rows per value, with each AMP controlling some percentage of these values, provides several benefits and carries few of the negatives associated with a high number of duplicate primary index values.

For example:
  • During join index creation there is no duplicate row checking as there is with a base table, so one of the principal reasons to avoid such high numbers of duplicates on a primary index does not apply to the case of creating a join index.
  • The join itself can be more efficient with a higher numbers of NUPI duplicates because when so many rows carry the same NUPI row-hash value, the physical I/O involved in storing them can be less.
  • While balanced processing is always important when selecting a primary index for a base table, the dense NUPI approach is appropriate for join indexes when it enables fast query execution and replaces an all-AMP alternative that would process only a few rows from each AMP.

    On the other hand, it is not desirable to overload one AMP unduly, whether the access is single- or all-AMP. If an inordinate number of data blocks would have to be processed by one AMP using the dense NUPI approach, then parallelizing the work across all AMPs by selecting an alternative primary index is probably a better choice for enhancing performance.

  • Designing a sparse join index to ensure that the number of distinct values in its index primary index is greater than the number of AMPs in the system is a good strategy to protect against too many queries being concentrated on too few AMPs. However, if the queries are very short and are infrequent, that concern is less important.