16.10 - Sparse Join Indexes - Teradata Database

Teradata Database Design

Product
Teradata Database
Release Number
16.10
Release Date
June 2017
Content Type
User Guide
Publication ID
B035-1094-161K
Language
English (United States)

Any join index, whether simple or aggregate, multitable or single-table, can be sparse. A sparse join index specifies a constant expression in the WHERE clause of its definition to narrowly filter its row population. For example, the following DDL creates an aggregate join index containing only the sales records from 2000:

     CREATE JOIN INDEX j1 AS
     SELECT store_id, dept_id, SUM(sales_dollars) AS sum_sd
     FROM sales
     WHERE EXTRACT(year FROM sales_date) = 2000
     GROUP BY store_id, dept_id;

This method limits the rows included in the join index to a subset of the rows in the table based on an SQL request result.

When base tables are large, you can use this feature to reduce the content of the join index to only the portion of the table that is frequently used if the typical query only references a portion of the rows.

It is important to collect statistics on the sparse-defining column of the join index, or the Optimizer may not use the join index.

Sparse Join Indexes and Query Optimization

When a user query is entered, the Optimizer determines if accessing j1 gives the correct answer and is more efficient than accessing the base tables. This sparse join index would be selected by the Optimizer only for queries that restricted themselves to data from the year 2000. For example, a query might require data from June of 2000. Because the join index j1 contains all of the data for year 2000, it might be used to satisfy the following query:

     SELECT store_id, dept_id, SUM(sales_dollars) AS sum_sd
     FROM sales
     WHERE sales_date >= 6/1/2000
     AND   sales_date < 7/1/2000
     GROUP BY storeid, dept_id;

As another example, the following DDL creates a join index containing only those customers living in four western states of the US:

     CREATE JOIN INDEX westcust AS
     SELECT cust.id, cust.address, donations.amount
     FROM cust, donations
     WHERE cust.cust_id = donations.cust_id
     AND   donations.d_date > '2003/01/01'
     AND   cust.state IN ('CA', 'OR', 'WA','NV');

The Optimizer can use this join index for the following query written to find all donors from California with donations of 1,000 USD or more made since January 1, 2003.

     SELECT custid, cust.address
     FROM cust, donations
     WHERE donations.amount > 1000
     AND   cust.state = 'CA'
     AND   donations.d_date > '2003/01/01';

Because the customers and donations considered by the query are part of the subset included in the join index, the Optimizer uses it to answer the query. This can save a great deal of time especially in situations where the base tables are very large, but queries typically look only at subsets of the tables.

Maintenance of sparse join indexes can be much faster than maintenance for other join indexes because the sparse types have fewer values, and so are generally updated much less frequently.

See Sparse Join Indexes and Tactical Queries for information about design issues specific to sparse join index support for tactical queries.

Performance Impact of Sparse Join Indexes

A sparse index can focus on the portions of the tables that are most frequently used to do the following things.

  • Make the costs for maintaining an index proportional to the percent of rows actually referenced in the index.
  • Make request access faster where the resulting sparse index is smaller than a standard join index.
  • Reduce the storage requirements for a join index where possible.