Sparse Join Indexes | Database Design | Teradata Vantage - Sparse Join Indexes - Advanced SQL Engine

Sparse Join Indexes | Database Design | Teradata Vantage - Sparse Join Indexes - Advanced SQL Engine - Teradata Database

Database Design

Product

Advanced SQL Engine

Teradata Database

Release Number

17.10

Published

July 2021

Language

English (United States)

Last Update

2021-07-27

dita:mapPath

kko1591750222108.ditamap

dita:ditavalPath

kko1591750222108.ditaval

dita:id

B035-1094

lifecycle

Product Category

Teradata Vantage™

Any join index, whether simple or aggregate, multitable or single-table, can be sparse. A sparse join index specifies a constant expression in the WHERE clause of its definition to narrowly filter its row population. For example, the following DDL creates an aggregate join index containing only the sales records from 2000:

CREATE JOIN INDEX j1 AS
SELECT store_id, dept_id, SUM(sales_dollars) AS sum_sd
FROM sales
WHERE EXTRACT(year FROM sales_date) = 2000
GROUP BY store_id, dept_id;

This method limits the rows included in the join index to a subset of the rows in the table based on an SQL request result.

When base tables are large, you can use this feature to reduce the content of the join index to only the portion of the table that is frequently used if the typical query only references a portion of the rows.

It is important to collect statistics on the sparse-defining column of the join index, or the Optimizer may not use the join index.

Sparse Join Indexes and Query Optimization

When a user query is entered, the Optimizer determines if accessing j1 gives the correct answer and is more efficient than accessing the base tables. This sparse join index would be selected by the Optimizer only for queries that restricted themselves to data from the year 2000. For example, a query might require data from June of 2000. Because the join index j1 contains all of the data for year 2000, it might be used to satisfy the following query:

SELECT store_id, dept_id, SUM(sales_dollars) AS sum_sd
FROM sales
WHERE sales_date >= 6/1/2000
AND   sales_date < 7/1/2000
GROUP BY storeid, dept_id;

As another example, the following DDL creates a join index containing only those customers living in four western states of the US:

CREATE JOIN INDEX westcust AS
SELECT cust.id, cust.address, donations.amount
FROM cust, donations
WHERE cust.cust_id = donations.cust_id
AND   donations.d_date > '2003/01/01'
 AND   cust.state IN ('CA', 'OR', 'WA','NV');

The Optimizer can use this join index for the following query written to find all donors from California with donations of 1,000 USD or more made since January 1, 2003.

SELECT custid, cust.address
FROM cust, donations
WHERE donations.amount > 1000
AND   cust.state = 'CA'
AND   donations.d_date > '2003/01/01';

Because the customers and donations considered by the query are part of the subset included in the join index, the Optimizer uses it to answer the query. This can save a great deal of time especially in situations where the base tables are very large, but queries typically look only at subsets of the tables.

Maintenance of sparse join indexes can be much faster than maintenance for other join indexes because the sparse types have fewer values, and so are generally updated much less frequently.

See Sparse Join Indexes and Tactical Queries for information about design issues specific to sparse join index support for tactical queries.

Performance Impact of Sparse Join Indexes

A sparse index can focus on the portions of the tables that are most frequently used to do the following things.

Make the costs for maintaining an index proportional to the percent of rows actually referenced in the index.
Make request access faster where the resulting sparse index is smaller than a standard join index.
Reduce the storage requirements for a join index where possible.