Any join index, whether simple or aggregate, multitable or single-table, can be sparse. A sparse join index specifies a constant expression in the WHERE clause of its definition to narrowly filter its row population. For example, the following DDL creates an aggregate join index containing only the sales records from 2000:
CREATE JOIN INDEX j1 AS SELECT store_id, dept_id, SUM(sales_dollars) AS sum_sd FROM sales WHERE EXTRACT(year FROM sales_date) = 2000 GROUP BY store_id, dept_id;
This method limits the rows included in the join index to a subset of the rows in the table based on an SQL request result.
When base tables are large, you can use this feature to reduce the content of the join index to only the portion of the table that is frequently used if the typical query only references a portion of the rows.
It is important to collect statistics on the sparse-defining column of the join index, or the Optimizer may not use the join index.
Sparse Join Indexes and Query Optimization
When a user query is entered, the Optimizer determines if accessing j1 gives the correct answer and is more efficient than accessing the base tables. This sparse join index would be selected by the Optimizer only for queries that restricted themselves to data from the year 2000. For example, a query might require data from June of 2000. Because the join index j1 contains all of the data for year 2000, it might be used to satisfy the following query:
SELECT store_id, dept_id, SUM(sales_dollars) AS sum_sd FROM sales WHERE sales_date >= 6/1/2000 AND sales_date < 7/1/2000 GROUP BY storeid, dept_id;
As another example, the following DDL creates a join index containing only those customers living in four western states of the US:
CREATE JOIN INDEX westcust AS SELECT cust.id, cust.address, donations.amount FROM cust, donations WHERE cust.cust_id = donations.cust_id AND donations.d_date > '2003/01/01' AND cust.state IN ('CA', 'OR', 'WA','NV');
The Optimizer can use this join index for the following query written to find all donors from California with donations of 1,000 USD or more made since January 1, 2003.
SELECT custid, cust.address FROM cust, donations WHERE donations.amount > 1000 AND cust.state = 'CA' AND donations.d_date > '2003/01/01';
Because the customers and donations considered by the query are part of the subset included in the join index, the Optimizer uses it to answer the query. This can save a great deal of time especially in situations where the base tables are very large, but queries typically look only at subsets of the tables.
Maintenance of sparse join indexes can be much faster than maintenance for other join indexes because the sparse types have fewer values, and so are generally updated much less frequently.
See Sparse Join Indexes and Tactical Queries for information about design issues specific to sparse join index support for tactical queries.
Performance Impact of Sparse Join Indexes
- Make the costs for maintaining an index proportional to the percent of rows actually referenced in the index.
- Make request access faster where the resulting sparse index is smaller than a standard join index.
- Reduce the storage requirements for a join index where possible.