This example shows how a single-table join index can be used as a substitute for a standard join index to minimize update maintenance while at the same time making join processing more high-performing than it would otherwise be.
Table Definitions
Suppose you have the following tables that you query frequently using join expressions, and both are very large.
Table Name | Primary Index | Primary Index Type |
---|---|---|
LineItem | OrderKey | NUPI |
Part | PartKey | UPI |
The table definitions are as follows.
CREATE TABLE LineItem ( l_OrderKey INTEGER NOT NULL, l_PartKey INTEGER NOT NULL, 1_SupplierKey INTEGER, l_LineNumber INTEGER, l_Quantity INTEGER NOT NULL, l_ExtendedPrice DECIMAL(13,2) NOT NULL, l_Discount DECIMAL(13,2), l_Tax DECIMAL(13,2), l_ReturnFlag CHARACTER(1), l_LineStatus CHARACTER(1), l_ShipDate DATE FORMAT ‘yyyy-mm-dd’, l_CommitDate DATE FORMAT ‘yyyy-mm-dd’, l_ReceiptDate DATE FORMAT ‘yyyy-mm-dd’, l_ShipInstruct VARCHAR(25), l_ShipMode VARCHAR(10), l_Comment VARCHAR(44)) PRIMARY INDEX (l_OrderKey); CREATE TABLE part ( p_PartKey INTEGER NOT NULL, p_PartDescription CHARACTER(26), p_SupplierNumber INTEGER) UNIQUE PRIMARY INDEX (p_PartKey);
Example Query Request
A frequently performed query on these tables might be the following:
SELECT l_PartKey, p_PartDescription, l_Quantity, l_SupplierKey FROM LineItem, Part WHERE l_PartKey=p_PartKey;
Decision: Ordinary Join Index Versus Single-Table Join Index
You could create an ordinary join index on the LineItem and Part tables, but there is a high cost to keeping this join index updated because each update requires a costly minijoin operation, so the ordinary join index would have to be updated each time either a line item or a new part was inserted, deleted, or updated in the respective primary base tables.
A better solution might be to create a single-table join index on the columns of LineItem that need to be joined frequently with the Part table and then make the primary index for the join index l_PartKey. Single-table join indexes are not cost-free, but the cost of performing the single row updates for a single-table index is far less expensive than the minijoins required by a multitable join index.
Single-Table Join Index Definition
The definition for the join index might look something like this.
CREATE JOIN INDEX PartKeyLineItem AS SELECT l_PartKey, l_Quantity, l_SupplierKey FROM LineItem PRIMARY INDEX (l_PartKey);
The intent of defining this join index is to permit the Optimizer to select it in place of the base table LineItem in cases like the equality condition l_PartKey = p_PartKey, eliminating the need to redistribute the LineItem table (because its proxy, the join index table PartKeyLineItem, has the same primary index as that of the Part table, so the rows are stored on the same AMP). This avoids the large redistribution of LineItem, but not the join processing.
Not only can the Optimizer use single-table join indexes for rewriting queries, it can also use statistics collected on complex expressions in the index definition to better estimate single-table cardinalities. See Teradata Vantage™ - SQL Request and Transaction Processing, B035-1142 for more information about using hash and single-table join indexes to estimate single-table cardinalities.
General Procedure for Defining a Single-Table Join Index
- Define a column_1_name for each column_name in the primary base table to be included in the single-table join index.
- To enhance join selectivity, define the primary index on a different column set than the primary base table, or define column partitioning.
- If the physical database design warrants, use:
- CREATE INDEX to create one or more NUSIs on the join index
- A WHERE clause to define a sparse join index
- A unique primary index for the join index