You can create a row-partitioned primary index for a noncompressed join index, with some restrictions, such as the join index cannot include the ORDER BY option. A row-partitioned PI join index can improve query performance by leveraging join index and row-partitioned PI benefits. The following two scenarios indicate some possibilities in which an noncompressed join index with a row-partitioned PI might be beneficial.
Assume a star schema with the following tables in the physical data model:
|Dimensional Model Table Type||Corresponding Physical Table and Column Sets|
|PRIMARY INDEX(yr, mth)|
You might want to create a single-level row-partitioned aggregate join index with daily summary data to answer some ad hoc queries for this database schema such as the following:
CREATE JOIN INDEX sales_summary AS SELECT sale_date, prod_category, division, SUM(amount) AS daily_sales FROM calendar AS c, product AS p, org AS o, sales AS s WHERE c.dayofmth = s.sale_date AND p.prod_id = s.prod_id AND o.store_id = s.store_id GROUP BY sale_date, prod_category, division PRIMARY INDEX(sale_date, prod_category, division) PARTITION BY RANGE_N(sale_date BETWEEN DATE '1990-01-01' AND DATE '2005-12-31' EACH INTERVAL '1' MONTH);
A wide range of queries can make use of this join index, especially when there exists a foreign key-to-primary key relationship between the fact table and the dimension tables that enables the join index to be used broadly to cover queries on a subset of dimension tables. See Query Coverage by Join Indexes, Restriction on Coverage by Join Indexes When a Join Index Definition References More Tables Than a Query, and Rules for Whether Join Indexes With Extra Tables Cover Queries. For information about broad join indexes, see Teradata Vantage™ - Database Design, B035-1094 and Teradata Vantage™ SQL Request and Transaction Processing, B035-1142.
A significant performance gain is achieved with this join index when it is used to answer the queries that have equality or range constraints on the sale_date column.
Optimizations for row-partitioned base tables, such as row partition elimination, are applied to a join index in the same way as they are applied to a base table.
For example, the execution of the following query only needs to access 12 out of 192 partitions, saving up to 93.75 percent on disk reads, with proportional elapsed time reductions compared with a corresponding nonpartitioned join index, which must perform a full-table scan for the same query:
SELECT prod_category, SUM(amount) AS daily_sales FROM calendar AS c, product AS p, sales AS s WHERE c.dayofmth = s.sale_date AND p.prod_id = s.prod_id AND sale_date BETWEEN DATE '2005-01-01' AND DATE '2005-12-31' GROUP BY prod_category;
An EXPLAIN of the SELECT statement includes SUM step to aggregate from 12 partitions of SALES_SUMMARY.
With the same schema used in Scenario A, you create a single-level row-partitioned PI single-table join index on the fact table that includes the foreign key of one or more dimension tables so it can be used to join to the corresponding dimension table set and roll up to higher aggregate levels. The following is an example of a single-table join index:
CREATE JOIN INDEX sji AS SELECT sale_date, prod_id, SUM(amount) AS sales_amount FROM Sales GROUP BY sale_date, prod_id PRIMARY INDEX(sale_date, prod_id) PARTITION BY RANGE_N(sale_date BETWEEN DATE '1990-01-01' AND DATE '2005-12-31' EACH INTERVAL '1' MONTH);
For the total sales amount at the month level, the Optimizer can join this join index to the calendar table and roll up:
SELECT yr, mth, SUM(amount) FROM sales, Calendar WHERE sale_date=dayofmth AND sale_date BETWEEN DATE '2005-01-01' AND DATE '2005-12-31' GROUP BY yr, mth;
An EXPLAIN of the SELECT statement includes a RETRIEVE step from 12 partitions of the join index, sji. Row partition elimination is applied to sji, reducing the number of row partitions accessed to 12 out of 192 row partitions, making the join much faster than it would be with a nonpartitioned single-table join index, which requires a full-table scan. Teradata Database can apply other row-partition-related join optimizations, such as dynamic row partition elimination and rowkey-based merge join, to row-partitioned join indexes just as they are applied to row-partitioned base tables.
While a join index can greatly improve the query response time, you should consider the overhead of maintaining it when inserts, deletions, and updates are made to the base table on which the index is defined. A row-partitioned join index might experience significant improvement in its maintenance performance due to row-partition-related optimizations. For example, for the join indexes defined in scenarios A and B, Teradata Database can apply row partition elimination when there is an equality or range constraint on the row partitioning column in the data maintenance statement.
DELETE sales WHERE sale_date BETWEEN DATE '1999-01-01' AND DATE '1999-12-31';
- DELETE from 12 partitions of SALES_SUMMARY with a condition of ("(SALES_SUMMARY.sale_date ≥ DATE '1999-01-01') AND (SALES_SUMMARY.sale_date ≤ DATE '1999-12-31')")
- DELETE from 12 partitions of SJI with a condition of ("(SJI.sale_date ≥ DATE '1999-01-01') AND (SJI.sale_date ≤ DATE '1999-12-31')")
Partitioning on the DATE column in the join index helps insert performance if the transaction data is inserted into sales according to a time sequence. Note that MultiLoad and FastLoad are not supported for base tables with join indexes, so other batch loading methods, such as Teradata Parallel Data Pump or FastLoad into a staging table followed by either an INSERT … SELECT or MERGE batch request with error logging must be used to load data into sales.
Because the inserted rows are clustered in the data blocks corresponding to the appropriate partitions, the number of data blocks the system must read and write is reduced compared with the nonpartitioned join index case where the inserted rows scatter among all the data blocks.
You cannot drop a join index to enable MultiLoad or FastLoad batch loads until any requests that access that index complete processing. Requests place READ locks on any join indexes they access, and Teradata Database defers processing of any DROP JOIN INDEX requests against locked indexes until their READ locks have all been released.
The maintenance of a row-partitioned join index may be less efficient than an otherwise identical nonpartitioned join index. For example, if a row-partitioned join index has a primary index that does not include the partitioning column set, and a DELETE or UPDATE statement specifies a constraint on the primary index, the maintenance of the row-partitioned PI join index must probe all partitions within the AMP as opposed to the fast primary index access for a nonpartitioned join index.