Row- and Column-Partitioned Join Indexes | CREATE JOIN INDEX | Teradata Vantage - Guidelines for Row-Partitioned PI Join Indexes - Advanced SQL Engine - Teradata Database

SQL Data Definition Language Detailed Topics

Product
Advanced SQL Engine
Teradata Database
Release Number
17.05
17.00
Published
June 2020
Language
English (United States)
Last Update
2021-01-24
dita:mapPath
jpx1556733107962.ditamap
dita:ditavalPath
lze1555437562152.ditaval
dita:id
B035-1184
lifecycle
previous
Product Category
Teradata Vantage™

You can create a row-partitioned primary index for a noncompressed join index, with some restrictions, such as the join index cannot include the ORDER BY option. A row-partitioned PI join index can improve query performance by leveraging join index and row-partitioned PI benefits. The following two scenarios indicate some possibilities in which an noncompressed join index with a row-partitioned PI might be beneficial.

Scenario A

Assume a star schema with the following tables in the physical data model:

Dimensional Model Table Type Corresponding Physical Table and Column Sets
Fact sales
  • sale_date
  • store_id
  • prod_id
  • amount
Dimension
  • calendar

    dayofmth

    mth

  • yr
PRIMARY INDEX(yr, mth)
  • product

    prod_id

  • prod_category
  • org

    store_id

    area

  • division
  • business_unit

You might want to create a single-level row-partitioned aggregate join index with daily summary data to answer some ad hoc queries for this database schema such as the following:

CREATE JOIN INDEX sales_summary AS
  SELECT sale_date, prod_category, division,
         SUM(amount) AS daily_sales
  FROM calendar AS c, product AS p, org AS o, sales AS s
  WHERE c.dayofmth = s.sale_date
  AND   p.prod_id = s.prod_id
  AND   o.store_id = s.store_id
  GROUP BY sale_date, prod_category, division
PRIMARY INDEX(sale_date, prod_category, division)
PARTITION BY RANGE_N(sale_date BETWEEN DATE '1990-01-01'
                               AND     DATE '2005-12-31'
                               EACH INTERVAL '1' MONTH);

A wide range of queries can make use of this join index, especially when there exists a foreign key-to-primary key relationship between the fact table and the dimension tables that enables the join index to be used broadly to cover queries on a subset of dimension tables. See Query Coverage by Join Indexes, Restriction on Coverage by Join Indexes When a Join Index Definition References More Tables Than a Query, and Rules for Whether Join Indexes With Extra Tables Cover Queries. For information about broad join indexes, see Teradata Vantage™ - Database Design, B035-1094 and Teradata Vantage™ - SQL Request and Transaction Processing, B035-1142.

A significant performance gain is achieved with this join index when it is used to answer the queries that have equality or range constraints on the sale_date column.

Optimizations for row-partitioned base tables, such as row partition elimination, are applied to a join index in the same way as they are applied to a base table.

For example, the execution of the following query only needs to access 12 out of 192 partitions, saving up to 93.75 percent on disk reads, with proportional elapsed time reductions compared with a corresponding nonpartitioned join index, which must perform a full-table scan for the same query:

SELECT prod_category, SUM(amount) AS daily_sales
FROM calendar AS c, product AS p, sales AS s
WHERE c.dayofmth = s.sale_date
AND   p.prod_id = s.prod_id
AND   sale_date BETWEEN DATE '2005-01-01' AND DATE '2005-12-31'
GROUP BY prod_category;

An EXPLAIN of the SELECT statement includes SUM step to aggregate from 12 partitions of SALES_SUMMARY.

Scenario B

With the same schema used in Scenario A, you create a single-level row-partitioned PI single-table join index on the fact table that includes the foreign key of one or more dimension tables so it can be used to join to the corresponding dimension table set and roll up to higher aggregate levels. The following is an example of a single-table join index:

CREATE JOIN INDEX sji AS
  SELECT sale_date, prod_id, SUM(amount) AS sales_amount
  FROM Sales
  GROUP BY sale_date, prod_id
PRIMARY INDEX(sale_date, prod_id)
PARTITION BY RANGE_N(sale_date BETWEEN DATE '1990-01-01'
                               AND     DATE '2005-12-31'
                               EACH INTERVAL '1' MONTH);

For the total sales amount at the month level, the Optimizer can join this join index to the calendar table and roll up:

SELECT yr, mth, SUM(amount)
FROM sales, Calendar
WHERE sale_date=dayofmth
AND sale_date BETWEEN DATE '2005-01-01' AND DATE '2005-12-31'
GROUP BY yr, mth;

An EXPLAIN of the SELECT statement includes a RETRIEVE step from 12 partitions of the join index, sji. Row partition elimination is applied to sji, reducing the number of row partitions accessed to 12 out of 192 row partitions, making the join much faster than it would be with a nonpartitioned single-table join index, which requires a full-table scan. The database can apply other row-partition-related join optimizations, such as dynamic row partition elimination and rowkey-based merge join, to row-partitioned join indexes just as they are applied to row-partitioned base tables.

While a join index can greatly improve the query response time, you should consider the overhead of maintaining it when inserts, deletions, and updates are made to the base table on which the index is defined. A row-partitioned join index might experience significant improvement in its maintenance performance due to row-partition-related optimizations. For example, for the join indexes defined in scenarios A and B, the database can apply row partition elimination when there is an equality or range constraint on the row partitioning column in the data maintenance statement.

For example:

DELETE sales
        WHERE sale_date BETWEEN DATE '1999-01-01'
                        AND     DATE '1999-12-31';
An EXPLAIN of the DELETE statement includes steps for:
  • DELETE from 12 partitions of SALES_SUMMARY with a condition of ("(SALES_SUMMARY.sale_date ≥ DATE '1999-01-01') AND (SALES_SUMMARY.sale_date ≤ DATE '1999-12-31')")
  • DELETE from 12 partitions of SJI with a condition of ("(SJI.sale_date ≥ DATE '1999-01-01') AND (SJI.sale_date ≤ DATE '1999-12-31')")

Partitioning on the DATE column in the join index helps insert performance if the transaction data is inserted into sales according to a time sequence. Note that Teradata MultiLoad and Teradata FastLoad are not supported for base tables with join indexes, so other batch loading methods, such as Teradata Parallel Data Pump or Teradata FastLoad into a staging table followed by either an INSERT … SELECT or MERGE batch request with error logging must be used to load data into sales.

Because the inserted rows are clustered in the data blocks corresponding to the appropriate partitions, the number of data blocks the system must read and write is reduced compared with the nonpartitioned join index case where the inserted rows scatter among all the data blocks.

You cannot drop a join index to enable Teradata MultiLoad or Teradata FastLoad batch loads until any requests that access that index complete processing. Requests place READ locks on any join indexes they access, and the database defers processing of any DROP JOIN INDEX requests against locked indexes until their READ locks have all been released.

The maintenance of a row-partitioned join index may be less efficient than an otherwise identical nonpartitioned join index. For example, if a row-partitioned join index has a primary index that does not include the partitioning column set, and a DELETE or UPDATE statement specifies a constraint on the primary index, the maintenance of the row-partitioned PI join index must probe all partitions within the AMP as opposed to the fast primary index access for a nonpartitioned join index.