16.20 - Guidelines for Row-Partitioned PI Join Indexes - Teradata Database - Teradata Vantage NewSQL Engine

Teradata Vantage™ SQL Data Definition Language Detailed Topics

Product
Teradata Database
Teradata Vantage NewSQL Engine
Release Number
16.20
Release Date
March 2019
Content Type
Programming Reference
Publication ID
B035-1184-162K
Language
English (United States)

You can create a row-partitioned primary index for a noncompressed join index, with some restrictions, such as the join index cannot include the ORDER BY option. A row-partitioned PI join index can improve query performance by leveraging join index and row-partitioned PI benefits. The following two scenarios indicate some possibilities in which an noncompressed join index with a row-partitioned PI might be beneficial.

Scenario A

Assume a star schema with the following tables in the physical data model:

Dimensional Model Table Type Corresponding Physical Table and Column Sets
Fact sales
  • sale_date
  • store_id
  • prod_id
  • amount
Dimension
  • calendar

    dayofmth

    mth

  • yr
PRIMARY INDEX(yr, mth)
  • product

    prod_id

  • prod_category
  • org

    store_id

    area

  • division
  • business_unit

You might want to create a single-level row-partitioned aggregate join index with daily summary data to answer some ad hoc queries for this database schema such as the following:

     CREATE JOIN INDEX sales_summary AS
       SELECT sale_date, prod_category, division,
              SUM(amount) AS daily_sales
       FROM calendar AS c, product AS p, org AS o, sales AS s
       WHERE c.dayofmth = s.sale_date
       AND   p.prod_id = s.prod_id
       AND   o.store_id = s.store_id
       GROUP BY sale_date, prod_category, division
     PRIMARY INDEX(sale_date, prod_category, division)
     PARTITION BY RANGE_N(sale_date BETWEEN DATE '1990-01-01'
                                    AND     DATE '2005-12-31'
                                    EACH INTERVAL '1' MONTH);

A wide range of queries can make use of this join index, especially when there exists a foreign key-to-primary key relationship between the fact table and the dimension tables that enables the join index to be used broadly to cover queries on a subset of dimension tables. See Query Coverage by Join Indexes, Restriction on Coverage by Join Indexes When a Join Index Definition References More Tables Than a Query, and Rules for Whether Join Indexes With Extra Tables Cover Queries. For information about broad join indexes, see Teradata Vantage™ - Database Design, B035-1094 and Teradata Vantage™ SQL Request and Transaction Processing, B035-1142.

A significant performance gain is achieved with this join index when it is used to answer the queries that have equality or range constraints on the sale_date column.

Optimizations for row-partitioned base tables, such as row partition elimination, are applied to a join index in the same way as they are applied to a base table.

For example, the execution of the following query only needs to access 12 out of 192 partitions, saving up to 93.75 percent on disk reads, with proportional elapsed time reductions compared with a corresponding nonpartitioned join index, which must perform a full-table scan for the same query:

     SELECT prod_category, SUM(amount) AS daily_sales
     FROM calendar AS c, product AS p, sales AS s
     WHERE c.dayofmth = s.sale_date
     AND   p.prod_id = s.prod_id
     AND   sale_date BETWEEN DATE '2005-01-01' AND DATE '2005-12-31'
     GROUP BY prod_category;

An EXPLAIN of the SELECT statement includes SUM step to aggregate from 12 partitions of SALES_SUMMARY.

Scenario B

With the same schema used in Scenario A, you create a single-level row-partitioned PI single-table join index on the fact table that includes the foreign key of one or more dimension tables so it can be used to join to the corresponding dimension table set and roll up to higher aggregate levels. The following is an example of a single-table join index:

     CREATE JOIN INDEX sji AS
       SELECT sale_date, prod_id, SUM(amount) AS sales_amount
       FROM Sales
       GROUP BY sale_date, prod_id
     PRIMARY INDEX(sale_date, prod_id)
     PARTITION BY RANGE_N(sale_date BETWEEN DATE '1990-01-01'
                                    AND     DATE '2005-12-31'
                                    EACH INTERVAL '1' MONTH);

For the total sales amount at the month level, the Optimizer can join this join index to the calendar table and roll up:

     SELECT yr, mth, SUM(amount)
     FROM sales, Calendar
     WHERE sale_date=dayofmth
     AND sale_date BETWEEN DATE '2005-01-01' AND DATE '2005-12-31'
     GROUP BY yr, mth;

An EXPLAIN of the SELECT statement includes a RETRIEVE step from 12 partitions of the join index, sji. Row partition elimination is applied to sji, reducing the number of row partitions accessed to 12 out of 192 row partitions, making the join much faster than it would be with a nonpartitioned single-table join index, which requires a full-table scan. Teradata Database can apply other row-partition-related join optimizations, such as dynamic row partition elimination and rowkey-based merge join, to row-partitioned join indexes just as they are applied to row-partitioned base tables.

While a join index can greatly improve the query response time, you should consider the overhead of maintaining it when inserts, deletions, and updates are made to the base table on which the index is defined. A row-partitioned join index might experience significant improvement in its maintenance performance due to row-partition-related optimizations. For example, for the join indexes defined in scenarios A and B, Teradata Database can apply row partition elimination when there is an equality or range constraint on the row partitioning column in the data maintenance statement.

For example:

     DELETE sales
             WHERE sale_date BETWEEN DATE '1999-01-01'
                             AND     DATE '1999-12-31';
An EXPLAIN of the DELETE statement includes steps for:
  • DELETE from 12 partitions of SALES_SUMMARY with a condition of ("(SALES_SUMMARY.sale_date ≥ DATE '1999-01-01') AND (SALES_SUMMARY.sale_date ≤ DATE '1999-12-31')")
  • DELETE from 12 partitions of SJI with a condition of ("(SJI.sale_date ≥ DATE '1999-01-01') AND (SJI.sale_date ≤ DATE '1999-12-31')")

Partitioning on the DATE column in the join index helps insert performance if the transaction data is inserted into sales according to a time sequence. Note that MultiLoad and FastLoad are not supported for base tables with join indexes, so other batch loading methods, such as Teradata Parallel Data Pump or FastLoad into a staging table followed by either an INSERT … SELECT or MERGE batch request with error logging must be used to load data into sales.

Because the inserted rows are clustered in the data blocks corresponding to the appropriate partitions, the number of data blocks the system must read and write is reduced compared with the nonpartitioned join index case where the inserted rows scatter among all the data blocks.

You cannot drop a join index to enable MultiLoad or FastLoad batch loads until any requests that access that index complete processing. Requests place READ locks on any join indexes they access, and Teradata Database defers processing of any DROP JOIN INDEX requests against locked indexes until their READ locks have all been released.

The maintenance of a row-partitioned join index may be less efficient than an otherwise identical nonpartitioned join index. For example, if a row-partitioned join index has a primary index that does not include the partitioning column set, and a DELETE or UPDATE statement specifies a constraint on the primary index, the maintenance of the row-partitioned PI join index must probe all partitions within the AMP as opposed to the fast primary index access for a nonpartitioned join index.