15.10 - Example: An SLPPI Join Index and Performance - Teradata Database

Teradata Database SQL Data Definition Language Syntax and Examples

prodname
Teradata Database
vrm_release
15.10
created_date
December 2015
category
Programming Reference
featnum
B035-1144-151K

The following example shows how an uncompressed PPI join index can help query performance.

Assume a star schema with the following fact and dimension tables defined in the physical data model:

  • A fact table named sales with columns sale_date , store_id , prod_id , and amount .
  • The following three dimension tables.
    • calendar , with columns dayofmth , mth , and yr and a primary index defined on yr and mth .
    • product , with columns prod_id and prod_category .
    • org , with columns store_id , area , division , and business_unit .

You might want to create an aggregate join index with daily summary data like the following join index named sales_summary to answer a set of ad hoc queries:

     CREATE JOIN INDEX sales_summary AS
       SELECT sale_date, prod_category, division, 
              SUM(amount) AS daily_sales
       FROM calendar AS c, product AS p, org AS o, sales AS s
       WHERE c.dayofmth = s.sale_date
       AND   p.prod_id = s.prod_id
       AND   o.store_id = s.store_id
       AND   s.sale_date BETWEEN DATE '1991-01-01' AND DATE '2006-12-31'
       GROUP BY sale_date, prod_category, division
     PRIMARY INDEX(sale_date, prod_category, division)
     PARTITION BY RANGE_N(sale_date BETWEEN DATE '1991-01-01' 
                                    AND     DATE '2006-12-31' 
                                    EACH INTERVAL '1' MONTH);

A wide range of queries can make use of this join index, especially when there are foreign key-primary key relationships defined between the fact table and the dimension tables that enable the join index to be used as a broad join index to cover queries over a subset of dimension tables.

Significant performance gains can be achieved with this join index when the Optimizer uses it to answer the queries that have equality or range constraints on the sale_date column. The system applies optimizations such as static row partition elimination and rowkey-based merge join to a PPI join index in the same way it does to a PPI base table.

For example, the following query needs to access only 12 of the 192 total partitions, which saves as much as 93.75 percent on disk reads, and proportional savings on elapsed time compared with an otherwise identical nonpartitioned primary index join index, which requires a full-table scan.

     SELECT prod_category, SUM(amount) AS daily_sales
     FROM calendar c, product p, sales s
     WHERE c.dayofmth = s.sale_date
     AND   p.prod_id = s.prod_id 
     AND   sale_date BETWEEN DATE '2006-01-01' AND DATE '2006-12-31'
     GROUP BY prod_category;

While a join index often significantly improves query response times, you must also consider the overhead of maintaining the index when inserts, deletes, updates, and merges occur to the base table on which it is defined.

An uncompressed PPI join index might see significant improvement in maintenance performance because of PPI-related optimizations. For example, partitioning on the DATE column in the join index also helps insert performance if the transaction data are inserted into the sales table according to a time sequence.

The Teradata Parallel Transporter UPDATE operator, MultiLoad, and FastLoad cannot be used to load rows into base tables with join indexes. Because of this limitation, you must use either of the following workarounds:

  • Use the Teradata Parallel Data Pump utility to load the rows into sales
  • FastLoad the rows into a staging table and then use either an INSERT … SELECT or MERGE request with error logging to load the rows into sales .

Because the inserted rows are clustered in the data blocks corresponding to the appropriate partitions, the number of data blocks the system must read and write is reduced compared to an nonpartitioned primary index join index, where the inserted rows scatter among all the data blocks.

On the other hand, it is possible that the maintenance of a PPI join index makes the index less efficient than it would be with an nonpartitioned primary index. For example, if the primary index of the join index does not include the partitioning columns, and a DELETE or UPDATE statement specifies a constraint on the primary index, the system must probe all partitions on the AMP to maintain the PPI join index as opposed to the fast primary index access for the join index when defined with an nonpartitioned primary index.