Example: SLPPI Join Index and Performance - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

The following example shows how an uncompressed PPI join index can help query performance.

Assume a star schema with the following fact and dimension tables defined in the physical data model:

  • A fact table named sales with columns sale_date , store_id , prod_id , and amount .
  • The following three dimension tables.
    • Calendar , with columns dayofmth , mth , and yr and a primary index defined on yr and mth .
    • Product , with columns prod_id and prod_category .
    • Org , with columns store_id , area , division , and business_unit .

You may want to create an aggregate join index with daily summary data like the following join index named sales_summary to answer a set of ad hoc queries:

     CREATE JOIN INDEX sales_summary AS
       SELECT sale_date, prod_category, division, 
              SUM(amount) AS daily_sales
       FROM calendar AS c, product AS p, org AS o, sales AS s
       WHERE c.dayofmth = s.sale_date
       AND   p.prod_id = s.prod_id
       AND   o.store_id = s.store_id
       AND   s.sale_date BETWEEN DATE '1991-01-01' AND DATE '2006-12-31'
       GROUP BY sale_date, prod_category, division
     PRIMARY INDEX(sale_date, prod_category, division)
     PARTITION BY RANGE_N(sale_date BETWEEN DATE '1991-01-01' 
                                    AND     DATE '2006-12-31' 
                                    EACH INTERVAL '1' MONTH);

A wide range of queries can use this join index, especially when there are foreign key-primary key relationships defined between the fact table and the dimension tables that enable the join index to be used as a broad join index to cover queries over a subset of dimension tables.

When the Optimizer uses this join index to answer the queries that have equality or range constraints on the sale_date column, performance can improve significantly. The system applies optimizations such as static row partition elimination and rowkey-based merge join to a PPI join index as for a PPI base table.

For example, the following query must access only 12 of the 192 total partitions, which saves as much as 93.75 percent on disk reads, and proportional savings on elapsed time compared to an otherwise identical nonpartitioned primary index join index, which requires a full-table scan.

     SELECT prod_category, SUM(amount) AS daily_sales
     FROM calendar c, product p, sales s
     WHERE c.dayofmth = s.sale_date
     AND   p.prod_id = s.prod_id 
     AND   sale_date BETWEEN DATE '2006-01-01' AND DATE '2006-12-31'
     GROUP BY prod_category;

A join index often significantly improves query response times, but consider the overhead of maintaining the index when inserts, deletes, updates, and merges occur to its base table.

An uncompressed PPI join index may see significant improvement in maintenance performance because of PPI-related optimizations. For example, partitioning on the DATE column in the join index also helps insert performance if the transaction data are inserted into the sales table according to a time sequence.

The Teradata Parallel Transporter UPDATE operator, MultiLoad, and FastLoad cannot be used to load rows into base tables with join indexes. Therefore, you must use either of the following workarounds:

  • Use the Teradata Parallel Data Pump utility to load the rows into sales.
  • FastLoad the rows into a staging table and use an INSERT ... SELECT or MERGE request with error logging to load the rows into sales .

Because the inserted rows are clustered in the data blocks corresponding to the appropriate partitions, the number of data blocks the system must read and write is reduced compared to an nonpartitioned primary index join index, where the inserted rows scatter among all the data blocks.

However, maintenance of a PPI join index makes the index less efficient than a nonpartitioned primary index. For example, if the primary index of the join index does not include the partitioning columns, and a DELETE or UPDATE statement specifies a constraint on the primary index, the system must probe all partitions on the AMP to maintain the PPI join index as opposed to the fast primary index access for the join index when defined with an nonpartitioned primary index.