15.00 - Effects of Dynamic Row Partition Elimination on Joins - Teradata Database

Teradata Database SQL Data Definition Language Detailed Topics

Teradata Database
Programming Reference

Effects of Dynamic Row Partition Elimination on Joins

The following guidelines apply to maximizing dynamic row partition elimination for your join queries (dynamic row partition elimination is not supported with merge join methods for multilevel partitioned tables. See Database Design and SQL Request and Transaction Processing) against a partitioned table.

  • Dynamic row partition elimination improves performance when a partitioned table and another table are joined by an equality condition on the partitioning column set of the partitioned table.
  • Note that this does not work with an inclusion product join, which the Optimizer specifies to work with IN (subquery) join conditions.

  • Only those partitions that are required to answer a request are involved in the join.
  • When performing join steps, the system determines the qualifying partitions of the partitioned table dynamically based on the values of rows from the other table.

    Instead of a product join against all the rows in the partitioned table, the Optimizer assigns a product join only for a row of the other table against a single partition instead of the entire table.

    If there are 100 partitions in the partitioned table and only 5 are needed to answer a join request, the other 95 are not joined, and the system realizes a 95% resource saving for that join operation.

  • Always collect statistics on the join columns.
  • The following table definitions, EXPLAIN request, and EXPLAIN report are used to demonstrate how dynamic row partition elimination can facilitate join operations where one of the tables in the join is a PPI table.

         CREATE SET TABLE DB.sales (    -- Defines total day sales
           prodid   INTEGER,            -- for each product
           saledate DATE FORMAT 'YYYY-MM-DD',
           amount   DECIMAL(10,2))
         PRIMARY INDEX (prodid, saledate)
         PARTITION BY RANGE_N(saledate BETWEEN DATE '2004-01-01'
                                       AND     DATE '2004-12-31'
                                       EACH INTERVAL '1' MONTH );
         CREATE SET TABLE DB.fiscal_month (    -- Defines the days in
           yr       SMALLINT NOT NULL,         -- in each fiscal month.
           mth      SMALLINT NOT NULL,         -- A fiscal month may
           dayofmth DATE NOT NULL)             -- partially span up
         PRIMARY INDEX (yr, mth);              -- to 2 calendar months.

    For example, fiscal month 9 in 2004 is from August 30 to September 26, or 28 days.

    Assume a 2 AMP system and 10 product sales per day.

         EXPLAIN SELECT yr, mth, SUM(amount)
                 FROM DB.sales, DB.fiscal_month
                 WHERE saledate = dayofmth 
                 AND   yr = 2004 
                 AND   mth = 9
                 GROUP BY yr, mth;
         1) First, we lock a distinct DB."pseudo table" for read on a 
            RowHash to prevent global deadlock for DB.sales.
         2) Next, we lock DB.sales for read.
         3) We do a single-AMP RETRIEVE step from DB.fiscal_month by 
            way of the primary index “DB.fiscal_month.yr = 2004,
            DB.fiscal_month.mth = 9" with no residual conditions into
            Spool 4 (all_amps), which is duplicated on all AMPs. Then
            we do a SORT to partition by rowkey. The size of Spool 4
            is estimated with high confidence to be 56 rows. The
            estimated time for this step is 0.03 seconds.
         4) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an
            all-rows scan, which is joined to all partitions of DB.sales.
            Spool 4 and DB.sales are joined using a product join, with a
            join condition of (“DB.sales.saledate = dayofmth") enhanced by
            dynamic partition elimination. The result goes into Spool 3
            (all_amps), which is built locally on the AMPs. The size of
            Spool 3 is estimated with no confidence to be 280 rows. The
            estimated time for this step is 0.04 seconds.
         5) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use)
            by way of an all-rows scan, and the grouping identifier in field
            1. Aggregate Intermediate Results are computed globally, then
            placed in Spool 5. The size of Spool 5 is estimated with low
            confidence to be 1 row. The estimated time for this step is
            0.05 seconds.
         6) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way
            of an all-rows scan into Spool 1 (group_amps), which is built
            locally on the AMPs. The size of Spool 1 is estimated with low
            confidence to be 1 row. The estimated time for this step is
            0.04 seconds.
         7) Finally, we send out an END TRANSACTION step to all AMPs
            involved in processing the request.
            -> The contents of Spool 1 are sent back to the user as the result
            of statement 1.

    The relevant EXPLAIN report text is highlighted in bold typeface.