16.10 - Row Partition Elimination - Teradata Database

Teradata Database SQL Request and Transaction Processing

prodname
Teradata Database
vrm_release
16.10
created_date
June 2017
category
Programming Reference
User Guide
featnum
B035-1142-161K

About Row Partition Elimination

Row partition elimination is a family of methods that Teradata Database uses to limit the number of row partitions that must be scanned to return a query result set for a row-partitioned table or join index. It does this by skipping row partitions that do not contain rows that meet the search conditions of a query. Row partition elimination is an automatic optimization in which the Optimizer determines, based on query conditions and a partitioning expression, that some row partitions for that partitioning expression cannot contain qualifying rows; therefore, those row partitions can be skipped during a file scan. Row partitions that are skipped for a particular query are called eliminated row partitions.

Teradata Database supports the following types of row partition elimination:

  • Static
  • Delayed
  • Dynamic

Teradata Database also supports various forms of column partition elimination for column-partitioned tables and join indexes. See Column Partition Elimination for more information about column partition elimination.

Mixing Row Partition Elimination Methods Within a Request

Row partition elimination methods can be mixed within the same query and can also be mixed with column partition elimination methods when a table or join index has both column and row partitioning. For example, static row partition elimination can be used for some partitioning levels, while dynamic row partition elimination can be used for other levels, and some levels might not require any row partition elimination. Some individual partition levels might even benefit from a mix of multiple forms of partition elimination.

When there are multiple row partitioning expressions, Teradata Database combines row partition elimination at each of the levels to further reduce the number of data subsets that need to be scanned. For most applications, the greatest benefit of row partitioning is obtained from partition elimination.

Teradata Database can eliminate partitions from search consideration at any number of levels or at a combination of levels.

The full cylinder read optimization is not supported when there is row partition elimination for either single-level or multilevel partitioning. Therefore, there are trade-offs to consider in using partitioning. The Optimizer does use row partition elimination when possible, and does not cost a full-table scan with full cylinder reads against the cost of reading a subset of row partitions using block reads. In most cases, it is reasonable to assume that row partition elimination provides a greater benefit than a full-table scan with full cylinder reads.

Static Row Partition Elimination

When query conditions are such that they allow row partition elimination to be specified by the Optimizer during the early stages of query optimization, the form of row partition elimination used is referred to as static row partition elimination.

Any single-table constraints on partitioning columns can be used for static row partition elimination, including those on the system-derived column PARTITION or any of the members of the system-derived PARTITION#L n column set, where the value of n ranges from 1 through 62, inclusive.

See Static Row Partition Elimination for more information about this method of partition elimination.

Delayed Row Partition Elimination

When query conditions are based on a comparison derived in part from USING request modifier variables or from the result of a built-in function, it is not possible for the Optimizer to reuse a cached query plan as it would otherwise do because a cached plan needs to be general enough to handle changes in search condition values in subsequent executions.

In this case, the Optimizer applies row partition elimination at a later point in the optimization process, at the time it builds the finalized query plan from a cached plan using the values for this specific execution of the plan. This form of partition elimination is referred to as delayed row partition elimination.

See Delayed Row Partition Elimination for more information about this method of row partition elimination.

Dynamic Row Partition Elimination

When query conditions reference values in other tables that allow for row partition elimination to be done as a request is executing, the row partition elimination is referred to as dynamic.

When query conditions reference values in other tables that would allow row partition elimination as the query is executing, row partition elimination is performed dynamically by the AMP database software after a query has already been optimized and while it is executing. This form of partition elimination is referred to as dynamic row partition elimination.

Teradata Database can use a special form of dynamic row partition elimination known as dynamic row partition elimination with range conditions when it joins a given left table partition with a sequential range of row partitions from the right table when that range of row partitions is derived from a range join condition. This form of dynamic row partition elimination is only supported for single-level partitioned tables that use RANGE_N partitioning.

Dynamic row partition elimination can also be used to simplify and enhance join performance by selecting the least costly method from a set of join methods especially designed to be used with row partition elimination.

See Product Joins With Dynamic Row Partition Elimination for more information about this method of partition elimination.

Character Partition Elimination

Eliminating character partitions is not a separate form of row partition elimination, but it has some unique characteristics.

Character partitioning typically defines a small number of row partitions compared to non-character row partitioning. The effective limit for character partitioning is roughly 2,000 partitions because of a 16,000 character limit on partitioning CHECK constraints (see Database Design or “CREATE TABLE” in SQL Data Definition Language Detailed Topics for more information about partitioning CHECK constraints and their various forms). As a result, the most effective way to specify character partitioning for very large tables is often as part of a multilevel partitioning expression.

Combining partition elimination on multiple partitioning levels can then reduce the ratio of combined partitions that must be read from the rough limit of 1/2000 for a character-partitioned table up to the maximum number of combined partitions that can be defined.

For example, assuming a 2-byte internal partition number, the following table defines 65,475 partitions, which is close to the maximum number of 65,535 combined partitions that can be defined for 2-byte partitioning. In this case, 27 row partitions are defined at level 1, 485 row partitions at level 2, and 5 row partitions at level 3.

     CREATE TABLE markets (
       productname     VARCHAR(50) NOT CASESPECIFIC,
       region          BYTEINT NOT NULL,
       activity_date   DATE FORMAT 'yyyy-mm-dd' NOT NULL,
       revenue_code    BYTEINT NOT NULL,
       business_sector BYTEINT NOT NULL,
       note            VARCHAR(256))
     PRIMARY INDEX (productname, region)
     PARTITION BY (RANGE_N(productname BETWEEN  'A','B','C','D','E','F',
                                                'G','H','I','J','K','L',
                                                'M','N','O','P','Q','R',
                                                'S','T','U','V','W','X',
                                                'Y','Z' AND
                   'ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ',
                   NO RANGE OR UNKNOWN),
                   RANGE_N(activity_date BETWEEN DATE '1998-10-01'
                                         AND     DATE '2007-12-31'
                                         EACH INTERVAL '7' DAY,
                   NO RANGE, UNKNOWN),
                   RANGE_N(revenue_code  BETWEEN 1
                                         AND     4
                                         EACH 1,
                   NO RANGE OR UNKNOWN ));

Assuming an even distribution of rows to combined partitions, the following SELECT request accesses only one combined partition, or 1/65,475 of the rows in markets.

     SELECT *
     FROM markets
     WHERE productname BETWEEN 'a' AND 'az'
                       AND     activity_date=DATE '2007-08-15'
                       AND     revenue_code=1;

Teradata Database uses row partition elimination on a character partitioning level together with partition elimination on other partitioning levels to access a smaller subset of the data than can be done using a character-partitioned table alone. This is because the maximum number of row partitions a character-partitioned table can define is limited to roughly 2,000.

Static and Delayed Row Partition Elimination for Row Partitioning That Specify BEGIN and END Period Bound Functions

Only static and delayed row partition elimination are supported for Period expressions that specify BEGIN and END Period bound functions.

The rules for row partition elimination of DateTime expressions apply in the same way for partitioning expressions based on the following functions that incorporate BEGIN and END Period bound functions:

  • Direct partitioning by BEGIN and END Period bound functions.
  • Partitioning by CASE_N functions using BEGIN and END Period bound functions.
  • Partitioning by CASE functions using BEGIN and END Period bound functions.
  • Partitioning by RANGE_N functions using BEGIN and END Period bound functions.

Teradata Database performs single-partition scanning only when the table accessed defines its partitioning based on BEGIN or END Period bound functions and when you specify equality constraints on BEGIN or END Period bound functions in a WHERE clause.

The rules for performing single-partition scans are as follows:

IF a table is partitioned on this Period bound function… AND the WHERE clause specifies an equality constraint on … THEN Teradata Database does …
BEGIN a BEGIN Period bound function single-partition access.
an END Period bound function not do single-partition access.
both a BEGIN Period bound and an END Period bound function single-partition access.
END a BEGIN Period bound function not do single-partition access.
an END Period bound function single-partition access.
both a BEGIN and an END Period bound functions single-partition access.
BEGIN and END a BEGIN Period bound function not do single-partition access.
an END Period bound function not do single-partition access.
both a BEGIN and an END Period bound functions single-partition access.

Assume the following table definitions for the example set that illustrates these rules:

     CREATE SET TABLE t11 (
       a INTEGER,
       b PERIOD(DATE))
     PRIMARY INDEX(a)
     PARTITION BY CAST((BEGIN(b)) AS INTEGER;

     CREATE SET TABLE t12 (
       a INTEGER,
       b PERIOD(DATE))
     PRIMARY INDEX(a)
     PARTITION BY CAST((END(b)) AS INTEGER);

In the examples that follow, the relevant EXPLAIN text phrases are highlighted in boldface type.

The following SELECT request scans all the row partitions because the WHERE clause predicate and the partitioning expression are defined on different bounds:

     EXPLAIN SELECT *
             FROM t11
             WHERE END(b)=DATE ‘2010-02-03’;
Explanation
-----------------------------------------------------------------------
1) First, we lock a distinct df2."pseudo table" for read on a
   RowHash to prevent global deadlock for df2.t11.
2) Next, we lock df2.t11 for read.
3) We do an all-AMPs RETRIEVE step from df2.t11 by way of  an 
    all-rows scan  with a condition of  ("(END(df2.t11.b ))= DATE 
    '2010-02-03'")  into Spool 1 (group_amps), which is built locally
   on the AMPs. The size of Spool 1 is estimated with no confidence
   to be 1 row (56 bytes). The estimated time for this step is 0.03
   seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
   in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
   statement 1. The total estimated time is 0.03 seconds.

The following SELECT request performs a single-partition scan because the WHERE clause predicate is defined on an END Period bound function and the partitioning expression is defined using an equality condition on an END Period bound function.

     EXPLAIN SELECT *
             FROM t12
             WHERE END(b)=DATE ‘2011-02-03’;
Explanation
----------------------------------------------------------------------
1) First, we lock a distinct df2."pseudo table" for read on a
   RowHash to prevent global deadlock for df2.t12.
2) Next, we lock df2.t12 for read.
3) We do an all-AMPs RETRIEVE step from  a single partition  of
   df2.t12 with a condition of ("df2.t12.b = PERIOD(DATE '2011-02-03'
    - INTERVAL '1' DAY, DATE   '2011-02-03')")  with a residual condition 
   of ( "(END(df2.t12.b ))= DATE '2011-02-03'")  into Spool 1
   (group_amps), which is built locally on the AMPs. The size of
   Spool 1 is estimated with no confidence to be 1 row (56 bytes).
   The estimated time for this step is 0.03 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
   in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
   statement 1. The total estimated time is 0.03 seconds.

The following SELECT request performs a single-partition scan because the WHERE clause predicate is defined on an END Period bound function and the partitioning expression is defined using an equality condition on an END Period bound function:

     EXPLAIN SELECT *
             FROM t12
             WHERE a=1
             AND   END(b)=DATE ‘2011-02-03’;
Explanation
-----------------------------------------------------------------------
1) First, we do a single-AMP RETRIEVE step from  a single partition  of
   df2.t12 by way of the primary index "df2.t12.a = 1,
   df2.t12.b = PERIOD (DATE '2011-02-03'- INTERVAL '1' DAY, DATE 
   '2011-02-03')" with a residual condition of ("((END(df2.t12.b ))= 
   DATE '2011-02-03') AND (df2.t12.a = 1)") into Spool 1 (one-amp),
   which is built locally on that AMP. The size of Spool 1 is estimated
   with low confidence to be 1 row (56 bytes). The estimated time for
   this step is 0.02 seconds.
-> The contents of Spool 1 are sent back to the user as the result of
   statement 1. The total estimated time is 0.02 seconds.

When a table is row partitioned directly using BEGIN and END Period bound functions resulting into a numeric value, and whenever the conditions in a request specify partitioning columns of that table that are specified in the partitioning expression, then Teradata Database does row partition elimination using the same rules for BEGIN and END Period bound functions as apply for other DateTime expressions.

Assume the following table definition:

     CREATE SET TABLE t1 (
       a INTEGER
       b PERIOD(DATE)
     PRIMARY INDEX(a)
     PARTITION BY CAST((END(b) AS INTEGER);

The following SELECT request scans 55,333 row partitions of t1:

     EXPLAIN SELECT *
             FROM t1
             WHERE b>PERIOD(DATE ‘1901-02-02);
Explanation
-----------------------------------------------------------------------
1) First, we lock a distinct df2."pseudo table" for read on a
   RowHash to prevent global deadlock for df2.T1.
2) Next, we lock df2.T1 for read.
3) We do an all-AMPs RETRIEVE step from  55333 partitions of 
   df2.T1 with a condition of ("df2.T1.b > (PERIOD (DATE '1901-02-02',
   DATE '1901-02-02'))") into Spool 1 (group_amps), which is built 
   locally on the AMPs. The size of Spool 1 is estimated with no
   confidence to be 1 row (56 bytes). The estimated time for this
   step is 0.03 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
   in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
   statement 1. The total estimated time is 0.03 seconds.

When a table is row partitioned using a CASE_N or CASE function that specifies BEGIN or END Period bound functions, the existing rules for row partition elimination with DateTime expressions also apply.

Assume the following table definition:

     CREATE TABLE orders (
       o_orderkey      INTEGER NOT NULL,
       o_custkey       INTEGER,
       o_orderperiod   PERIOD (DATE) NOT NULL,
       o_orderpriority CHARACTER (21),
       o_comment       VARCHAR (79))
     PRIMARY INDEX(o_orderkey)
     PARTITION BY CASE_N(END(o_orderperiod) <= DATE '2010-03-31', /*Q1*/
                         END(o_orderperiod) <= DATE '2010-06-30', /*Q2*/
                         END(o_orderperiod) <= DATE '2010-09-30', /*Q3*/
                         END(o_orderperiod) <= DATE '2010-12-31'  /*Q4*/
                        );

The following SELECT request scans 2 row partitions of orders and displays the details of the orders placed for the first 2 quarters:

     EXPLAIN SELECT *
             FROM orders
             WHERE END(o_orderperiod) > DATE '2010-06-30';
Explanation
-----------------------------------------------------------------------
1) First, we lock a distinct df2."pseudo table" for read on
   a RowHash to prevent global deadlock for df2.orders.
2) Next, we lock df2.orders for read.
3) We do an all-AMPs RETRIEVE step from  2 partitions of 
   df2.orders with a condition of (
   "(END (df2.orders.O_orderperiod))> DATE '2010-06-30'")
   into Spool 1 (group_amps), which is built locally on the AMPs.
   The size of Spool 1 is estimated with no confidence to be 1 row (
   167 bytes). The estimated time for this step is 0.03 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
   in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
   statement 1. The total estimated time is 0.03 seconds.

When a table is row-partitioned using a RANGE_N function that specifies BEGIN or END Period bound functions, and whenever the conditions in a request specify a column that is specified in the partitioning expression, Teradata Database does row partition elimination using the same rules that apply for DateTime expressions.

Assume you define a sales_history table with the following definition:

     CREATE TABLE sales_history (
       product_code       CHARACTER(8),
       quantity_sold      INTEGER,
       transaction_period PERIOD(DATE))
     PRIMARY INDEX (product_code)
     PARTITION BY RANGE_N(END (transaction_period)
                          BETWEEN DATE '2006-01-01'
                          AND     DATE '2015-12-31'
                          EACH INTERVAL'1' YEAR);

The following SELECT request scans 5 row partitions of sales_history before 2010:

     EXPLAIN SELECT *
     FROM sales_history
     WHERE transaction_period < PERIOD(DATE '2010-01-01');
Explanation
-----------------------------------------------------------------------
1) First, we lock a distinct df2."pseudo table" for read on a
   RowHash to prevent global deadlock for df2.sales_history.
2) Next, we lock df2.sales_history for read.
3) We do an all-AMPs RETRIEVE step from  5 partitions of 
   df2.sales_history with a condition of (
   "df2.sales_history.transaction_period < (PERIOD (DATE '2010-01-01',
   DATE '2010-01-01'))") into Spool 1 (group_amps), which is built 
   locally on the AMPs. The size of Spool 1 is estimated with no
   confidence to be 1 row(64 bytes). The estimated time for this step
   is 0.03 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
   in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
   statement 1. The total estimated time is 0.03 seconds.

When the RANGE_N partitioning expression specifies BEGIN or END bound functions and the query conditions contain the same partitioning expression, then the Optimizer uses row partition elimination.

The following SELECT request scans 4 row partitions of the sales_history table to display all the sales history before 2010:

     EXPLAIN SELECT *
             FROM sales_history
             WHERE END (transaction_period) < DATE '2010-01-01';
Explanation
-----------------------------------------------------------------------
1) First, we lock a distinct df2."pseudo table" for read on a
   RowHash to prevent global deadlock for df2.sales_history.
2) Next, we lock df2.sales_history for read.
3) We do an all-AMPs RETRIEVE step from  4 partitions of 
   df2.sales_history with a condition of (
   "(END(df2.sales_history.transaction_period ))< DATE
   '2010-01-01'") into Spool 1 (group_amps), which is built locally
   on the AMPs. The size of Spool 1 is estimated with no confidence
   to be 1 row (64 bytes). The estimated time for this step is 0.03
   seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
   in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
   statement 1. The total estimated time is 0.03 seconds.