Row partition elimination is a family of methods that limit the number of row partitions that must be scanned to return a query result set for a row-partitioned table or join index. It does this by skipping row partitions that do not contain rows that meet the search conditions of a query. Row partition elimination is an automatic optimization in which the Optimizer determines, based on query conditions and a partitioning expression, that some row partitions for that partitioning expression cannot contain qualifying rows; therefore, those row partitions can be skipped during a file scan. Row partitions that are skipped for a particular query are called eliminated row partitions.
- Static
- Delayed
- Dynamic
Vantage also supports various forms of column partition elimination for column-partitioned tables and join indexes. See Column Partition Elimination for more information about column partition elimination.
Mixing Row Partition Elimination Methods within a Request
Row partition elimination methods can be mixed within the same query and can also be mixed with column partition elimination methods when a table or join index has both column and row partitioning. For example, static row partition elimination can be used for some partitioning levels, while dynamic row partition elimination can be used for other levels, and some levels might not have any row partition elimination. Some individual partition levels might even benefit from a mix of multiple forms of partition elimination.
When there are multiple row partitioning expressions, Vantage combines row partition elimination at each of the levels to further reduce the number of data subsets that need to be scanned. For most applications, the greatest benefit of row partitioning is obtained from partition elimination.
Vantage can eliminate partitions from search consideration at any number of levels or at a combination of levels.
The full cylinder read optimization is not supported when there is row partition elimination. Therefore, there are trade-offs to consider in using partitioning. The Optimizer does use row partition elimination when possible, and does not cost a full-table scan with full cylinder reads against the cost of reading a subset of row partitions using block reads. In most cases, it is reasonable to assume that row partition elimination provides a greater benefit than a full-table scan with full cylinder reads.
Static Row Partition Elimination
When query conditions are such that they allow row partition elimination to be determined by the Optimizer during the early stages of query optimization, the form of row partition elimination used is referred to as static row partition elimination.
Any single-table constraints on partitioning columns can be used for static row partition elimination, including those on the system-derived column PARTITION or any of the members of the system-derived PARTITION#L n column set, where the value of n ranges from 1 through 62, inclusive.
See Static Row Partition Elimination for more information about this method of partition elimination.
Delayed Row Partition Elimination
When query conditions are based on a comparison derived in part from USING request modifier variables or from the result of a built-in function, it is not possible for the Optimizer to reuse a cached query plan as it would otherwise do because a cached plan needs to be general enough to handle changes in search condition values in subsequent executions.
In this case, the Optimizer applies row partition elimination at a later point in the optimization process, at the time it builds the finalized query plan from a cached plan using the values for this specific execution of the plan. This form of partition elimination is referred to as delayed row partition elimination.
See Delayed Row Partition Elimination for more information about this method of row partition elimination.
Dynamic Row Partition Elimination
When query conditions reference values in other tables that allow for row partition elimination to be done as a request is executing, the row partition elimination is referred to as dynamic.
When query conditions reference values in other tables that would allow row partition elimination as the query is executing, row partition elimination is performed dynamically by the AMP database software after a query has already been optimized and while it is executing. This form of partition elimination is referred to as dynamic row partition elimination.
Vantage can use a special form of dynamic row partition elimination known as dynamic row partition elimination with range conditions when it joins a given left table partition with a sequential range of row partitions from the right table when that range of row partitions is derived from a range join condition. This form of dynamic row partition elimination is only supported for single-level partitioned tables that use RANGE_N partitioning.
Dynamic row partition elimination can also be used to simplify and enhance join performance by selecting the least costly method from a set of join methods especially designed to be used with row partition elimination.
See Product Joins With Dynamic Row Partition Elimination for more information about this method of partition elimination.
Character Partition Elimination
Eliminating character partitions is not a separate form of row partition elimination, but it has some unique characteristics.
Character partitioning typically defines a small number of row partitions compared to non-character row partitioning. The effective limit for character partitioning is roughly 2,000 partitions because of a 16,000 character limit on partitioning CHECK constraints (see Teradata Vantage™ - Database Design, B035-1094 or the information about CREATE TABLE in Teradata Vantage™ - SQL Data Definition Language Detailed Topics, B035-1184 for more information about partitioning CHECK constraints and their various forms). As a result, the most effective way to specify character partitioning for very large tables is often as part of a multilevel partitioning expression.
Combining partition elimination on multiple partitioning levels can then reduce the ratio of combined partitions that must be read from the rough limit of 1/2000 for a character-partitioned table up to the maximum number of combined partitions that can be defined.
For example, assuming a 2-byte internal partition number, the following table defines 65,475 partitions, which is close to the maximum number of 65,535 combined partitions that can be defined for 2-byte partitioning. In this case, 27 row partitions are defined at level 1, 485 row partitions at level 2, and 5 row partitions at level 3.
CREATE TABLE markets ( productname VARCHAR(50) NOT CASESPECIFIC, region BYTEINT NOT NULL, activity_date DATE FORMAT 'yyyy-mm-dd' NOT NULL, revenue_code BYTEINT NOT NULL, business_sector BYTEINT NOT NULL, note VARCHAR(256)) PRIMARY INDEX (productname, region) PARTITION BY (RANGE_N(productname BETWEEN 'A','B','C','D','E','F', 'G','H','I','J','K','L', 'M','N','O','P','Q','R', 'S','T','U','V','W','X', 'Y','Z' AND 'ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ', NO RANGE OR UNKNOWN), RANGE_N(activity_date BETWEEN DATE '1998-10-01' AND DATE '2007-12-31' EACH INTERVAL '7' DAY, NO RANGE, UNKNOWN), RANGE_N(revenue_code BETWEEN 1 AND 4 EACH 1, NO RANGE OR UNKNOWN ));
Assuming an even distribution of rows to combined partitions, the following SELECT request accesses only one combined partition, or 1/65,475 of the rows in markets.
SELECT * FROM markets WHERE productname BETWEEN 'a' AND 'az' AND activity_date=DATE '2007-08-15' AND revenue_code=1;
Vantage uses row partition elimination on a character partitioning level together with partition elimination on other partitioning levels to access a smaller subset of the data than can be done using a character-partitioned table alone. This is because the maximum number of row partitions a character-partitioned table can define is limited to roughly 2,000.
Static and Delayed Row Partition Elimination for Row Partitioning Specifying BEGIN and END Period Bound Functions
Only static and delayed row partition elimination are supported for Period expressions that specify BEGIN and END Period bound functions.
- Direct partitioning by BEGIN and END Period bound functions.
- Partitioning by CASE_N functions using BEGIN and END Period bound functions.
- Partitioning by CASE functions using BEGIN and END Period bound functions.
- Partitioning by RANGE_N functions using BEGIN and END Period bound functions.
The database performs single-partition scanning only when the table accessed defines its partitioning based on BEGIN or END Period bound functions and when you specify equality constraints on BEGIN or END Period bound functions in a WHERE clause.
The rules for performing single-partition scans are as follows:
IF a table is partitioned on this Period bound function… | AND the WHERE clause specifies an equality constraint on … | THEN the database does … |
---|---|---|
BEGIN | a BEGIN Period bound function | single-partition access. |
an END Period bound function | not do single-partition access. | |
both a BEGIN Period bound and an END Period bound function | single-partition access. | |
END | a BEGIN Period bound function | not do single-partition access. |
an END Period bound function | single-partition access. | |
both a BEGIN and an END Period bound functions | single-partition access. | |
BEGIN and END | a BEGIN Period bound function | not do single-partition access. |
an END Period bound function | not do single-partition access. | |
both a BEGIN and an END Period bound functions | single-partition access. |
Assume the following table definitions for the example set that illustrates these rules:
CREATE SET TABLE t11 ( a INTEGER, b PERIOD(DATE)) PRIMARY INDEX(a) PARTITION BY CAST((BEGIN(b)) AS INTEGER; CREATE SET TABLE t12 ( a INTEGER, b PERIOD(DATE)) PRIMARY INDEX(a) PARTITION BY CAST((END(b)) AS INTEGER);
In the examples that follow, the relevant EXPLAIN text phrases are highlighted in boldface type.
EXPLAIN SELECT * FROM t11 WHERE END(b)=DATE ‘2010-02-03’;The following shows a portion of the EXPLAIN output:
... 3) We do an all-AMPs RETRIEVE step from df2.t11 by way of an all-rows scan with a condition of ("(END(df2.t11.b ))= DATE '2010-02-03'") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 1 row (56 bytes). The estimated time for this step is 0.03 seconds.
The following SELECT request performs a single-partition scan because the WHERE clause predicate is defined on an END Period bound function and the partitioning expression is defined using an equality condition on an END Period bound function.
EXPLAIN SELECT * FROM t12 WHERE END(b)=DATE ‘2011-02-03’;
Result:
... 3) We do an all-AMPs RETRIEVE step from a single partition of df2.t12 with a condition of ("df2.t12.b = PERIOD(DATE '2011-02-03' - INTERVAL '1' DAY, DATE '2011-02-03')") with a residual condition of ( "(END(df2.t12.b ))= DATE '2011-02-03'") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 1 row (56 bytes). The estimated time for this step is 0.03 seconds. ...
The following SELECT request performs a single-partition scan because the WHERE clause predicate is defined on an END Period bound function and the partitioning expression is defined using an equality condition on an END Period bound function:
EXPLAIN SELECT * FROM t12 WHERE a=1 AND END(b)=DATE ‘2011-02-03’;
...
1) First, we do a single-AMP RETRIEVE step from a single partition of
df2.t12 by way of the primary index "df2.t12.a = 1,
df2.t12.b = PERIOD (DATE '2011-02-03'- INTERVAL '1' DAY, DATE
'2011-02-03')" with a residual condition of ("((END(df2.t12.b ))=
DATE '2011-02-03') AND (df2.t12.a = 1)") into Spool 1 (one-amp),
which is built locally on that AMP. The size of Spool 1 is estimated
with low confidence to be 1 row (56 bytes). The estimated time for
this step is 0.02 seconds.
When a table is row partitioned directly using BEGIN and END Period bound functions resulting into a numeric value, and whenever the conditions in a request specify partitioning columns of that table that are specified in the partitioning expression, then the database performs row partition elimination using the same rules for BEGIN and END Period bound functions as apply for other DateTime expressions.
Assume the following table definition:
CREATE SET TABLE t1 ( a INTEGER b PERIOD(DATE) PRIMARY INDEX(a) PARTITION BY CAST((END(b) AS INTEGER);
The following SELECT request scans 55,333 row partitions of t1:
EXPLAIN SELECT * FROM t1 WHERE b>PERIOD(DATE ‘1901-02-02);
...
3) We do an all-AMPs RETRIEVE step from 55333 partitions of
df2.T1 with a condition of ("df2.T1.b > (PERIOD (DATE '1901-02-02',
DATE '1901-02-02'))") into Spool 1 (group_amps), which is built
locally on the AMPs. The size of Spool 1 is estimated with no
confidence to be 1 row (56 bytes). The estimated time for this
step is 0.03 seconds.
...
When a table is row partitioned using a CASE_N or CASE function that specifies BEGIN or END Period bound functions, the existing rules for row partition elimination with DateTime expressions also apply.
Assume the following table definition:
CREATE TABLE orders ( o_orderkey INTEGER NOT NULL, o_custkey INTEGER, o_orderperiod PERIOD (DATE) NOT NULL, o_orderpriority CHARACTER (21), o_comment VARCHAR (79)) PRIMARY INDEX(o_orderkey) PARTITION BY CASE_N(END(o_orderperiod) <= DATE '2010-03-31', /*Q1*/ END(o_orderperiod) <= DATE '2010-06-30', /*Q2*/ END(o_orderperiod) <= DATE '2010-09-30', /*Q3*/ END(o_orderperiod) <= DATE '2010-12-31' /*Q4*/ );
The following SELECT request scans 2 row partitions of orders and displays the details of the orders placed for the first 2 quarters:
EXPLAIN SELECT * FROM orders WHERE END(o_orderperiod) > DATE '2010-06-30';
...
3) We do an all-AMPs RETRIEVE step from 2 partitions of
df2.orders with a condition of (
"(END (df2.orders.O_orderperiod))> DATE '2010-06-30'")
into Spool 1 (group_amps), which is built locally on the AMPs.
The size of Spool 1 is estimated with no confidence to be 1 row (
167 bytes). The estimated time for this step is 0.03 seconds.
...
When a table is row-partitioned using a RANGE_N function that specifies BEGIN or END Period bound functions, and whenever the conditions in a request specify a column that is specified in the partitioning expression, the database performs row partition elimination using the same rules that apply for DateTime expressions.
Assume you define a sales history table with the following definition:
CREATE TABLE sales_history ( product_code CHARACTER(8), quantity_sold INTEGER, transaction_period PERIOD(DATE)) PRIMARY INDEX (product_code) PARTITION BY RANGE_N(END (transaction_period) BETWEEN DATE '2006-01-01' AND DATE '2015-12-31' EACH INTERVAL'1' YEAR);
The following SELECT request scans 5 row partitions of sales_history before 2010:
EXPLAIN SELECT * FROM sales_history WHERE transaction_period < PERIOD(DATE '2010-01-01');
...
3) We do an all-AMPs RETRIEVE step from 5 partitions of
df2.sales_history with a condition of (
"df2.sales_history.transaction_period < (PERIOD (DATE '2010-01-01',
DATE '2010-01-01'))") into Spool 1 (group_amps), which is built
locally on the AMPs. The size of Spool 1 is estimated with no
confidence to be 1 row(64 bytes). The estimated time for this step
is 0.03 seconds.
...
When the RANGE_N partitioning expression specifies BEGIN or END bound functions and the query conditions contain the same partitioning expression, then the Optimizer uses row partition elimination.
The following SELECT request scans 4 row partitions of the sales_history table to display all the sales history before 2010:
EXPLAIN SELECT * FROM sales_history WHERE END (transaction_period) < DATE '2010-01-01';
...
3) We do an all-AMPs RETRIEVE step from 4 partitions of
df2.sales_history with a condition of (
"(END(df2.sales_history.transaction_period ))< DATE
'2010-01-01'") into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with no confidence
to be 1 row (64 bytes). The estimated time for this step is 0.03
seconds.
...