Predicate simplification is a part of the query rewrite system that transforms a query into an equivalent form, but one that is simpler and more amenable to query optimization. Several of the key categories of predicate simplification are outlined in this section.
Satisfiability and Transitive Closure
- A contradiction.
A simple example is a condition like a=1 AND a=0.
- Inferring new predicates from the existing predicate set by using transitive closure.
For example, consider a request that specifies the predicate a=1 AND a=b. This implies that b=1, which might be useful information for rewriting or otherwise optimizing the request.
Other examples of deriving new conditions by using transitive closure include the following samples:
A=B AND A=C --> B=C A=5 AND A=B --> B=5 A=5 AND A IS NULL --> FALSE A=5 AND A IS NOT NULL --> A=5 X > 1 AND Y > X --> Y >= 3 X IN (1,2,3) AND Y=X --> Y IN (1,2,3)
Transitive closure is also applied in the context of extract predicates. For example, for the set of conditions {o_orderdate=‘1999-05-01’ AND EXTRACT(MONTH FROM o_orderdate)>2}, the system adds EXTRACT(MONTH FROM o_orderdate)=5 based on o_orderdate=‘1999-05-01’, which can then be used to simplify the existing EXTRACT predicate in the query.
For some categories of DML requests specified with BEGIN END constraints on a column with a Period data type, the query rewrite system adds the implied constraint BEGIN(column_1) <END(column_1) to help to identify unsatisfiable conditions. When these unsatisfiable conditions occur, EXPLAIN text for the query identifies them using the EXPLAIN text phrase unsatisfiable (see EXPLAIN Request Modifier Phrase Terminology).
As an example, consider the following table definition:
CREATE TABLE t1 (a INTEGER b PERIOD (DATE));
Suppose that you use the following query:
SELECT * FROM t1 WHERE BEGIN(b) > DATE ‘2010-02-03’ AND END(b) < DATE ‘2010-02-03’;
The query rewrite system adds an implied constraint BEGIN(b) < END(b), and is able to derive unsatisfiability in conjunction with the query predicates.
The query rewrite system can also apply transitive closure across query blocks, meaning transitive closure between outer and inner query blocks. This allows conditions to be pushed into and out of subqueries. The basic approach is to combine the query block conditions before computing the transitive closure. The IN and NOT IN operators are treated as = and ≠ operators, respectively. Derived conditions are added, as appropriate, to each query block.
Consider a simple SQL example. The following SELECT request implies that x<1.
SELECT * FROM t1 WHERE x IN (SELECT y FROM t2 WHERE y<1);
Similarly, the following SELECT request implies that x < 3 and y is in (1,4).
SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE y<3 AND x=y) AND x IN (1,4);
In the current context, the conditions under analysis are referred to as connecting conditions. A connecting condition is one that connects an outer query with a subquery. See Connecting Predicates for further information about connecting conditions.
Applications of Transitive Closure
Transitive Closure (TC) can optimize date ranges and IN clauses. The following request illustrates one of these cases:
SELECT l_shipmode, SUM (CASE WHEN o_orderpriority = '1URGENT' OR o_orderpriority = '2-HIGH' THEN 1 ELSE 0 END) FROM lineitem WHERE l_commitdate < l_receiptdate AND l_shipdate < l_commitdate AND l_receiptdate >= '1994-01-01' AND l_receiptdate < ('1994-06-06') GROUP BY l_shipmode;
The new set of constraints that can be derived is as follows:
(l_shipdate < l_receiptdate AND l_commitdate <= '1994-06-04' AND l_shipdate <= '1994-06-03')
If lineitem or one of its covering indexes is either value-ordered or row-partitioned on l_shipdate, the new constraint l_shipdate<='1994-06-03' enables Vantage to access only a portion of the table instead of doing a full-table scan.
You might notice performance improvements for some queries because of the extra predicates TC adds. The extra predicates can also be seen in the EXPLAIN reports for requests that use them.
SAT-TC and Query Rewrite
One important aspect of using transitive closure for query rewrite is its application across ON and WHERE clauses. For example, suppose you have the following request:
SELECT product_name, sum(amount*quantity) AS qty FROM product LEFT OUTER JOIN sales1 ON product_key=sales_product_key WHERE product_key=10 GROUP BY product_key, product_name ;
For this request, transitive closure adds the inferred predicate sales_product_key=10 to the ON clause. This is particularly effective when the predicate added to the ON clause is a constraint on the primary index of the inner table in a join.
Another important property of transitive closure is its ability to infer new predicates across the ON clauses of consecutive inner joins. For example, consider the following request.
SELECT product_key, product_name, SUM(s1.amount * s1.quantity+s2.amount * s2.quantity) AS total FROM product LEFT OUTER JOIN ((sales1 AS s1 INNER JOIN store ON s1.sales_store_key=store_key) INNER JOIN sales2 AS s2 ON s2.sales_store_key=store_key AND s2.sales_store_key=10) ON product_key=s1.sales_product_key AND product_key=s2.sales_product_key GROUP BY product_key, product_name;
To see this application of transitive closure, consider the consecutive inner joins between s1, s2, and store. The predicates in consecutive inner joins can be treated collectively by transitive closure as if they were specified in a single WHERE clause. In this example transitive closure processes these predicates as if they appeared as the following compound predicate.
WHERE s1.sales_store_key = store_key AND s2.sales_store_key = store_key AND s2.sales_store_key = 10
By grouping the predicates logically like this, transitive closure can derive the new predicates store_key=10 and s1.sales_store_key=10 and then place them in the ON clause of the uppermost inner join in the set of consecutive inner joins. In this example, that is the ON clause joining to s2.
If a condition is false mathematically, it is said to be contradictory or unsatisfiable. In this context, the opposite of contradictory is satisfiable, regardless of the data. An example might be specifying the conditions a=1 and a=2 in the same request. If Query Rewrite discovers such an unsatisfiable condition, it simplifies and optimizes the condition in such a way that all joins and retrievals can be done on a single AMP basis.
One way to take advantage of a contradictory condition is to add CHECK constraints to tables, enabling Query Rewrite to eliminate unnecessary conditions, thus permitting the Optimizer to later construct better execution plans (see Applications of Transitive Closure).
For example, assume that you want to list all orders made in the first three months of the fiscal year. Assuming that you have access only to the ordertbl view that is a UNION ALL of orders1, orders2,…,orders12, and the query looks like this:
SELECT * FROM order_tbl WHERE EXTRACT(MONTH FROM o_orderdate)<= 3;
Without CHECK constraint, the system must access all of the tables orders1, orders2, … orders12 using the constraint EXTRACT(MONTH FROM o_orderdate)<=3, even though it only needs to access orders1, orders2, and orders3 to satisfy the request. The only way Query Rewrite knows to filter out the other nine tables is to add CHECK constraints for every table, and then to determine the contradiction between the CHECK constraints and the query constraint.
For example, if the CHECK constraint on order4 is added, Query Rewrite sees the following compound predicate, which is a contradiction.
EXTRACT(MONTH FROM o_orderdate)<= 3 -- from query AND EXTRACT(MONTH FROM o_orderdate)=4 -- from CHECK constraint
For this particular case, Query Rewrite can simply eliminate this step. In general, Query Rewrite needs to know if a set of conditions is satisfiable.
Of course, you would rarely submit a contradictory query that does not return results regardless of the data like a=1 AND a=2. The example in the previous paragraph indicates the need for such checks. As previously stated, this issue is referred to as the satisfiability problem. Any solution to satisfiability generates a set of conditions and either declares them to be FALSE to denote that they are contradictory, or TRUE, which means that for some specific data, the set of conditions is satisfiable.
You should be aware of the cost of enforcing CHECK constraints if you decide to use them for horizontal table partitioning. This type of table partitioning is not identical to the row partitioning that you can specify with a partitioning expression in a PARTITION BY clause for a table or join index, though it is similar conceptually.
SAT-TC and Query Optimization
- Determining whether a join index must be updated to keep it synchronized with an update operation on one or more of its base tables. The term update operation here signifies an insert, update, or delete operation against the base table in question.
- Determining whether a join index partly or fully covers a query.
The join index update problem can be solved by using the satisfiability check for the conjunction of the join index conditions and the condition applied in the base table maintenance.
Assume the following sparse join index definitions:
CREATE JOIN INDEX j1 AS SELECT * FROM lineitem WHERE EXTRACT (MONTH FROM l_shipdate)<=6; CREATE JOIN INDEX j2 AS SELECT * FROM lineitem WHERE EXTRACT(MONTH FROM l_shipdate)>=7;
Now consider the following delete operation on the lineitem table for example:
DELETE lineitem WHERE EXTRACT(MONTH FROM l_shipdate)=12;
This implies that there is a need to update j2, but not j1. The system can make this decision because the Satisfiability check returns TRUE for the following predicate:
EXTRACT(MONTH FROM l_shipdate)=12 AND EXTRACT(MONTH FROM l_shipdate)>=7
but FALSE for this predicate.
EXTRACT(MONTH FROM l_shipdate)=12 AND EXTRACT(MONTH FROM l_shipdate)<=6
The problem of determining whether a join index completely or partially covers a query is solved as a set of satisfiability problems. Note that the use of satisfiability in this problem becomes more important when more complex conditions, like constants in WHERE clause predicates, are specified in the join index definition. This happens, for example, when you create a sparse join index. See the information about hash and join indexes in Teradata Vantage™ - Database Design, B035-1094 and the documentation for CREATE JOIN INDEX in Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144 for more information about sparse join indexes.
Constant Predicate Evaluation
Constant predicate evaluation evaluates predicates that contain only constants (those having no column references) at the time a request is parsed. For example, consider the following complex predicate:
t1.pi=t2.pi OR ‘a’ IN (‘b’,‘c’)
The system identifies and evaluates constant predicates and then replaces the predicate with the result if it is either TRUE or FALSE.
The IN predicate in this example can be evaluated to FALSE when the database parses the request, so Query Rewrite transforms it as follows:
t1.pi=t2.pi
Domain-based Simplification
Domain-based simplification uses the domain range of the underlying column to simplify a predicate. For example, consider the following table definition:
create table t1 (a1 smallint)
With this table definition, the predicate a1 = 64000 leads to unsatisfiability, as the constant 64000 is not within the smallint range of values:
a1 = 64000 => FALSE
Similarly, consider the following predicate:
a1 in (32800, 80000, 1, 2, 3)
This predicate can be simplified as follows, as the constants 32800 and 80000 are not within the smallint range of values:
a1 in (1, 2, 3)
Check Constraint-based Simplification
Check constraint-based simplification is similar to domain-based simplification, but the range of the underlying column is based on check constraints specified on the column. For example, consider the following table definition:
create table t1 (a1 int NOT NULL check (a1 < 10))
With this table definition, the following predicate leads to unsatisfiability, as the predicate is outside the allowed range of values based on the check constraint:
a1 > 30 => FALSE
Consolidating Single-Table Predicates
Query rewrite evaluates single-table predicates and rewrites them to eliminate overlapping or redundant conditions. For example, consider the following set of predicates:
a1 NOT IN (1,3) AND a1 <4 AND a1 >= 2
If a1 is an INTEGER column these predicates can be simplified as follows:
a1=2
Similarly, consider the following predicate:
a1 IN (1, 3, 5, 7, 9) AND a1 > 4
where a1 is an INTEGER column.
The predicate can be simplified as follows:
a1 IN (5,7,9)
Consolidation of single-table predicates enables the Optimizer to calculate more accurate selectivity estimates while at the same time eliminating the execution of redundant predicates.
Query Rewrite also supports simplification of single-table predicates for the BEGIN and END bound functions on Period columns.
For example, consider the following predicate based on the BEGIN bound function:
BEGIN(b)>DATE ‘2005-02-03’ AND BEGIN(b)>DATE ‘2010-02-03’
Query Rewrite simplifies this to the following predicate:
BEGIN(b)>=DATE ‘2010-02-04’
Similarly, consider the following predicate based on the END bound function:
END(b)< DATE‘2005-02-03’ AND END(b)< DATE‘2010-02-03’
Query Rewrite simplifies this to the following predicate:
END(b)<= DATE‘2005-02-02’
Consolidation of single table predicate can lead to unsatisfiability as follows:
a1 NOT IN (1, 2, 3) AND a1>=1 AND a1<=3
If a1 is an INTEGER column, the conditions are non-overlapping and the predicate is rewritten as follows:
0=1
0=1 evaluates to FALSE.
In other cases, the predicate can be simplified to a predicate that does not constrain the set of possible values, as shown:
a1>1 OR a1<2
The predicate can be rewritten as follows:
a1 IS NOT NULL
For a more complex example, consider the following predicate:
(a1 >= 1 AND a1 <= 3) OR (a1 >= 4 AND a1 <= 10)
If a1 is an INTEGER column, the predicate can be rewritten as follows:
a1 >= 1 AND a1 <= 10
Constant Movearound
The constant movearound rewrite can enable better selectivity estimates and wider use of indexes by the Optimizer. Constant movearound attempts to move constants from one side of a boolean comparison operator to the other side in order to rewrite predicates of the following form:
<column> <±> <constant_1> <comparison_operator><constant_2>
Constant movearound rewrites predicates of the preceding form to predicates of the following form:
<column> <comparison_operator> <constant_3>
The method does this by removing the plus or minus operation from the column side of the predicate and then adding its negation to the constant-only side of the predicate. The constant expression is then folded. Any errors that occur during folding, such as overflow or underflow, cause the rewrite to be rejected in favor of using the original expression.
This transformation is only done for the plus and minus arithmetic operators.
For a simple example, consider a1 + 1 > 4, which can be rewritten as a1 > 3.
The constant expression that is moved can also be an INTERVAL expression, as in DATE operations.
For example, consider the following predicate:
date_col + INTERVAL '3' MONTH <= '2007-03-31'
Using constant movearound, this predicate can be rewritten as follows:
date_col <= '2006-12-31'
Constant Substitution
The constant substitution rewrite substitutes the values of columns, if they are available, and attempts to derive unsatisfiability or to simplify the predicate, if possible. Predicates of the following form are simplified by substituting the values for column_1 and column_2 whenever possible:
<column_1> <±> <constant_1> <comparison_operator> <column_2> <±> <constant_2>
As an example, consider the following predicate:
a = 10 AND b = 20 AND a + 2 = b + 1
Substituting the value 10 for a and 20 for b, the predicate can be simplified to 12 = 21, which is FALSE.
For additional examples, consider a > a + 1, which simplifies to FALSE and a >= a - 1, which is rewritten as a IS NOT NULL.
Simplification by Distribution
This simplification by distribution rewrite attempts to simplify complex AND/OR predicates by using distributive properties.
Consider the following predicate:
a > 5 AND b < 6 AND (a < 2 OR b > 9)
By distributing the AND into the OR branch, the predicate can be simplified as follows:
a > 5 AND b < 6 AND ((a > 5 AND a < 2) OR (b < 6 AND b > 9)) => a > 5 AND b < 6 AND ( FALSE OR FALSE) => a > 5 AND b < 6 AND FALSE => FALSE
Factoring Common Predicates
The factoring of common predicates is another rewrite based on distributive properties. If a complex AND/OR predicate has some common predicates across the AND/OR branches, these common predicates can be factored out to simplify the predicate.
Consider the following predicate:
(a1 = 1 AND b1 = 1 AND c1 = 1) OR (a1 = 1 AND b1 = 1 AND d1 = 1) OR (a1 = 1 AND e1 = 1)
Note that (a1 = b1 AND b1 = 1) is common to the first two branches of the overall OR predicate. In the first step of factoring, (a1 = b1 AND b1 = 1) can be factored out, resulting in the following equivalent predicate:
(a1 =1 AND b1 = 1 AND (c1 = 1 OR d1 = 1)) OR (a1 = 1 AND e1 = 1)
After the first step, note that (a1 = 1) is common to the remaining two branches of the OR predicate. In the second step of factoring, (a1 = 1) can be factored out, resulting in the following equivalent predicate:
a1 = 1 AND ((b1 = 1 AND (c1 = 1 OR d1 = 1)) OR e1 = 1)
Simplification Based on Containment
Simplification based on containment attempts to simplify complex AND/OR predicates by using the set containment properties of set algebra. There are two kinds of containment rewrite rules: the OR containment rule and the AND containment rule.
The OR containment rule is as follows:
If A contains A', A OR (A' AND ..) => A
For example, a > 5 OR (a > 7 AND b < 6) can be simplified to a > 5.
The AND containment rule is as follows:
If A' contains A, A AND (A' OR ..) => A
For example, a < 10 AND (a < 12 OR b < 6) can be simplified to a < 10.
Duplicate Predicate Removal
Duplicate Predicate Removal eliminates identical conjuncts, disjuncts, or both from AND and OR predicates. The maximum number of duplicate conjuncts and disjuncts compared per predicate is 100.