Predicate Simplification - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

Predicate simplification transforms a query into an equivalent, simpler form that is more optimizable.

Satisfiability and Transitive Closure

The SAT-TC (SATisfiability-Transitive Closure) rewrite analyzes a set of predicates to determine if either of the following can be exploited to rewrite a request:
  • A contradiction.

    A simple example is a condition like a=1 AND a=0.

  • Inferring new predicates from the existing predicate set by using transitive closure.

    For example, consider a request that specifies the predicate a=1 AND a=b. This implies that b=1, which may be useful information for rewriting or otherwise optimizing the request.

Other examples of deriving new conditions by using transitive closure include the following samples:

A=B AND A=C --> B=C
A=5 AND A=B --> B=5
A=5 AND A IS NULL --> FALSE
A=5 AND A IS NOT NULL --> A=5
X > 1 AND Y > X --> Y >= 3
X IN (1,2,3) AND Y=X --> Y IN (1,2,3)

Transitive closure is also applied in the context of extract predicates. For example, for the set of conditions {o_orderdate='1999-05-01' AND EXTRACT(MONTH FROM o_orderdate)>2}, the system adds EXTRACT(MONTH FROM o_orderdate)=5 based on o_orderdate='1999-05-01', which can then be used to simplify the existing EXTRACT predicate in the query.

For categories of DML requests specified with BEGIN END constraints on a column with a Period data type, the query rewrite system may add the implied constraint BEGIN(column_1) <END(column_1) to help to identify unsatisfiable conditions. EXPLAIN text for the query identifies these unsatisfiable conditions using the EXPLAIN text phrase unsatisfiable (see EXPLAIN Request Modifier Phrase Terminology).

As an example, consider the following table definition:

CREATE TABLE t1 (a INTEGER b PERIOD (DATE));

Suppose that you use the following query:

SELECT *
FROM t1
WHERE BEGIN(b) > DATE ‘2010-02-03’ AND
      END(b) < DATE ‘2010-02-03’;

The query rewrite system adds an implied constraint BEGIN(b) < END(b), and is able to derive unsatisfiability with the query predicates.

The query rewrite system can also apply transitive closure across query blocks, meaning transitive closure between outer and inner query blocks. This allows conditions to be pushed into and out of subqueries. The basic approach is to combine the query block conditions before computing the transitive closure. The IN and NOT IN operators are treated as = and ≠ operators, respectively. Derived conditions are added, as appropriate, to each query block.

Consider a simple SQL example. The following SELECT request implies that x<1.

SELECT * FROM t1 WHERE x IN (SELECT y FROM t2 WHERE y<1);

Similarly, the following SELECT request implies that x < 3 and y is in (1,4).

SELECT *
 FROM t1
 WHERE EXISTS (SELECT *
               FROM t2
               WHERE y<3
               AND   x=y)
               AND   x IN (1,4);

In the current context, the conditions under analysis are called connecting conditions. A connecting condition is one that connects an outer query with a subquery. See Connecting Predicates for further information about connecting conditions.

Applications of Transitive Closure

Transitive Closure (TC) can optimize date ranges and IN clauses. The following request shows one of these cases:

SELECT l_shipmode, SUM (CASE
                            WHEN o_orderpriority = '1URGENT'
                            OR   o_orderpriority = '2-HIGH'
                            THEN 1
                            ELSE 0
                        END)
FROM lineitem
WHERE l_commitdate  <  l_receiptdate
AND   l_shipdate    <  l_commitdate
AND   l_receiptdate >= '1994-01-01'
AND   l_receiptdate < ('1994-06-06')
GROUP BY l_shipmode;

The new set of constraints that can be derived is as follows:

(l_shipdate < l_receiptdate AND
l_commitdate <= '1994-06-04' AND
l_shipdate   <= '1994-06-03')

If lineitem or one of its covering indexes is either value-ordered or row-partitioned on l_shipdate, the new constraint l_shipdate<='1994-06-03' enables Vantage to access only a portion of the table instead of doing a full-table scan.

You may notice performance improvements for queries because of the extra predicates TC adds. The extra predicates can also be seen in the EXPLAIN reports for requests that use them.

SAT-TC and Query Rewrite

One important aspect of using transitive closure for query rewrite is its application across ON and WHERE clauses. For example, suppose you have the following request:

SELECT product_name, sum(amount*quantity) AS qty
FROM product LEFT OUTER JOIN sales1
ON product_key=sales_product_key
WHERE product_key=10
GROUP BY product_key, product_name ;

For this request, transitive closure adds the inferred predicate sales_product_key=10 to the ON clause. This is particularly effective when the predicate added to the ON clause is a constraint on the primary index of the inner table in a join.

Another important property of transitive closure is its ability to infer new predicates across the ON clauses of consecutive inner joins. For example, consider the following request.

SELECT product_key, product_name, SUM(s1.amount * s1.quantity+s2.amount * s2.quantity) 
AS total
FROM product LEFT OUTER JOIN ((sales1 AS s1 INNER JOIN store
             ON  s1.sales_store_key=store_key)
             INNER JOIN sales2 AS s2
             ON  s2.sales_store_key=store_key
             AND s2.sales_store_key=10)
             ON  product_key=s1.sales_product_key
             AND product_key=s2.sales_product_key
GROUP BY product_key, product_name;

To see this application of transitive closure, consider the consecutive inner joins between s1, s2, and store. The predicates in consecutive inner joins can be treated collectively by transitive closure, as if specified in this single WHERE clause:

WHERE s1.sales_store_key = store_key
AND   s2.sales_store_key = store_key
AND   s2.sales_store_key = 10

By grouping the predicates logically like this, transitive closure can derive the new predicates store_key=10 and s1.sales_store_key=10 and place the new predicates in the ON clause of the uppermost inner join in the set of consecutive inner joins. In this example, that is the ON clause joining to s2.

A condition that is false mathematically is called contradictory or unsatisfiable. In this context, the opposite of contradictory is satisfiable, regardless of the data (for example, specifying a=1 and a=2 in the same request). Query Rewrite simplifies and optimizes such a condition so that all joins and retrievals can be done on a single-AMP basis.

One way to take advantage of a contradictory condition is to add CHECK constraints to tables, enabling Query Rewrite to eliminate unnecessary conditions, thus permitting the Optimizer to later construct better execution plans (see Applications of Transitive Closure).

For example, assume that you want to list all orders made in the first three months of the fiscal year. Assuming that you have access only to the ordertbl view that is a UNION ALL of orders1, orders2, …, orders12, and the query looks like this:

SELECT *
FROM order_tbl
WHERE EXTRACT(MONTH FROM o_orderdate)<= 3;

Without CHECK constraint, the system must access tables orders1, orders2, …, orders12 using the constraint EXTRACT(MONTH FROM o_orderdate)<=3, even though accessing orders1, orders2, and orders3 is enough to satisfy the request. The only way Query Rewrite knows to filter out the other nine tables is to add CHECK constraints for every table, and then to determine the contradiction between the CHECK constraints and the query constraint.

For example, if the CHECK constraint on order4 is added, Query Rewrite sees the following compound predicate, which is a contradiction.

EXTRACT(MONTH FROM o_orderdate)<= 3     -- from query
AND
EXTRACT(MONTH FROM o_orderdate)=4       -- from CHECK constraint

For this case, Query Rewrite can simply eliminate this step. Query Rewrite must know if a set of conditions is satisfiable.

You rarely submit a contradictory query that does not return results regardless of the data like a=1 AND a=2. The example in the previous paragraph indicates the need for such checks. Any solution to satisfiability generates a set of conditions declared FALSE (that is, contradictory) or TRUE (satisfiable for specific data).

If you decide to use CHECK constraints for horizontal table partitioning, be aware of the cost of enforcement. Though conceptually similar, CHECK constraints are not identical to the row partitioning that you can specify with a partitioning expression in a PARTITION BY clause for a table or join index.

SAT-TC and Query Optimization

Query optimization introduces the following additional applications of satisfiability in the usage and maintenance of join indexes:
  • Determining whether a join index must be updated when one or more of its base tables is updated. (An update on a base tables is an insert, update, or delete operation.)
  • Determining whether a join index partly or fully covers a query.

The join index update problem can be solved by using the satisfiability check for the conjunction of the join index conditions and the condition applied in the base table maintenance.

Assume the following sparse join index definitions:

CREATE JOIN INDEX j1 AS
  SELECT *
  FROM lineitem
  WHERE EXTRACT (MONTH FROM l_shipdate)<=6;

CREATE JOIN INDEX j2 AS
  SELECT *
  FROM lineitem
  WHERE EXTRACT(MONTH FROM l_shipdate)>=7;

Now consider the following delete operation on the lineitem table for example:

DELETE lineitem
WHERE EXTRACT(MONTH FROM l_shipdate)=12;

This implies that there is a need to update j2, but not j1. The system can make this decision because the Satisfiability check returns TRUE for the following predicate:

EXTRACT(MONTH FROM l_shipdate)=12
AND
EXTRACT(MONTH FROM l_shipdate)>=7

but FALSE for this predicate.

EXTRACT(MONTH FROM l_shipdate)=12
AND
EXTRACT(MONTH FROM l_shipdate)<=6

The problem of determining whether a join index completely or partially covers a query is solved as a set of satisfiability problems. The use of satisfiability in this problem becomes more important when more complex conditions, like constants in WHERE clause predicates, are specified in the join index definition. This happens, for example, when you create a sparse join index. See When Join Indexes Are Useful and CREATE JOIN INDEX for more information about sparse join indexes.

Constant Predicate Evaluation

Constant predicate evaluation evaluates predicates that contain only constants (those having no column references) at the time a request is parsed. For example, consider the following complex predicate:

t1.pi=t2.pi
OR 'a' IN ('b','c')

The system evaluates constant predicates and replaces the predicate with the result if the result is TRUE or FALSE.

If the IN predicate evaluates to FALSE, Query Rewrite transforms the predicate as follows:

t1.pi=t2.pi

Domain-Based Simplification

Domain-based simplification uses the domain range of the underlying column to simplify a predicate. For example, consider the following table definition:

create table t1 (a1 smallint)

With this table definition, the predicate a1 = 64000 leads to unsatisfiability, as the constant 64000 is not within the smallint range of values:

a1 = 64000 => FALSE

Similarly, consider the following predicate:

a1 in (32800, 80000, 1, 2, 3)

This predicate can be simplified as follows, as the constants 32800 and 80000 are not within the smallint range of values:

a1 in (1, 2, 3)

Check Constraint-Based Simplification

Check constraint-based simplification is similar to domain-based simplification, but the range of the underlying column is based on check constraints specified on the column. For example, consider the following table definition:

create table t1 (a1 int NOT NULL check (a1 < 10))

With this table definition, the following predicate leads to unsatisfiability, as the predicate is outside the allowed range of values based on the check constraint:

a1 > 30 => FALSE

Consolidating Single-Table Predicates

Query rewrite rewrites single-table predicates to eliminate overlapping or redundant conditions. For example, consider the following set of predicates:

a1 NOT IN (1,3) AND
a1 <4 AND
a1 >= 2

If a1 is an INTEGER column these predicates can be simplified as follows:

a1=2

Similarly, consider the following predicate:

a1 IN (1, 3, 5, 7, 9) AND
a1 > 4

where a1 is an INTEGER column.

The predicate can be simplified as follows:

a1 IN (5,7,9)

Consolidation of single-table predicates enables the Optimizer to calculate more accurate selectivity estimates while at the same time eliminating the execution of redundant predicates.

Query Rewrite also supports simplification of single-table predicates for the BEGIN and END bound functions on Period columns.

For example, consider the following predicate based on the BEGIN bound function:

BEGIN(b)>DATE '2005-02-03' AND
BEGIN(b)>DATE '2010-02-03'

Query Rewrite simplifies this to the following predicate:

BEGIN(b)>=DATE '2010-02-04'

Similarly, consider the following predicate based on the END bound function:

END(b)< DATE '2005-02-03' AND
END(b)< DATE '2010-02-03'

Query Rewrite simplifies this to the following predicate:

END(b)<= DATE '2005-02-02'

Consolidation of single table predicate can lead to unsatisfiability as follows:

a1 NOT IN (1, 2, 3) AND
a1>=1 AND
a1<=3

If a1 is an INTEGER column, the preceding conditions are non-overlapping and the predicate is rewritten as follows:

0=1

0=1 evaluates to FALSE.

In other cases, the predicate can be simplified to a predicate that does not constrain the set of possible values, as shown:

a1>1 OR a1<2 

The predicate can be rewritten as follows:

a1 IS NOT NULL

For a more complex example, consider the following predicate:

(a1 >= 1 AND a1 <= 3) OR
(a1 >= 4 AND a1 <= 10)

If a1 is an INTEGER column, the predicate can be rewritten as follows:

a1 >= 1 AND a1 <= 10

Constant Move-Around

The constant movearound rewrite can enable better selectivity estimates and wider use of indexes by the Optimizer. Constant movearound attempts to move constants from one side of a boolean comparison operator to the other side to rewrite predicates of the following form:

column ± constant_1 comparison_operator constant_2

Constant movearound rewrites predicates of the preceding form to predicates of the following form:

column comparison_operator constant_3

The method does this by removing the plus or minus operation from the column side of the predicate and then adding its negation to the constant-only side of the predicate. The constant expression is then folded. Any errors that occur during folding, such as overflow or underflow, cause the rewrite to be rejected in favor of using the original expression.

This transformation is only done for the plus and minus arithmetic operators.

For a simple example, consider a1 + 1 > 4, which can be rewritten as a1 > 3.

The constant expression that is moved can also be an INTERVAL expression, as in DATE operations.

For example, consider the following predicate:

date_col + INTERVAL '3' MONTH <= '2007-03-31'

Using constant movearound, this predicate can be rewritten as follows:

date_col <= '2006-12-31'

Constant Substitution

The constant substitution rewrite substitutes the values of columns, if available, and attempts to derive unsatisfiability or to simplify the predicate, if possible. Predicates of the following form are simplified by substituting the values for column_1 and column_2 whenever possible:

column_1 ± constant_1 comparison_operatorcolumn_2 ± constant_2

As an example, consider the following predicate:

a = 10 AND
b = 20 AND
a + 2 = b + 1

Substituting the value 10 for a and 20 for b, the predicate can be simplified to 12 = 21, which is FALSE.

For additional examples, consider a > a + 1, which simplifies to FALSE and a >= a - 1, which is rewritten as a IS NOT NULL.

Simplification by Distribution

This simplification by distribution rewrite attempts to simplify a complex predicate by using distributive properties.

Consider the following predicate:

a > 5 AND b < 6 AND (a < 2 OR b > 9)

By distributing the AND into the OR branch, the predicate can be simplified as follows:

a > 5 AND b < 6 AND ((a > 5 AND a < 2) OR (b < 6 AND b > 9)) =>
a > 5 AND b < 6 AND ( FALSE OR FALSE) =>
a > 5 AND b < 6 AND FALSE =>
FALSE

Factoring Common Predicates

The factoring of common predicates is another rewrite based on distributive properties. If a complex predicate has common predicates across the AND and OR branches, these common predicates can be factored out to simplify the predicate.

Consider the following predicate:

(a1 = 1 AND b1 = 1 AND c1 = 1) OR
(a1 = 1 AND b1 = 1 AND d1 = 1) OR
(a1 = 1 AND e1 = 1)

(a1 = b1 AND b1 = 1) is common to the first two branches of the overall OR predicate. In the first step of factoring, (a1 = b1 AND b1 = 1) can be factored out, resulting in the following equivalent predicate:

(a1 =1 AND b1 = 1 AND (c1 = 1 OR d1 = 1)) OR
(a1 = 1 AND e1 = 1)

After the first step, (a1 = 1) is common to the remaining two branches of the OR predicate. In the second step of factoring, (a1 = 1) can be factored out, resulting in the following equivalent predicate:

a1 = 1 AND ((b1 = 1 AND (c1 = 1 OR d1 = 1)) OR e1 = 1)

Simplification Based on Containment

Simplification based on containment attempts to simplify complex predicates by using the set containment properties of set algebra. There are two kinds of containment rewrite rules: the OR containment rule and the AND containment rule.

The OR containment rule is as follows:

If A contains A',
   A OR (A' AND ..) => A

For example, a > 5 OR (a > 7 AND b < 6) can be simplified to a > 5.

The AND containment rule is as follows:

If A' contains A,
   A AND (A' OR ..) => A

For example, a < 10 AND (a < 12 OR b < 6) can be simplified to a < 10.

Duplicate Predicate Removal

Duplicate Predicate Removal eliminates identical conjuncts, disjuncts, or both from AND and OR predicates. The maximum number of duplicate conjuncts and disjuncts compared per predicate is 100.