Predicate Simplification - Advanced SQL Engine - Teradata Database

SQL Request and Transaction Processing

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Published
July 2021
Language
English (United States)
Last Update
2021-07-28
dita:mapPath
uqf1592445067244.ditamap
dita:ditavalPath
uqf1592445067244.ditaval
dita:id
B035-1142
lifecycle
previous
Product Category
Teradata Vantage™

Predicate simplification is a part of the query rewrite system that transforms a query into an equivalent form, but one that is simpler and more amenable to query optimization. Several of the key categories of predicate simplification are outlined in this section.

Satisfiability and Transitive Closure

The SAT-TC (SATisfiability-Transitive Closure) rewrite analyzes a set of predicates to determine if either of the following can be exploited to rewrite a request:
  • A contradiction.

    A simple example is a condition like a=1 AND a=0.

  • Inferring new predicates from the existing predicate set by using transitive closure.

    For example, consider a request that specifies the predicate a=1 AND a=b. This implies that b=1, which might be useful information for rewriting or otherwise optimizing the request.

Other examples of deriving new conditions by using transitive closure include the following samples:

A=B AND A=C --> B=C
A=5 AND A=B --> B=5
A=5 AND A IS NULL --> FALSE
A=5 AND A IS NOT NULL --> A=5
X > 1 AND Y > X --> Y >= 3
X IN (1,2,3) AND Y=X --> Y IN (1,2,3)

Transitive closure is also applied in the context of extract predicates. For example, for the set of conditions {o_orderdate=‘1999-05-01’ AND EXTRACT(MONTH FROM o_orderdate)>2}, the system adds EXTRACT(MONTH FROM o_orderdate)=5 based on o_orderdate=‘1999-05-01’, which can then be used to simplify the existing EXTRACT predicate in the query.

For some categories of DML requests specified with BEGIN END constraints on a column with a Period data type, the query rewrite system adds the implied constraint BEGIN(column_1) <END(column_1) to help to identify unsatisfiable conditions. When these unsatisfiable conditions occur, EXPLAIN text for the query identifies them using the EXPLAIN text phrase unsatisfiable (see EXPLAIN Request Modifier Phrase Terminology).

As an example, consider the following table definition:

CREATE TABLE t1 (a INTEGER b PERIOD (DATE));

Suppose that you use the following query:

SELECT *
FROM t1
WHERE BEGIN(b) > DATE ‘2010-02-03’ AND
      END(b) < DATE ‘2010-02-03’;

The query rewrite system adds an implied constraint BEGIN(b) < END(b), and is able to derive unsatisfiability in conjunction with the query predicates.

The query rewrite system can also apply transitive closure across query blocks, meaning transitive closure between outer and inner query blocks. This allows conditions to be pushed into and out of subqueries. The basic approach is to combine the query block conditions before computing the transitive closure. The IN and NOT IN operators are treated as = and operators, respectively. Derived conditions are added, as appropriate, to each query block.

Consider a simple SQL example. The following SELECT request implies that x<1.

SELECT * FROM t1 WHERE x IN (SELECT y FROM t2 WHERE y<1);

Similarly, the following SELECT request implies that x < 3 and y is in (1,4).

SELECT *
 FROM t1
 WHERE EXISTS (SELECT *
               FROM t2
               WHERE y<3
               AND   x=y)
               AND   x IN (1,4);

In the current context, the conditions under analysis are referred to as connecting conditions. A connecting condition is one that connects an outer query with a subquery. See Connecting Predicates for further information about connecting conditions.

Applications of Transitive Closure

Transitive Closure (TC) can optimize date ranges and IN clauses. The following request illustrates one of these cases:

SELECT l_shipmode, SUM (CASE
                            WHEN o_orderpriority = '1URGENT'
                            OR   o_orderpriority = '2-HIGH'
                            THEN 1
                            ELSE 0
                        END)
FROM lineitem
WHERE l_commitdate  <  l_receiptdate
AND   l_shipdate    <  l_commitdate
AND   l_receiptdate >= '1994-01-01'
AND   l_receiptdate < ('1994-06-06')
GROUP BY l_shipmode;

The new set of constraints that can be derived is as follows:

(l_shipdate < l_receiptdate AND
l_commitdate <= '1994-06-04' AND
l_shipdate   <= '1994-06-03')

If lineitem or one of its covering indexes is either value-ordered or row-partitioned on l_shipdate, the new constraint l_shipdate<='1994-06-03' enables Vantage to access only a portion of the table instead of doing a full-table scan.

You might notice performance improvements for some queries because of the extra predicates TC adds. The extra predicates can also be seen in the EXPLAIN reports for requests that use them.

SAT-TC and Query Rewrite

One important aspect of using transitive closure for query rewrite is its application across ON and WHERE clauses. For example, suppose you have the following request:

SELECT product_name, sum(amount*quantity) AS qty
FROM product LEFT OUTER JOIN sales1
ON product_key=sales_product_key
WHERE product_key=10
GROUP BY product_key, product_name ;

For this request, transitive closure adds the inferred predicate sales_product_key=10 to the ON clause. This is particularly effective when the predicate added to the ON clause is a constraint on the primary index of the inner table in a join.

Another important property of transitive closure is its ability to infer new predicates across the ON clauses of consecutive inner joins. For example, consider the following request.

SELECT product_key, product_name, SUM(s1.amount * s1.quantity+s2.amount * s2.quantity) 
AS total
FROM product LEFT OUTER JOIN ((sales1 AS s1 INNER JOIN store
             ON  s1.sales_store_key=store_key)
             INNER JOIN sales2 AS s2
             ON  s2.sales_store_key=store_key
             AND s2.sales_store_key=10)
             ON  product_key=s1.sales_product_key
             AND product_key=s2.sales_product_key
GROUP BY product_key, product_name;

To see this application of transitive closure, consider the consecutive inner joins between s1, s2, and store. The predicates in consecutive inner joins can be treated collectively by transitive closure as if they were specified in a single WHERE clause. In this example transitive closure processes these predicates as if they appeared as the following compound predicate.

WHERE s1.sales_store_key = store_key
AND   s2.sales_store_key = store_key
AND   s2.sales_store_key = 10

By grouping the predicates logically like this, transitive closure can derive the new predicates store_key=10 and s1.sales_store_key=10 and then place them in the ON clause of the uppermost inner join in the set of consecutive inner joins. In this example, that is the ON clause joining to s2.

If a condition is false mathematically, it is said to be contradictory or unsatisfiable. In this context, the opposite of contradictory is satisfiable, regardless of the data. An example might be specifying the conditions a=1 and a=2 in the same request. If Query Rewrite discovers such an unsatisfiable condition, it simplifies and optimizes the condition in such a way that all joins and retrievals can be done on a single AMP basis.

One way to take advantage of a contradictory condition is to add CHECK constraints to tables, enabling Query Rewrite to eliminate unnecessary conditions, thus permitting the Optimizer to later construct better execution plans (see Applications of Transitive Closure).

For example, assume that you want to list all orders made in the first three months of the fiscal year. Assuming that you have access only to the ordertbl view that is a UNION ALL of orders1, orders2,…,orders12, and the query looks like this:

SELECT *
FROM order_tbl
WHERE EXTRACT(MONTH FROM o_orderdate)<= 3;

Without CHECK constraint, the system must access all of the tables orders1, orders2, … orders12 using the constraint EXTRACT(MONTH FROM o_orderdate)<=3, even though it only needs to access orders1, orders2, and orders3 to satisfy the request. The only way Query Rewrite knows to filter out the other nine tables is to add CHECK constraints for every table, and then to determine the contradiction between the CHECK constraints and the query constraint.

For example, if the CHECK constraint on order4 is added, Query Rewrite sees the following compound predicate, which is a contradiction.

EXTRACT(MONTH FROM o_orderdate)<= 3     -- from query
AND
EXTRACT(MONTH FROM o_orderdate)=4       -- from CHECK constraint

For this particular case, Query Rewrite can simply eliminate this step. In general, Query Rewrite needs to know if a set of conditions is satisfiable.

Of course, you would rarely submit a contradictory query that does not return results regardless of the data like a=1 AND a=2. The example in the previous paragraph indicates the need for such checks. As previously stated, this issue is referred to as the satisfiability problem. Any solution to satisfiability generates a set of conditions and either declares them to be FALSE to denote that they are contradictory, or TRUE, which means that for some specific data, the set of conditions is satisfiable.

You should be aware of the cost of enforcing CHECK constraints if you decide to use them for horizontal table partitioning. This type of table partitioning is not identical to the row partitioning that you can specify with a partitioning expression in a PARTITION BY clause for a table or join index, though it is similar conceptually.

SAT-TC and Query Optimization

Query optimization introduces the following additional applications of satisfiability in the usage and maintenance of join indexes:
  • Determining whether a join index must be updated to keep it synchronized with an update operation on one or more of its base tables. The term update operation here signifies an insert, update, or delete operation against the base table in question.
  • Determining whether a join index partly or fully covers a query.

The join index update problem can be solved by using the satisfiability check for the conjunction of the join index conditions and the condition applied in the base table maintenance.

Assume the following sparse join index definitions:

CREATE JOIN INDEX j1 AS
  SELECT *
  FROM lineitem
  WHERE EXTRACT (MONTH FROM l_shipdate)<=6;

CREATE JOIN INDEX j2 AS
  SELECT *
  FROM lineitem
  WHERE EXTRACT(MONTH FROM l_shipdate)>=7;

Now consider the following delete operation on the lineitem table for example:

DELETE lineitem
WHERE EXTRACT(MONTH FROM l_shipdate)=12;

This implies that there is a need to update j2, but not j1. The system can make this decision because the Satisfiability check returns TRUE for the following predicate:

EXTRACT(MONTH FROM l_shipdate)=12
AND
EXTRACT(MONTH FROM l_shipdate)>=7

but FALSE for this predicate.

EXTRACT(MONTH FROM l_shipdate)=12
AND
EXTRACT(MONTH FROM l_shipdate)<=6

The problem of determining whether a join index completely or partially covers a query is solved as a set of satisfiability problems. Note that the use of satisfiability in this problem becomes more important when more complex conditions, like constants in WHERE clause predicates, are specified in the join index definition. This happens, for example, when you create a sparse join index. See the information about hash and join indexes in Teradata Vantage™ - Database Design, B035-1094 and the documentation for CREATE JOIN INDEX in Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144 for more information about sparse join indexes.

Constant Predicate Evaluation

Constant predicate evaluation evaluates predicates that contain only constants (those having no column references) at the time a request is parsed. For example, consider the following complex predicate:

t1.pi=t2.pi
OR ‘a’ IN (‘b’,‘c’)

The system identifies and evaluates constant predicates and then replaces the predicate with the result if it is either TRUE or FALSE.

The IN predicate in this example can be evaluated to FALSE when the database parses the request, so Query Rewrite transforms it as shown below:

t1.pi=t2.pi

Domain-based Simplification

Domain-based simplification uses the domain range of the underlying column to simplify a predicate. For example, consider the following table definition:

create table t1 (a1 smallint)

With this table definition, the predicate a1 = 64000 leads to unsatisfiability, as the constant 64000 is not within the smallint range of values:

a1 = 64000 => FALSE

Similarly, consider the following predicate:

a1 in (32800, 80000, 1, 2, 3)

This predicate can be simplified as shown below, as the constants 32800 and 80000 are not within the smallint range of values:

a1 in (1, 2, 3)

Check Constraint-based Simplification

Check constraint-based simplification is similar to domain-based simplification, but the range of the underlying column is based on check constraints specified on the column. For example, consider the following table definition:

create table t1 (a1 int NOT NULL check (a1 < 10))

With this table definition, the following predicate leads to unsatisfiability, as the predicate is outside the allowed range of values based on the check constraint:

a1 > 30 => FALSE

Consolidating Single-Table Predicates

Query rewrite evaluates single-table predicates and rewrites them to eliminate overlapping or redundant conditions. For example, consider the following set of predicates:

a1 NOT IN (1,3) AND
a1 <4 AND
a1 >= 2

If a1 is an INTEGER column these predicates can be simplified as follows:

a1=2

Similarly, consider the following predicate:

a1 IN (1, 3, 5, 7, 9) AND
a1 > 4

where a1 is an INTEGER column.

The predicate can be simplified as follows:

a1 IN (5,7,9)

Consolidation of single-table predicates enables the Optimizer to calculate more accurate selectivity estimates while at the same time eliminating the execution of redundant predicates.

Query Rewrite also supports simplification of single-table predicates for the BEGIN and END bound functions on Period columns.

For example, consider the following predicate based on the BEGIN bound function:

BEGIN(b)>DATE ‘2005-02-03’ AND
BEGIN(b)>DATE ‘2010-02-03’

Query Rewrite simplifies this to the following predicate:

BEGIN(b)>=DATE ‘2010-02-04’

Similarly, consider the following predicate based on the END bound function:

END(b)< DATE‘2005-02-03’ AND
END(b)< DATE‘2010-02-03’

Query Rewrite simplifies this to the following predicate:

END(b)<= DATE‘2005-02-02’

Consolidation of single table predicate can lead to unsatisfiability as follows:

a1 NOT IN (1, 2, 3) AND
a1>=1 AND
a1<=3

If a1 is an INTEGER column, the above conditions are non-overlapping and the predicate is rewritten as follows:

0=1

0=1 evaluates to FALSE.

In other cases, the predicate can be simplified to a predicate that does not constrain the set of possible values, as shown:

a1>1 OR a1<2 

The predicate can be rewritten as follows:

a1 IS NOT NULL

For a more complex example, consider the following predicate:

(a1 >= 1 AND a1 <= 3) OR
(a1 >= 4 AND a1 <= 10)

If a1 is an INTEGER column, the predicate can be rewritten as follows:

a1 >= 1 AND a1 <= 10

Constant Movearound

The constant movearound rewrite can enable better selectivity estimates and wider use of indexes by the Optimizer. Constant movearound attempts to move constants from one side of a boolean comparison operator to the other side in order to rewrite predicates of the following form:

<column> <±> <constant_1> <comparison_operator><constant_2>

Constant movearound rewrites predicates of the form above to predicates of the following form:

<column> <comparison_operator> <constant_3>

The method does this by removing the plus or minus operation from the column side of the predicate and then adding its negation to the constant-only side of the predicate. The constant expression is then folded. Any errors that occur during folding, such as overflow or underflow, cause the rewrite to be rejected in favor of using the original expression.

This transformation is only done for the plus and minus arithmetic operators.

For a simple example, consider a1 + 1 > 4, which can be rewritten as a1 > 3.

The constant expression that is moved can also be an INTERVAL expression, as in DATE operations.

For example, consider the following predicate:

date_col + INTERVAL '3' MONTH <= '2007-03-31'

Using constant movearound, this predicate can be rewritten as follows:

date_col <= '2006-12-31'

Constant Substitution

The constant substitution rewrite substitutes the values of columns, if they are available, and attempts to derive unsatisfiability or to simplify the predicate, if possible. Predicates of the following form are simplified by substituting the values for column_1 and column_2 whenever possible:

<column_1> <±> <constant_1> <comparison_operator>
<column_2> <±> <constant_2> 

As an example, consider the following predicate:

a = 10 AND
b = 20 AND
a + 2 = b + 1

Substituting the value 10 for a and 20 for b, the predicate can be simplified to 12 = 21, which is FALSE.

For additional examples, consider a > a + 1, which simplifies to FALSE and a >= a - 1, which is rewritten as a IS NOT NULL.

Simplification by Distribution

This simplification by distribution rewrite attempts to simplify complex AND/OR predicates by using distributive properties.

Consider the following predicate:

a > 5 AND b < 6 AND (a < 2 OR b > 9)

By distributing the AND into the OR branch, the predicate can be simplified as follows:

a > 5 AND b < 6 AND ((a > 5 AND a < 2) OR (b < 6 AND b > 9)) =>
a > 5 AND b < 6 AND ( FALSE OR FALSE) =>
a > 5 AND b < 6 AND FALSE =>
FALSE

Factoring Common Predicates

The factoring of common predicates is another rewrite based on distributive properties. If a complex AND/OR predicate has some common predicates across the AND/OR branches, these common predicates can be factored out to simplify the predicate.

Consider the following predicate:

(a1 = 1 AND b1 = 1 AND c1 = 1) OR
(a1 = 1 AND b1 = 1 AND d1 = 1) OR
(a1 = 1 AND e1 = 1)

Note that (a1 = b1 AND b1 = 1) is common to the first two branches of the overall OR predicate. In the first step of factoring, (a1 = b1 AND b1 = 1) can be factored out, resulting in the following equivalent predicate:

(a1 =1 AND b1 = 1 AND (c1 = 1 OR d1 = 1)) OR
(a1 = 1 AND e1 = 1)

After the first step, note that (a1 = 1) is common to the remaining two branches of the OR predicate. In the second step of factoring, (a1 = 1) can be factored out, resulting in the following equivalent predicate:

a1 = 1 AND ((b1 = 1 AND (c1 = 1 OR d1 = 1)) OR e1 = 1)

Simplification Based on Containment

Simplification based on containment attempts to simplify complex AND/OR predicates by using the set containment properties of set algebra. There are two kinds of containment rewrite rules: the OR containment rule and the AND containment rule.

The OR containment rule is as follows:

If A contains A',
   A OR (A' AND ..) => A

For example, a > 5 OR (a > 7 AND b < 6) can be simplified to a > 5.

The AND containment rule is as follows:

If A' contains A,
   A AND (A' OR ..) => A

For example, a < 10 AND (a < 12 OR b < 6) can be simplified to a < 10.

Duplicate Predicate Removal

Duplicate Predicate Removal eliminates identical conjuncts, disjuncts, or both from AND and OR predicates. The maximum number of duplicate conjuncts and disjuncts compared per predicate is 100.