15.10 - Predicate Simplification - Teradata Database

Teradata Database SQL Request and Transaction Processing

prodname
Teradata Database
vrm_release
15.10
category
Programming Reference
User Guide
featnum
B035-1142-151K

Predicate simplification is a part of the query rewrite system that transforms a query into an equivalent form, but one that is simpler and more amenable to query optimization. Several of the key categories of predicate simplification are outlined in this section.

The SAT-TC (SATisfiability-Transitive Closure) rewrite analyzes a set of predicates to determine if either of the following can be exploited to rewrite a request.

  • A contradiction.
  • A simple example is a condition like a=1 AND a=0.

  • Inferring new predicates from the existing predicate set by using transitive closure.
  • For example, consider a request that specifies the predicate a=1 AND a=b. This implies that b=1, which might be useful information for rewriting or otherwise optimizing the request.

    Other examples of deriving new conditions by using transitive closure include the following samples:

         A=B AND A=C  B=C
         A=5 AND A=B  B=5
         A=5 AND A IS NULL  FALSE
         A=5 AND A IS NOT NULL  A=5
         X > 1 AND Y > X  Y >= 3
         X IN (1,2,3) AND Y=X  Y IN (1,2,3)

    Transitive closure is also applied in the context of extract predicates. For example, for the set of conditions {o_orderdate=‘1999-05-01’ AND EXTRACT(MONTH FROM o_orderdate)>2}, the system adds EXTRACT(MONTH FROM o_orderdate)=5 based on o_orderdate=‘1999-05-01’, which can then be used to simplify the existing EXTRACT predicate in the query.

    For some categories of DML requests specified with BEGIN … END constraints on a column with a Period data type, the query rewrite system adds the implied constraint BEGIN(column_1) <END(column_1) to help to identify unsatisfiable conditions. When these unsatisfiable conditions occur, EXPLAIN text for the query identifies them using the EXPLAIN text phrase unsatisfiable (see “Chapter 5 Interpreting the Output of the EXPLAIN Request Modifier” on page 533).

    As an example, consider the following table definition:

    CREATE TABLE t1 (a INTEGER b PERIOD (DATE));

    Suppose that you use the following query:

    SELECT *
    FROM t1
    WHERE BEGIN(b) > DATE ‘2010-02-03’ AND
          END(b) < DATE ‘2010-02-03’;

    The query rewrite system adds an implied constraint BEGIN(b) < END(b), and is able to derive unsatisfiability in conjunction with the query predicates.

    The query rewrite system can also apply transitive closure across query blocks, meaning transitive closure between outer and inner query blocks. This allows conditions to be pushed into and out of subqueries. The basic approach is to combine the query block conditions before computing the transitive closure. The IN and NOT IN operators are treated
    as = and operators, respectively. Derived conditions are added, as appropriate, to each query block.

    Consider a simple SQL example. The following SELECT request implies that x<1.

         SELECT * FROM t1 WHERE x IN (SELECT y FROM t2 WHERE y<1);

    Similarly, the following SELECT request implies that x < 3 and y is in (1,4).

        SELECT *
         FROM t1
         WHERE EXISTS (SELECT *
                       FROM t2
                       WHERE y<3
                       AND   x=y)
                       AND   x IN (1,4);

    In the current context, the conditions under analysis are referred to as connecting conditions. A connecting condition is one that connects an outer query with a subquery. See “Connecting Predicates” on page 114 for further information about connecting conditions.

    Satisfiability and Transitive Closure are done automatically by the Query Rewrite system, and do not require user intervention.

    In many decision support and CRM applications, Transitive Closure (TC) is needed to optimize date ranges and IN clauses. The following request illustrates one of these cases.

         SELECT l_shipmode, SUM (CASE 
                                     WHEN o_orderpriority = '1URGENT' 
                                     OR   o_orderpriority = '2-HIGH' 
                                     THEN 1 
                                     ELSE 0
                                 END) 
         FROM lineitem 
         WHERE l_commitdate  <  l_receiptdate 
         AND   l_shipdate    <  l_commitdate 
         AND   l_receiptdate >= '1994-01-01' 
         AND   l_receiptdate < ('1994-06-06') 
         GROUP BY l_shipmode;

    The new set of constraints that can be derived is as follows:

            (l_shipdate < l_receiptdate AND 
            l_commitdate <= '1994-06-04' AND
            l_shipdate   <= '1994-06-03')

    If lineitem or one of its covering indexes is either value‑ordered or row‑partitioned on l_shipdate, the new constraint l_shipdate<='1994-06-03' enables Teradata Database to access only a portion of the table instead of doing a full‑table scan.

    You might notice performance improvements for some queries because of the extra predicates TC adds. The extra predicates can also be seen in the EXPLAIN reports for requests that use them.

    One important aspect of using transitive closure for query rewrite is its application across ON and WHERE clauses. For example, suppose you have the following request:

         SELECT product_name, sum(amount*quantity) AS qty
         FROM product LEFT OUTER JOIN sales1 
         ON product_key=sales_product_key
         WHERE product_key=10 
         GROUP BY product_key, product_name ;

    For this request, transitive closure adds the inferred predicate sales_product_key=10 to the ON clause. This is particularly effective when the predicate added to the ON clause is a constraint on the primary index of the inner table in a join.

    Another important property of transitive closure is its ability to infer new predicates across the ON clauses of consecutive inner joins. For example, consider the following request.

         SELECT product_key, product_name, SUM(s1.amount *             s1.quantity+s2.amount * s2.quantity) AS total
         FROM product LEFT OUTER JOIN ((sales1 AS s1 INNER JOIN store 
                      ON  s1.sales_store_key=store_key) 
                      INNER JOIN sales2 AS s2 
                      ON  s2.sales_store_key=store_key 
                      AND s2.sales_store_key=10) 
                      ON  product_key=s1.sales_product_key 
                      AND product_key=s2.sales_product_key 
         GROUP BY product_key, product_name;

    To see this application of transitive closure, consider the consecutive inner joins between s1, s2, and store. The predicates in consecutive inner joins can be treated collectively by transitive closure as if they were specified in a single WHERE clause. In this example transitive closure processes these predicates as if they appeared as the following compound predicate.

         WHERE s1.sales_store_key = store_key 
         AND   s2.sales_store_key = store_key 
         AND   s2.sales_store_key = 10 

    By grouping the predicates logically like this, transitive closure can derive the new predicates store_key=10 and s1.sales_store_key=10 and then place them in the ON clause of the uppermost inner join in the set of consecutive inner joins. In this example, that is the ON clause joining to s2.

    If a condition is false mathematically, it is said to be contradictory or unsatisfiable. In this context, the opposite of contradictory is satisfiable, regardless of the data. An example might be specifying the conditions a=1 and a=2 in the same request. If Query Rewrite discovers such an unsatisfiable condition, it simplifies and optimizes the condition in such a way that all joins and retrievals can be done on a single AMP basis.

    One way to fix a contradictory condition is to add CHECK constraints to the request, enabling Query Rewrite to eliminate unnecessary conditions, thus permitting the Optimizer to later construct better execution plans (see “Applications of Transitive Closure” on page 95).

    For example, assume that you want to list all orders made in the first three months of the fiscal year. Assuming that you have access only to the ordertbl view, and the query looks like this.

        SELECT * 
        FROM order_tbl 
        WHERE EXTRACT(MONTH FROM o_orderdate)<= 3;

    Without being able to add such constraints to a query during query rewrite, the system must access all of the tables orders1, orders2, … orders12 using the constraint EXTRACT(MONTH FROM o_orderdate)<=3, even though it only needs to access orders1, orders2, and orders3 to satisfy the request. The only way Query Rewrite knows to filter out the other nine tables is to add CHECK constraints for every table to the query, and then to determine the contradiction between the CHECK constraint and the query constraint.

    For example, if the CHECK constraint on order4 is added to the corresponding step, Query Rewrite sees the following compound predicate, which is a contradiction.

         EXTRACT(MONTH FROM o_orderdate)<= 3
         AND 
         EXTRACT(MONTH FROM o_orderdate)=4

    For this particular case, Query Rewrite can simply eliminate this step. In general, Query Rewrite needs to know if a set of constraints is satisfiable.

    Of course, you would rarely submit a contradictory query that does not return results regardless of the data like a=1 AND a=2. The example in the previous paragraph indicates the need for such checks. As previously stated, this issue is referred to as the satisfiability problem. Any solution to satisfiability generates a set of constraints and either declares them to be FALSE to denote that they are contradictory, or TRUE, which means that for some specific data, the set of constraints is satisfiable.

    You should be aware of the cost of enforcing CHECK constraints if you decide to use them for horizontal table partitioning. This type of table partitioning is not identical to the row partitioning that you can specify with a partitioning expression in a PARTITION BY clause for a table or join index, though it is similar conceptually.

    In the sense that the term is used here, horizontal table partitioning refers to the sort of partitioning that Teradata Database has always done when it assigns the individual rows of a table to different AMPs by hashing on their primary index.

    Note, however, that the use of CHECK constraints in query optimization is done automatically by the Query Rewrite Subsystem and does not require user intervention.

    Another way to partition tables horizontally is to define them as row‑partitioned tables. This should be your first choice for horizontal partitioning, and you should consider using CHECK constraints only for those cases where a row‑partitioned table is not possible.

    Query optimization introduces the following additional applications of satisfiability in the usage and maintenance of join indexes:

  • Determining whether a join index must be updated to keep it synchronized with an update operation on one or more of its base tables. The term update operation here signifies an insert, update, or delete operation against the base table in question.
  • Determining whether a join index partly or fully covers a query.
  • The join index update problem can be solved by using the satisfiability check for the conjunction of the join index conditions and the condition applied in the base table maintenance.

    Assume the following sparse join index definitions.

         CREATE JOIN INDEX j1 AS 
           SELECT * 
           FROM lineitem 
           WHERE EXTRACT (MONTH FROM l_shipdate)<=6; 
     
         CREATE JOIN INDEX j2 AS 
           SELECT * 
           FROM lineitem 
           WHERE EXTRACT(MONTH FROM l_shipdate)>=7; 

    Now consider the following delete operation on the lineitem table for example.

         DELETE lineitem 
         WHERE EXTRACT(MONTH FROM l_shipdate)=12; 

    This implies that there is a need to update j2, but not j1. The system can make this decision because the Satisfiability check returns TRUE for the following predicate.

         EXTRACT(MONTH FROM l_shipdate)=12 
         AND 
         EXTRACT(MONTH FROM l_shipdate)>=7

    but FALSE for this predicate.

         EXTRACT(MONTH FROM l_shipdate)=12 
         AND 
         EXTRACT(MONTH FROM l_shipdate)<=6 

    The problem of determining whether a join index completely or partially covers a query is solved as a set of satisfiability problems. Note that the use of satisfiability in this problem becomes more important when more complex conditions, like constants in WHERE clause predicates, are specified in the join index definition. This happens, for example, when you create a sparse join index. See the chapter on hash and join indexes in Database Design and the documentation for CREATE JOIN INDEX in SQL Data Definition Language for more information about sparse join indexes.

    Constant predicate evaluation evaluates predicates that contain only constants (those having no column references) at the time a request is parsed. For example, consider the following complex predicate:

            t1.pi=t2.pi
         OR ‘a’ IN (‘b’,‘c’)

    The system identifies and evaluates constant predicates and then replaces the predicate with the result if it is either TRUE or FALSE.

    The IN predicate in this example can be evaluated to FALSE when the Teradata Database parses the request, so Query Rewrite transforms it as shown below:

         t1.pi=t2.pi

    Because constant predicate evaluation might otherwise interact with other rewrites, the Teradata Database executes it in the preprocessing phase of Query Rewrite rather than in the Query Rewrite system itself.

    Domain-based simplification uses the domain range of the underlying column to simplify a predicate. For example, consider the following table definition:

         create table t1 (a1 smallint)

    With this table definition, the predicate a1 = 64000 leads to unsatisfiability, as the constant 64000 is not within the smallint range of values:

         a1 = 64000 => FALSE

    Similarly, consider the following predicate:

         a1 in (32800, 80000, 1, 2, 3)

    This predicate can be simplified as shown below, as the constants 32800 and 80000 are not within the smallint range of values:

         a1 in (1, 2, 3)

    Check constraint-based simplification is similar to domain-based simplification, but the range of the underlying column is based on check constraints specified on the column. For example, consider the following table definition:

         create table t1 (a1 int NOT NULL check (a1 < 10))

    With this table definition, the following predicate leads to unsatisfiability, as the predicate is outside the allowed range of values based on the check constraint:

         a1 > 30 => FALSE 

    Query rewrite evaluates single table predicates and rewrites them to eliminate overlapping or redundant conditions. For example, consider the following set of predicates:

         a1 NOT IN (1,3) AND
         a1 <4 AND 
         a1 >= 2

    If a1 is an INTEGER column these predicates can be simplified as follows:

         a1=2

    Similarly, consider the following predicate:

         a1 IN (1, 3, 5, 7, 9) AND
         a1 > 4

    where a1 is an INTEGER column.

    The predicate can be simplified as follows:

         a1 IN (5,7,9) 

    Consolidation of single table predicates enables the Optimizer to calculate more accurate selectivity estimates while at the same time eliminating the execution of redundant predicates.

    Query Rewrite also supports simplification of single table predicates for the BEGIN and END bound functions on Period columns.

    For example, consider the following predicate based on the BEGIN bound function:

             BEGIN(b)>DATE ‘2005-02-03’ AND
             BEGIN(b)>DATE ‘2010-02-03’

    Query Rewrite simplifies this to the following predicate:

         BEGIN(b)>=DATE ‘2010-02-04’

    Similarly, consider the following predicate based on the END bound function:

             END(b)< DATE‘2005-02-03’ AND
             END(b)< DATE‘2010-02-03’

    Query Rewrite simplifies this to the following predicate:

         END(b)<= DATE‘2005-02-02’

    Consolidation of single table predicate can lead to unsatisfiability as follows:

             a1 NOT IN (1, 2, 3) AND
             a1>=1 AND
             a1<=3

    If a1 is an INTEGER column, the above conditions are non-overlapping and the predicate is rewritten as follows:

         0=1

    0=1 evaluates to FALSE.

    In other cases, the predicate can be simplified to a predicate that does not constrain the set of possible values, as shown:

            a1>1 OR a1<2 

    The predicate can be rewritten as follows:

         a1 IS NOT NULL

    For a more complex example, consider the following predicate:

         (a1 >= 1 AND a1 <= 3) OR
         (a1 >= 4 AND a1 <= 10) 

    If a1 is an INTEGER column, the predicate can be rewritten as follows:

         a1 >= 1 AND a1 <= 10

    The constant movearound rewrite can enable better selectivity estimates and wider use of indexes by the Optimizer. Constant movearound attempts to move constants from one side of a boolean comparison operator to the other side in order to rewrite predicates of the following form:

         <column> <±> <constant_1> <comparison_operator><constant_2> 

    Constant movearound rewrites predicates of the form above to predicates of the following form:

         <column> <comparison_operator> <constant_3>

    The method does this by removing the plus or minus operation from the column side of the predicate and then adding its negation to the constant‑only side of the predicate. The constant expression is then folded. Any errors that occur during folding, such as overflow or underflow, cause the rewrite to be rejected in favor of using the original expression.

    This transformation is only done for the plus and minus arithmetic operators.

    For a simple example, consider a1 + 1 > 4, which can be rewritten as a1 > 3.

    The constant expression that is moved can also be an INTERVAL expression, as in DATE operations.

    For example, consider the following predicate:

         date_col + INTERVAL '3' MONTH <= '2007-03-31'

    Using constant movearound, this predicate can be rewritten as follows:

         date_col <= '2006-12-31'

    The constant substitution rewrite substitutes the values of columns, if they are available, and attempts to derive unsatisfiability or to simplify the predicate, if possible. Predicates of the following form are simplified by substituting the values for column_1 and column_2 whenever possible:

         <column_1> <±> <constant_1> <comparison_operator>
         <column_2> <±> <constant_2> 

     

    As an example, consider the following predicate:

         a = 10 AND 
         b = 20 AND 
         a + 2 = b + 1

    Substituting the value 10 for a and 20 for b, the predicate can be simplified to 12 = 21, which is FALSE.

    For additional examples, consider a > a + 1, which simplifies to FALSE and a >= a - 1, which is rewritten as a IS NOT NULL.

    This simplification by distribution rewrite attempts to simplify complex AND/OR predicates by using the distributive properties of set algebra.

    Consider the following predicate:

         a > 5 AND b < 6 AND (a < 2 OR b > 9)

    By distributing the AND into the OR branch, the predicate can be simplified as follows:

         a > 5 AND b < 6 AND ((a > 5 AND a < 2) OR (b < 6 AND b > 9)) =>
         a > 5 AND b < 6 AND ( FALSE OR FALSE) =>
         a > 5 AND b < 6 AND FALSE =>
         FALSE

    The factoring of common predicates is another rewrite based on the distributive properties of set algebra. If a complex AND/OR predicate has some common predicates across the AND/OR branches, these common predicates can be factored out to simplify the predicate.

    Consider the following predicate:

         (a1 = 1 AND b1 = 1 AND c1 = 1) OR
         (a1 = 1 AND b1 = 1 AND d1 = 1) OR
         (a1 = 1 AND e1 = 1)

    Note that (a1 = b1 AND b1 = 1) is common to the first two branches of the overall OR predicate. In the first step of factoring, (a1 = b1 AND b1 = 1) can be factored out, resulting in the following equivalent predicate:

         (a1 =1 AND b1 = 1 AND (c1 = 1 OR d1 = 1)) OR
         (a1 = 1 AND e1 = 1)

    After the first step, note that (a1 = 1) is common to the remaining two branches of the OR predicate. In the second step of factoring, (a1 = 1) can be factored out, resulting in the following equivalent predicate:

         a1 = 1 AND ((b1 = 1 AND (c1 = 1 OR d1 = 1)) OR e1 = 1)

    Simplification based on containment attempts to simplify complex AND/OR predicates by using the set containment properties of set algebra. There are two kinds of containment rewrite rules: the OR containment rule and the AND containment rule.

    The OR containment rule is as follows:

         If A contains Aʹ,
            A OR (Aʹ AND ..) => A

    For example, a > 5 OR (a > 7 AND b < 6) can be simplified to a > 5.

    The AND containment rule is as follows:

         If Aʹ contains A,
            A AND (Aʹ OR ..) => A

    For example, a < 10 AND (a < 12 OR b < 6) can be simplified to a < 10.

    Duplicate Predicate Removal eliminates identical conjuncts, disjuncts, or both from AND and OR predicates. The maximum number of duplicate conjuncts and disjuncts compared per predicate is 100