16.10 - Query Rewrite - Teradata Database

Teradata Database SQL Request and Transaction Processing

prodname
Teradata Database
vrm_release
16.10
created_date
June 2017
category
Programming Reference
User Guide
featnum
B035-1142-161K

About Query Rewrite

The Query Rewrite Subsystem is invoked by the Parser just prior to query optimization, and is the first phase of optimization performed in the query optimization process. For the Teradata system, query rewrites are mostly performed in the Query Rewrite Subsystem of the Parser; however, some query rewrites are performed in the later phases.

Function of the Query Rewrite Subsystem

The fundamental motivation behind the Query Rewrite Subsystem is to flatten the query in order to make optimization of query components such as views, correlated subqueries, and aggregates more tractable for optimization.

No query optimizer can consistently rewrite a poorly written query to make it as efficient as a well-written, semantically equivalent query, but a good query optimizer can certainly take significant steps to close the performance gap between the two.

Stated semi-formally, Query Rewrite is the process of rewriting query Q as query Q’ such that the following criteria are both true:

  • Query Q and Query Q’ produce the identical answer set.
  • Query Q’ runs faster (which means that it is less costly) than Query Q.

The various Query Rewrite techniques can either be rule-based (such as predicate move around) or cost-based (such as join index substitution)

With many SQL queries now being created by query generator tools that do not write optimally efficient code for a given target SQL processor, Query Rewrite has become more crucial than ever. Even trivial SQL requests can be written in an enormously large number of different ways. Consider the following verbally expressed query: “Get the names of suppliers who supply part P2.” Using the features available in the ANSI SQL-92 version of the language, it is possible to express this query in at least 52 different ways, assuming the system must access 2 tables in the process of returning the answer set. And this example only indicates the number of possible ways of writing the verbally expressed query as an SQL request. It does not even touch on the number of internal rewrites of those 52 individual SQL queries that various query optimization software packages might possibly undertake.

This topic provides a survey of the stages of query rewrite undertaken by the Teradata Database Query Rewrite subsystem. The information is provided only to help you to understand what sorts of things Query Rewrite does and the relative order in which it does them.

Teradata Database Query Rewrite Processes

The following processes list the logical sequence of the processes undertaken by Query Rewrite and the Optimizer as they optimize a DML request. DDL and DCL requests are also optimized, but the primary goal of optimization is to ensure that DML requests run with as low a cost as is possible.

Query Rewrite engages in the following process stages:

  1. Query Rewrite receives the ResTree as its input from the Resolver (see Resolver) or from Parameterized Value Peek (see Parameterized Requests).
  2. Teradata Database performs constant predicate evaluation (see Constant Predicate Evaluation) during the Query Rewrite preprocessing phase. This is done by searching for predicates that do not contain column references and evaluating the condition at the time the code is parsed. The system replaces the predicate with the result when the result is TRUE or FALSE, but not when it is UNKNOWN.
  3. Wherever possible, Query Rewrite pushes projections into spooled views (see Predicate Pushdown and Pullup).

    This means that only those columns that are referenced by the main query are retained and pushed into spooled views, while other projections are dropped from the ResTree.

    Pushing projections into views can also enable views to be folded that would otherwise continue to be spools. For example, views containing CASE expressions are not always merged; however, if the CASE expressions are removed from the view, view folding sometimes becomes possible.

    Pushing projections can also enable view elimination that would not otherwise occur.

    Some views are guaranteed to return only a single row (for example, when their select list specifies a single aggregate expression). If a query does not reference any columns of a single-row view, query rewrite can reduce the view definition to SELECT 0; meaning that no table is referenced in the view.

  4. Whenever applicable, large IN and NOT-IN predicates (called IN-lists), are rewritten to be separate spools, which participate in the join planning with the other relations in the query. These spools hold the values from the large IN-lists. This process prevents the large IN-lists from participating in predicate-related query rewrite and optimization processes, where the very large lists can slow parsing.
    There are two types of IN-list spool rewrites:
    • Subquery-based rewrite is used for large IN-lists that are not part of CASE expressions
    • Outer-join-based rewrite is used for IN-lists that are part of CASE expressions.
  5. Wherever possible, Query Rewrite converts outer joins to inner joins (see Outer Join-to-Inner Join Conversion).

    Outer joins are converted to inner joins if the conversion does not produce non-matching rows independently of the data.

    The rules for this are that Query Rewrite converts an outer join to an inner join if the following statements are true:

    • The WHERE clause contains one or more null filtering conditions on the inner table.

      A null filtering condition, or NFC, is any predicate that evaluates to FALSE for nulls.

    • The outer join is involved with another join, and the other join has at least one null filtering condition on the inner relation of the outer join.

      In this case, the other join filters out the non-matching rows of the outer join, making it possible to be converted to an inner join.

      The following additional rule applies only for full outer joins:

      If both of its relations have null filtering conditions, convert the outer join to an inner join.

      Otherwise, do the following conversions:

      IF its … Has an NFC, then convert the full outer join to a …
      left relation left outer join
      right relation right outer join.
  6. Whenever possible, Query Rewrite folds views (see View Folding). You should always read views to mean views and derived tables. Views and derived tables are semantically identical, and the Query Rewrite subsystem treats them so.
  7. When doing folding, there are 2 categories of views: Type 1 view folding, which is any view that is not a Type 2 view, and Type 2 view folding.

    The following are the Type 1 spooling conditions (if any of the conditions is true):

    • Views with aggregates (views with GROUP BY, HAVING, WITH BY or aggregates in the select list) are spooled if any of the following conditions are satisfied:
      • The view is not the only object in the query. Views can be folded in this case if the aggregation can be pulled up after joins.

        This optimization is referred to as lazy aggregation, and it is not implemented in Teradata Database.

      • The query is an ABORT or ROLLBACK request. The ABORT and ROLLBACK statements have limited syntax and do not allow clauses like GROUP BY and HAVING.

        Folding an aggregate view in this case could trigger an error for the abort request, and that is why aggregate views are spooled.

      • The main query has aggregations.

        Folding views in this case may cause nested aggregations, which Teradata does not support. It also may cause 2 levels of GROUP BY (both view and main query have a GROUP BY) which is also not supported.

      • The main query has window statistical functions.

        Folding views in this case might cause nested aggregations, which Teradata does not support. It also might cause 2 levels of GROUP BY (both view and main query have a group by) which also is not supported.

      • The view has extended grouping.

        As a side effect of folding a view is that conditions can be pushed into the tables referenced in the view. This could be a problem for views with extended grouping.

      • Both the main query and the view have outer joins and the view has a WHERE clause. The restriction is based on the difficulty of merging the WHERE clause of the view into the main query.

        This case can be folded by moving the WHERE clause condition to the right place.

        If the view is an outer table, the WHERE clause can be combined with the WHERE clause of the main query.

        If the view is an inner table the WHERE clause can be combined with the ON clause that connects the view to the main query.

    • The view is the only object in the main query, but it references a constant in the definition and has no GROUP BY or HAVING clause.

      This case can be folded as well. A class of views called single-row views are identified as those aggregate views with no GROUP BY and HAVING clauses. Such views are guaranteed to produce a single row regardless of the data.

      If the main query only references constants from the view then Teradata Database can do either of the following things:

      • Convert the view to a simple SELECT of the constants referenced in the query with no other clauses. If the main query references no columns of the view, the view can be simply rewritten as SELECT 1;
      • Remove the view altogether from the query and replace the view reference in the query with the constants.
    • Views with a DISTINCT operator. The presence of the DISTINCT operator in the view cannot be pulled up to the main query in general, which is why such views are spooled.
    • The view is defined on a single table with a WHERE clause and is involved in a full outer join. The problem here is with the difficulty of where to place the view in the WHERE clause.
    • The view is an inner table of an outer join in the main query and has a constant, CASE expression, or ZEROIFNULL expression as well as one of the following conditions:
      • The “view” is a derived table.
      • The view is one of the relations specified in a full outer join.
      • The constant has casting.
      • The constant, CASE, or ZEROIFNULL expression is referenced in an aggregate of the main query.

        The system applies Type 2 view folding to cases that are not described by the preceding list.

    • Views with set operations (UNION, MINUS or INTERSECT).
    • Views with windowed statistical functions, a QUALIFY clause, or a SAMPLE clause.
    • The main query is an UPDATE or DELETE request and the view is defined with outer joins. This is an issue because UPDATE and DELETE statements do not allow outer joins.
      • Type 2 view folding is designed to handle problems that can otherwise occur under the following conditions:
      • Views and derived tables have constants or null manipulating CASE expressions in their definitions.
      • And, such views and derived tables are used as inner tables in outer join operations.

      The constants and null manipulating case expressions COALESCE and ZEROIFNULL are referred to as special expressions across the rest of this section to denote Type 2 Views.

      The following example provides an idea of how Type 2 view folding works:

      Suppose you submit the following query:

           SELECT ...
           FROM T1 --> T2 --> v ON … ON …;

      where -->  indicates the join order and v(c,d,e) is defined as follows:

           CREATE VIEW v(c,d,e) AS
             SELECT T3.C, T3.D, 10
             FROM T3 --> T4 ON <..>;

      The expected result of the query is the following:

       t1	  t2	  t3	 t4
      ---     ---     ---	---
        1 	  1 	  1 	 1
        2 	  2 	  2 	 ?
        3 	  3 	  ? 	 ?
        4 	  ? 	  ? 	 ?

    After the view folding the query looks like the following SQL text:

         SELECT t1.a, t2.b, t3.c, t3.d, 10 AS v.e
         FROM t1 --> t2 --> t3 --> t4 ON <..> ON <..>;

    The expected result of the query is the following:

    t1.a	 t2.b	v.c	 v.d	  v.e
    ----     ----    ---	 ---	  ---
       1 	   1      1 	  1	   10
       2 	   2      2  	 ?	   10
       3 	   3      ? 	  ?	    ?
       4 	   ?      ? 	  ?	    ?

    Suppose the system joins t2 -->  t3 first. Call the result res1. Because the join is done between a base relation t2 and a view relation t3, the system adds a derived column. As you can see, the derived column carries nulls for the non-matching rows and non-null values for the matching rows.

    The first join undergoes the derived column setup phase.

    res1
    
    t2.b	 t3.cd	 dc (derived column)
    ----     -----	 -------------------
       1 	    1 	 	  	       1
       2 	    2 	   	 	      1
       3 	    ? 	   	   	    ?

    Suppose the system joins t1 -->  res1 which, when expanded, is t1 -->  (t2 -->  t3)). In this case, the derived column comes from the inner table res1. Call this new result res2. In this join, the derived column undergoes the propagation phase.

    res2
    
    t1.a	 t2.b	 t3.cd	 c (derived column)
    ----     ----     -----	 ------------------
       1 	   1 	    1 	 	     	   1
       2 	   2 	    2 	   	    	  1
       3 	   3 	    ? 	   	    	  ?
       4 	   ? 	    ? 	   	    	  ?

    Suppose the system now makes the final join between res2 and t4 which, when expanded, is (t1 -->  (t2 --> t3)) --> t4). In this join, the materialization and execution of the special expression occurs. As can be seen from the result, the expression is reported as 10 for all the rows where the derived column is 1, and reports null for rows where the derived column is null.

    res3
    
    t1.a	 t2.b	 t3.c(v.c) 	   t4.d(v.d)   	v.e.(special expression '10')d	 
    ----     ----    ----------        ---------       ------------------------------
       1 	   1 	        1    	    	1				                   10
       2 	   2        	 2 	       	?			                       10
       3 	   3        	 ?            	?				                    ?
       4 	   ? 	        ?                ?				                    ?

    Query Rewrite can apply view folding at any time during the rewrite process, which allows the subsystem to benefit from other rewrites. For example, conversion of a full outer join to an inner join allows a view to be folded, but it would be spooled if it had not been converted from a full outer join.

  8. Eliminate conditions through satisfiability and transitive closure (see Predicate Simplification).
  9. Push predicate conditions into spooled views (see Predicate Marshaling).

    Pushing predicate conditions into views helps performance by reducing the size of spooled view files.

  10. Eliminate redundant joins (see Eliminating Redundant Joins).
  11. Eliminate set operation branches (see Eliminating Set Operation Branches).
  12. Push joins into UNION ALL branches (see Pushing Joins Into UNION ALL Branches).
  13. Repeat stages 3 through 9 one additional time.
  14. Query Rewrite produces the ResTree´ as its output, which is semantically equivalent to the ResTree input, and passes it to the Optimizer.

    The system replaces the existing predicate with the predicate <column1>*=<column2>, where the syntax element *= specifies an equality marked with the NULLEQ flag.

Query Rewrite performs NULLEQ rewrites during its postprocessing phase. This rewrite looks for predicates of the form <column1>=<column2> OR (<column1> IS NULL AND <column2> IS NULL).

The maximum number of ORed conditions or IN list values that you can specify for such a predicate is 1,048,547.

Query Rewrite Driver

The Query Rewrite Driver is the component of the Query Rewrite Subsystem that applies rewrite rules to each block of a request. The Query Rewrite Subsystem passes the rule set to the query rewrite driver with each request to be rewritten. The query rewrite driver maintains information about which rules are enabled for each block of the request. Initially, all rules are enabled for each block.

The function of the query rewrite driver is to apply the set of rewrite rules to each block of a request and, by exploiting interactions among the various rewrites, to re-enable and re-apply rewrites when possible.

The query rewrite driver takes the next sequential rule to be applied from the rule set and applies it, either successfully or unsuccessfully, to each block in the order that is appropriate for the rule. When a rule is applied to a block, the query rewrite driver takes the following actions:

  • The rule is disabled for that block regardless of whether it was successfully applied or not.
  • Any triggers for the rule are applied.

A trigger simply re-enables a rewrite rule for some block based on the interactions presented above.fpu A trigger might re-enable a rule for the block to which the rule was applied, or for a containing (parent) or contained (for example, a subquery or view) block.

The query rewrite driver makes 2 passes through the rule set, applying each rule to each block (if the rule is enabled for that block) on both passes. Two passes are made because this provides each rule with the opportunity to trigger every other rule.

To illustrate the functioning of the query rewrite driver, consider the following simple example:

     CREATE VIEW jan_feb_sales AS
     SELECT sales_key, sales_date, sales_store_key,
            sales_product_key, quantity, amount
     FROM sales1
     UNION ALL
     SELECT sales_key, sales_date, sales_store_key,
            sales_product_key, quantity, amount
     FROM sales2;
     SELECT SUM(amount*quantity)
     FROM jan_feb_sales
     WHERE EXTRACT(MONTH FROM sales_date)=1;

The Query Rewrite Driver processes this request as indicated by the following process stages. After each rule is applied to a block, it is disabled for that block. Also note that the stages of this process as outlined here indicate only some of the more interesting interactions that can be enabled by a rewrite rule.

  1. All rules are enabled for all blocks in the request.
  2. Projection pushdown is applied to all blocks.

    The query rewrite driver removes sales_key, sales_store_key, and sales_product_key from the select lists of each branch of the UNION ALL because they are not referenced by the containing block.

  3. Outer-to-inner join conversion is attempted, without success, on each block.
  4. View folding is attempted, without success, on each block.

    The only block that is a candidate for folding is the jan_feb_sales view. This block cannot be folded because its containing block still has a WHERE clause with the predicate (EXTRACT(MONTH FROM sales_date)=1), and the select list of the containing block does not match the select lists of each branch of the UNION ALL.

  5. Predicate pushdown is applied and the predicate  EXTRACT(MONTH FROM sales_date) = 1 is pushed into each branch of the UNION ALL.
  6. Predicate simplification is done.

    This leads to an unsatisfiable condition in the UNION ALL branch containing sales2 because sales2 has the CHECK constraint EXTRACT(MONTH FROM sales_date)=2, and this contradicts the condition pushed into the block in Stage 5.

    This also includes an attempt at predicate simplification, without success, on each block.

  7. Join elimination is attempted, without success, on each block.
  8. Set operation branch elimination can be applied to the UNION ALL operation because one branch contains an unsatisfiable condition.

    The UNION ALL is replaced by the remaining branch. Applying the triggers for this rule re-enables view folding in the parent SELECT block.

  9. Pushing joins into UNION ALLs is attempted, without success, on each block.

At this point the query rewrite driver has completed the first pass. The query rewrite driver then makes a second pass over the rule set. Some rule interactions re-enable certain rules for some blocks. For this example, the interesting interaction occurs in Stage 8. The UNION ALL was replaced by the remaining SELECT request, and view folding, which had been disabled for each block in Stage 4, was re-enabled for the jan_feb_sales view.

On this second pass, the view is successfully folded and the query is rewritten as shown in the following lines:

     SELECT SUM(amount*quantity)
     FROM sales1
     WHERE EXTRACT(MONTH FROM sales_date)=1;

In this simple example, there was only one interesting rule interaction that enabled a rule that had been unsuccessfully applied earlier, which occurred in Stage 8 during the first pass of the query rewrite driver over the request. For complex queries, however, many interesting rule interactions might occur.

The individual rewrites interact with each other because applying a rewrite to a query block can enable other rewrites to be applied either to the same query block or to a parent block or sub-block of the block. To see this, consider the following example:

     SELECT MAX(total)
     FROM SELECT (product_key, product_name,
                  SUM(quantity*amount) AS total
          FROM sales, product
          WHERE sales_product_key = product_key
          GROUP BY product_key, product_name) AS v
     WHERE product_key IN (10, 20, 30);

Pushing the predicate from the outer query block transforms the derived table (recall that the semantics of derived tables are identical to those of views) v into the following:

     SELECT product_key, product_name, SUM(quantity*amount) AS total
     FROM sales, product
     WHERE sales_product_key=product_key
     AND   product_key IN (10, 20, 30)
     GROUP BY product_key, product_name ;

Transitive closure can now be applied to this block to derive the new predicate sales_product_key IN (10, 20, 30). This newly added predicate can then be pushed into each branch of the sales view. This example shows that predicate pushing can enable transitive closure, which can, in turn, enable predicate pushing.

The example also illustrates an important property of the set of rewrite rules: the application of one rewrite can enable another rewrite that might have previously been applied to a block, either successfully or unsuccessfully.

For this example, consider what would happen if the query rewrite driver applied transitive closure to each block first and then attempted to apply predicate pushing to each block, or vice versa, but then did not retry the first rewrite after applying the second. Neither of these simple sequences leads to the predicates getting pushed into each branch of sales.

Another property of each rewrite that is important to understand is the order in which it should be applied to each block of a query. Each rewrite can be applied either from the top down to each block before the blocks it contains or from the bottom up to each block contained in the block before the block itself.

For example, consider the following view definition and query against it:

     CREATE VIEW sales_by_product AS
     SELECT product_key, product_name, SUM(quantity*amount) AS total
     FROM sales, product
     WHERE sales_product_key = product_key
     GROUP BY product_key, product_name;

     SELECT SUM(total) AS total_sales
     FROM sales_by_product;

For this query, projection pushdown can have a cascading effect by applying it from the top down. The only column of the view sales_by_product that is referenced is total, so the projection pushdown rewrite can remove both product_key and product_name from the select list of this view.

Furthermore, because sales_by_product only references sales_product_key, quantity and amount in the sales view select list, all of the other columns in the select list of sales (sales_key, sales_date, and sales_store_key) can be removed from each branch of the sales view. You can see that projection pushdown can have a cascading effect, because removing columns from the select list of a containing query block can cause columns in the select lists of nested views to become unreferenced, and thereby eligible for removal.

Likewise, predicate pushdown is most effectively applied from the top down because pushing a predicate into a data block might enable pushing the predicate again into a sub-block of its containing block.

On the other hand, view folding is done from the bottom up because that ordering of the process guarantees that once folding is attempted on each view in a block, it is not necessary to try folding again for that block unless the rule is re-enabled by another rewrite.

Obviously, the number of interactions among the individual rewrites is large. It is also true that no single order of applying each rewrite once is sufficient to guarantee that each block is rewritten completely because applying a rule may re-enable another rule that was previously applied, whether successfully or not.

Query Rewrite cannot eliminate SEQUENCED inner joins from a view unless the parent table of the view is normalized and the normalize column has VALIDTIME. However, if the view or derived table defined on temporal tables is qualified by a CURRENT or AS OF modifier, Query Rewrite can enable join elimination to exclude the inner join semantics of a SEQUENCED view or derived table by pushing the CURRENT or AS OF qualifier into the view or derived table. To optimize this, the tables should have a CURRENT or SEQUENCED referential integrity relationship defined between them. Note that join elimination does not occur if a request specifies conditions on the system-projected VALIDTIME column of the view.