15.10 - Peeking at Parameterized Values in the Data Parcel - Teradata Database

Teradata Database SQL Request and Transaction Processing

prodname
Teradata Database
vrm_release
15.10
category
Programming Reference
User Guide
featnum
B035-1142-151K

Teradata Database has the ability to peek at the parameterized values of a request (such as USING request modifier values, which are user-supplied constant data) in CLIv2 Data parcels, and then to resolve those variables in the Resolver. This enables the Optimizer to generate a specific, uncached plan for such requests when the system determines that to be the best way to handle the request rather than always generating a cached generic plan. As a general rule, specific plans are expected to have a runtime performance that is significantly better than an equivalent generic plan.

Optionally, the parameterized request cache infrastructure can compare the estimates of the specific and generic plans and choose which of them to execute based on the estimated cost without executing the generic plan.

Parameterized value peeking also resolves DATE, CURRENT_DATE, TEMPORAL_DATE, CURRENT_TIMESTAMP, and USER values for all cacheable requests, including those for which a specific query plan is generated. Some cached plans that use the TEMPORAL_TIMESTAMP function are also peeked.

Plans that are TEMPORAL_TIMESTAMP‑peeked are reusable as long as the session that is matching the cache entries has a TEMPORAL_TIMESTAMP that is equal to or greater than the value marked in the plan. The semantics of the query are such that the plans can be reused if in addition to the existing criteria, the new criterion is also met.

The first time a request is submitted, the Optimizer generates a specific plan and determines whether to classify this request as a candidate for incremental planning and execution (IPE). See “Incremental Planning and Execution” on page 126 for more information about IPE. The Optimizer examines factors such as query complexity, parsing time, query band settings, and cost profiles settings and if IPE is applicable, decides to generate a dynamic plan; otherwise, it generates a static plan. To ensure that requests that are submitted only once are provided an optimal plan, the Optimizer always generates a specific plan before it generates a generic plan. See “Query Optimization Processes” on page 123 for more information about the sequence of processes the Optimizer follows when optimizing a request.

If a request is encountered a second time, its entry in the request cache indicates to the Parser to generate a generic plan. In this case, the Parser does not consider IPE and parameterized request peeking logic. The process produces a generic plan, and Teradata Database updates the request cache entry to record the new state of the request. The system then instructs the Dispatcher to collect the runtime attributes of the generic plan. Based on a comparison of parsing and run time attributes for the specific and generic plans, the Dispatcher chooses whether to use a specific or generic plan and records the choice in the request cache. Future occurrences of this request, if found in the request cache, honor this choice to use either a specific or a generic plan. If Teradata Database flushes the request cache entry, the Optimizer treats a subsequent submittal of the request as a new occurrence and the process restarts from the beginning.

This topic introduces some of the terminology that is specific to the parameter value peeking subsystem.

 

       Term

                                                            Definition

Specific Plan

An Optimizer plan that applies only to a single instance of that query and is generated by using one or both of the following:

  • peeking at parameterized values and CURRENT_TIMESTAMP and USER built-in functions
  • using intermediate incremental planning and execution spool results
  • Single instance plans are called specific query plans. Teradata Database never caches specific query plans.

    The values obtained from peeking at a parameterized request can then be used to tailor a very specific query plan for that request.

    Specific plans can be either static or dynamic plans.

    Generic Plan

    An Optimizer plan generated without peeking at parameterized values and CURRENT_TIMESTAMP and USER built-in functions, intermediate incremental planning and execution spool results, or both, that apply to most, if not all, instances of that query.

    Generic plans are static plans.

    The presence of parameterized data in a request does not automatically determine that the Optimizer will generate a specific request plan for it. There are many parameterized queries for which the query plan generated by the Optimizer does not depend on parameterized values. Such queries do not gain any added performance benefit from generating specific query plans as the result of parameterized value peeking. Furthermore, the negative performance impact of not caching a plan can be high if a request has a short execution time.

    For example, consider the following query that specifies a USING request modifier whose single variable x is also used to specify a parameterized predicate condition :x in its WHERE clause.

         USING (x INTEGER) 
         SELECT * 
         FROM table_1 
         WHERE column_1 = :x;

    If there is a UPI defined on table_1.column_1, then the access path selected by the Optimizer is independent of the value of x because irrespective of its value, assuming that value specifies a valid primary index, a primary index access path is the best access path.

    Date‑based parameter value peeking, which resolves the current date from DATE and CURRENT_DATE built‑in functions when one of those functions is specified in the request, can be used to generate either a specific or generic plan, depending on several other factors that might be present in the text. The terminology used for these date‑specific query plans is as follows:

  • A generic parameterized request with a resolved DATE or CURRENT_DATE value for which a query plan is generated is called a DateSpecific generic plan.
  • A specific parameterized request with a resolved DATE or CURRENT_DATE value for which a query plan is generated is called a DateSpecific specific plan.
  • If your system is enabled with Teradata Active System Management Category 3, and a request belongs to a workload defined with an enforcement priority level of tactical, then it is considered to be a HiPriority request (see the Teradata Viewpoint online help for details). Tactical queries are defined as requests whose per‑AMP CPU time is less than or equal to one second.

    In the context of the parameter value peek subsystem, the following definitions apply:

     

                       Term

                                                      Definition

    Parameterized PK statement

    A single SQL statement with a set of parameterized variables and/or CURRENT_TIMESTAMP/USER built-in functions that are used in equality predicate conditions on an nonpartitioned table, a PPI row-partitioned table, or USI of a single table with no ORed conditions.

    The term PK, which commonly represents Primary Key, in this case represents a primary or unique secondary index column set.

    Parameterized PK request

    A request that contains only the following:

  • Parameterized PK statements
  • Null statements
  • See SQL Data Manipulation Language for a definition of null statements.

    Parameterized value peeking does not impact the treatment of parameterized PK requests, so the Optimizer does not generate a specific plan for those cases.

    The following table indicates whether various types of parameterized request are parameterized‑dependent or parameterized‑independent.

     

    The Parser considers this type of parameterized request …

    To be …

    Reason

    simple parameterized

    parameterized‑independent.

    The request consists of PK statements and other statements without specifying parameterized variables in any of its predicates.

    For example, consider the following table definition:

        CREATE TABLE emp (
          emp_id INTEGER,
          dept_id INTEGER)
        UNIQUE PRIMARY INDEX (emp_id);

    The following multistatement parameterized request is parameterized‑independent, even though the second statement in the request is a parameterized PK statement, because the :x variable is specified on a UPI, so the access path is independent of the value of x, which means that it is also an exempt statement (see “Exempt Requests” on page 43):

        USING (x INTEGER)
        SELECT *
        FROM emp
        WHERE dept_id = 10
       ;SELECT *
        FROM emp
        WHERE emp_id = :x
        AND   dept_id = 12;

    parameterized PK

    parameterized‑independent.

    By definition.

    iterated

    parameterized‑independent.

    By definition.

    all other parameterized requests

    parameterized‑dependent.

    By definition.

    The word peeking means looking at the parameterized values from a data parcel and evaluating date‑based/CURRENT_TIMESTAMP/USER built‑in function constant values during query parsing, then using those values to investigate all potential optimization opportunities such as:

  • Satisfiability and transitive closure (see “Predicate Simplification” on page 94)
  • Optimum single table access planning
  • Row partition elimination (see “Row Partition Elimination” on page 317)
  • Using covering secondary, hash, and join indexes in place of base tables (see “Query Rewrite” on page 72)
  • Peeking facilitates the optimization of certain categories of queries by inserting data values that are specific to those queries into the parse tree at an earlier stage of request processing than would otherwise be done. The system always caches generic plans because reusing a cached query plan saves parsing and optimization time. However, reusing a cached generic query plan is not the best approach to executing a query that provides constant literals that can be used to produce a specific plan tailored closely to that specific request.

    The system does not cache specific plans for requests that are parameterized‑dependent (see “Data Parcel Peeking Terminology” on page 29 for the definition of parameterized‑dependent and parameterized‑independent requests), because those plans cannot be reused for otherwise identical queries that have different sets of parameterized or built‑in function‑supplied constant values, but it does cache the other information generated for specific plans. This information includes its SQL text hash, its host character set, its estimated execution costs, its parsing time, and its run time. Parameterized value peeking directly impacts only those requests that are not cacheable (see “Request Cache” on page 23).

    When the system determines that a plan for a request should not be cached, the parameterized request and DATE, CURRENT_DATE, TEMPORAL_DATE, CURRENT_TIMESTAMP, USER, and in some cases TEMPORAL_TIMESTAMP built‑in function values can be used to optimize the request instead of treating that data as parameters with unknown values. In other words, parameterized value peeking uses the literal data values from a parameterized request and date‑related built‑in functions, or both, to ensure that the Optimizer generates an optimal, uncached plan for that particular request rather than generating a generic plan and then caching it for future reuse.

    A specific plan generated by the Optimizer should be an optimal plan, and its runtime performance should be significantly better than the execution time for an equivalent generic plan. However, there can be cases when the runtime costs of a specific plan and its equivalent generic plan do not differ significantly. The impact of this failure to achieve significantly better performance from a specific plan is especially high for those cases where the parsing cost is high. In such cases, the system must examine the performance implications of regenerating the specific plan every time.

    The system monitors all such requests for their parsing and run times and keeps the information in the request cache to use when it must decide between generating a specific or generic plan for a request.

    The first time a parameterized request is submitted to a PE, determined by not finding the request in the request cache, the Parameterized Value Peek system peeks at the parameterized values for the request and instructs the Optimizer to generate a specific plan only if the query is not a parameterized PK request. Specific query plans are not cached because they cannot be reused for different parameterized or built‑in date function values.

    The Optimizer always generates specific plans before it generates generic plans to ensure that queries submitted to the system only once are provided an optimal plan.

    If the parsing cost of a specific plan is a fraction of its execution time, then all subsequent requests execute specific plans (see “Parameterized Value Peeking Process” on page 34 for details). This is because if the overhead of parsing is small, the impact of not caching the request is insignificant.

    Otherwise, when the system encounters a request for the second time (determined by interrogating the request cache), the Optimizer generates a generic plan for it. If the estimated cost of a previously executed specific plan is significantly less than the estimated cost of the equivalent generic plan, then Teradata Database parses the request again and instructs the Optimizer to generate and execute a specific plan for it. (see “Parameterized Value Peeking Process” on page 34 for details).

    This avoids executing a bad generic plan for the cases where it is possible to generate a good specific plan. The system compares the plan cost estimates only if the CompareEstimates constant in the cost profile is enabled (CompareEstimates is disabled by default). Otherwise, the system executes the generic plan, measures its parsing and run times, and caches them in the request cache.

    The decision to cache a generic plan is based on the respective CPU run times of the specific and generic plans. The performance of some parameterized requests that are submitted only twice during the unspoiled lifetime of the request cache might be negatively affected by this because the request must be parsed on both its first and second submissions.

    For tactical queries, the system caches the generic plan and always uses it (see “Parameterized Value Peeking Process” on page 34 for details).

    For HiPriority queries, the system compares elapsed times instead of CPU times to determine whether the generic plan should be cached or not (see “Parameterized Value Peeking Process” on page 34 for details).

    See “Parameterized Value Peeking Process” on page 34 for details.

    You can control the availability of parameterized value peeking on your system by setting the DisablePeekUsing performance field in the DBS Control record (see Utilities for details).

     

    IF the value of DisablePeekUsing is set to …

    THEN parameterized value, CURRENT_TIMESTAMP and USER peeking are …

    FALSE

    enabled for your system.

    This is the default.

    TRUE

    disabled for your system.

    The setting for DisablePeekUsing does not control whether the system caches requests that contain constant literal data obtained from date‑based built‑in functions only, without specifying a parameterized request. In this case, the Optimizer always generates and caches generic plans for queries that specify the built-in functions DATE, CURRENT_DATE, TEMPORAL_DATE, and in some cases, TEMPORAL_TIMESTAMP, but when the date value changes, the system flushes any cached plans with dependencies on that date value, and the Optimizer generates a new plan using the new date value.

    The following table defines the terms used in the stages of the Parameterized Value Peek component processes.

     

                                 Key to Terms Used in the Parameterized Value Peek Process Stages

                                                                                 Time Factors

     

    The following algorithmic factors are measured by Teradata Database.

                      Term

                                                   Definition

    Pset

    Elapsed time for parsing a request for a specific plan.

    Elapsed time is defined as the clock time a request takes to complete its execution.

    Rset

    AMP elapsed time for the specific plan.

    Rget

    AMP elapsed time for the generic plan.

    Pscpu

    CPU parsing time for a request using the specific plan.

    CPU parsing time is defined as the CPU time taken to parse a request, generate a query plan for it, and pass that plan to the Dispatcher.

    Rscpu

    Total all‑AMPs CPU time for the specific plan.

    Rgcpu

    Total all‑AMPs CPU time for the generic plan.

    PerAMPRgcpu

    CPU time per single AMP for the generic plan.

    See “Optimizer Cost Functions” on page 312 for information about the cost profile algorithmic factors for parameterized value peeking.

    The following process outlines the stages the Parameterized Value Peek subsystem follows to determine whether a generic or specific plan should be used and cached for a given request. The process documented by this table is intended to be general and does not include stages that are specific to static and dynamic plans.

     

    Stage

    IF …

    THEN the request is processed as …

        1

  • the request is submitted for the first time
  • AND

  • the specific plan projects that the parsing time is less than one percent of the runtime
  • Specific Always

    Specific Always means that parameterized values are resolved and evaluated in the plan for all subsequent requests.

        2

  • the request that was submitted in Stage 1 is submitted a second time
  • AND

  • the CompareEstimates cost profile flag is set
  • AND

  • the generic plan estimate > (EstimateCostFilter cost profile flag value * specific plan estimate)
  • Specific Always

        3

  • the request is HiPriority
  • AND

  • the UseHiPri cost profile flag is set
  • AND

  • the total elapsed time for the specific plan > (ElapsedTimeThreshold cost profile flag value * elapsed runtime of the generic plan)
  • Generic Always

    Generic Always means that parameterized values are not resolved until the concrete steps for the request are generated, and the cached generic plan is used for all subsequent requests.

        4

    PerAMPRgcpu   (TacticalResp1 value * 1,000)

    Generic Always

        5

  • the request is HiPriority
  • AND

  • PerAMPRgcpu   (TacticalResp2 value * 1,000)
  • Generic Always

        6

  • the CPU path time for parsing the specific plan > (HighParsingPTThreshold cost profile flag value * (specific plan parsing time + runtime))
  • AND

  • | generic plan runtime - specific plan runtime | (HighParsingRTThreshold cost profile value * specific plan parsing time)
  • Specific Always

        7

    | generic runtime - specific runtime | (LowParsingRTThreshold cost profile flag value * specific plan parsing time)

    Specific Always

        8

    anything else

    Generic Always

    Teradata Database does not replace parameterized variables for parameterized PK requests because they have a very short run time. The intent of eliminating PK requests from consideration is that it is known a priori that such queries have a short run time, and therefore caching such plans to avoid parsing the request each time it is submitted is beneficial. Additionally, PK requests are USING-independent by definition.

    Other request types, such as few-AMP requests, typically have a short run time and should benefit from caching as a general rule. However, it is not possible to determine whether such requests qualify for caching without first optimizing them, so they are not excluded from the Parameterized Value Peek logic.

    If the cost of a specific plan is better than the runtime cost of a generic plan as determined by their respective parsing and runtime costs, or if the estimate of a specific plan indicates significant cost savings over the estimate for the equivalent generic plan, then the system generates a specific plan for subsequent requests.

    If a request is not a parameterized PK request, then the system replaces USING variables, CURRENT_TIMESTAMP, and USER in all the conditions in the request, including the select list of a subquery or derived table.

    The system uses the following rule to decide which parameterized request variables are replaced in the request:

     

    IF a request is this type …

    THEN replace its parameterized/built-in functions in all places except …

    SELECT

    the outermost select list.

    This applies for SELECT requests that specify SET operators such as GROUP BY, UNION, INTERSECT, and MINUS.

    anything else

    when it occurs in a parameter assignment list.

    The system resolves the DATE, CURRENT_DATE, and TEMPORAL_DATE (in some cases, the system also resolves the value of the built‑in function TEMPORAL_TIMESTAMP) built‑in function value for all requests, whether parameterized or not, and replaces the function with the actual date before the request reaches the Optimizer. This helps to generate a more optimal plan in cases of row partition elimination, sparse join indexes, and NUSIs that are based on DATE or CURRENT_DATE.

    The rule to replace these built‑in functions in the resolved parse tree is the same as is followed for replacing parameterized request variables (see “Replacing Parameterized Variables” on page 35).

    The DATE or CURRENT_DATE value is resolved in the Resolver phase and replaced with the value of the actual date. The SQL text otherwise remains the same. The system also notes the timestamp and uses it in OptApply instead of getting a new timestamp to build the USING row, for example. This synchronizes the timestamps for DATE, CURRENT_DATE, TEMPORAL_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, and TEMPORAL_TIMESTAMP.

    If the system finds a matching entry in the request cache pertaining to a previous date, the entry is marked for purging and the plan is regenerated as if the request is seen for the first time.

    The EXPLAIN text for requests having a resolved DATE, CURRENT_DATE, or TEMPORAL_DATE value shows the resolved date. For example, assuming the following EXPLAIN SELECT request was submitted on February 2, 2007, the EXPLAIN text looks like the following example, with the relevant phrases highlighted in boldface type.

         EXPLAIN SELECT * 
                 FROM es 
                 WHERE process_dt = CURRENT_DATE;
     
    Explanation
    ---------------------------------------------------------------------------
      1) First, we lock a distinct GL."pseudo table" for read on a
         RowHash to prevent global deadlock for GL.es.
      2) Next, we lock GL.es for read.
      3) We do an all-AMPs RETRIEVE step from a single partition of
         GL.es with a condition of ("GL.es.process_dt = DATE
         '2007-02-20'") with a residual condition of ("GL.es.process_dt
         = DATE '2007-02-20'") into Spool 1 (group_amps), which is built
         locally on the AMPs.  The size of Spool 1 is estimated with no
         confidence to be 1 row.  The estimated time for this step is 0.03
         seconds.
      4) Finally, we send out an END TRANSACTION step to all AMPs involved
         in processing the request.
      -> The contents of Spool 1 are sent back to the user as the result of
         statement 1.  The total estimated time is 0.03 seconds.

    The items in the following list also affect parameterized value peeking.

  • If a USING request modifier is being explained, or submitted with an INSERT EXPLAIN or DUMP EXPLAIN request, the system peeks at the parameterized variables if a Data parcel is also provided in the request.
  • If no Data parcel is submitted, then the system does not peek at the parameterized variables, and the system generates a generic plan for the request.

  • If a parameterized request is submitted under a MultiLoad or FastLoad sessions, the system does not peek at the parameterized variables and current date, so there is no impact on request caching.
  • If a parameterized request is submitted under a FastExport session, the system peeks at the parameterized variables and current date for a SELECT request in order to generate a more optimal plan for exporting the data.
  • Note that requests submitted in a FastExport session are not cached, so peeking at parameterized request variable values does not impact caching behavior.

  • If the Parser does not peek at the value of any of the parameterized request variables of a parameterized request as, for example, when a USING variable is in the outermost select list and is not specified in the WHERE condition, then there is no impact on caching.
  • The request cache stores the plans of successfully parsed SQL DML requests so they can be reused when the same request is resubmitted. The request cache contains the text of the SQL request and its plastic steps. The plastic steps are referred to as the request plan or simply as the plan.

    If parameterized value peeking is not enabled, when a parameterized request is seen and successfully parsed, it is cached immediately. Parameterized value peeking changes the behavior of the request cache for requests submitted in CLIv2 Execute or Both Prepare and Execute modes, as follows:

  • If a request specifies a DATE, CURRENT_DATE, or TEMPORAL_DATE built‑in function, then the plan is generated for the specific date when the request is parsed. The specific date, when exposed during the optimization process, causes the Optimizer to generate a more optimal plan where applicable.
  • For example, if the partitioning expression is date‑based, and the condition on the partitioning column in the predicate specifies the CURRENT_DATE built‑in function, then the Optimizer performs row-partition elimination, generating a more optimal plan.

    Assume that table t4 is row partitioned on the date column d.

         EXPLAIN SELECT * 
                 FROM t4 
                 WHERE d BETWEEN DATE '2005-02-01' AND CURRENT_DATE;
     
          *** Help information returned. 12 rows.
          *** Total elapsed time was 1 second.
     

    The following is the EXPLAIN text output when the date is not replaced.

     
    Explanation
    ---------------------------------------------------------------------------
      1) First, we lock MYDB.t4 for read on a reserved RowHash in all 
         partitions to prevent global deadlock.
      2) Next, we lock MYDB.t4 for read.
      3) We do an all-AMPs RETRIEVE step from 23 partitions of MYDB.t4 with
         a condition of ("(MYDB.t4.d <= DATE) AND (MYDB.t4.d >= DATE
         '2005-02-01')") into Spool 1 (group_amps), which is built locally
         on the AMPs.  The size of Spool 1 is estimated with no confidence
         to be 2 rows.  The estimated time for this step is 0.09 seconds.
      4) Finally, we send out an END TRANSACTION step to all AMPs involved
         in processing the request.
      -> The contents of Spool 1 are sent back to the user as the result of
         statement 1.  The total estimated time is 0.09 seconds.
     

    Note that Teradata Database must scan 23 row partitions to process this query without peeking at the value of CURRENT_DATE (the relevant text is highlighted in boldface type).

    If parameterized value peeking is enabled, the following is the EXPLAIN text output for the same request. Notice the EXPLAIN output indicates 9 fewer row partitions to scan due to parameterized value peeking (the relevant text is highlighted in boldface type).

         EXPLAIN SELECT * 
                 FROM t4 
                 WHERE d BETWEEN DATE '2005-02-01' 
                         AND     CURRENT_DATE;
     
          *** Help information returned. 13 rows.
          *** Total elapsed time was 1 second.
    Explanation
    ---------------------------------------------------------------------------
      1) First, we lock MYDB.t4 for read on a RowHash in all partitions 
         to prevent global deadlock.
      2) Next, we lock MYDB.t4 for read.
      3) We do an all-AMPs RETRIEVE step from 14 partitions of MYDB.t4 with
         a condition of ("(MYDB.t4.d <= DATE '2006-03-02') AND (MYDB.t4.d
         >= DATE '2005-02-01')") into Spool 1 (group_amps), which is built
         locally on the AMPs.  The size of Spool 1 is estimated with no
         confidence to be 2 rows.  The estimated time for this step is 0.08
         seconds.
      4) Finally, we send out an END TRANSACTION step to all AMPs involved
         in processing the request.
      -> The contents of Spool 1 are sent back to the user as the result of
         statement 1.  The total estimated time is 0.08 seconds.
     

    Date‑based plans are stored in the request cache along with the date. Teradata Database uses this information to process a resubmitted request as described in the following table.

    The system first recomputes the date.

     

    IF the recomputed date …

    THEN the …

    matches the cached date

    cached plan is reused.

    does not match the cached date

    cached plan is purged and a new plan is generated for the changed date.

    The new plan is then cached.

  • When parameterized value peeking is enabled, requests are cached or not according to the actions described by the following table.
  •  

    IF a parameterized DML request is …

    THEN it is …

    parameterized‑independent

    cached immediately.

    parameterized‑dependent

    parsed with the peeked parameterized values exposed, and the Optimizer then generates a plan that is specific to those parameterized values.

    In this case, the plan is not cached, but its request text is.

    When the same parameterized-dependent request is resubmitted, the Optimizer generates a generic plan for it.

    The generic plan is executed if either of the following statements is true:

  • The estimates are not compared.
  • The estimate for the specific plan does not indicate any benefit when compared to the estimate for the generic plan.
  • Otherwise, Teradata Database parses the request again and generates a specific plan for it.

    This means that a generic plan, if executed, is always cached for subsequent reuse if either of the following statements is true:

  • Its runtime CPU cost is very small.
  • Execution of the specific plan does not provide enough runtime CPU cost benefit to compensate for the parsing CPU cost.
  • In all other cases, the Optimizer generates the specific plan for all requests until the cache is purged. This includes the case when executing the generic plan fails, such as when it aborts or returns a fatal error during execution.

    A performance benefit is expected to be seen when the system decides to generate specific plans for all subsequent submittals of the request. A performance benefit is expected to be seen when the system decides to use the cached plan for all subsequent submittals of the request.

    You can submit a MONITOR SESSION request to determine the number of request cache hits in a session. You can then use that information to determine whether a generic plan has been cached and used.

    A specific plan is generally more optimal than its equivalent generic plan because its parameterized values have been substituted as literals. There are, however, some requests for which the specific plan cost is the same as that for the equivalent generic plan. The logic that replaces the parameterized values to generate a specific plan cannot distinguish between queries for which the estimated specific and generic plan times are identical.

    The request plans for the following two requests are identical, and peeking at parameterized values cannot help to generate more optimal plans for them. But requests such as the previous SELECT example can benefit from parameterized value peeking if a join index like the following is defined on the target base table for the query.

    For example, consider the following table definition.

         CREATE SET TABLE MYDB.t2, NO FALLBACK,NO BEFORE JOURNAL,
           NO AFTER JOURNAL,CHECKSUM = DEFAULT (
           i INTEGER,
           j INTEGER,
           k INTEGER,
           l INTEGER,
           c CHARACTER(400) CHARACTER SET LATIN NOT CASESPECIFIC 
                            DEFAULT 'a')
         PRIMARY INDEX (i)
         INDEX (j);
     

    The statistics collected on the table show the following number of unique values for the NUSI column j and the NUPI column i.

         Date     Time     Unique Values      Column Names
         -------- -------- ------------------ --------------------------
         06/02/17 12:51:22              1,537 j
         06/02/17 12:56:10                 60 k
     

    Now consider the following query against table t2.

         EXPLAIN USING (a INTEGER) SELECT * 
                                   FROM t2 
                                   WHERE j = 58 
                                   AND   k = 100000+i;
     
          *** Help information returned. 14 rows.
          *** Total elapsed time was 1 second.
     
         Explanation
         ---------------------------------------------------------------------------
           1) First, we lock MYDB.t2 for read on a reserved RowHash to prevent
              global deadlock for MYDB.t2.
           2) Next, we lock MYDB.t2 for read.
           3) We do an all-AMPs RETRIEVE step from MYDB.t2 by way of an all-rows
              scan with a condition of ("(MYDB.t2.k = (100000 + MYDB.t2.i )) AND
              (MYDB.t2.j = 58)") into Spool 1 (group_amps), which is built
              locally on the AMPs.  The size of Spool 1 is estimated with high
              confidence to be 10 rows.  The estimated time for this step is
              0.17 seconds.
           4) Finally, we send out an END TRANSACTION step to all AMPs involved
              in processing the request.
           -> The contents of Spool 1 are sent back to the user as the result of
              statement 1.  The total estimated time is 0.17 seconds.
     

    And consider the following query, which is also made against table t2.

         EXPLAIN USING (a INTEGER) SELECT * 
                                   FROM t2 
                                   WHERE j = 58 
                                   AND   k =:a + i;
     
     *** Help information returned. 14 rows.
     *** Total elapsed time was 1 second.
     
         Explanation
         ---------------------------------------------------------------------------
           1) First, we lock MYDB.t2 for read on a reserved RowHash to prevent
              global deadlock.
           2) Next, we lock MYDB.t2 for read.
           3) We do an all-AMPs RETRIEVE step from MYDB.t2 by way of an all-rows
              scan with a condition of ("(MYDB.t2.k = (:a + MYDB.t2.i )) AND
              (MYDB.t2.j = 58)") into Spool 1 (group_amps), which is built
              locally on the AMPs.  The size of Spool 1 is estimated with high
              confidence to be 10 rows.  The estimated time for this step is
              0.17 seconds.
           4) Finally, we send out an END TRANSACTION step to all AMPs involved
              in processing the request.
           -> The contents of Spool 1 are sent back to the user as the result of
              statement 1.  The total estimated time is 0.17 seconds.
     

    Now create table t3 as a copy of table t2 with data and consider the following join index definition:

         CREATE JOIN INDEX j1 AS 
           SELECT * 
           FROM t3 
           WHERE j > 58 
           AND k > i+3;

    Consider the following query and assume the USING value for a1 is 4.

         EXPLAIN USING (a1 INTEGER) 
                 SELECT * 
                 FROM t3 
                 WHERE j = 80 
                 AND   k = i+:a1;
     
     *** Help information returned. 14 rows.
     *** Total elapsed time was 1 second.
     

    You can see that the value 4 has been explicitly inserted in the text for step 3, where it is highlighted in boldface type.

     
    Explanation
    ---------------------------------------------------------------------------
      1) First, we lock MYDB.J1 for read on a reserved RowHash to prevent
         global deadlock.
      2) Next, we lock MYDB.J1 for read.
      3) We do an all-AMPs RETRIEVE step from MYDB.J1 by way of an all-rows
         scan with a condition of ("(MYDB.J1.k = (MYDB.J1.i + 4 )) AND
         (MYDB.J1.j = 80)") into Spool 1 (group_amps), which is built
         locally on the AMPs.  The size of Spool 1 is estimated with no
         confidence to be 126 rows.  The estimated time for this step is
         0.14 seconds.
      4) Finally, we send out an END TRANSACTION step to all AMPs involved
         in processing the request.
      -> The contents of Spool 1 are sent back to the user as the result of
         statement 1.  The total estimated time is 0.14 seconds.

    The following performance benefits are realized from parameterized value peeking:

  • Teradata Database generates specific plans for all requests if either of the following conditions are true:
  • The system compares the plan estimates and the estimated cost of the specific plan is less than the estimated cost of the generic plan.
  • Both the parsing and runtime CPU costs of the specific plan is smaller than the runtime CPU cost of the equivalent generic plan.
  • The values for the DATE, CURRENT_DATE, USER, or TEMPORAL_DATE (in some cases, the TEMPORAL_TIMESTAMP function is also expanded and passed to the Optimizer) built‑in functions are expanded and passed to the Optimizer for optimal plan generation.
  • This is done even if the request does not specify parameterized data.

  • If a specific plan does not provide a performance benefit, the system instead caches the equivalent generic plan and reuses it for subsequent requests.
  • The following potential performance liabilities are also realized from parameterized value peeking.

  • In some cases, estimates of generic and specific plans are compared. If the generic plan is worse than the equivalent specific plan, Teradata Database parses the request again and generates a specific plan for it.
  • A request that is classified as parameterized‑dependent might not always generate a different specific plan. Such requests are also subject to the logic for plan caching. Ideally, they would be cached immediately; however, they are instead cached only after their second submission to the system. This can have a performance impact for those requests that are submitted only twice.
  • If the specific plan and the generic plan are identical, and there are no potential benefits seen for the specific plan, then the generic plan is used for all subsequent requests. As a result, no performance benefit can be realized even if parameterized data values in subsequent requests are such that they would result in more optimal specific plans if they were exposed and inserted into the plan as literal data values.
  • However, once the request cache is purged, the system reevaluates all affected specific and generic plans.

  • In some cases, the decision to cache a generic plan is based on the elapsed execution times of the specific and generic plans. The decision can be influenced wrongly because of other workloads running on the system that might block either of the requests being compared, and as a result skew their elapsed time values.
  • In some cases, the algorithm that compares specific and generic costs uses estimates. A bad estimate for either plan can influence the caching decision.