Parameterized Requests | VantageCloud Lake - Parameterized Requests - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905
This feature is only supported on the Block File System on the primary cluster. It is not available for the Object File System.

Vantage has the ability to peek at the parameterized values of a request (such as USING request modifier values, which are user-supplied constant data) in CLIv2 data parcels, and then to resolve those variables in the Resolver. This enables the Optimizer to generate a specific, uncached plan for such requests when the system determines this is the best way to handle the request rather than generating a cached generic plan.

Data Parcel Peeking Terminology

The following terminology is specific to the parameter value peeking subsystem:

Term Definition
Specific Plan An Optimizer plan that applies only to a single instance of that query and is generated by using one or both of the following:
  • Peeking at parameterized values and CURRENT_TIMESTAMP and USER built-in functions
  • Using intermediate incremental planning and execution spool results

Single instance plans are called specific query plans. Vantage caches specific query plans.

The values from peeking at a parameterized request can then be used to tailor a specific query plan for that request.

Specific plans can be either static or dynamic plans.

Generic Plan An Optimizer plan generated without peeking at parameterized values and CURRENT_TIMESTAMP and USER built-in functions, intermediate incremental planning and execution spool results, or both, that apply to most, if not all, instances of that query.

Generic plans are static plans.

Specific Always Parameterized values are resolved and evaluated in the plan for all subsequent requests.
Generic Always Parameterized values are not resolved until the concrete steps for the request are generated, and the cached generic plan is used for all subsequent requests.

The presence of parameterized data in a request may not mean the Optimizer generates a request plan for the request. There are parameterized queries for which the query plan generated by the Optimizer does not depend on parameterized values. Such queries do not gain a performance benefit from generating specific query plans as the result of parameterized value peeking. Furthermore, the negative performance impact of not caching a plan can be high if a request has a short run time.

For example, consider the following query that specifies a USING request modifier whose single variable x is also used to specify a parameterized predicate condition :x in its WHERE clause.

USING (x INTEGER)
SELECT *
FROM table_1
WHERE column_1 = :x;

If there is a UPI defined on table_1.column_1, then the access path selected by the Optimizer is independent of the value of x because irrespective of its value, assuming that value specifies a valid primary index, a primary index access path is the best access path.

Date-based parameter value peeking, which resolves the current date from DATE and CURRENT_DATE built-in functions when a function is specified in the request, can be used to generate a specific or generic plan, depending on other factors present in the text. The terminology used for these date-specific query plans is as follows:
  • A generic parameterized request with a resolved DATE or CURRENT_DATE value for which a query plan is generated is called a DateSpecific generic plan.
  • A specific parameterized request with a resolved DATE or CURRENT_DATE value for which a query plan is generated is called a DateSpecific specific plan.

In the context of the parameter value peek subsystem, the following definitions apply:

Term Definition
Parameterized PK statement A single SQL statement with a set of parameterized variables or CURRENT_TIMESTAMP/USER built-in functions that are used in equality predicate conditions on an nonpartitioned table, a row-partitioned PI table, or USI of a single table with no ORed conditions.

The term PK, which commonly represents Primary Key, represents a primary or unique secondary index column set.

Parameterized PK request A request that contains only the following:
  • Parameterized PK statements
  • Null statements

    See Null for a definition of null statements.

Parameterized value peeking does not impact the treatment of parameterized PK requests, so the Optimizer does not generate a specific plan for those cases.

The following table shows whether parameterized requests are parameterized-dependent or parameterized-independent.

Parameterized Request Type Parameterized-independent? Reason
Simple Yes The request consists of PK statements and other statements without specifying parameterized variables in any of its predicates.

For example, consider the following table definition:

CREATE TABLE emp (
  emp_id INTEGER,
  dept_id INTEGER)
UNIQUE PRIMARY INDEX (emp_id);

The following multiple-statement parameterized request is parameterized-independent, because the :x variable is specified on a UPI, so the access path is independent of the value of x, which means the statement is also an exempt (see Purging Exempt and Non-Exempt Requests):

USING (x INTEGER)
SELECT *
FROM emp
WHERE dept_id = 10
;SELECT *
FROM emp
WHERE emp_id = :x
AND   dept_id = 12;
Parameterized PK Yes By definition.
Iterated Yes By definition.
Other No By definition.

Parameterized Value Peeking

The word peeking means looking at the parameterized values from a data parcel and evaluating date-based/CURRENT_TIMESTAMP/USER built-in function constant values during query parsing, then using those values to investigate potential optimization opportunities such as:

Peeking facilitates the optimization of certain categories of queries by inserting data values that are specific to those queries into the parse tree at an earlier stage of request processing. The system caches generic plans because reusing a cached query plan saves parsing and optimization time. However, reusing a cached generic query plan is not always the best approach to executing a query that provides constant literals that can be used to produce a specific plan tailored closely to that specific request.

The system does not cache specific plans for requests that are parameterized-dependent, because those plans cannot be reused for otherwise identical queries that have different sets of parameterized or built-in function-supplied constant values. The system does cache the other information generated for specific plans. This information includes its SQL text hash, its host character set, its estimated execution costs, its parsing time, and its run time.

When the system determines not to cache a plan for a request, the system can optimize the request using the parameterized request and DATE, CURRENT_DATE, TEMPORAL_DATE, CURRENT_TIMESTAMP, USER, andTEMPORAL_TIMESTAMP built-in function values instead of treating that data as parameters with unknown values. That is, parameterized value peeking uses the literal data values from a parameterized request and date-related built-in functions, or both, to make sure the Optimizer generates an optimal, uncached plan for that request rather than generating a generic plan and then caching that plan for future reuse.

A specific plan generated by the Optimizer must be an optimal plan, and its runtime performance must be significantly better than the run time for an equivalent generic plan. However, the runtime costs of a specific plan and its equivalent generic plan may not differ significantly. The impact is high if the parsing cost is high. Therefore, the system monitors the parsing and run times of such requests and keeps the information in the request cache, to use in deciding between generating a specific or generic plan for a request. See Request Caching Logic.

Enabling and Disabling Parameterized Value Peeking

By default, parameterized value peeking is enabled.

Miscellaneous Considerations for Parameterized Value Peeking

The items in the following list also affect parameterized value peeking:
  • If a USING request modifier is being explained, or submitted with an INSERT EXPLAIN or DUMP EXPLAIN request, the system peeks at the parameterized variables if a data parcel is also provided in the request.
    INSERT EXPLAIN and DUMP EXPLAIN are supported only on the Block File System on the primary cluster, not on the Object File System.

    If no data parcel is submitted, then the system does not peek at the parameterized variables, and the system generates a generic plan for the request.

  • If a parameterized request is submitted under a MultiLoad or FastLoad sessions, the system does not peek at the parameterized variables and current date, so there is no impact on request caching.
  • If a parameterized request is submitted under a FastExport session, the system peeks at the parameterized variables and current date for a SELECT request to generate a more optimal plan for exporting the data.

    Requests submitted in a FastExport session are not cached, so peeking at parameterized request variable values does not impact caching behavior.

  • If the Parser does not peek at the value of any of the parameterized request variables of a parameterized request as, for example, when a USING variable is in the outermost select list and is not specified in the WHERE condition, then there is no impact on caching.
The following performance benefits are realized from parameterized value peeking:
  • Vantage generates specific plans for all requests if either of the following conditions are true:
    • The system compares the plan estimates and the estimated cost of the specific plan is less than the estimated cost of the generic plan.
    • Both the parsing and runtime CPU costs of the specific plan is smaller than the runtime CPU cost of the equivalent generic plan.
  • The values for the DATE, CURRENT_DATE, USER, or TEMPORAL_DATE built-in functions are expanded and passed to the Optimizer for optimal plan generation. The TEMPORAL_TIMESTAMP function may also be expanded and passed to the Optimizer.

    This is done even if the request does not specify parameterized data.

  • If a specific plan does not provide a performance benefit, the system instead caches the equivalent generic plan to reuse for subsequent requests.
The following potential performance liabilities may occur with parameterized value peeking:
  • Estimates of generic and specific plans are compared. If the generic plan is worse than the equivalent specific plan, the database parses the request again and generates a specific plan for it.
  • A request that is classified as parameterized-dependent may not generate a different specific plan. Ideally, such requests are cached immediately. However, these requests are instead cached only after their second submission to the system. This can have a performance impact for those requests that are submitted only twice.
  • If the specific plan and the generic plan are identical, and there are no potential benefits seen for the specific plan, the generic plan is used for subsequent requests. Therefore, no performance benefit can be realized even if parameterized data values in subsequent requests cause more optimal specific plans if exposed and inserted into the plan as literal data values.

    However, when the request cache is cleared, the system reevaluates affected specific and generic plans.

  • The decision to cache a generic plan may be based on the elapsed execution times of the specific and generic plans. The decision can be influenced wrongly because of other workloads running on the system that may block the requests being compared, and as a result, skew their elapsed time values.
  • The algorithm that compares specific and generic costs may use estimates. A bad estimate for either plan can influence the caching decision.

Request Caching Logic

The request cache stores the plans of successfully parsed SQL DML requests to reuse when the same request is resubmitted. The request cache contains the text of the SQL request and its plastic steps. The plastic steps are called the request plan or simply as the plan.

If parameterized value peeking is not enabled, a successfully parsed parameterized request is cached immediately. Parameterized value peeking changes the behavior of the request cache for requests submitted in CLIv2 Execute or Both Prepare and Execute modes, as follows:
  • If a request specifies a DATE, CURRENT_DATE, or TEMPORAL_DATE built-in function, the plan is generated for the specific date when the request is parsed. The specific date, when exposed during the optimization process, causes the Optimizer to generate a more optimal plan where applicable.

    For example, if the partitioning expression is date-based, and the condition on the partitioning column in the predicate specifies the CURRENT_DATE built-in function, the Optimizer performs row-partition elimination, generating a more optimal plan.

    Assume that table t4 is row partitioned on the date column d.

    EXPLAIN SELECT *
            FROM t4
            WHERE d BETWEEN DATE '2005-02-01' AND CURRENT_DATE;

    The following is the relevant portion of the EXPLAIN text output when the date is not replaced:

    ...
    3) We do an all-AMPs RETRIEVE step from 23 partitions of MYDB.t4
       with a condition of ("(MYDB.t4.d <= DATE) AND (MYDB.t4.d >= DATE
       '2005-02-01')") into Spool 1 (group_amps), which is built locally
       on the AMPs. The size of Spool 1 is estimated with no confidence
       to be 2 rows. The estimated time for this step is 0.09 seconds.
    ...

    Vantage must scan 23 row partitions to process this query without peeking at the value of CURRENT_DATE (the relevant text is bold).

    If parameterized value peeking is enabled, the following is the relevant portion of the EXPLAIN text output for the same request. The EXPLAIN output indicates 9 fewer row partitions to scan due to parameterized value peeking.

    ...
    3) We do an all-AMPs RETRIEVE step from 14 partitions of MYDB.t4 
       with a condition of ("(MYDB.t4.d <= DATE '2006-03-02') AND
       (MYDB.t4.d >= DATE '2005-02-01')") into Spool 1 (group_amps), 
       which is built locally on the AMPs. The size of Spool 1 is 
       estimated with no confidence to be 2 rows. The estimated time 
       for this step is 0.08 seconds.
    ...

    Date-based plans are stored in the request cache with the date. Vantage uses this information to process a resubmitted request as described in the following table.

    The system first recomputes the date.

Recomputed Date Result
Matches the cached date Cached plan is reused.
Does not match the cached date Cached plan is cleared and a new plan is generated for the changed date.

The new plan is then cached.

  • When parameterized value peeking is enabled, requests are cached or not according to the actions described by the following table.
Parameterized DML Request Result
Parameterized-independent Request is cached immediately.
Parameterized-dependent Request is parsed with the peeked parameterized values exposed, and the Optimizer then generates a plan that is specific to those parameterized values.

The plan is not cached, but its request text is.

When the same parameterized-dependent request is resubmitted, the Optimizer generates a generic plan for it.

The generic plan runs if either of the following statements is true:
  • The estimates are not compared.
  • The estimate for the specific plan does not indicate any benefit when compared to the estimate for the generic plan.

Otherwise, the database parses the request again and generates a specific plan for it.

Therefore, a generic plan, if run, is cached for subsequent reuse if either of the following statements is true:
  • Its runtime CPU cost is small.
  • Execution of the specific plan does not provide enough runtime CPU cost benefit to compensate for the parsing CPU cost.

Otherwise, the Optimizer generates the specific plan for all requests until the cache is cleared.

A performance benefit is expected to be seen when the system decides to generate specific plans for all subsequent submittals of the request. A performance benefit is expected to be seen when the system decides to use the cached plan for all subsequent submittals of the request.

You can submit a MONITOR SESSION request to determine the number of request cache hits in a session. You can then use that information to determine whether a generic plan has been cached and used.

A specific plan is typically better than its equivalent generic plan, because its parameterized values are substituted as literals. The specific plan and equivalent generic plan may have the same cost. The logic that replaces the parameterized values to generate a specific plan cannot distinguish between queries for which the estimated specific and generic plan times are identical.

The request plans for the following two requests are identical, and peeking at parameterized values cannot help to generate more optimal plans for them. But requests such as the previous SELECT example can benefit from parameterized value peeking if a join index like the following is defined on the target base table for the query.

For example, consider the following table definition (on the Block File System).

CREATE SET TABLE MYDB.t2, NO FALLBACK,NO BEFORE JOURNAL,
  NO AFTER JOURNAL,CHECKSUM = DEFAULT (
  i INTEGER,
  j INTEGER,
  k INTEGER,
  l INTEGER,
  c CHARACTER(400) CHARACTER SET LATIN NOT CASESPECIFIC
                   DEFAULT 'a')
PRIMARY INDEX (i)
INDEX (j);

The statistics collected on the table show the following number of unique values for the NUSI column j and the NUPI column i.

Date     Time     Unique Values      Column Names
-------- -------- ------------------ --------------------------
06/02/17 12:51:22              1,537 j
06/02/17 12:56:10                 60 k

Now consider the following query against table t2, showing the relevant step:

EXPLAIN USING (a INTEGER) SELECT *
                          FROM t2
                          WHERE j = 58
                          AND   k = 100000+i;
...
3) We do an all-AMPs RETRIEVE step from MYDB.t2 by way of an all-rows
   scan with a condition of ("(MYDB.t2.k = (100000 + MYDB.t2.i )) AND
   (MYDB.t2.j = 58)") into Spool 1 (group_amps), which is built
   locally on the AMPs. The size of Spool 1 is estimated with high
   confidence to be 10 rows. The estimated time for this step is
   0.17 seconds.
...

Also consider the following query, which is also made against table t2, showing the relevant step:

EXPLAIN USING (a INTEGER) SELECT *
                          FROM t2
                          WHERE j = 58
                          AND   k =:a + i;
...
3) We do an all-AMPs RETRIEVE step from MYDB.t2 by way of an all-rows
   scan with a condition of ("(MYDB.t2.k = (:a + MYDB.t2.i )) AND
   (MYDB.t2.j = 58)") into Spool 1 (group_amps), which is built
   locally on the AMPs. The size of Spool 1 is estimated with high
   confidence to be 10 rows. The estimated time for this step is
   0.17 seconds.
...

Now create table t3 as a copy of table t2 with data and consider the following join index definition:

CREATE JOIN INDEX j1 AS
  SELECT *
  FROM t3
  WHERE j > 58
  AND k > i+3;

Consider the following query and assume the USING value for a1 is 4:

EXPLAIN USING (a1 INTEGER)
        SELECT *
        FROM t3
        WHERE j = 80
        AND   k = i+:a1;

The value 4 has been explicitly inserted in the text for step 3.

...
3) We do an all-AMPs RETRIEVE step from MYDB.J1 by way of an 
   all-rows scan with a condition of 
   ("(MYDB.J1.k = (MYDB.J1.i +  4  )) AND
   (MYDB.J1.j = 80)") into Spool 1 (group_amps), which is built
   locally on the AMPs. The size of Spool 1 is estimated with no
   confidence to be 126 rows. The estimated time for this step is
   0.14 seconds.
...