Vantage has the ability to peek at the parameterized values of a request (such as USING request modifier values, which are user-supplied constant data) in CLIv2 data parcels, and then to resolve those variables in the Resolver. This enables the Optimizer to generate a specific, uncached plan for such requests when the system determines this is the best way to handle the request rather than always generating a cached generic plan.
Data Parcel Peeking Terminology
The following terminology is specific to the parameter value peeking subsystem:
Term | Definition |
---|---|
Specific Plan | An Optimizer plan that applies only to a single instance of that query and is generated by using one or both of the following:
Single instance plans are called specific query plans. Vantage caches specific query plans. The values obtained from peeking at a parameterized request can then be used to tailor a very specific query plan for that request. Specific plans can be either static or dynamic plans. |
Generic Plan | An Optimizer plan generated without peeking at parameterized values and CURRENT_TIMESTAMP and USER built-in functions, intermediate incremental planning and execution spool results, or both, that apply to most, if not all, instances of that query. Generic plans are static plans. |
Specific Always | Parameterized values are resolved and evaluated in the plan for all subsequent requests. |
Generic Always | Parameterized values are not resolved until the concrete steps for the request are generated, and the cached generic plan is used for all subsequent requests. |
The presence of parameterized data in a request does not automatically determine that the Optimizer will generate a specific request plan for it. There are many parameterized queries for which the query plan generated by the Optimizer does not depend on parameterized values. Such queries do not gain any added performance benefit from generating specific query plans as the result of parameterized value peeking. Furthermore, the negative performance impact of not caching a plan can be high if a request has a short execution time.
For example, consider the following query that specifies a USING request modifier whose single variable x is also used to specify a parameterized predicate condition :x in its WHERE clause.
USING (x INTEGER) SELECT * FROM table_1 WHERE column_1 = :x;
If there is a UPI defined on table_1.column_1, then the access path selected by the Optimizer is independent of the value of x because irrespective of its value, assuming that value specifies a valid primary index, a primary index access path is the best access path.
- A generic parameterized request with a resolved DATE or CURRENT_DATE value for which a query plan is generated is called a DateSpecific generic plan.
- A specific parameterized request with a resolved DATE or CURRENT_DATE value for which a query plan is generated is called a DateSpecific specific plan.
In the context of the parameter value peek subsystem, the following definitions apply:
Term | Definition |
---|---|
Parameterized PK statement | A single SQL statement with a set of parameterized variables and/or CURRENT_TIMESTAMP/USER built-in functions that are used in equality predicate conditions on an nonpartitioned table, a row-partitioned PI table, or USI of a single table with no ORed conditions. The term PK, which commonly represents Primary Key, in this case represents a primary or unique secondary index column set. |
Parameterized PK request | A request that contains only the following:
|
Parameterized value peeking does not impact the treatment of parameterized PK requests, so the Optimizer does not generate a specific plan for those cases.
The following table indicates whether various types of parameterized request are parameterized-dependent or parameterized-independent.
The Parser considers this type of parameterized request … | To be … | Reason |
---|---|---|
simple parameterized | parameterized-independent. | The request consists of PK statements and other statements without specifying parameterized variables in any of its predicates. For example, consider the following table definition: CREATE TABLE emp ( emp_id INTEGER, dept_id INTEGER) UNIQUE PRIMARY INDEX (emp_id); The following multistatement parameterized request is parameterized-independent, because the :x variable is specified on a UPI, so the access path is independent of the value of x, which means that it is also an exempt statement (see Purging Exempt and Non-Exempt Requests): USING (x INTEGER) SELECT * FROM emp WHERE dept_id = 10 ;SELECT * FROM emp WHERE emp_id = :x AND dept_id = 12; |
parameterized PK | parameterized-independent. | By definition. |
iterated | parameterized-independent. | By definition. |
all other parameterized requests | parameterized-dependent. | By definition. |
Parameterized Value Peeking
- Satisfiability and transitive closure (see Predicate Simplification)
- Optimum single table access planning
- Row partition elimination (see Row Partition Elimination)
- Using covering secondary, hash, and join indexes in place of base tables (see Query Rewrite, Statistics, and Optimization)
Peeking facilitates the optimization of certain categories of queries by inserting data values that are specific to those queries into the parse tree at an earlier stage of request processing than would otherwise be done. The system always caches generic plans because reusing a cached query plan saves parsing and optimization time. However, reusing a cached generic query plan is not always the best approach to executing a query that provides constant literals that can be used to produce a specific plan tailored closely to that specific request.
The system does not cache specific plans for requests that are parameterized-dependent, because those plans cannot be reused for otherwise identical queries that have different sets of parameterized or built-in function-supplied constant values, but it does cache the other information generated for specific plans. This information includes its SQL text hash, its host character set, its estimated execution costs, its parsing time, and its run time.
When the system determines that a plan for a request should not be cached, the parameterized request and DATE, CURRENT_DATE, TEMPORAL_DATE, CURRENT_TIMESTAMP, USER, and in some cases TEMPORAL_TIMESTAMP built-in function values can be used to optimize the request instead of treating that data as parameters with unknown values. In other words, parameterized value peeking uses the literal data values from a parameterized request and date-related built-in functions, or both, to ensure that the Optimizer generates an optimal, uncached plan for that particular request rather than generating a generic plan and then caching it for future reuse.
A specific plan generated by the Optimizer should be an optimal plan, and its runtime performance should be significantly better than the execution time for an equivalent generic plan. However, there can be cases when the runtime costs of a specific plan and its equivalent generic plan do not differ significantly. In this case, the impact is especially high if the parsing cost is high. To avoid these cases, the system monitors all such requests for their parsing and run times, and keeps the information in the request cache. The system then uses this information to decide between generating a specific or generic plan for a request. For more details about this process, see Request Caching Logic.
Enabling and Disabling Parameterized Value Peeking
By default, parameterized value peeking is enabled. To disable it, you can change the value of the DisablePeekUsing field using the DBS Control utility. For more information about DBS Control, see Teradata Vantage™ - Database Utilities, B035-1102.
Miscellaneous Considerations for Parameterized Value Peeking
- If a USING request modifier is being explained, or submitted with an INSERT EXPLAIN or DUMP EXPLAIN request, the system peeks at the parameterized variables if a data parcel is also provided in the request.
If no data parcel is submitted, then the system does not peek at the parameterized variables, and the system generates a generic plan for the request.
- If a parameterized request is submitted under a MultiLoad or FastLoad sessions, the system does not peek at the parameterized variables and current date, so there is no impact on request caching.
- If a parameterized request is submitted under a FastExport session, the system peeks at the parameterized variables and current date for a SELECT request in order to generate a more optimal plan for exporting the data.
Note that requests submitted in a FastExport session are not cached, so peeking at parameterized request variable values does not impact caching behavior.
- If the Parser does not peek at the value of any of the parameterized request variables of a parameterized request as, for example, when a USING variable is in the outermost select list and is not specified in the WHERE condition, then there is no impact on caching.
- Vantage generates specific plans for all requests if either of the following conditions are true:
- The system compares the plan estimates and the estimated cost of the specific plan is less than the estimated cost of the generic plan.
- Both the parsing and runtime CPU costs of the specific plan is smaller than the runtime CPU cost of the equivalent generic plan.
- The values for the DATE, CURRENT_DATE, USER, or TEMPORAL_DATE (in some cases, the TEMPORAL_TIMESTAMP function is also expanded and passed to the Optimizer) built-in functions are expanded and passed to the Optimizer for optimal plan generation.
This is done even if the request does not specify parameterized data.
- If a specific plan does not provide a performance benefit, the system instead caches the equivalent generic plan and reuses it for subsequent requests.
- In some cases, estimates of generic and specific plans are compared. If the generic plan is worse than the equivalent specific plan, the database parses the request again and generates a specific plan for it.
- A request that is classified as parameterized-dependent might not always generate a different specific plan. Ideally, they would be cached immediately; however, they are instead cached only after their second submission to the system. This can have a performance impact for those requests that are submitted only twice.
- If the specific plan and the generic plan are identical, and there are no potential benefits seen for the specific plan, then the generic plan is used for all subsequent requests. As a result, no performance benefit can be realized even if parameterized data values in subsequent requests are such that they would result in more optimal specific plans if they were exposed and inserted into the plan as literal data values.
However, once the request cache is purged, the system reevaluates all affected specific and generic plans.
- In some cases, the decision to cache a generic plan is based on the elapsed execution times of the specific and generic plans. The decision can be influenced wrongly because of other workloads running on the system that might block either of the requests being compared, and as a result skew their elapsed time values.
- In some cases, the algorithm that compares specific and generic costs uses estimates. A bad estimate for either plan can influence the caching decision.
Request Caching Logic
The request cache stores the plans of successfully parsed SQL DML requests so they can be reused when the same request is resubmitted. The request cache contains the text of the SQL request and its plastic steps. The plastic steps are referred to as the request plan or simply as the plan.
- If a request specifies a DATE, CURRENT_DATE, or TEMPORAL_DATE built-in function, then the plan is generated for the specific date when the request is parsed. The specific date, when exposed during the optimization process, causes the Optimizer to generate a more optimal plan where applicable.
For example, if the partitioning expression is date-based, and the condition on the partitioning column in the predicate specifies the CURRENT_DATE built-in function, then the Optimizer performs row-partition elimination, generating a more optimal plan.
Assume that table t4 is row partitioned on the date column d.
EXPLAIN SELECT * FROM t4 WHERE d BETWEEN DATE '2005-02-01' AND CURRENT_DATE;
The following is the relevant portion of the EXPLAIN text output when the date is not replaced:
... 3) We do an all-AMPs RETRIEVE step from 23 partitions of MYDB.t4 with a condition of ("(MYDB.t4.d <= DATE) AND (MYDB.t4.d >= DATE '2005-02-01')") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 2 rows. The estimated time for this step is 0.09 seconds. ...
Note that Vantage must scan 23 row partitions to process this query without peeking at the value of CURRENT_DATE (the relevant text is highlighted in boldface type).
If parameterized value peeking is enabled, the following is the relevant portion of the EXPLAIN text output for the same request. Notice the EXPLAIN output indicates 9 fewer row partitions to scan due to parameterized value peeking (the relevant text is highlighted in boldface type).
... 3) We do an all-AMPs RETRIEVE step from 14 partitions of MYDB.t4 with a condition of ("(MYDB.t4.d <= DATE '2006-03-02') AND (MYDB.t4.d >= DATE '2005-02-01')") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 2 rows. The estimated time for this step is 0.08 seconds. ...
Date-based plans are stored in the request cache along with the date. Vantage uses this information to process a resubmitted request as described in the following table.
The system first recomputes the date.
IF the recomputed date … | THEN the … |
---|---|
matches the cached date | cached plan is reused. |
does not match the cached date | cached plan is purged and a new plan is generated for the changed date. The new plan is then cached. |
- When parameterized value peeking is enabled, requests are cached or not according to the actions described by the following table.
IF a parameterized DML request is … | THEN it is … |
---|---|
parameterized-independent | cached immediately. |
parameterized-dependent | parsed with the peeked parameterized values exposed, and the Optimizer then generates a plan that is specific to those parameterized values. In this case, the plan is not cached, but its request text is. When the same parameterized-dependent request is resubmitted, the Optimizer generates a generic plan for it. |
- The estimates are not compared.
- The estimate for the specific plan does not indicate any benefit when compared to the estimate for the generic plan.
Otherwise, the database parses the request again and generates a specific plan for it.
- Its runtime CPU cost is very small.
- Execution of the specific plan does not provide enough runtime CPU cost benefit to compensate for the parsing CPU cost.
In all other cases, the Optimizer generates the specific plan for all requests until the cache is purged. This includes the case when executing the generic plan fails, such as when it aborts or returns a fatal error during execution.
A performance benefit is expected to be seen when the system decides to generate specific plans for all subsequent submittals of the request. A performance benefit is expected to be seen when the system decides to use the cached plan for all subsequent submittals of the request.
You can submit a MONITOR SESSION request to determine the number of request cache hits in a session. You can then use that information to determine whether a generic plan has been cached and used.
A specific plan is generally more optimal than its equivalent generic plan because its parameterized values have been substituted as literals. There are, however, some requests for which the specific plan cost is the same as that for the equivalent generic plan. The logic that replaces the parameterized values to generate a specific plan cannot distinguish between queries for which the estimated specific and generic plan times are identical.
The request plans for the following two requests are identical, and peeking at parameterized values cannot help to generate more optimal plans for them. But requests such as the previous SELECT example can benefit from parameterized value peeking if a join index like the following is defined on the target base table for the query.
For example, consider the following table definition.
CREATE SET TABLE MYDB.t2, NO FALLBACK,NO BEFORE JOURNAL, NO AFTER JOURNAL,CHECKSUM = DEFAULT ( i INTEGER, j INTEGER, k INTEGER, l INTEGER, c CHARACTER(400) CHARACTER SET LATIN NOT CASESPECIFIC DEFAULT 'a') PRIMARY INDEX (i) INDEX (j);
The statistics collected on the table show the following number of unique values for the NUSI column j and the NUPI column i.
Date Time Unique Values Column Names -------- -------- ------------------ -------------------------- 06/02/17 12:51:22 1,537 j 06/02/17 12:56:10 60 k
Now consider the following query against table t2, showing the relevant step:
EXPLAIN USING (a INTEGER) SELECT * FROM t2 WHERE j = 58 AND k = 100000+i; ... 3) We do an all-AMPs RETRIEVE step from MYDB.t2 by way of an all-rows scan with a condition of ("(MYDB.t2.k = (100000 + MYDB.t2.i )) AND (MYDB.t2.j = 58)") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 10 rows. The estimated time for this step is 0.17 seconds. ...
Also consider the following query, which is also made against table t2, showing the relevant step:
EXPLAIN USING (a INTEGER) SELECT * FROM t2 WHERE j = 58 AND k =:a + i;
Result:
... 3) We do an all-AMPs RETRIEVE step from MYDB.t2 by way of an all-rows scan with a condition of ("(MYDB.t2.k = (:a + MYDB.t2.i )) AND (MYDB.t2.j = 58)") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 10 rows. The estimated time for this step is 0.17 seconds. ...
Now create table t3 as a copy of table t2 with data and consider the following join index definition:
CREATE JOIN INDEX j1 AS SELECT * FROM t3 WHERE j > 58 AND k > i+3;
Consider the following query and assume the USING value for a1 is 4:
EXPLAIN USING (a1 INTEGER) SELECT * FROM t3 WHERE j = 80 AND k = i+:a1;
You can see that the value 4 has been explicitly inserted in the text for step 3, where it is highlighted in boldface type.
...
3) We do an all-AMPs RETRIEVE step from MYDB.J1 by way of an
all-rows scan with a condition of
("(MYDB.J1.k = (MYDB.J1.i + 4 )) AND
(MYDB.J1.j = 80)") into Spool 1 (group_amps), which is built
locally on the AMPs. The size of Spool 1 is estimated with no
confidence to be 126 rows. The estimated time for this step is
0.14 seconds.
...