The SQL Query Optimizer determines an efficient way to access, join, and aggregate the tables required to answer an SQL request.
Questions a Query Optimizer Asks and Their Answers
The Optimizer performs its task of determining a best plan using various demographic information about the tables and columns involved in the request and the configuration of the system, as well as numerous heuristic strategies, or rules of thumb.
- Should this request be optimized using incremental planning and execution?
- What is the cardinality of the table?
In this context, cardinality generally refers to the number of rows in a result or spool table, not the number of rows in a base table.
- What is the degree of the table?
In this context, degree generally refers to the number of columns in a result or spool table, not the number of columns in a base table.
- Are there interval histogram statistics for the column and index sets required to process the query?
- If there are existing interval histogram statistics, are they fresh enough to provide reasonable cardinality estimates, or are they stale?
- If there are existing interval histogram statistics, can they cover a range query over DATE values, or do they require extrapolation?
- Does the table have a primary index, primary AMP index, or no primary index?
- Is the table partitioned?
- If the table is partitioned, are there PARTITION statistics?
- Is the requested column set indexed?
- If the column set is indexed, is the index unique or nonunique?
- Can row-partition elimination be applied?
- Can column-partition elimination be applied?
- How many distinct values are in the column set?
- How many rows in the column set or index have one or more nulls for the columns on which statistics have been collected?
- How many rows in the column set or index are null for all the columns on which statistics have been collected?
- How many rows per column set value are expected to be returned?
- Can a base table be replaced by a covering secondary, hash, or single-table join index?
- Is a join partly or completely covered by a join index or NUSI?
- Can a join or aggregation be pushed into UNION ALL branches?
- Can an aggregation be partially pushed to the relations of a join?
- Is an aggregate already calculated for a column by an existing join index?
- What strategies have tended to work best with queries of this type in the past?
- How many AMPs are there in the system?
- How many nodes are there in the system?
- How much and what kind of disk does each AMP have and what is the processor speed of the node it is running on?
- Is the table load isolated?
Ideally, many of these questions are answered largely based on statistical data that you have generated using the SQL COLLECT STATISTICS statement (see the information about COLLECT STATISTICS (Optimizer Form) in Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144). When database statistics are collected regularly, you can expect the Optimizer to make the best decisions possible.
When the Optimizer needs index or column statistics, it first checks the statistics cache (see Statistics Cache). If the necessary statistics are not cached, the Optimizer retrieves them from DBC.StatsTbl (see Teradata Vantage™ - Data Dictionary, B035-1092 for information about DBC.StatsTbl).
If statistics have been collected, but long enough ago that they no longer reflect the true demographics of the data, then the Optimizer might not be able to make the best-informed decisions about how to proceed (see Time and Resource Consumption Factors in Deciding How to Collect Statistics, An Example of How Stale Statistics Can Produce a Poor Query Plan, and Stale Statistics). You can set various thresholds for recollecting statistics to ensure that the system does not recollect statistics when it is not necessary to do so. See the information about COLLECT STATISTICS (Optimizer Form) in Teradata Vantage™ - SQL Data Definition Language Detailed Topics, B035-1184 for more information about specifying threshold values for recollecting statistics.
If no statistics have been collected on indexed columns in a request, the Optimizer makes a snapshot sampling of data and uses that estimate to make a best guess about the optimum data retrieval path (see Dynamic AMP Sampling). Any derived statistics that the Optimizer develops begin with the dynamic AMP sample as a basis for deriving additional statistics. Note that the Optimizer does not use dynamic AMP samples of non-indexed columns to make cardinality estimates.
The degree that this dynamic AMP sample approximates the population demographics for a column or table is directly proportional to the size of the table: the larger the table, the more likely a sample approximates its true global demographics.
The following is a simplified list of the Optimizer processing stages.
- The Optimizer examines an incoming request parcel to determine if the SQL text it is about to optimize is a DML request or a DDL/DCL request.
Type of SQL Request in the Request Parcel Optimizer Action DDL or DCL The Optimizer deletes the original request from the parse tree after it has been replaced with specific Data Dictionary operations.
No access planning is required for DDL and DCL requests, so the Optimizer only converts the request parcel into work steps involving dictionary writes, locking information, and so on.
DML The Optimizer produces access plans, join plans, and execution plans.
The Optimizer then uses whatever statistical information it has, whether complete or sampled, to determine which access paths or plans are to be used.
If there are no column or index statistics in the statistics cache, then the Optimizer uses dynamic AMP sampling to estimate the population statistics of the data.
- The Optimizer determines if the steps are to be executed in series or in parallel, and if they are to be individual or common processing steps.
- The parse tree is further fleshed out with the optimized access paths, join plans, and aggregation, and the Optimizer selects the best plan based on the available derived statistics and costing data it has.
- The Optimizer places, combines, and reorders table-level locks to reduce the likelihood of deadlocks, then removes any duplicate locks it finds.
- Finally, the Optimizer either passes its fully-optimized parse tree, known as the White Tree, on to the Generator for further processing or, if has optimized an EXPLAIN request, produces a verbal explanation of the White Tree to which additional spool size information and costing data that would not otherwise be costed, is added both for human analysis and for Teradata Viewpoint.
Tasks That Query Optimizers Do Not Perform
- Guarantee that the access, join, and aggregation plans they generate are infallibly the best plans possible.
A query optimizer always generates several optimal plans based on the population and environmental demographics it has to work with and the quality of code for the query it receives, then selects the best of the generated plan set to respond to an SQL request.
You should not assume that any query optimizer ever produces the single best query plan possible to support a given SQL request.
You should assume that the query plan selected is more optimal than the otherwise unoptimized Resolver ResTree´ (also known as the Red Tree) formulation would have been.
- Rationalize poorly formed queries in such a way as to make their performance as effective as a semantically equivalent well-formed query that returns the same result.
A query optimizer always creates the most effective plan it can for the request it is presented; nevertheless, semantically identical queries can differ in their execution times by an order of magnitude or more depending on how carefully their original SQL code is written. There are limits to the capability of query rewrite (see Query Rewrite, Statistics, and Optimization) and the Optimizer to increase the efficiency of a given user-written query.
About the Types of Steps
An AMP step is a data structure that describes an operation to be processed by one or more AMPs in order to perform a task in response to a request parcel. The combined AMP steps for a request constitute the plan for that request.
There are several types of steps, the most important of which are the plastic and concrete steps. For more information about plastic and concrete steps, see Generator.
Parallel steps are steps from the same request parcel that can be processed concurrently by the AMPs or a single step in a request parcel that can be processed simultaneously by multiple AMPs, taking advantage of the parallelism inherent in the Vantage architecture. Each parallel step has an independent execution path, running simultaneously with other steps.
The Optimizer determines which steps of a task can be run in parallel and groups them together. These parallel steps, which make the best use of the BYNET architecture, are generated by the Optimizer whenever possible.
The EXPLAIN facility explicitly reports any parallel steps specified by the Optimizer. Note that the Dispatcher has limits on how many steps may actually be actively running in parallel. Also, checks at the AMP level may block a parallel step until a previous parallel step completes.
Additionally, a step may spawn other steps to which rows are sent. For example, during redistribution. The sending steps and the spawned receiving steps run in parallel, with the message system acting as a pipeline between the sender and receiver.
Common steps are processing steps common to 2 or more SQL statements from the same request parcel or macro. They are recognized as such and combined by the Optimizer.
For example, consider the following multistatement request parcel.
SELECT employee_number, last_name, ‘Handling Calls’ FROM employee WHERE employee_number IN (SELECT employee_number FROM call_employee) ;SELECT employee_number, last_name, ‘Not Handling Calls’ FROM employee WHERE employee_number NOT IN (SELECT employee_number FROM call_employee);
The Optimizer processes these requests in parallel using a common steps approach as illustrated by the following table:
|Stage||Process||Processing Mode||Step Type|
|1||The Lock Manager locks both tables (employee_number and call_emp).||Serial||Common|
|2||Vantage copies the rows from the employee_number table and redistributes them.
The system copies the rows from the call_emp table and redistributes them.
|3||Vantage Merge Joins the results.||Serial||Individual|
|4||Vantage Exclusion Merge Joins the results.||Serial||Individual|
|5||The Lock Manager releases the table-level locks on employee_number and call_emp.||Serial||Individual|
The Optimizer generates the parallel and common steps for the parcel as shown in the following illustration:
For more detailed information about query optimization, see Query Rewrite, Statistics, and Optimization.
For information about the Viewpoint Stats Manager, which is also a component of the Optimizer, see Teradata® Viewpoint User Guide, B035-2206.