The SQL Query Optimizer determines an efficient way to access, join, and aggregate the tables required to answer an SQL request.
Questions a Query Optimizer Asks and Their Answers
The Optimizer performs its task of determining a best plan using demographic information about the tables and columns involved in the request and the configuration of the system and numerous heuristic strategies, or rules of thumb.
- Should this request be optimized using incremental planning and execution?
- What is the cardinality of the table?
In this context, cardinality refers to the number of rows in a result or spool table, not the number of rows in a base table.
- What is the degree of the table?
In this context, degree refers to the number of columns in a result or spool table, not the number of columns in a base table.
- Are there interval histogram statistics for the column and index sets required to process the query?
- Are there existing interval histogram statistics fresh enough to provide reasonable cardinality estimates?
- Do existing interval histogram statistics require extrapolation to cover a range query over DATE values?
- Does the table have a primary index, primary AMP index, or no primary index?
- Is the table partitioned?
- If the table is partitioned, are there PARTITION statistics?
- Is the requested column set indexed?
- If the column set is indexed, is the index unique or nonunique?
- Can row-partition elimination be applied?
- Can column-partition elimination be applied?
- How many distinct values are in the column set?
- How many rows in the column set or index have one or more nulls for the columns on which statistics have been collected?
- How many rows in the column set or index are null for all the columns on which statistics have been collected?
- How many rows per column set value are expected to be returned?
- Can a base table be replaced by a covering secondary or single-table join index?
- Is a join partly or completely covered by a join index or NUSI?
- Can a join or aggregation be pushed into UNION ALL branches?
- Can an aggregation be partially pushed to the relations of a join?
- Is an aggregate already calculated for a column by an existing join index?
- What strategies have tended to work best with queries of this type in the past?
- How many AMPs are there in the system?
- How many nodes are there in the system?
- How much and what kind of disk does each AMP have and what is the processor speed of the node the AMP is running on?
Ideally, these questions are answered based on statistical data that you have generated using the SQL COLLECT STATISTICS statement (see COLLECT STATISTICS (Optimizer Form)). When database statistics are collected regularly, you can expect the Optimizer to make the best decisions possible.
When the Optimizer needs index or column statistics, it first checks the statistics cache (see Statistics Cache). If the necessary statistics are not cached, the Optimizer retrieves them from DBC.StatsTbl.
If collected statistics no longer reflect the demographics of the data, the Optimizer may not be able to make the best decisions (see Time and Resource Consumption Factors in Deciding How to Collect Statistics, An Example of How Stale Statistics Can Produce a Poor Query Plan, and Stale Statistics). You can set thresholds for recollecting statistics to make sure the system does not unnecessarily recollect statistics. See COLLECT STATISTICS (Optimizer Form).
If no statistics have been collected on indexed columns in a request, the Optimizer makes a snapshot sampling of data and uses that estimate to make a best guess about the optimum data retrieval path (see Dynamic AMP Sampling). Any derived statistics that the Optimizer develops begin with the dynamic AMP sample as a basis for deriving additional statistics. The Optimizer does not use dynamic AMP samples of non-indexed columns to make cardinality estimates.
The degree that this dynamic AMP sample approximates the population demographics for a column or table is directly proportional to the size of the table: the larger the table, the more likely a sample approximates its true global demographics.
Optimizer Processes
- The Optimizer examines an incoming request parcel to determine if the SQL text to optimize is a DML request or a DDL/DCL request.
Type of SQL Request in the Request Parcel Optimizer Action DDL or DCL The Optimizer deletes the original request from the parse tree after the request has been replaced with specific Data Dictionary operations. No access planning is required for DDL and DCL requests, so the Optimizer only converts the request parcel into work steps involving dictionary writes, locking information, and so on.
DML The Optimizer produces access plans, join plans, and execution plans. The Optimizer then uses its available statistical information, whether complete or sampled, to determine which access paths or plans are to be used.
If there are no column or index statistics in the statistics cache, then the Optimizer uses dynamic AMP sampling to estimate the population statistics of the data.
- The Optimizer determines if the steps are to be run in series or in parallel, and are to be individual or common processing steps.
- The parse tree is further fleshed out with the optimized access paths, join plans, and aggregation, and the Optimizer selects the best plan based on the available derived statistics and costing data.
- The Optimizer places, combines, and reorders table-level locks to reduce the likelihood of deadlocks, then removes any duplicate locks.
- Finally, the Optimizer either passes its fully-optimized parse tree, known as the White Tree, to the Generator for further processing or, if has optimized an EXPLAIN request, produces a verbal explanation of the White Tree to which additional spool size information and costing data, otherwise not costed, is added for analysis.
Tasks Query Optimizers Do Not Perform
- Guarantee that the generated access, join, and aggregation plans are infallibly the best plans possible.
A query optimizer generates multiple optimal plans, based on available population and environmental demographics and the quality of code for the query, and selects the best generated plan set for an SQL request.
Do not assume that any query optimizer ever produces the single best query plan possible to support a given SQL request.
Assume the selected query plan is more optimal than the otherwise unoptimized Resolver ResTree´ (also known as the Red Tree) formulation.
- Rationalize poorly formed queries in such a way as to make their performance as effective as a semantically equivalent well-formed query that returns the same result.
A query optimizer creates the most effective plan for the request. Nevertheless, semantically identical queries can differ in their execution times by an order of magnitude or more depending on how carefully their original SQL code is written. There are limits to the capability of query rewrite (see Query Rewrite, Statistics, and Optimization) and the Optimizer to increase the efficiency of a given user-written query.
AMP Steps
An AMP step is a data structure that describes an operation to be processed by one or more AMPs to perform a task in response to a request parcel. The combined AMP steps for a request constitute the plan for that request.
The most important steps are the plastic and concrete steps. See Generator.
Parallel Steps
Parallel steps are steps from the same request parcel that can be processed concurrently by the AMPs or a single step in a request parcel that can be processed simultaneously by multiple AMPs, taking advantage of the parallelism inherent in the Vantage architecture. Each parallel step has an independent execution path, running simultaneously with other steps.
The Optimizer groups the task steps that can run in parallel. These parallel steps, which make the best use of the BYNET architecture, are generated by the Optimizer whenever possible.
The EXPLAIN facility explicitly reports parallel steps specified by the Optimizer. The Dispatcher has limits on the number of steps actively running in parallel. Also, checks at the AMP level may block a parallel step until a previous parallel step completes.
Additionally, a step may spawn other steps to which rows are sent. For example, during redistribution. The sending steps and the spawned receiving steps run in parallel, with the message system acting as a pipeline between the sender and receiver.
Common Steps
Processing steps common to two or more SQL statements from the same request parcel or macro are combined by the Optimizer.
For example, consider the following multiple-statement request parcel.
SELECT employee_number, last_name, ‘Handling Calls’ FROM employee WHERE employee_number IN (SELECT employee_number FROM call_employee) ;SELECT employee_number, last_name, ‘Not Handling Calls’ FROM employee WHERE employee_number NOT IN (SELECT employee_number FROM call_employee);
The Optimizer processes these requests in parallel using a common steps approach as illustrated by the following table:
Stage | Process | Processing Mode | Step Type |
---|---|---|---|
1 | The Lock Manager locks both tables (employee_number and call_emp). | Serial | Common |
2 | Vantage copies the rows from the employee_number table and redistributes them. The system copies the rows from the call_emp table and redistributes them. |
Parallel | Common |
3 | Vantage Merge Joins the results. | Serial | Individual |
4 | Vantage Exclusion Merge Joins the results. | Serial | Individual |
5 | The Lock Manager releases the table-level locks on employee_number and call_emp. | Serial | Individual |
The Optimizer generates the parallel and common steps for the parcel as shown in the following illustration:
Related Information
For more detailed information about query optimization, see Query Rewrite, Statistics, and Optimization.
For information about how Vantage optimizes join requests, see Join Planning and Optimization and Join Optimizations.