Optimizer | SQL Request & Transaction Processing | VantageCloud Lake - Optimizer - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

The SQL Query Optimizer determines an efficient way to access, join, and aggregate the tables required to answer an SQL request.

Questions a Query Optimizer Asks and Their Answers

The Optimizer performs its task of determining a best plan using demographic information about the tables and columns involved in the request and the configuration of the system and numerous heuristic strategies, or rules of thumb.

For more detailed information about query optimization and how Vantage optimizes join requests, see the following sources:
Among the myriad possible optimizations examined by the Optimizer are those addressed by the following questions.
  • Should this request be optimized using incremental planning and execution?
  • What is the cardinality of the table?

    In this context, cardinality refers to the number of rows in a result or spool table, not the number of rows in a base table.

  • What is the degree of the table?

    In this context, degree refers to the number of columns in a result or spool table, not the number of columns in a base table.

  • Are there interval histogram statistics for the column and index sets required to process the query?
  • Are there existing interval histogram statistics fresh enough to provide reasonable cardinality estimates?
  • Do existing interval histogram statistics require extrapolation to cover a range query over DATE values?
  • Does the table have a primary index, primary AMP index, or no primary index?
  • Is the table partitioned?
  • If the table is partitioned, are there PARTITION statistics?
  • Is the requested column set indexed?
  • If the column set is indexed, is the index unique or nonunique?
  • Can row-partition elimination be applied?
  • Can column-partition elimination be applied?
  • How many distinct values are in the column set?
  • How many rows in the column set or index have one or more nulls for the columns on which statistics have been collected?
  • How many rows in the column set or index are null for all the columns on which statistics have been collected?
  • How many rows per column set value are expected to be returned?
  • Can a base table be replaced by a covering secondary or single-table join index?
  • Is a join partly or completely covered by a join index or NUSI?
  • Can a join or aggregation be pushed into UNION ALL branches?
  • Can an aggregation be partially pushed to the relations of a join?
  • Is an aggregate already calculated for a column by an existing join index?
  • What strategies have tended to work best with queries of this type in the past?
  • How many AMPs are there in the system?
  • How many nodes are there in the system?
  • How much and what kind of disk does each AMP have and what is the processor speed of the node the AMP is running on?

Ideally, these questions are answered based on statistical data that you have generated using the SQL COLLECT STATISTICS statement (see COLLECT STATISTICS (Optimizer Form)). When database statistics are collected regularly, you can expect the Optimizer to make the best decisions possible.

When the Optimizer needs index or column statistics, it first checks the statistics cache (see Statistics Cache). If the necessary statistics are not cached, the Optimizer retrieves them from DBC.StatsTbl.

If collected statistics no longer reflect the demographics of the data, the Optimizer may not be able to make the best decisions (see Time and Resource Consumption Factors in Deciding How to Collect Statistics, An Example of How Stale Statistics Can Produce a Poor Query Plan, and Stale Statistics). You can set thresholds for recollecting statistics to make sure the system does not unnecessarily recollect statistics. See COLLECT STATISTICS (Optimizer Form).

If no statistics have been collected on indexed columns in a request, the Optimizer makes a snapshot sampling of data and uses that estimate to make a best guess about the optimum data retrieval path (see Dynamic AMP Sampling). Any derived statistics that the Optimizer develops begin with the dynamic AMP sample as a basis for deriving additional statistics. The Optimizer does not use dynamic AMP samples of non-indexed columns to make cardinality estimates.

The degree that this dynamic AMP sample approximates the population demographics for a column or table is directly proportional to the size of the table: the larger the table, the more likely a sample approximates its true global demographics.

Optimizer Processes

The following is a simplified list of the Optimizer processing stages.
  1. The Optimizer examines an incoming request parcel to determine if the SQL text to optimize is a DML request or a DDL/DCL request.
    Type of SQL Request in the Request Parcel Optimizer Action
    DDL or DCL The Optimizer deletes the original request from the parse tree after the request has been replaced with specific Data Dictionary operations.

    No access planning is required for DDL and DCL requests, so the Optimizer only converts the request parcel into work steps involving dictionary writes, locking information, and so on.

    DML The Optimizer produces access plans, join plans, and execution plans.

    The Optimizer then uses its available statistical information, whether complete or sampled, to determine which access paths or plans are to be used.

    If there are no column or index statistics in the statistics cache, then the Optimizer uses dynamic AMP sampling to estimate the population statistics of the data.

  2. The Optimizer determines if the steps are to be run in series or in parallel, and are to be individual or common processing steps.
  3. The parse tree is further fleshed out with the optimized access paths, join plans, and aggregation, and the Optimizer selects the best plan based on the available derived statistics and costing data.
  4. The Optimizer places, combines, and reorders table-level locks to reduce the likelihood of deadlocks, then removes any duplicate locks.
  5. Finally, the Optimizer either passes its fully-optimized parse tree, known as the White Tree, to the Generator for further processing or, if has optimized an EXPLAIN request, produces a verbal explanation of the White Tree to which additional spool size information and costing data, otherwise not costed, is added for analysis.

Tasks Query Optimizers Do Not Perform

Query optimizers do not do either of the following things:
  • Guarantee that the generated access, join, and aggregation plans are infallibly the best plans possible.

    A query optimizer generates multiple optimal plans, based on available population and environmental demographics and the quality of code for the query, and selects the best generated plan set for an SQL request.

    Do not assume that any query optimizer ever produces the single best query plan possible to support a given SQL request.

    Assume the selected query plan is more optimal than the otherwise unoptimized Resolver ResTree´ (also known as the Red Tree) formulation.

  • Rationalize poorly formed queries in such a way as to make their performance as effective as a semantically equivalent well-formed query that returns the same result.

    A query optimizer creates the most effective plan for the request. Nevertheless, semantically identical queries can differ in their execution times by an order of magnitude or more depending on how carefully their original SQL code is written. There are limits to the capability of query rewrite (see Query Rewrite, Statistics, and Optimization) and the Optimizer to increase the efficiency of a given user-written query.

AMP Steps

An AMP step is a data structure that describes an operation to be processed by one or more AMPs to perform a task in response to a request parcel. The combined AMP steps for a request constitute the plan for that request.

The most important steps are the plastic and concrete steps. See Generator.

Parallel Steps

Parallel steps are steps from the same request parcel that can be processed concurrently by the AMPs or a single step in a request parcel that can be processed simultaneously by multiple AMPs, taking advantage of the parallelism inherent in the Vantage architecture. Each parallel step has an independent execution path, running simultaneously with other steps.

The Optimizer groups the task steps that can run in parallel. These parallel steps, which make the best use of the BYNET architecture, are generated by the Optimizer whenever possible.

The EXPLAIN facility explicitly reports parallel steps specified by the Optimizer. The Dispatcher has limits on the number of steps actively running in parallel. Also, checks at the AMP level may block a parallel step until a previous parallel step completes.

Additionally, a step may spawn other steps to which rows are sent. For example, during redistribution. The sending steps and the spawned receiving steps run in parallel, with the message system acting as a pipeline between the sender and receiver.

Common Steps

Processing steps common to two or more SQL statements from the same request parcel or macro are combined by the Optimizer.

For example, consider the following multiple-statement request parcel.

     SELECT employee_number, last_name, ‘Handling Calls’
     FROM employee
     WHERE employee_number IN (SELECT employee_number
                               FROM call_employee)
     ;SELECT employee_number, last_name, ‘Not Handling Calls’
      FROM employee
      WHERE employee_number NOT IN (SELECT employee_number
                                    FROM call_employee);

The Optimizer processes these requests in parallel using a common steps approach as illustrated by the following table:

Stage Process Processing Mode Step Type
1 The Lock Manager locks both tables (employee_number and call_emp). Serial Common
2 Vantage copies the rows from the employee_number table and redistributes them.

The system copies the rows from the call_emp table and redistributes them.

Parallel Common
3 Vantage Merge Joins the results. Serial Individual
4 Vantage Exclusion Merge Joins the results. Serial Individual
5 The Lock Manager releases the table-level locks on employee_number and call_emp. Serial Individual

The Optimizer generates the parallel and common steps for the parcel as shown in the following illustration:


Optimizer generates parcel parallel & common steps

Related Information

For more detailed information about query optimization, see Query Rewrite, Statistics, and Optimization.

For information about how Vantage optimizes join requests, see Join Planning and Optimization and Join Optimizations.