15.00 - Queries and AMPs - Teradata Database

Teradata Database SQL Fundamentals

prodname
Teradata Database
vrm_release
15.00
category
Programming Reference
featnum
B035-1141-015K

Queries and AMPs

An SQL query, which includes DELETE, INSERT, MERGE, and UPDATE as well as SELECT, can affect one AMP, several AMPs, or all AMPs in the configuration.

 

IF a query …

THEN …

(involving a single table) uses a unique primary index (UPI)

the row hash can be used to identify a single AMP.

At most one row can be returned.

(involving a single table) uses a nonunique primary index (NUPI)

the row hash can be used to identify a single AMP.

Any number of rows can be returned.

uses a unique secondary index (USI)

one or two AMPs are affected (one AMP if the subtable and base table are on the same AMP).

At most one row can be returned.

uses a nonunique secondary index (NUSI)

if the table has a partitioned primary index (PPI) and the NUSI is the same column set as a NUPI, the query affects one AMP.

Otherwise, all AMPs take part in the operation and any number of rows can be returned.

The SELECT statements in subsequent examples reference the following table data.

 

The meanings of the abbreviations are as follows.

 

Abbreviation

Meaning

PK

Primary Key

FK

Foreign Key

UPI

Unique Primary Index

Single AMP Request

Assume that a PE receives the following SELECT statement:

   SELECT last_name
   FROM Employee
   WHERE employee_number = 1008;

Because a unique primary index value is used as the search condition (the column employee_number is the primary index for the Employee table), PE1 generates a single AMP step requesting the row for employee 1008. The AMP step, along with the PE identification, is put into a message, and sent via the BYNET to the relevant AMP (processor).

See “Flow Diagram of a Single AMP Request” on page 161 for a flow diagram that illustrates this process.

Flow Diagram of a Single AMP Request

Only one BYNET is shown to simplify the illustration.

Assuming that AMP2 has the row, it accepts the message. As shown in the next diagram, AMP2 retrieves the row from disk, includes the row and the PE identification in a return message, and sends the message back to PE1 via the BYNET. PE1 accepts the message and returns the response row to the requesting application.

Flow Diagram of a Single AMP Response to Requesting PE

The following diagram illustrates a single AMP request with partition elimination.

All AMP Request

Assume PE1 receives a SELECT statement that specifies a range of primary index values as a search condition as shown in the following example:

   SELECT last_name, employee_number
   FROM employee
   WHERE employee_number BETWEEEN 1001 AND 1010
   ORDER BY last_name;

In this case, each value hashes differently, and all AMPs must search for the qualifying rows.

PE1 first parses the request and creates the following AMP steps:

  • Retrieve rows between 1001 and 1010
  • Sort ascending on last_name
  • Merge the sorted rows to form the answer set
  • PE1 then builds a message for each AMP step and puts that message onto the BYNET. Typically, each AMP step is completed before the next one begins; note, however, that some queries can generate parallel steps.

    When PE1 puts the message for the first AMP step on the BYNET, that message is broadcast to all processors as illustrated in the following diagram.

    The process is:

    1 All AMPs accept the message, but the PEs do not.

    2 Each AMP checks for qualifying rows on its disk storage units.

    3 If any qualifying rows are found, the data in the requested columns is converted to the client format and copied to a spool file.

    4 Each AMP completes the step, whether rows were found or not, and puts a completion message on the BYNET.

    The completion messages flow across the BYNET to PE1.

    5 When all AMPs have returned a completion message, PE1 transmits a message containing AMP Step 2 to the BYNET.

    Upon receipt of Step 2, the AMPs sort their individual answer sets into ascending sequence by last_name , as illustrated in the following diagram.

    Note: If partitioned on employee_number, the scan may be limited to a few partitions based on partition elimination.

    6 Each AMP sorts its answer set, then puts a completion message on the BYNET.

    7 When PE1 has received all completion messages for Step 2, it sends a message containing AMP Step 3.

    8 Upon receipt of Step 3, each AMP copies the first block from its sorted spool to the BYNET.

    Because there can be multiple AMPs on a single node, each node might be required to handle sort spools from multiple AMPs (see the diagram at the top of the next page).

    9 Nodes that contain multiple AMPs must first perform an intermediate sort of the spools generated by each of the local AMPs.

    When the local sort is complete on each node, the lowest sorting row from each node is sent over the BYNET to PE1. From this point on, PE1 acts as the Merge coordinator among all the participating nodes.

    10 The Merge continues with PE1 building a globally sorted buffer.

    When this buffer fills, PE1 forwards it to the application and begins building subsequent buffers.

    11 When a participant node has exhausted its sort spool, it sends a Done message to PE1.

    This causes PE1 to prune this node from the set of Merge participants.

    When there are no remaining Merge participants, PE1 sends the final buffer to the application along with an End Of File message.

    Partition Elimination

    A PPI can increase query efficiency through partition elimination, where partitions can automatically be skipped because they cannot contain qualifying rows.

    Teradata Database supports several types of partition elimination.

     

    Type

    Description

    Static

    Based on constant conditions such as equality or inequality on the partitioning columns.

    Dynamic

    The partitions to eliminate cannot be determined until the query is executed and the data is scanned.

    Delayed

    Occurs with conditions comparing a partitioning column to a USING variable or built-in function such as CURRENT_DATE, where the Optimizer builds a somewhat generalized plan for the query but delays partition elimination until specific values of USING variables and built-in functions are known.

    The degree of partition elimination depends on the:

  • Partitioning expressions for the primary index of the table
  • Conditions in the query
  • Ability of the Optimizer to detect partition elimination
  • It is not always required that all values of the partitioning columns be specified in a query to have partition elimination occur.

     

    IF a query …

    THEN …

    specifies values for all the primary index columns

    the AMP where the rows reside can be determined and only a single AMP is accessed.

  • If conditions are not specified on the partitioning columns, each partition can be probed to find the rows based on the hash value.
  • If conditions are also specified on the partitioning columns, partition elimination may reduce the number of partitions to be probed on that AMP.
  • For an illustration, see “Single AMP Request With Partition Elimination” on page 165.

    does not specify the values for all the primary index columns

    an all-AMP full file scan is required for a table with an NPPI.

    However, with a PPI, if conditions are specified on the partitioning columns, partition elimination may reduce an all-AMP full file scan to an all-AMP scan of only the non-eliminated partitions.

    Single AMP Request With Partition Elimination

    If a SELECT specifies values for all the primary index columns, the AMP where the rows reside can be determined and only a single AMP is accessed.

    If conditions are also specified on the partitioning columns, partition elimination may reduce the number of partitions to be probed on that AMP.

    Suppose the Employee table is defined with a single-level PPI where the partitioning column is dept_number.

    Assume that a PE receives the following SELECT statement:

       SELECT last_name
       FROM Employee
       WHERE employee_number = 1023
       AND dept_number = 501;

    The following flow diagram illustrates this process.

    The AMP Step includes the list of partitions (in this case, P3) to access. Partition elimination (in this case, static partition elimination) reduces access to the partitions that satisfy the query requirements. In each partition in the list (in this case, only P3), look for rows with a given row hash value of the PI.

    Partition elimination is similar for the Employee table with a multilevel PPI where one partitioning expression uses the dept_number column and another partitioning expression uses the hire_date column.

    Assume that a PE receives the following SELECT statement:

       SELECT last_name
       FROM Employee
       WHERE employee_number = 1023
       AND dept_number = 501
       AND hire_date BETWEEN DATE '2006-01-01' AND DATE '2006-12-31';

    The following flow diagram illustrates this process.

    No one was hired in department number 401 in 2006, so partition P4 is empty.

    For more information on partition elimination, see Database Design.