An SQL query, which includes DELETE, INSERT, MERGE, and UPDATE as well as SELECT, can affect one AMP, several AMPs, or all AMPs in the configuration.
IF a query … | THEN … |
---|---|
(involving a single table) uses a unique primary index (UPI) | the row hash can be used to identify a single AMP. At most one row can be returned. |
(involving a single table) uses a nonunique primary index (NUPI) | the row hash can be used to identify a single AMP. Any number of rows can be returned. |
uses a unique secondary index (USI) | one or two AMPs are affected (one AMP if the subtable and base table are on the same AMP). At most one row can be returned. |
uses a nonunique secondary index (NUSI) | if the table has a partitioned primary index (PPI) and the NUSI is the same column set as a NUPI, the query affects one AMP. Otherwise, all AMPs take part in the operation and any number of rows can be returned. |
The SELECT statements in subsequent examples reference the following table data.
Employee | ||||||||
Employee Number | Manager Employee Number | Dept. Number | Job Code | Last Name | First Name | Hire Date | Birth Date | Salary Amount |
PK/UPI | FK | FK | FK | |||||
1006 | 1019 | 301 | 312101 | Stein | John | 961005 | 631015 | 2945000 |
1008 | 1019 | 301 | 312102 | Kanieski | Carol | 970201 | 680517 | 2925000 |
1005 | 0801 | 403 | 431100 | Ryan | Loretta | 1061015 | 650910 | 3120000 |
1004 | 1003 | 401 | 412101 | Johnson | Darlene | 1061015 | 760423 | 3630000 |
1007 | 1005 | 403 | 432101 | Villegas | Arnando | 1050102 | 770131 | 4970000 |
1003 | 0801 | 401 | 411100 | Trader | James | 960731 | 670619 | 3755000 |
1016 | 0801 | 302 | 321100 | Rogers | Nora | 980310 | 690904 | 5650000 |
1012 | 1005 | 403 | 432101 | Hopkins | Paulene | 970315 | 720218 | 3790000 |
1019 | 0801 | 301 | 311100 | Kubic | Ron | 980801 | 721211 | 5770000 |
1023 | 1017 | 501 | 512101 | Rabbit | Peter | 1040301 | 621029 | 2650000 |
1083 | 0801 | 619 | 414221 | Kimble | George | 1010312 | 810330 | 3620000 |
1017 | 0801 | 501 | 511100 | Runyon | Irene | 980501 | 611110 | 6600000 |
1001 | 1003 | 401 | 412101 | Hoover | William | 1010818 | 700114 | 2552500 |
The meanings of the abbreviations are as follows.
Abbreviation | Meaning |
---|---|
PK | Primary Key |
FK | Foreign Key |
UPI | Unique Primary Index |
Single AMP Request
Assume that a PE receives the following SELECT statement:
SELECT last_name FROM Employee WHERE employee_number = 1008;
Because a unique primary index value is used as the search condition (the column employee_number is the primary index for the Employee table), PE1 generates a single AMP step requesting the row for employee 1008. The AMP step, along with the PE identification, is put into a message, and sent via the BYNET to the relevant AMP (processor).
Flow Diagram of a Single AMP Request
Only one BYNET is shown to simplify the illustration.
Assuming that AMP2 has the row, it accepts the message. As shown in the next diagram, AMP2 retrieves the row from disk, includes the row and the PE identification in a return message, and sends the message back to PE1 via the BYNET. PE1 accepts the message and returns the response row to the requesting application.
Flow Diagram of a Single AMP Response to Requesting PE
The following diagram illustrates a single AMP request with partition elimination.
All AMP Request
Assume PE1 receives a SELECT statement that specifies a range of primary index values as a search condition as shown in the following example:
SELECT last_name, employee_number FROM employee WHERE employee_number BETWEEEN 1001 AND 1010 ORDER BY last_name;
In this case, each value hashes differently, and all AMPs must search for the qualifying rows.
- Retrieve rows between 1001 and 1010
- Sort ascending on last_name
- Merge the sorted rows to form the answer set
PE1 then builds a message for each AMP step and puts that message onto the BYNET. Typically, each AMP step is completed before the next one begins; note, however, that some queries can generate parallel steps.
When PE1 puts the message for the first AMP step on the BYNET, that message is broadcast to all processors as illustrated in the following diagram.
The process is:
- All AMPs accept the message, but the PEs do not.
- Each AMP checks for qualifying rows on its disk storage units.
- If any qualifying rows are found, the data in the requested columns is converted to the client format and copied to a spool file.
- Each AMP completes the step, whether rows were found or not, and puts a completion message on the BYNET.
The completion messages flow across the BYNET to PE1.
- When all AMPs have returned a completion message, PE1 transmits a message containing AMP Step 2 to the BYNET.
Upon receipt of Step 2, the AMPs sort their individual answer sets into ascending sequence by last_name, as illustrated in the following diagram.
If partitioned on employee_number, the scan may be limited to a few partitions based on partition elimination.
- Each AMP sorts its answer set, then puts a completion message on the BYNET.
- When PE1 has received all completion messages for Step 2, it sends a message containing AMP Step 3.
- Upon receipt of Step 3, each AMP copies the first block from its sorted spool to the BYNET.
Because there can be multiple AMPs on a single node, each node might be required to handle sort spools from multiple AMPs (see the following diagram).
- Nodes that contain multiple AMPs must first perform an intermediate sort of the spools generated by each of the local AMPs.
When the local sort is complete on each node, the lowest sorting row from each node is sent over the BYNET to PE1. From this point on, PE1 acts as the Merge coordinator among all the participating nodes.
- The Merge continues with PE1 building a globally sorted buffer.
When this buffer fills, PE1 forwards it to the application and begins building subsequent buffers.
- When a participant node has exhausted its sort spool, it sends a Done message to PE1.
This causes PE1 to prune this node from the set of Merge participants.
When there are no remaining Merge participants, PE1 sends the final buffer to the application along with an End Of File message.
Partition Elimination
A PPI can increase query efficiency through partition elimination, where partitions can automatically be skipped because they cannot contain qualifying rows.
The database supports several types of partition elimination.
Type | Description |
---|---|
Static | Based on constant conditions such as equality or inequality on the partitioning columns. |
Dynamic | The partitions to eliminate cannot be determined until the query is executed and the data is scanned. |
Delayed | Occurs with conditions comparing a partitioning column to a USING variable or built-in function such as CURRENT_DATE, where the Optimizer builds a somewhat generalized plan for the query but delays partition elimination until specific values of USING variables and built-in functions are known. |
- Partitioning expressions for the primary index of the table
- Conditions in the query
- Ability of the Optimizer to detect partition elimination
It is not always required that all values of the partitioning columns be specified in a query to have partition elimination occur.
IF a query … | THEN … |
---|---|
specifies values for all the primary index columns | the AMP where the rows reside can be determined and only a single AMP is accessed.
|
does not specify the values for all the primary index columns | an all-AMP full file scan is required for a table with an NPPI. However, with a PPI, if conditions are specified on the partitioning columns, partition elimination may reduce an all-AMP full file scan to an all-AMP scan of only the non-eliminated partitions. |
Single AMP Request With Partition Elimination
If a SELECT specifies values for all the primary index columns, the AMP where the rows reside can be determined and only a single AMP is accessed.
If conditions are also specified on the partitioning columns, partition elimination may reduce the number of partitions to be probed on that AMP.
Suppose the Employee table is defined with a single-level PPI where the partitioning column is dept_number.
Assume that a PE receives the following SELECT statement:
SELECT last_name FROM Employee WHERE employee_number = 1023 AND dept_number = 501;
The following flow diagram illustrates this process.
The AMP Step includes the list of partitions (in this case, P3) to access. Partition elimination (in this case, static partition elimination) reduces access to the partitions that satisfy the query requirements. In each partition in the list (in this case, only P3), look for rows with a given row hash value of the PI.
Partition elimination is similar for the Employee table with a multilevel PPI where one partitioning expression uses the dept_number column and another partitioning expression uses the hire_date column.
Assume that a PE receives the following SELECT statement:
SELECT last_name FROM Employee WHERE employee_number = 1023 AND dept_number = 501 AND hire_date BETWEEN DATE '2006-01-01' AND DATE '2006-12-31'
The following flow diagram illustrates this process.
No one was hired in department number 401 in 2006, so partition P4 is empty.
Related Topics
For more information on partition elimination, see Teradata Vantage™ - Database Design, B035-1094.