EXPLAIN Request Modifier and Join Processing - Analytics Database - Teradata Vantage

SQL Request and Transaction Processing

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Teradata Vantage
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-10-04
dita:mapPath
zfm1628111633230.ditamap
dita:ditavalPath
qkf1628213546010.ditaval
dita:id
evd1472255317510
lifecycle
latest
Product Category
Teradata Vantage™

The following descriptions are generated by EXPLAIN when applied to sample join requests.

Each explanation is preceded by the request syntax and is followed by a listing of any new terminology found in the display.

Product Join Example

This request has a WHERE condition based on the value of a unique primary index, which produces efficient processing even though a product join is used.

EXPLAIN 
SELECT Hours, EmpNo, Description
FROM Charges, Project
WHERE Charges.Proj_Id = 'ENG-0003'
AND Project.Proj_Id = 'ENG-0003'    
AND Charges.WkEnd > Project.DueDate;

This request returns the following EXPLAIN report:

Explanation 
--------------------------------------------------------
...
2) Next, we do a single-AMP RETRIEVE step from PERSONNEL.Project
   by way of the unique primary index 
   "PERSONNEL.Project.Proj_Id = ’ENG-003’" with no residual
   conditions into Spool 2, which is duplicated on all AMPs.
   The size of Spool 2 is estimated to be 4 rows. The estimated
   time for this step is 0.07 seconds.
3) We do an all AMPs JOIN step from Spool 2 (Last Use) by way of
   an all-rows scan, which is joined to PERSONNEL.Charges with a
   condition of ("PERSONNEL.Charges.Proj_Id = ’ENG-0003’"). 
   Spool 2 and PERSONNEL.Charges are joined using a product join,
   with a join condition of
   ("PERSONNEL.Charges.WkEnd > Spool_2.DueDate"). The result
   goes into Spool 1, which is built locally on the AMPs. The size
   of Spool 1 is estimated to be 6 rows. The estimated time for this
   step is 0.13 seconds.
...

New terminology in this explanation is defined as follows:

Phrase Definition
Single-AMP RETRIEVE step from … by way of the unique primary index A single row of a table is selected using a unique primary index that hashes to the single AMP on which the row is stored.
Duplicated on all AMPs The contents of a spool file, selected from the first table involved in the join, is replicated on all the AMPs that contain another table involved in the join.
All-AMPs JOIN step … by way of an all-rows scan Table rows are searched row by row on each AMP on which they reside. Rows that satisfy the join condition are joined to the spooled row or rows.
Condition of … An intermediate condition used to qualify the joining of selected rows with spooled rows (as compared with an overall join condition).
Product join One of the types of join processing performed by Vantage.

Merge Join Example

This request returns the following EXPLAIN report:

EXPLAIN 
SELECT Name, DeptName, Loc
FROM Employee, Department
WHERE Employee.DeptNo = Department.DeptNo;
Explanation 
--------------------------------------------------------
...
2) Next, we do an all-AMPs RETRIEVE step from
   PERSONNEL.Employee by way of an all-rows scan with no
   residual conditions into Spool 2, which is redistributed
   by hash code to all AMPs. Then we do a SORT to order Spool
   2 by row hash. The size of Spool 2 is estimated to be 8 rows.
   The estimated time for this step is 0.10 seconds.
3) We do an all-AMPs JOIN step from PERSONNEL.Department by way
   of a RowHash match scan with no residual conditions, which is
   joined to Spool 2 (Last Use). PERSONNEL.Department and Spool 2
   are joined using a merge join, with a join condition of
   ("Spool_2.DeptNo = PERSONNEL.Department.DeptNo"). The result
   goes into Spool 1, which is built locally on the AMPs. The 
   size of Spool 1 is estimated to be 8 rows. The estimated time
   for this step is 0.11 seconds.
...

New terminology in this explanation is defined as follows:

Phrase Definition
merge join One of the join methods used by Vantage.

Hash Join Example

This request returns the following EXPLAIN report:

EXPLAIN 
SELECT Employee.EmpNum,Department.DeptName, Employee.Salary
FROM Employee, Department
WHERE Employee.Location = Department.Location;
Explanation 
-------------------------------------------
...
4) We do an all-AMPs RETRIEVE step from PERSONNEL.Employee by way of
   an all-rows scan with no residual conditions into Spool 2 fanned out
   into 22 hash join partitions, which is redistributed by hash code to
   all AMPs. The size of Spool 2 is estimated to be 3,995,664 rows.
   The estimated time for this step is 3 minutes and 54 seconds.
5) We do an all-AMPs RETRIEVE step from PERSONNEL.Department by way of
   an all-rows scan with no residual conditions into Spool 3 fanned out
   into 22 hash join partitions, which is redistributed by hash code to
   all AMPs. The size of Spool 3 is estimated to be 4,000,256 rows. The
   estimated time for this step is 3 minutes and 54 seconds.
6) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
   all-rows scan, which is joined to Spool 3 (Last Use). Spool 2 and
   Spool 3 are joined using a hash join of 22 partitions, with a join
   condition of ("Spool_2.Location = Spool_3.Location"). The result 
   goes into Spool 1, which is built locally on the AMPs. The result
   spool field will not be cached in memory. The size of Spool 1 is
   estimated to be 1,997,895,930 rows. The estimated time for this
   step is 4 hours and 42 minutes.
...

New terminology in this explanation is defined as follows:

           Phrase                                                       Definition
hash join One of the types of join processing performed by Vantage.

Nested Join Example

This request returns the following EXPLAIN report:

EXPLAIN 
SELECT dept_name, name, yrs_exp
FROM employee, department
WHERE employee.emp_no  = department.mgr_no
AND   department.dept_no = 100;
Explanation 
-------------------------------------------------------
1) First, we do a single AMP JOIN step from PERSONNEL.Department by way of the unique primary index "PERSONNEL.Department.Dept_No = 100" with no residual condition which is joined to PERSONNEL.Employee by way of the unique primary index "PERSONNEL.Employee.Emp_No = PERSONNEL.Department.Mgr_No". PERSONNEL.Department and PERSONNEL.Employee are joined using a nested join. The result goes into Spool 1, which is built locally on that AMP. The size of Spool 1 is estimated to be 1 rows. The estimated time for this step is 0.10 seconds.
...

New terminology in this explanation is defined as follows:

Phrase Definition
Single-AMP JOIN step from … by way of the unique primary index A single row on one AMP is selected using the value of a unique index defined for its table.

Using the value in that row which hashes to a unique index value in a row of a second table on another AMP, the first row is joined with the second row. (When a table is accessed by its unique primary index, the need for a rowhash lock on the table is implied, though not explicitly stated in the Explain text.)

Nested join One of the types of join processing performed by Vantage.

Exclusion Merge Join Example

The request in this example selects columns only from the primary table.

If an additional column was selected from the table being joined through the subquery (for example, if the request is SELECT name, dept_no, loc FROM employee, department), the result is a Cartesian product.

EXPLAIN 
SELECT name, dept_no
FROM employee
WHERE dept_no NOT IN (SELECT dept_no
                      FROM department
                      WHERE loc = 'CHI')
ORDER BY name;

This request returns the following EXPLAIN report:

Explanation 
--------------------------------------------------------
...
2) Next, we execute the following steps in parallel.
   1) We do an all-AMPs RETRIEVE step from 
      PERSONNEL.Department by way of an all-rows
      scan with a condition of 
      ("PERSONNEL.Department.Loc = ’CHI’") into Spool 2, 
      which is redistributed by hash code to all AMPs. Then we
      do a SORT to order Spool 2 by row hash and the sort key
      in spool field1 eliminating duplicate rows. The size of
      Spool 2 is estimated to be 4 rows. The estimated time for
      this step is 0.07 seconds.
   2) We do an all-AMPs RETRIEVE step from PERSONNEL.Employee by
      way of an all-rows scan with no residual conditions into
      Spool 3, which is redistributed by hash code to all AMPs.
      Then we do a SORT to order Spool 3 by row hash. The size
      of Spool 3 is estimated to be 8 rows. The estimated time
      for this step is 0.10 seconds.
3) We do an all-AMPs JOIN step from Spool 3 (Last Use) by way of
   Spool 3 an all-rows scan, which is joined to Spool 2 (Last Use). 
   and Spool 2 are joined using an exclusion merge join, with a
   join condition of ("Spool_3.Dept_No = Spool_2.Dept_No"). The
   result goes into Spool 1, which is built locally on the AMPs.
   Then we do a SORT to order Spool 1 by the sort key in spool
   field1. The size of Spool 1 is estimated to be 8 rows. The
   estimated time for this step is 0.13 seconds.
...

New terminology in this explanation is defined as follows:

Phrase Definition
SORT to order Spool 2 by row hash and the sort key in spool field1 eliminating duplicate rows ... Rows in the spool are sorted by hash code using a uniqueness sort to eliminate duplicate rows. Uniqueness is based on the data in spool field1.

The contents of spool field1 depend on the query and may comprise any of the following:

  • A concatenation of all the fields in the spool row (used for queries with SELECT DISTINCT or that involve a UNION, INTERSECT, or MINUS operation).
  • A concatenation of the rowIDs that identify the data rows from which the spool row was formed (used for complex queries involving subqueries).
  • Another value that uniquely defines the spool row (used for complex queries involving aggregates and subqueries).
SORT to order Spool 1 by the sort key in spool field1 This last sort is in response to the “ORDER BY” clause attached to the primary SELECT request.
exclusion merge join One of the types of join processing performed by Vantage.

SELECT Example with a Condition Based on a Subquery

In this example, the constraint that governs the join is defined by a subquery predicate and the ORDER BY clause specifies a sorted result.

The EXPLAIN request modifier generates the following response for this request:

EXPLAIN 
SELECT name, emp_no
FROM employee
WHERE emp_no IN (SELECT emp_no
                 FROM charges)
ORDER BY name;
Explanation
--------------------------------------------------------------------
...
3) We lock PERSONNEL.charges for read, and we lock PERSONNEL.employee
   for read.
4) We do an all-AMPs RETRIEVE step from PERSONNEL.charges by way of an
   all-rows scan with no residual conditions into Spool 2, which is
   redistributed by hash code to all AMPs. Then we do a SORT to order
   Spool 2 by row hash and the sort key in spool field1 eliminating
   duplicate rows. The size of Spool 2 is estimated to be 12 rows.
   The estimated time for this step is 0.15 seconds.
5) We do an all-AMPs JOIN step from PERSONNEL.employee by way of an
   all-rows scan with no residual conditions, which is joined to
   Spool 2 (Last Use). PERSONNEL.employee and Spool 2 are joined using
   an inclusion merge join, with a join condition of
   ("PERSONNEL.employee.EmpNo = Spool_2.EmpNo"). The result goes into
   Spool 1, which is built locally on the AMPs. Then we do a SORT to
   order Spool 1 by the sort key in spool field1. The size of Spool 1
   is estimated to be 12 rows. The estimated time for this step is 
   0.07 seconds.
...