16.10 - EXPLAIN Request Modifier and Join Processing - Teradata Database

Teradata Database SQL Request and Transaction Processing

prodname
Teradata Database
vrm_release
16.10
created_date
June 2017
category
Programming Reference
User Guide
featnum
B035-1142-161K

The following descriptions are generated by EXPLAIN when applied to sample join requests.

Each explanation is preceded by the request syntax and is followed by a listing of any new terminology found in the display.

Product Join Example

This request has a WHERE condition based on the value of a unique primary index, which produces efficient processing even though a product join is used.

     EXPLAIN SELECT Hours, EmpNo, Description
             FROM Charges, Project
             WHERE Charges.Proj_Id = 'ENG-0003'
             AND Project.Proj_Id = 'ENG-0003'    
             AND Charges.WkEnd > Project.DueDate ;

This request returns the following EXPLAIN report:

Explanation 
--------------------------------------------------------
1) First, we lock PERSONNEL.Charges for read.
2) Next, we do a single-AMP RETRIEVE step from PERSONNEL.Project
   by way of the unique primary index 
   "PERSONNEL.Project.Proj_Id = ’ENG-003’" with no residual
   conditions into Spool 2, which is duplicated on all AMPs.
   The size of Spool 2 is estimated to be 4 rows. The estimated
   time for this step is 0.07 seconds.
3) We do an all AMPs JOIN step from Spool 2 (Last Use) by way of
   an all-rows scan, which is joined to PERSONNEL.Charges with a
   condition of ("PERSONNEL.Charges.Proj_Id = ’ENG-0003’"). 
   Spool 2 and PERSONNEL.Charges are joined using a product join,
   with a join condition of
   ("PERSONNEL.Charges.WkEnd > Spool_2.DueDate"). The result
   goes into Spool 1, which is built locally on the AMPs. The size
   of Spool 1 is estimated to be 6 rows. The estimated time for this
   step is 0.13 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
   in processing the request.
-> The contents of Spool 1 are back to the user as the result of
   statement 1. The total estimated time is 0.20 seconds.

New terminology in this explanation is defined as follows:

Phrase Definition
single-AMP RETRIEVE step from … by way of the unique primary index A single row of a table is selected using a unique primary index that hashes to the single AMP on which the row is stored.
duplicated on all AMPs The contents of a spool file, selected from the first table involved in the join, is replicated on all the AMPs that contain another table involved in the join.
all-AMPs JOIN step … by way of an all-rows scan Table rows are searched row by row on each AMP on which they reside. Rows that satisfy the join condition are joined to the spooled row or rows.
condition of … An intermediate condition used to qualify the joining of selected rows with spooled rows (as compared with an overall join condition).
product join One of the types of join processing performed by Teradata Database.

Merge Join Example

This request returns the following EXPLAIN report:

     EXPLAIN SELECT Name, DeptName, Loc
             FROM Employee, Department
             WHERE Employee.DeptNo = Department.DeptNo ;
Explanation 
--------------------------------------------------------
1) First, we lock PERSONNEL.Department for read, and we lock
   PERSONNEL.Employee for read.
2) Next, we do an all-AMPs RETRIEVE step from
   PERSONNEL.Employee by way of an all-rows scan with no
   residual conditions into Spool 2, which is redistributed
   by hash code to all AMPs. Then we do a SORT to order Spool
   2 by row hash. The size of Spool 2 is estimated to be 8 rows.
   The estimated time for this step is 0.10 seconds.
3) We do an all-AMPs JOIN step from PERSONNEL.Department by way
   of a RowHash match scan with no residual conditions, which is
   joined to Spool 2 (Last Use). PERSONNEL.Department and Spool 2
   are joined using a merge join, with a join condition of
   ("Spool_2.DeptNo = PERSONNEL.Department.DeptNo"). The result
   goes into Spool 1, which is built locally on the AMPs. The 
   size of Spool 1 is estimated to be 8 rows. The estimated time
   for this step is 0.11 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs
   involved in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
   statement 1. The total estimated time is 0.22 seconds.

New terminology in this explanation is defined as follows:

         Phrase                                                         Definition
merge join One of the join methods used by Teradata Database.

Hash Join Example

This request returns the following EXPLAIN report:

     EXPLAIN SELECT Employee.EmpNum,Department.DeptName, Employee.Salary
             FROM Employee, Department
             WHERE Employee.Location = Department.Location ;
   
     ***Help information returned. 30 rows.
     ***Total elapsed time was 1 second.
Explanation 
-------------------------------------------
1) First, we lock a distinct PERSONNEL."pseudo table" for read on
   a RowHash to prevent global deadlock for PERSONNEL.Department.
2) Next, we lock a distinct PERSONNEL."pseudo table" for read on
   a RowHash to prevent global deadlock for PERSONNEL.Employee.
3) We lock PERSONNEL.Department for read, and we lock
   PERSONNEL.Employee for read.
4) We do an all-AMPs RETRIEVE step from PERSONNEL.Employee by way of
   an all-rows scan with no residual conditions into Spool 2 fanned out
   into 22 hash join partitions, which is redistributed by hash code to
   all AMPs. The size of Spool 2 is estimated to be 3,995,664 rows.
   The estimated time for this step is 3 minutes and 54 seconds.
5) We do an all-AMPs RETRIEVE step from PERSONNEL.Department by way of
   an all-rows scan with no residual conditions into Spool 3 fanned out
   into 22 hash join partitions, which is redistributed by hash code to
   all AMPs. The size of Spool 3 is estimated to be 4,000,256 rows. The
   estimated time for this step is 3 minutes and 54 seconds.
6) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
   all-rows scan, which is joined to Spool 3 (Last Use). Spool 2 and
   Spool 3 are joined using a hash join of 22 partitions, with a join
   condition of ("Spool_2.Location = Spool_3.Location"). The result 
   goes into Spool 1, which is built locally on the AMPs. The result
   spool field will not be cached in memory. The size of Spool 1 is
   estimated to be 1,997,895,930 rows. The estimated time for this
   step is 4 hours and 42 minutes.
7) Finally, we send out an END TRANSACTION step to all AMPs involved
   in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
   statement 1. The total estimated time is 4 hours and 49 minutes.
   DBS Control Record - Performance Fields:
   HTMemAlloc        = 5%
   SkewAllowance     = 75%

New terminology in this explanation is defined as follows:

           Phrase                                                       Definition
hash join One of the types of join processing performed by Teradata Database.

Nested Join Example

This request returns the following EXPLAIN report:

     EXPLAIN SELECT dept_name, name, yrs_exp
             FROM employee, department
             WHERE employee.emp_no  = department.mgr_no
             AND   department.dept_no = 100;
Explanation 
-------------------------------------------------------
1) First, we do a single AMP JOIN step from PERSONNEL.Department by way of the unique primary index "PERSONNEL.Department.Dept_No = 100" with no residual condition which is joined to PERSONNEL.Employee by way of the unique primary index "PERSONNEL.Employee.Emp_No = PERSONNEL.Department.Mgr_No". PERSONNEL.Department and PERSONNEL.Employee are joined using a nested join. The result goes into Spool 1, which is built locally on that AMP. The size of Spool 1 is estimated to be 1 rows. The estimated time for this step is 0.10 seconds.
> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.10 seconds.

New terminology in this explanation is defined as follows:

Phrase Definition
single-AMP JOIN step from … by way of the unique primary index A single row on one AMP is selected using the value of a unique index defined for its table.

Using the value in that row which hashes to a unique index value in a row of a second table on another AMP, the first row is joined with the second row. (Note that when a table is accessed by its unique primary index, the need for a rowhash lock on the table is implied, even though it is not explicitly stated in the Explain text.)

nested join One of the types of join processing performed by the Teradata Database.

Exclusion Merge Join Example

The request in this example selects columns only from the primary table.

If an additional column was selected from the table being joined via the subquery (for example, if the request was SELECT name, dept_no, loc FROM employee, department), the result would be a Cartesian product.

     EXPLAIN SELECT name, dept_no
             FROM employee
             WHERE dept_no NOT IN (SELECT dept_no
                                   FROM department
                                   WHERE loc = 'CHI')
             ORDER BY name;

This request returns the following EXPLAIN report:

Explanation 
--------------------------------------------------------
1) First, we lock PERSONNEL.Department for read, and we lock 
   PERSONNEL.Employee for read.
2) Next, we execute the following steps in parallel.
   1) We do an all-AMPs RETRIEVE step from 
      PERSONNEL.Department by way of an all-rows
      scan with a condition of 
      ("PERSONNEL.Department.Loc = ’CHI’") into Spool 2, 
      which is redistributed by hash code to all AMPs. Then we
      do a SORT to order Spool 2 by row hash and the sort key
      in spool field1 eliminating duplicate rows. The size of
      Spool 2 is estimated to be 4 rows. The estimated time for
      this step is 0.07 seconds.
   2) We do an all-AMPs RETRIEVE step from PERSONNEL.Employee by
      way of an all-rows scan with no residual conditions into
      Spool 3, which is redistributed by hash code to all AMPs.
      Then we do a SORT to order Spool 3 by row hash. The size
      of Spool 3 is estimated to be 8 rows. The estimated time
      for this step is 0.10 seconds.
3) We do an all-AMPs JOIN step from Spool 3 (Last Use) by way of
   Spool 3 an all-rows scan, which is joined to Spool 2 (Last Use). 
   and Spool 2 are joined using an exclusion merge join, with a
   join condition of ("Spool_3.Dept_No = Spool_2.Dept_No"). The
   result goes into Spool 1, which is built locally on the AMPs.
   Then we do a SORT to order Spool 1 by the sort key in spool
   field1. The size of Spool 1 is estimated to be 8 rows. The
   estimated time for this step is 0.13 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs
   involved in processing the request.
-> The contents of Spool 1 are sent back to the user as the
   result of statement 1. The total estimated time is 
   0.23 seconds.

New terminology in this explanation is defined as follows:

Phrase Definition
SORT to order Spool 2 by row hash and the sort key in spool field1 eliminating duplicate rows... Rows in the spool are sorted by hash code using a uniqueness sort to eliminate duplicate rows. Uniqueness is based on the data in spool field1.

The contents of spool field1 depend on the query and might comprise any of the following:

  • A concatenation of all the fields in the spool row (used for queries with SELECT DISTINCT or that involve a UNION, INTERSECT, or MINUS operation).
  • A concatenation of the rowIDs that identify the data rows from which the spool row was formed (used for complex queries involving subqueries).
  • Some other value that uniquely defines the spool row (used for complex queries involving aggregates and subqueries).
SORT to order Spool 1 by the sort key in spool field1 This last sort is in response to the “ORDER BY” clause attached to the primary SELECT request.
exclusion merge join One of the types of join processing performed by Teradata Database.