Third Attempt - Teradata Database

SQL Data Manipulation Language

Product
Teradata Database
Release Number
15.00
Language
English (United States)
Last Update
2018-09-28
dita:id
B035-1146
lifecycle
previous
Product Category
Teradata® Database

Third Attempt

Example 3: Outer Join That Is Really an Inner Join

Example 3 is another query where an explicit outer join is specified, but that join is logically an inner join, so the Optimizer transforms it into a simple inner join before performing the query.

     SELECT c.custnum, b.monthly_revenue
     FROM sampdb.customer c
     LEFT OUTER JOIN sampdb2.revenue b ON c.custnum = b.custnum
     AND  c.district ='K'
     AND (c.service_type = 'ABC' 
     OR   c.service_type = 'XYZ')
     WHERE b.data_year_month = 199707
     ORDER BY 1;

This query returns 13,010 rows.

Brief Analysis of the Result

Note the similarity to Example 1 (see “EXPLAIN Text for Example 1” on page 285).

Again, the Optimizer treats this query as an inner join even though the request explicitly specified an outer join. The WHERE clause on the inner table logically changes this query from an outer join to an inner join (see step 2 in “EXPLAIN Text for Example 3”).

As in previous examples, the WHERE clause is logically applied after the outer join processing completes, removing all rows that were nulled in the process: nonmatching rows between the left and right tables. Like “First Attempt” on page 285, the Optimizer knows to execute this as an inner join to improve the performance of the query.

EXPLAIN Text for Example 3

The following EXPLAIN text is reported on the query in Example 3.

As you can see, this EXPLAIN text output is identical to that of the EXPLAIN text for Example 1 (see “EXPLAIN Text for Example 1” on page 285) and, as expected, so is the answer set:

Explanation
-----------------------------------------------------------------------
   1. First, we lock SAMPDB.CUSTOMER for read, and we lock SAMPDB2.REVENUE or read. 
   2. Next, we do an all-AMPs JOIN step from SAMPDB.CUSTOMER by way of a RowHash match
      scan with a condition of ("(SAMPDB. T1.DISTRICT = 'K') and ((SAMPDB.
      T1.SERVICE_TYPE= 'ABC') or (SAMPDB. T1.SERVICE_ TYPE='XYZ'))"), which is joined
      to SAMPDB.CUSTOMER with a condition of ("SAMPDB2.REVENUE.DATA_YEAR_MONTH
      =199707"). SAMPDB.CUSTOMER and SAMPDB2.REVENUE are joined using a merge join,
      with a join condition of (" (SAMPDB.T1.CUSTNUM = SAMPDB2. REVENUE.CUSTNUM)").
      The input table SAMPDB.CUSTOMER will not be cached in memory. The result goes
      into Spool 1, which is built locally on the AMPs. Then we do a SORT to order
      Spool 1 by the sort key in spool field1. The size of Spool 1 is estimated to be
      1,328,513 rows. The estimated time for this step is 6 minutes and 2 seconds. 
   3. Finally, we send out an END TRANSACTION step to all AMPs involved in processing
      the request. 
  --> The contents of Spool 1 are sent back to the user as the result of statement
      1. The total estimated time is 0 hours and 6 minutes and 2 seconds. 

Analysis of the EXPLAIN Text

1 The Lock Manager applies the appropriate READ locks for the query.

2 The Optimizer selects the best join algorithm for this inner join (a merge join) and applies the conditions from the WHERE clause.

Logically, the WHERE clause term is applied after the terms in the ON clause of the LEFT OUTER JOIN have been applied. That would result in an intermediate step of 18,034 rows; however, the term b.data_year_month=199707 from the WHERE clause would then be applied, eliminating all rows where the value for b.data_year_month is null, and returning the final, incorrect, result of 13,010 rows.

These steps are headed off by the Optimizer, which recognizes that the WHERE clause references an inner table that does not evaluate to TRUE for nulls and is therefore equivalent to an inner join. As a result, it generates a plan for a simple inner join, ignoring the request for a left outer join.

You can see this in the EXPLAIN text where it says “...SAMPDB.CUSTOMER and SAMPDB2.REVENUE are joined using a merge join...” Had the plan used an outer join, the EXPLAIN text would have said “...SAMPDB.CUSTOMER and SAMPDB2.REVENUE are left outer joined using a merge join...”

3 The result of the query, 13,010 rows, is returned to the requestor.

This is not the correct answer to the business question we are attempting to ask.

The Coding Error

The following error was made in phrasing the SELECT statement used in this example:

The selection criterion of b.data_date = 199707 was placed in the WHERE clause rather than the ON clause, resulting in the query being processed as a simple inner join rather than the intended left outer join.