Second Attempt - Teradata Database

SQL Data Manipulation Language

Product
Teradata Database
Release Number
15.00
Language
English (United States)
Last Update
2018-09-28
dita:id
B035-1146
lifecycle
previous
Product Category
Teradata® Database

Second Attempt

Example 2: Outer Join But Does Not Return the Correct Answer

Example 2 is a correctly formed outer join, but it neither addresses the question posed nor returns the correct answer.

     SELECT c.custnum, b.monthly_revenue
     FROM sampdb.customer AS c
     LEFT OUTER JOIN sampdb2.revenue AS b ON c.custnum = b.custnum
     AND  c.district ='K'
     AND  b.data_year_month = 199707
     AND (c.service_type ='ABC'
     OR   c.service_type ='XYZ')
     ORDER BY 1;

This query returns 17713502 rows.

Brief Analysis of the Result

The query specifies a left outer join without qualifying the results table with a WHERE clause. The Optimizer constructs a plan for a left outer join exactly as specified by the query. The result is a join of every row in the outer table (17,713,502 rows) with the qualified rows of the inner table, those service types of either ABC or XYZ in district K in the month of July, 1997 (13,010 rows).

The cardinality of this result, 17,713,502 rows, is three orders of magnitude larger than the correct answer to the question we hope to answer.

Without a WHERE clause to qualify the result, an outer join result always contains at least 1 row for every row in the outer table.

Qualifying predicates placed in the ON clause do not eliminate rows from the result. Rather, they are treated as rows that are not matched with the inner table irrespective of whether join terms that reference both the inner and outer tables evaluate to TRUE. In other words, the selection criteria of an ON clause only define the rows to which nulls are to be appended for nonmatching rows. They do not restrict the result to rows matching the conditions they specify.

Note that the Optimizer does qualify the result to a degree, returning only those rows for which the predicate b.data_year_month = 199707 evaluates to TRUE in the second step (see “EXPLAIN Text for Example 2” on page 288).

EXPLAIN Text for Example 2

The following EXPLAIN text is reported on the query in Example 2:

Explanation
-----------------------------------------------------------------------
   1) First, we lock SAMPDB.CUSTOMER for read, and we lock SAMPDB2.REVENUE for read. 
   2) Next, we do an all-AMPs JOIN step from SAMPDB.CUSTOMER by way of a RowHash match
      scan with no residual conditions, which is joined to SAMPDB.CUSTOMER with a
      condition of ("SAMPDB2.REVENUE.DATA_YEAR_MONTH = 199707"). SAMPDB.CUSTOMER and
      SAMPDB2.REVENUE are left outer joined using a merge join, with condition(s) used
      for nonmatching on left table ("((SAMPDB.T1.SERVICE_TYPE='ABC') or
      (SAMPDB.T1.SERVICE_TYPE='XYZ')) and (SAMPDB. T1.DISTRICT = 'K')"), with a join
      condition of (" (SAMPDB.T1. CUSTNUM = SAMPDB2.REVENUE.CUSTNUM)"). The input
      table SAMPDB.CUSTOMER will not be cached in memory. The result goes into Spool
      1, which is built locally on the AMPs. Then we do a SORT to order Spool 1 by the
      sort key in spool field1. The size of Spool 1 is estimated to be 17,713,502 rows.
      The estimated time for this step is 7 minutes and 15 seconds. 
   3) Finally, we send out an END TRANSACTION step to all AMPs involved in processing
      the request. 
  --> The contents of Spool 1 are sent back to the user as the result of statement 1.
      The total estimated time is 0 hours and 7 minutes and 15 seconds. 

Analysis of the EXPLAIN Text

1 The Lock Manager applies the appropriate READ locks for the query.

2 The Optimizer selects the best join algorithm for this left outer join (a merge join) and applies the conditions from the ON clause.

3 The result of the query, 17,713,502 rows, is returned to the requestor.

This is not the correct answer to the business question we are attempting to ask.

The Coding Error

The following error was made in phrasing the SELECT statement used in this example:

  • All selection criteria for the query were placed in the ON clause.
  • Only selection criteria for the inner table in the outer join, those that define the nullable nonmatching rows for the outer join, should be placed in the ON clause.

    Selection criteria for the outer table (the customer table in this case) in the outer join must be placed in the WHERE clause.