First Attempt - Teradata Database

SQL Data Manipulation Language

Product
Teradata Database
Release Number
15.00
Language
English (United States)
Last Update
2018-09-28
dita:id
B035-1146
lifecycle
previous
Product Category
Teradata® Database

First Attempt

Example : Outer Join That Is Really an Inner Join

Example 1 makes an explicit outer join request, but its result might surprise you.

     SELECT c.custnum, b.monthly_revenue
     FROM sampdb.customer AS c
     LEFT OUTER JOIN sampdb2.revenue AS b ON c.custnum = b.custnum
     WHERE c.district ='K'
     AND   b.data_date = 199707
     AND  (c.service_type = 'ABC'
     OR    c.service_type = 'XYZ')
     ORDER BY 1;

This query returns 13,010 rows, the same as the simple inner join of “Inner Join Cardinality Estimate” on page 284.

Brief Analysis of the Result

Although the request specifies a LEFT OUTER JOIN, the Optimizer treats it as a simple inner join. Why does it convert the explicitly requested left outer join to a simple inner join?

This result occurs because all the selection criteria are in the WHERE clause, so they are logically applied only after the outer join processing has been completed. This means that Examples 1 and 3 are logically identical, so they produce the same result.

EXPLAIN Text for Example 1

An EXPLAIN for the query in Example 1 generates the following text.

Notice in step 2 of the EXPLAIN text that the Optimizer recognizes this query to be an inner join and executes it as such (“...joined using a merge join...” rather than “...left outer joined using a merge join...”). Therefore, it executes with the speed of an inner join:

Explanation
-----------------------------------------------------------------------
   1. First, we lock SAMPDB.CUSTOMER for read, and we lock    SAMPDB2.REVENUE for
      read. 
   2. Next, we do an all-AMPs JOIN step from SAMPDB.CUSTOMER by way of a RowHash match
      scan with a condition of ("(SAMPDB.    T1.DISTRICT = 'K') and ((SAMPDB.
      T1.SERVICE_TYPE= 'ABC ')  or (SAMPDB. T1.SERVICE_ TYPE = 'XYZ '))"), which is
      joined to SAMPDB.CUSTOMER with a condition of ("SAMPDB2.REVENUE.DATA_YEAR_MONTH
      = 199707"). SAMPDB.CUSTOMER and SAMPDB2.REVENUE are joined using a merge join,
      with a join condition of ("(SAMPDB.CUSTOMER.CUSTNUM = SAMPDB2.
      REVENUE.CUSTNUM)"). The input table SAMPDB.CUSTOMER will not be cached in
      memory. The result goes into Spool 1, which is built locally on the AMPs. Then we
      do a SORT to order Spool 1 by the sort key in spool field1. The size of Spool 1
      is estimated to be 1,328,513 rows. The estimated time for this step is 6 minutes
      and 2 seconds. 
   3. Finally, we send out an END TRANSACTION step to all AMPs involved in processing
      the request. 
   -> The contents of Spool 1 are sent back to the user as the result of statement
   1. The total estimated time is 0 hours and 6 minutes and 2 seconds.

Analysis of the EXPLAIN Text

1 The Lock Manager applies the appropriate READ locks for the query.

2 The Optimizer selects the best join algorithm for this inner join (a Merge Join - see SQL Request and Transaction Processing for the definition of a Merge Join) and applies the conditions from the WHERE clause.

Logically, the WHERE clause terms are applied after the terms in the ON clause of the LEFT OUTER JOIN have been applied. That would result in an intermediate step of 18,034 rows. However, the term B.DATA_YEAR_MONTH=199707 from the WHERE clause would then applied, eliminating all rows where the value for B.DATA_YEAR_MONTH is null and returning the final, incorrect, result of 13,010 rows.

These steps are actually headed off by the Optimizer, which recognizes that the WHERE clause references an inner table that does not evaluate to TRUE for nulls and is therefore equivalent to an inner join. As a result, it generates a plan for a simple inner join, ignoring the request for a left outer join.

You can see this in the EXPLAIN text where it says “...SAMPDB.CUSTOMER and SAMPDB2.REVENUE are joined using a merge join...” Had the plan used an outer join, the EXPLAIN text would have said “...SAMPDB.CUSTOMER and SAMPDB2.REVENUE are left outer joined using a merge join...”

3 The result of the query, 13,010 rows, is returned to the requestor.

This is not the correct answer to the business question we are attempting to ask.

The Coding Error

The following error was made in coding the SELECT statement used in this example:

  • The selection criteria for the outer table in the query (customer) are placed in the WHERE clause rather than the ON clause, resulting in the query being processed as a simple inner join rather than the intended left outer join.
  • You can examine this from two different perspectives, both of which explain why the expected result was not returned.

    1 Logical explanation. The WHERE clause appears later in the query than the ON clause. Because of this, the WHERE clause predicates are evaluated after the ON clause predicates.

    In this case, the result table for the ON clause has 18,034 rows. When the WHERE clause predicate b.data_year_month=199707 is applied, you should expect the ensuing result table to have fewer rows because this condition eliminates any results rows where b.data_year_month is null, and an outer join, by definition, produces results nulls on selection predicates.

    2 Physical explanation. Note that the EXPLAIN text for this query does not follow the logical explanation.

    One of the fundamental tasks of the Optimizer is to reorganize queries into an algebraic form that can be analyzed quickly. In this case, the Optimizer recognizes that a predicate referencing the inner table of an outer join does not evaluate to TRUE when a column referenced in that predicate expression contains nulls.

    Because of this, the Optimizer recognizes the join to be an inner, not an outer, join and so evaluates the query in that light. As a result, it generates a plan to perform an inner join (you can see this in the second step of the EXPLAIN text where it says “...are joined using a merge join...” The wording would have been “...are joined using an outer merge join...” if the query had been formulated correctly as an outer join.

    While this is the correct interpretation of the SELECT statement as written, the query does not formulate the business question correctly.