Correlated Subqueries - Teradata Database

SQL Data Manipulation Language

Product
Teradata Database
Release Number
15.10
Language
English (United States)
Last Update
2018-10-06
dita:id
B035-1146
lifecycle
previous
Product Category
Teradata® Database

A subquery is correlated when it references columns of outer tables in an enclosing or containing outer query. The expression correlated subquery comes from the explicit requirement for the use of correlation names in any correlated subquery in which the same table is referenced in both the internal and external query.

The reference from the inner query to its containing query is shown in this illustration.

When the inner table references a foreign key column of a different table, the column must be fully qualified in the interior WHERE clause.

The effect of a correlated subquery is to provide an implicit loop function within any standard SQL DML statement, including ABORT, DELETE, INSERT, and UPDATE.

A reference to variables in an outer expression by a variable in an inner expression is called an outer reference.

 

IF a scalar subquery references tables in its containing query that are …

THEN it is said to be …

not referenced in its own FROM clause

correlated.

referenced in its own FROM clause

noncorrelated.

If a column reference is made from an inner query to an outer query, then it must be fully qualified with the appropriate table name from the FROM clause of the outer query.

Correlated subqueries are ANSI SQL:2011-compliant.

The rules and restrictions are:

  • Correlated subqueries can specify combinations of equality and inequality between the tables of an outer query and the tables of a subquery.
  • For example, this SELECT statement specifies an equality condition in its subquery and specifies an inequality condition in the main query:

          SELECT * 
          FROM table_1 
          WHERE x < ALL (SELECT y
                         FROM table_2 
                         WHERE table_2.n = table_1.n);
  • You can nest correlated subqueries to a depth of 64.
  • For example, this SELECT statement specifies three levels of depth:

          SELECT * 
          FROM table_1 
          WHERE x = (SELECT y
                     FROM table_2
                     WHERE table_2.n = table_1.n
                     AND t2.m = (SELECT y
                                 FROM table_3 
                                 WHERE table_3.n = table_2.m 
                                 AND   table_1.l = table_3.l));
  • A correlated subquery can reference the same table as a table specified in the FROM clause of an outer query.
  • To differentiate between the references to the inner and outer tables, one of them must be renamed with a correlation name.

    In the first example, the outer table is aliased, while in the second example, the inner table is aliased.

    Also note that in the first example, the outer reference is a.n, while in the second example, the outer reference is table_1.n.

          SELECT * 
          FROM table_1 AS a 
          WHERE x < (SELECT AVG(table_1.x)
                     FROM table_1 
                     WHERE table_1.n = a.n);
     
          SELECT * 
          FROM table_1 
          WHERE x < (SELECT AVG(a.x)
                     FROM table_1 AS a 
                     WHERE table_1.n = a.n);
  • Correlated subqueries cannot contain SELECT AND CONSUME statements.
  • Correlated subqueries cannot contain the TOP n operator.
  • You can specify a SELECT in an outer query, a main query, or a subquery.
  •  

    IF a SELECT is specified in …

    THEN …

    the main query

    any table referenced in the main query should be specified in the FROM clause of the main query SELECT.

    a subquery

    any table referenced in that subquery must be specified either in the FROM clause of that query or in some outer query.

    If the select expression list of the subquery specifies a column reference to a table in the FROM clause of an outer query, the column reference must be fully qualified.

    If the correlated condition specifies an unqualified column reference, then Teradata Database searches for the column starting with the tables and views in the current subquery toward the tables and views in the outermost query.

  • Correlated subqueries can be specified in a FROM clause under the following conditions:
  •  

    IF a table referenced in a subquery is identified in the FROM clause of …

    THEN the table reference is …

    a subquery

    local and the subquery is noncorrelated.

    an outer query

    not local and the subquery is correlated.

    The following rules apply to specifying a FROM clause in a correlated subquery.

  • A FROM clause is required in any subquery specification.
  • The FROM clause is not required in the outer query. However, you should always write your applications to specify a FROM clause because future releases might enforce the ANSI SQL:2011 rule that all tables referenced in a SELECT must be specified in a FROM clause. More importantly, specifying tables explicitly in the FROM clause helps to prevent errors in coding your queries because it enhances their clarity.
  • Teradata Database returns a warning message if you do not specify a referenced table in the FROM clause.

    The EXISTS quantifier is supported as the predicate of a search condition in a WHERE clause.

    EXISTS tests the result of the subquery. The subquery is performed for each row of the table specified in the outer query when the subquery is correlated.

    If the subquery returns response rows, then its WHERE condition is satisfied.

    Outer references behave as described in the following process. This process does not mirror the query plan generated by the Optimizer. Instead, this process describes how a correlated subquery works at a conceptual level.

    1 For each row of an outer query, the values of the outer references in that row are used to evaluate the result of the inner subquery.

    2 The inner subquery expression result is joined with its associated row from the outer query based on the specified join constraint for the inner query.

    The semantics of correlated subqueries imply that an inner subquery expression is executed once for each row of its immediate outer query expression. The semantics do not guarantee that each iteration of the subquery produces a unique result.

    The following example of this behavior uses the following simple employee table.

     

    The following SELECT statement shows the behavior of a simple correlated subquery. Because the same table is referenced in both the inner and outer queries, both references are given correlation names for clarity, though only one of them (it makes no difference which) must be aliased.

         SELECT *
         FROM employee AS e1
         WHERE age < (SELECT MAX(age)
                      FROM employee AS e2
                      WHERE e1.sex = e2.sex);

    Here is the process:

    1 Two copies of the table described earlier are generated, one as e1 and the other as e2.

    2 Evaluation of the inner query requires data from the outer, containing, query.

    The evaluation of the inner query becomes a set of virtual individual queries such as the following:

         SELECT 101, 'Friedrich', 'F', 23
         FROM employee AS e1
         WHERE 23 < (SELECT MAX(age)
                     FROM employee AS e2
                     WHERE 'F' = e2.sex;
       …
         SELECT 108, 'Ghazal', 'F', 26
         FROM employee as e1
         WHERE 26 < (SELECT MAX(age)
                     FROM employee AS e2
                     WHERE 'F' = e2.sex;

    3 The expanded individual queries once the subquery has been evaluated for each row in the inner query look like the following:

         SELECT 101, 'Friedrich', F, 23
         FROM employee AS e1 
         WHERE 23 < 51;
     
         SELECT 102, 'Harvey', M, 47 
         FROM employee AS e1 
         WHERE 47 < 65;
     
         SELECT 103, 'Agrawal', M, 65 
         FROM employee AS e1 
         WHERE 65 < 65;
     
         SELECT 104, 'Valduriez', M, 34 
         FROM employee AS e1 
         WHERE 34 < 65;
     
         SELECT 105, 'Cariño', F, 39 
         FROM employee AS e1 
         WHERE 39 < 51;
     
         SELECT 106, 'Au', M, 28 
         FROM employee AS e1 
         WHERE 28 < 65;
     
         SELECT 107, 'Takamoto', F, 51 
         FROM employee AS e1 
         WHERE 51 < 51;
     
         SELECT 108, 'Ghazal', F, 26 
         FROM employee AS e1 
         WHERE 26 < 51;

    4 Teradata Database performs the same evaluations for each row in the outer query.

    5 Employee 103, Agrawal, is eliminated from the result because his age is not less than the maximum male age in the table. Similarly, employee 107, Takamoto, is eliminated because her age is not less than the maximum female age in the table.

    The final result is reported in the following table.

       emp_no emp_name  sex age
       -----  -------   --- ---
        101   Friedrich  F   23
        102   Harvey     M   47
        104   Valduriez  M   34
        105   Cariño     F   39
        106   Au         M   28
        108   Ghazal     F   26

    To perform a subquery as an noncorrelated subquery and not as a correlated subquery, observe the following guidelines.

  • For each main and subquery you code, ensure that its scope is local by specifying a FROM clause that includes all the tables it references.
  • Instead of specifying join operations in the main queries of DELETE and UPDATE statements, restrict them to subqueries.
  • In the following SELECT statement, no entry in predicate_2 of the subquery references any entry in table_list_1 of the main query; therefore, the subquery is completely self-contained and is not correlated. A self-contained subquery is also referred to as a local subquery.

         SELECT column_list_1 
         FROM table_list_1 
         WHERE predicate_1 (SELECT column_list_2
                            FROM table_list_2
                            WHERE predicate_2);

    The relationship between inner and outer query predicates is referred to as their correlation. The subquery in the previous statement is said to be uncorrelated, or noncorrelated, because it does not reference tables in the outer query. Because its subquery is local, the request restricts the number of its iterations to one. The results of the query are then joined with the results of the query made by the outer SELECT statement.

    Correlated subqueries perform the subquery in parentheses once for each result row of the outer query. When a subquery references tables in the outer query that are not referenced within itself, the subquery is said to be correlated because its result is directly correlated with the outer query. A correlated subquery performs once for each value from its containing outer query. It does not necessarily produce a unique result for each of those iterations.

    The following example shows the behavior of an SQL request that contains a correlated subquery.

    Assume that table_1 has columns col_1 and col_2, while table_2 has columns col_3 and col_4. The following four rows exist in the two tables.

     

    The following SELECT statement uses a correlated subquery in its WHERE clause to query table_1 and table_2.

         SELECT * 
         FROM table_1 
         WHERE col_1 IN (SELECT MAX(col_3)
                         FROM table_2 
                         WHERE table_1.col_2=table_2.col_4);

    The statement returns two response rows because the subquery is performed four times: once for each row in table_1.

    The result contains only 2 response rows because of the MAX(col_3) aggregation constraint (two of the subquery executions return a response row where col_1 is not in the result).

    The two rows returned are these:

  • col_1=100, col_2=1
  • col_1=40, col_2=2.
  • The four executions of the subquery return the following response rows:

     

    col_3

    col_4

    100

    1

    100

    1

    40

    2

    40

    2

    Only the first and fourth rows of table_1 have a value for col_1 in this result set. If you had not specified the MAX aggregate function, then all four rows of table_1 would have been returned.