Comparing Correlated and Noncorrelated Subqueries - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

In the following SELECT statement, no entry in predicate_2 of the subquery references any entry in table_list_1 of the main query; therefore, the subquery is completely self-contained and is not correlated. A self-contained subquery is also called a local subquery.

     SELECT column_list_1
     FROM table_list_1
     WHERE predicate_1 (SELECT column_list_2
                        FROM table_list_2
                        WHERE predicate_2);

The relationship between inner and outer query predicates is called the correlation. The subquery in the previous statement is uncorrelated, or noncorrelated, because the subquery does not reference tables in the outer query. Because its subquery is local, the request restricts the number of its iterations to one. The results of the query are then joined with the results of the query made by the outer SELECT statement.

Correlated subqueries perform the subquery in parentheses once for each result row of the outer query. When a subquery references tables in the outer query that are not referenced within itself, the subquery is correlated, because its result is directly correlated with the outer query. A correlated subquery performs once for each value from its containing outer query. A correlated subquery does not necessarily produce a unique result for each of those iterations.

The following example shows the behavior of an SQL request that contains a correlated subquery.

Assume that table_1 has columns col_1 and col_2:

col_1 col_2
100 1
50 1
20 2
40 2

Assume that table_2 has columns col_3 and col_4:

col_3 col_4
100 1
50 1
20 2
40 2

The following SELECT statement uses a correlated subquery in its WHERE clause to query table_1 and table_2.

     SELECT *
     FROM table_1
     WHERE col_1 IN (SELECT MAX(col_3)
                     FROM table_2
                     WHERE table_1.col_2=table_2.col_4);

The statement returns two response rows because the subquery is performed four times: once for each row in table_1.

The result contains only 2 response rows because of the MAX(col_3) aggregation constraint and two of the subquery executions return a response row where col_1 is not in the result.

The two rows returned are these:
col_1 col_2
100 1
40 2

The four executions of the subquery return the following response rows:

col_3 col_4
100 1
100 1
40 2
40 2

Only the first and fourth rows of table_1 have a value for col_1 in this result set. Without the MAX aggregate function, all four rows of table_1 are returned.