In the following SELECT statement, no entry in predicate_2 of the subquery references any entry in table_list_1 of the main query; therefore, the subquery is completely self-contained and is not correlated. A self-contained subquery is also referred to as a local subquery.
SELECT column_list_1 FROM table_list_1 WHERE predicate_1 (SELECT column_list_2 FROM table_list_2 WHERE predicate_2);
The relationship between inner and outer query predicates is referred to as their correlation. The subquery in the previous statement is said to be uncorrelated, or noncorrelated, because it does not reference tables in the outer query. Because its subquery is local, the request restricts the number of its iterations to one. The results of the query are then joined with the results of the query made by the outer SELECT statement.
Correlated subqueries perform the subquery in parentheses once for each result row of the outer query. When a subquery references tables in the outer query that are not referenced within itself, the subquery is said to be correlated because its result is directly correlated with the outer query. A correlated subquery performs once for each value from its containing outer query. It does not necessarily produce a unique result for each of those iterations.
The following example shows the behavior of an SQL request that contains a correlated subquery.
Assume that table_1 has columns col_1 and col_2, while table_2 has columns col_3 and col_4. The following four rows exist in the two tables.
table_1 | table_2 | |||
---|---|---|---|---|
col_1 | col_2 | col_3 | col_4 | |
100 | 1 | 100 | 1 | |
50 | 1 | 50 | 1 | |
20 | 2 | 20 | 2 | |
40 | 2 | 40 | 2 |
The following SELECT statement uses a correlated subquery in its WHERE clause to query table_1 and table_2.
SELECT * FROM table_1 WHERE col_1 IN (SELECT MAX(col_3) FROM table_2 WHERE table_1.col_2=table_2.col_4);
The statement returns two response rows because the subquery is performed four times: once for each row in table_1.
The result contains only 2 response rows because of the MAX(col_3) aggregation constraint and two of the subquery executions return a response row where col_1 is not in the result.
- col_1=100, col_2=1
- col_1=40, col_2=2.
The four executions of the subquery return the following response rows:
col_3 | col_4 |
100 | 1 |
100 | 1 |
40 | 2 |
40 | 2 |
Only the first and fourth rows of table_1 have a value for col_1 in this result set. If you had not specified the MAX aggregate function, then all four rows of table_1 would have been returned.