The outer join is a join that retains rows from one or both of the joined tables, depending on whether it is declared to be a left, right, or full outer join, respectively. This means that rows that do not match the rows produced by the inner join are extended in the outer join result by reporting nulls in the nonmatching columns. From the perspective of set theory, these nonmatching column values are not missing values, but the empty set, which is a real value. Because SQL represents the empty set as a null, the semantics of outer joins must be evaluated carefully whenever there are actual missing values in the inner join portion of the operation.
Ambiguities presented by nulls in the outer join also make query optimization more difficult, because certain key methods of query optimization such as transitive closure often cannot be achieved.
Semantics of Nulls in the Outer Join
The motivation behind the outer join is to preserve information that is otherwise lost in an inner join. There can be no doubt that this is an important issue, but the SQL solution to the problem presents some inconsistencies regarding the semantics of the nulls reported by an outer join. The following contrast of simple inner and outer natural joins on the same two tables shows that the respective joins are between the following supplier and supplier_parts tables:
Here is the result of the inner join of the supplier and supplier_parts tables.
This join outcome has no information about supplier S5, therefore, it is said to have lost that information.
Here is the result of the natural outer join of the supplier and supplier_parts tables:
This join outcome retains information about supplier S5, therefore it is said to preserve that information. Note that the result reports nulls for the part_num and quantity columns for supplier S5.
Think back to the definition of null semantics that was presented in Semantics of SQL Nulls. Recall that the unequivocal definition for an SQL null is that it represents an unknown value. In this case, however, the values for both part_num and quantity for supplier S5 are known, and those values are both the empty set. Because SQL does not support empty sets directly, it substitutes nulls in their place. This is also true for the result of the length determination of an empty character string (see Inconsistencies in How SQL Treats Nulls).
The important thing to remember with respect to the semantics of outer join nulls is that nulls represent both the empty set (when used to extend the inner join to preserve information that would otherwise be lost) and semantically correct SQL nulls (when used to represent unknown information in the inner join portion of the result). The determination of which null is which is left to the user.
Note that in set theory, the empty set is sometimes referred to as the null set. It is extremely important to understand that the word null in this context means something entirely different from its meaning in the SQL language. The null set is simply a set that contains no members. It is, therefore, empty, which is why this set is also referred to as the empty set.
Also note that there is only one empty set in set theory, analogous to how there is only one value of 3 in the set of real numbers, only one value of 3 in the set of cardinal numbers, and so on. Therefore, it is referred to as the empty set, not "an empty set." Note, too, that studies of the properties of the empty set are sometimes referred to as nullology. Again, this term has absolutely nothing to do with nulls as they are defined by the SQL language.