Recall that the defined semantics for SQL nulls is as the simple representation for missing information (see Semantics of SQL Nulls). A null is to be interpreted as missing information and nothing more. There are a number of areas in which the SQL treatment of nulls is inconsistent with this definition, and the purpose of this section is to summarize those inconsistencies.
- The empty set (notated symbolically as ø), which is a well-defined value in set theory, evaluates as null in SQL. The empty set is an actual value in mathematics, not a place holder for missing information. The union of the empty set with any set S yields S. In other words, the empty set is the identity under union. In SQL, however, nulls are used to indicate both missing information (and many other things) as well as the empty set. To demonstrate that nulls and the empty set are not the same thing in SQL, try to submit the following request and see what happens.
SELECT column_name FROM table_name UNION NULL;
Furthermore, in set theory, ø = ø, while in SQL, NULL ≠ NULL.
For example, the evaluation of an empty character string in SQL returns null rather than the mathematically correct empty set. Similarly, aggregation over empty sets report null, which is meant to mean that the value is missing, but there are known results having real values for such operations on empty sets.
The problem with empty sets is also true for the outer join, where the extended columns in the join result are denoted by nulls, but are actually empty sets. As a result, nulls are used to represent both empty sets and missing information in the same report.
- Nulls have no value by definition, but sort as if they were all equal to one another.
- Similarly, when Vantage makes duplicate row checks, it treats nulls as if they were equal to one another.
- A null unique primary index is valid in Vantage. In this situation, the non-value NULL is treated as if it were a unique value, which it is not, because by definition, NULL is neither a value nor unique.
There can be only one null UPI per table, and if a table defined with a NUPI has very many null primary indexes, the distribution of those rows across the AMPs will be very skewed.
- With the exception of COUNT(*), SQL aggregate operations ignore nulls, while SQL arithmetic operations do not.
- With the exception of CASE, COALESCE, and NULLIF, nulls are not valid predicate conditions in SQL expressions.
- With the exception of CASE, COALESCE, and NULLIF, SQL expressions cannot neither return nor operate on nulls.
- When a CASE, COALESCE, or NULLIF expression returns a null literal, it has INTEGER data type. All other nulls are untyped.
- Null is a relatively common European family name, so its use as a literal default for name columns can produce incorrect query results if there are instances of the family name Null in those columns.
- Structured UDTs can have null attributes, but the semantics of null attributes are different from the semantics of a null field in a column. For example, consider a structured UDT that is composed of a single attribute. If that attribute is set null, the field value in that column is not treated as a null with respect to the UDT column value.
You must explicitly place a null marker into the UDT column for the column “value” to be considered null.
The semantics of a null data type, whether partially or wholly null, are difficult to grasp. It can be said that a null attribute represents a missing type definition, but nulls are defined in ANSI/ISO SQL to represent missing values, not missing type definitions. The semantics of a UDT are what its designer defines them to be, of course, but from a logical perspective, it would seem that the best semantic definition of a null UDT would be undefined.
Using technology borrowed from object-oriented programming languages, it might be said that nulls superficially appear to be overloaded in SQL because multiple markers having different semantics are all subsumed under the same name: null. However, unlike the case for overloaded functions in object-oriented languages, it is not possible to discriminate among the various semantic possibilities a given null marker presents to a user or routine that must distinguish its intended semantics from among the myriad possible interpretations.