Redesigning the Database to Eliminate the Need for Nulls - Advanced SQL Engine - Teradata Database

Database Design

Product
Advanced SQL Engine
Teradata Database
Release Number
17.05
17.00
Published
June 2020
Language
English (United States)
Last Update
2021-01-22
dita:mapPath
qby1588121512748.ditamap
dita:ditavalPath
lze1555437562152.ditaval
dita:id
B035-1094
lifecycle
previous
Product Category
Teradata Vantage™

In the first order predicate calculus, propositions can have one of two values: they can either be true or they can be false. Recall that the rows of a relational table correspond to logical propositions that evaluate to TRUE (see How Relational Databases Are Built From Logical Propositions), because all false propositions are excluded from the database by means of various integrity constraints.

One can build a set of axioms deriving from such a set of true propositions in addition to a set of well-defined inference rules (referred to as a calculus in formal logic). Additional true propositions, formally known as theorems, can be derived from this calculus. These derived propositions correspond to valid relational queries if and only if the following statements about the original propositions are true:

  • The initial set of rows represents only true propositions.
  • All operations on a relation that contains tuples corresponding to these true propositions obey the formal inference rules of bivalent logic.

If either of these statements is false, then the correctness (truth of the derived propositions) of any query made against the data cannot be guaranteed. Because SQL nulls represent data values that cannot be asserted, it is not possible to know whether rows that contain nulls represent true propositions or not, hence, the tables that contain them are not true relations in the mathematical sense.

Noting that one of the most frequently provided justifications null supporters give for their use is that maintaining rows with missing values is often a necessary and useful thing to do in the real world, Pascal asks, why it would be considered useful to record what he calls “partial nothings” in the database if it is obviously nonsensical to maintain “full nothings”? Pascal then uses the following extreme example to support his case against nulls.

Taking the “Partial Nothing” Argument To Its Logical Extreme

Consider the following classic case study: an employee table that records the salary of each employee currently working for the enterprise. It is not uncommon for salaries to be unknown at a given point, so an employee row might contain a null to hold the place of the true salary of that employee. At the time the rows is inserted into the table, the salary value is missing information.

At a given instant, this employee table might look like this:

employee
emp_num emp_name dept_num hire_date salary
PK        
214 Smith 32 09-12-1989 56150
447 Lau 15 05-30-1993 ?
103 Hossein 09 09-13-1984 29775
500 Nakamura 11 06-09-1997 84932
713 Schroeder 24 10-29-2001 ?

New hire Schroeder has not yet been assigned a salary and Lau has just been promoted, but her new salary has not yet been determined.

The following “partial nothing” table lists only the primary key values known for each row.

employee
emp_num emp_name dept_num hire_date salary
PK        
214 ? ? ? ?
447 ? ? ? ?
103 ? ? ? ?
500 ? ? ? ?
713 ? ? ? ?

The problem with the “partial nothing” table is that its semantics are perceived as something other than what they truly are. The intended meaning of the nulls in the salary column derives from the true proposition that all employees receive a salary. The formal semantics of the table, however, assert only that a salary amount exists for each employee.

As a result, the table actually represents a mix of two different categories of assertion:

  • Those rows with nulls that apply to all employees (“a salary amount exists for all employees”).
  • Those rows that describe only those employees whose salaries are known (“all employees receive a salary”).

Another way of saying this is that multiple, inconsistent, informal predicates are being mapped (incorrectly!) into a single formal predicate. In this particular case, some rows (those containing nulls as place holders for the salary value) apply to all employees, while others (those that contain values for employee salaries) apply to a subset of that population. No single predicate can possibly apply to both situations.

Solving the Problem By Taking Simple Projections

The particular problem with nulls in this table is readily resolved with a minor change in the logical design of the database. Because the attributes in this employee table are a combination of those that apply to all tuples and those that apply only to some tuples, the relationship is similar to that of a simple entity supertype and subtype, the difference being that the attribute in question, salary, is not unique: all employees receive a salary.

The following tables resolve the problem of multiple simultaneous semantics for the original employee table by taking projections from the original employee table:

employee
emp_num emp_name dept_num hire_date
PK      
214 Smith 32 09-12-1989
447 Lau 15 05-30-1993
103 Hossein 09 09-13-1984
500 Nakamura 11 06-09-1997
713 Schroeder 24 10-29-2001
employee_salary
emp_num salary
PK  
214 56150
103 29775
500 84932