Redesigning the Database to Eliminate the Need for Nulls - Advanced SQL Engine

Redesigning the Database to Eliminate the Need for Nulls - Advanced SQL Engine - Teradata Database

Database Design

Product

Advanced SQL Engine

Teradata Database

Release Number

17.10

Published

July 2021

Language

English (United States)

Last Update

2021-07-27

dita:mapPath

kko1591750222108.ditamap

dita:ditavalPath

kko1591750222108.ditaval

dita:id

B035-1094

lifecycle

Product Category

Teradata Vantage™

In the first order predicate calculus, propositions can have one of two values: they can either be true or they can be false. Recall that the rows of a relational table correspond to logical propositions that evaluate to TRUE (see How Relational Databases Are Built From Logical Propositions), because all false propositions are excluded from the database by means of various integrity constraints.

One can build a set of axioms deriving from such a set of true propositions in addition to a set of well-defined inference rules (referred to as a calculus in formal logic). Additional true propositions, formally known as theorems, can be derived from this calculus. These derived propositions correspond to valid relational queries if and only if the following statements about the original propositions are true:

The initial set of rows represents only true propositions.
All operations on a relation that contains tuples corresponding to these true propositions obey the formal inference rules of bivalent logic.

If either of these statements is false, then the correctness (truth of the derived propositions) of any query made against the data cannot be guaranteed. Because SQL nulls represent data values that cannot be asserted, it is not possible to know whether rows that contain nulls represent true propositions or not, so the tables that contain them are not true relations in the mathematical sense.

Noting that one of the most frequently provided justifications null supporters give for their use is that maintaining rows with missing values is often a necessary and useful thing to do in the real world, Pascal asks, why it would be considered useful to record what he calls “partial nothings” in the database if it is obviously nonsensical to maintain “full nothings”? Pascal then uses the following extreme example to support his case against nulls.

Taking the “Partial Nothing” Argument To Its Logical Extreme

Consider the following classic case study: an employee table that records the salary of each employee currently working for the enterprise. It is not uncommon for salaries to be unknown at a given point, so an employee row might contain a null to hold the place of the true salary of that employee. At the time the rows is inserted into the table, the salary value is missing information.

At a given instant, this employee table might look like this:

employee
emp_num	emp_name	dept_num	hire_date	salary
PK
214	Smith	32	09-12-1989	56150
447	Lau	15	05-30-1993	?
103	Hossein	09	09-13-1984	29775
500	Nakamura	11	06-09-1997	84932
713	Schroeder	24	10-29-2001	?

New hire Schroeder has not yet been assigned a salary and Lau has just been promoted, but her new salary has not yet been determined.

The following “partial nothing” table lists only the primary key values known for each row.

employee
emp_num	emp_name	dept_num	hire_date	salary
PK
214	?	?	?	?
447	?	?	?	?
103	?	?	?	?
500	?	?	?	?
713	?	?	?	?

The problem with the “partial nothing” table is that its semantics are perceived as something other than what they truly are. The intended meaning of the nulls in the salary column derives from the true proposition that all employees receive a salary. The formal semantics of the table, however, assert only that a salary amount exists for each employee.

As a result, the table actually represents a mix of two different categories of assertion:

Those rows with nulls that apply to all employees (“a salary amount exists for all employees”).
Those rows that describe only those employees whose salaries are known (“all employees receive a salary”).

Another way of saying this is that multiple, inconsistent, informal predicates are being mapped (incorrectly!) into a single formal predicate. In this particular case, some rows (those containing nulls as place holders for the salary value) apply to all employees, while others (those that contain values for employee salaries) apply to a subset of that population. No single predicate can possibly apply to both situations.

Solving the Problem By Taking Simple Projections

The particular problem with nulls in this table is readily resolved with a minor change in the logical design of the database. Because the attributes in this employee table are a combination of those that apply to all tuples and those that apply only to some tuples, the relationship is similar to that of a simple entity supertype and subtype, the difference being that the attribute in question, salary, is not unique: all employees receive a salary.

The following tables resolve the problem of multiple simultaneous semantics for the original employee table by taking projections from the original employee table:

employee
emp_num	emp_name	dept_num	hire_date
PK
214	Smith	32	09-12-1989
447	Lau	15	05-30-1993
103	Hossein	09	09-13-1984
500	Nakamura	11	06-09-1997
713	Schroeder	24	10-29-2001

employee_salary
emp_num	salary
PK
214	56150
103	29775
500	84932