Relations, Relation Values, and Relation Variables
As the relational model has matured, careful analysis has continued to reveal facets of its definition that are obvious in hindsight, but had remained undiscovered and unstated until recently. An important example of such a revelation of the obvious is the relation variable, or relvar, first described by Date and Darwen (1998), which actually derives from a statement made by Codd (1970), where he writes that a database is “a collection of time-varying relations … of assorted degrees. As time progresses, each n-ary relation may be subject to insertion of additional n-tuples, deletion of existing ones, and alteration of components of any of its existing n-tuples.”
Note that relations, as defined in set theory, do not vary over time. A relation is a value, and values do not vary as a function of time. The number 3, for example, is always the number 3, and never 2 or 13 or 724.
Variables, on the other hand, take on different values as a function of time, so it is formally correct to state that the value of a relation variable changes as a function of time (the distinctions between the definitions of value and variable are based on those found in the text by Cleaveland (1986). What Codd almost certainly meant to say was something like the following: a database is a collection of time-varying relation variables of assorted degrees.
The distinction between a relation variable and its associated relation value is an important one because without it, the application of the axioms of traditional set theory to database relations is suspect at best. As Kent (2000, pp. 104-105) writes, “…one ought to be very cautious about claims of various models being based on “the axioms of traditional set theory.” That set theory deals entirely with extensional sets: a set is determined entirely by its population. There is simply no notion of a set with changing population; each different population constitutes a different set.”
Also see Date, 1992f, for an explicit comparison of the definition of a relation in set theory vis‑a‑vis its definition in the database relational model. Note that Date had not yet developed the notion of a relation variable when he wrote this paper, and where he compares the two flavors of relation definitions, he confounds the notions of relation value and relation variable in the database relational model. For example, he states that relations in the database relational model are time‑varying, while those in set theory are not. In fact, it is relation values that are time‑varying, while relation variables are not (setting aside the issue of adding or dropping columns from a database table for the sake of the theoretical distinction). The paper is nevertheless well worth reading for its explicit clarification of the differences between the two relation types.
In the context of business rules and their relationship with integrity constraints, the concept of the relvar is useful because it provides a formal, yet simple, framework in which to situate the idea of integrity constraints. Note that the relvar concept applies equally well to views (virtual relvars) and base tables (base relvars).
The first principle captured by the concept of the relvar is that the value of a relation is orthogonal to its defining variable. The easiest way to explain this is by analogy. A variable in any programming language represents the possible values that can be taken for that variable, but in itself it is not those values: it is, instead, a place holder for them. The values represented by the program variable can take on any number of values, but the variable itself is always an abstraction.
Think of a relvar as the heading definition for the attributes of a relation: it is the ANDing of all the column headings, including their constraints, defined for the relation. In terms of logic, a relvar is the predicate for a relation. More specifically, a relvar is the internal predicate for a relation. The real world that relvar represents is its external predicate. Any lack of correspondence between the external and internal predicates for a relvar is the defining characteristic of the data quality issue (see the introduction to this chapter for more information) as well as the inspiration for the maxim that a database can only enforce consistency, not truth. Unless otherwise stated, the phrase relvar predicate refers to the internal relvar predicate in this book, while each individual instantiation of that relvar, a tuple (or row), is a proposition for that predicate that evaluates to TRUE.
The relation value, then, is the net value of all the data contained by the relation. Each time the relation is updated, its value changes, but its relation variable does not.In a very real sense, a relation becomes an entirely new relation when an update occurs, but its relation variable does not change.
For example, consider the following equation:
If you set the variable x=4, then the value for variable y must be 3. If you then change the value of the variable x to 3, you have updated x, but you have not changed the value of 4 to 3, you have just changed the value of the variable x from 4 to 3. This holds for any other valid number over the domain of x. Another way to think of this is as follows: values are not time‑varying. A 3 is always a 3 and never a 4. On the other hand, a variable can represent any number of different values over the course of time. An old joke also illustrates this point by confusing variables with values: “Do you realize that 2 + 2 = 5 for large values of 2”?
The integrity of this update relationship is maintained by the relvar predicate, which is defined by the logical ANDing of all the integrity constraints defined on the relation in question. The result of enforcing the relvar predicate is what Date and Darwen (1998, 2000) call The Golden Rule: no statement can leave any relvar with a value that violates its relvar predicate. This means that no update operation can ever cause any database constraint to evaluate to FALSE. In other words, the integrity of the database cannot be violated. The Golden Rule is one of the fundamental principles of relational database theory.
It follows that tuples (“rows” in everyday language: see “Definitions” on page 75) are instantiations of the relvar predicate that represent various propositions about the relation they constitute. When you think of a row as a proposition, it immediately follows that it must also be a true proposition because, by definition, false propositions (wrong, or corrupt data in everyday language) are not permitted in a database that enforces integrity constraints (see “The Closed World Assumption” on page 630). If they were, the database would not represent facts and the Golden Rule would be violated. In other words, if a given tuple is an element of a particular relvar, then that tuple, by default, satisfies its predicate by evaluating to TRUE.
Unfortunately, the representation of the real world by a database is only as good as its input, and any database will permit the insertion of factually wrong data as long as that data does not violate the constraints specified for the base table or view through which the table is updated (see “Sources of Data Quality Problems” on page 618 and “The Role of Human Error in Creating Bad Data” on page 622). In other words, a database can only enforce consistency, not truth. All truths are consistent, but not all consistent things are true.
This is an important consideration: the database itself cannot know whether its information corresponds to real world facts (see “Sources of Data Quality Problems” on page 618). It can, however, constrain the data with which it is updated by ensuring that no business rules are violated, and it is these constraints that provide the mechanism for ensuring the success of the data integrity rules enforced by relational database management systems.