15.00 - Full Normalization and Dependency Preservation - Teradata Database

Teradata Database Design

prodname
Teradata Database
vrm_release
15.00
category
User Guide
featnum
B035-1094-015K

Full Normalization and Dependency Preservation

Sometimes the goals of decomposing to full normalization and decomposing them in a way that preserves functional dependencies can conflict. Consider the following example relvar:

 

The predicate for the student_subject_teacher relvar is as follows: the student named student is taught the subject named subject by the teacher named teacher.

Assume the following restrictions for purposes of illustration:

  • For each subject, each student of that subject is taught by only one teacher.
  • Each teacher teaches only one subject, but each subject is taught by several teachers.
  • What are the FDs for the student_subject_teacher relvar?

  • The first bullet implies the following FD:
  • student, subject ⎯  teacher

  • The second bullet implies the following FD:
  • teacher  subject

  • The second bullet also implies that the following FD is not true:
  • subject  teacher

    For the sake of illustration, this set of dependencies is portrayed symbolically as follows: student  teacher  subject.

    As the representation of relvar student_subject_teacher indicates, it has two candidate keys, the composite column sets student-subject and student-teacher.

    Both CKs have the property that all columns of student_subject_teacher are functionally dependent on them, and that property no longer holds if any column is discarded from either column set.

    Note, too, that student_subject_teacher also satisfies the FD teacher subject, which is not implied by either CK. This absence of implication is indicative that the student_subject_teacher relvar is not in BCNF. As a result, student_subject_teacher suffers from certain update anomalies. Suppose, for example, that you want to delete the information that Jones is studying physics. You cannot do this without also losing the information that Max Planck teaches physics.

    As usual, you can avoid the update anomalies by performing a nonloss decomposition into projections. In this particular case, the projections, both of which are in 5NF, are as follows:

     

    It is now possible to delete the information that Jones is studying physics by deleting the row for Jones and Max Planck from projection student_teacher without losing the row for Max Planck and Physics from projection teacher_subject. So the problem is solved … or is it?

    Yes, that particular problem is solved. A remaining problem, however, is that this decomposition violates the FD preservation objective described in “Preserving Functional Dependencies” on page 103.

    Specifically, the FD student, subject teacher now spans two relvars, student_teacher and teacher_subject, so it cannot be deduced from the FD teacher subject, which is the only nontrivial FD represented in student_teacher and teacher_subject. As a result, student_teacher and teacher_subject cannot be updated independently of one another.

    For example, an attempt to insert a row for Smith and Max Planck into student_teacher must be rejected, because Max Planck teaches physics and Smith is already being taught physics by Robert Millikan, yet this fact cannot be detected without examining teacher_subject.

    The point of this analysis is to highlight the fact that the following equally desirable objectives of database design can sometimes clash:

  • Decomposing relvars to the their ultimate normal form (or even just to BCNF)
  • Decomposing relvars in such a way that preserve FDs
  • In other words, it is not always possible to achieve both objectives simultaneously.

    Other points arise from this example:

  • Neither design is superior to the other on its face. The principles of normalization suggest that one design is better, while the principle of FD preservation suggest that the other is.
  • As a result, the ultimate design choice must be based on other criteria.

  • Assume that you settle on the two-relvar design.
  • In this case, the relvar-spanning FD must be declared, at least for documentation purposes, even if the design cannot enforce it.

    A fully formal version of that declaration might look something like this:

        FORALL s IN S, t1,t2 IN T, j1,j2 IN J
        IF     EXISTS { S:s, T:t1 } IN ST 
           AND EXISTS { S:s, T:t2 } IN ST 
           AND EXISTS { T:t1, J:j1 } IN TJ 
           AND EXISTS { T:t2, J:j2 } IN TJ 
        THEN j1 , j2 

    where the student, teacher, and subject domains are represented by S, T, and J, respectively.

    If you think this example is overly contrived, and that involuted dependency structures such as student teacher subject never occur in practice, consider the following, more familiar, example:

     

    This relvar satisfies the following set of FDs:

  • zip_code ⎯   city, state
  • street, city, state   zip_code
  • In other words, it is identical to the student_subject_teacher example, where student maps to the composite street-city, state maps to subject, and zip_code maps to teacher.