Validating the Integrity of Base Tables In a Referential Constraint Relationship - Advanced SQL Engine - Teradata Database

SQL Data Definition Language Detailed Topics

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Published
July 2021
Language
English (United States)
Last Update
2021-07-27
dita:mapPath
imq1591724555718.ditamap
dita:ditavalPath
imq1591724555718.ditaval
dita:id
B035-1184
lifecycle
previous
Product Category
Teradata Vantage™

If you decide to specify a non-enforced Referential Constraint for a table, the responsibility for ensuring or validating the referential integrity of that table is yours alone. The system does not enforce referential integrity when you specify Referential Constraint relationships between tables, because no declarative constraints are declared in the appropriate column definitions.

From the aspect of integrity assurance, the best way to guarantee the referential integrity of a table without taking advantage of a declarative standard or batch referential constraint is to use a procedural constraint such as a set of triggers to handle inserts, updates, and deletions to the tables in the relationship.

For example, you want to create DELETE/UPDATE triggers on parent tables, and INSERT/UPDATE triggers on child tables to enforce referential integrity. The reasons for preferring declarative constraints over procedural constraints are described briefly in Teradata Vantage™ - Database Design, B035-1094.

Also, actively firing triggers can have a greater negative effect on system performance than the simple declarative constraint they are intended to replace.

If you decide not to enforce any form of referential integrity constraint, then you are strongly advised to enforce a set of validation procedures that can detect when and where referential integrity violations occur.

The following scenario uses a basic SQL query to interrogate a table set for violations of the referential integrity rule.

Suppose you create the tables pk_tbl1 and softri_tbl1, with column b2 of softri_tbl1 having a Referential Constraint relationship with column a1 of pk_tbl1. In this relationship, table pk_tbl1 is the parent and table softri_tbl1 is the child. The DDL for defining these tables might look like the following CREATE TABLE statements.

     CREATE TABLE pk_tbl1 (
       a1 INTEGER NOT NULL PRIMARY KEY,
       a2 INTEGER);

     CREATE TABLE softri_tbl1 (
       b1 INTEGER,
       b2 INTEGER CONSTRAINT softri_1
          REFERENCES WITH NO CHECK OPTION pk_tbl1(a1));

Column softri_tbl1.b2 is an implicit foreign key referencing the primary key column pk_tbl1.a1.

Now populate the tables with data as follows.

     INSERT INTO pk_tbl1 (a1, a2) VALUES (11, 111);
     INSERT INTO pk_tbl1 (a1, a2) VALUES (22, 222);

     INSERT INTO softri_tbl1 (b1, b2) VALUES (100, 11);
     INSERT INTO softri_tbl1 (b1, b2) VALUES (200, 22);
     INSERT INTO softri_tbl1 (b1, b2) VALUES (300, 33);
     INSERT INTO softri_tbl1 (b1, b2) VALUES (400, 44);

The third and fourth inserts into table softri_tbl1 violate the implicit referential integrity relationship between the tables because the set of distinct values for softri_tbl1.b2 should be identical to the set of values for pk_tbl1 (which, as the primary key value set, constitute a distinct set by default). In this scenario, the softri_tbl1.b2 values 33 and 44 identify the rows that violate the implied referential integrity relationship.

The following SELECT statement is a generic query to test for this type of corruption. The exclusion of foreign key nulls is included in the query because it is not possible to determine what values they represent.

     SELECT DISTINCT childtable.*
     FROM childtable, parenttable
     WHERE childtable.fk NOT IN (SELECT pk
                                 FROM parenttable)
     AND   childtable.fk IS NOT NULL;

The scenario defines the following set of correspondences.

Generic Query Element Specific Query Element
childtable softri_tbl1
parenttable pk_tbl1
childtable.fk softri_tbl1.b2

From these correspondences, the specify query to test for corrupt rows in this scenario would look like the following SELECT statement.

     SELECT DISTINCT softri_tbl1.*
     FROM softri_tbl1, pk_tbl1
     WHERE softri_tbl1.b2 NOT IN (SELECT a1
                                  FROM pk_tbl1)
     AND   softri_tbl1.b2 IS NOT NULL;

      *** Query completed. 2 rows found. 2 columns returned.
      *** Total elapsed time was 1 second.

              b1           b2
     -----------  -----------
             300           33
             400           44

The report generated by this query returns the rows with the implicit foreign key values 33 and 44, which is what was expected: the set of rows from the child table whose foreign key values do not match any element of the primary key value set from the parent table. This set of non-matching rows must be deleted from the child table to maintain the referential integrity of the database.

Perform this validation query regularly, particularly after any substantial updates are made to the tables in the implied relationship.

Your data can be corrupt for the entire interval between the times you run this query, and you still have the burden of repairing any problems that it detects. The query does not solve the problem of failing to enforce referential integrity in your database, it only allows you to detect whether the referential integrity defined by one primary or alternate key in a parent table and one foreign key in a child table has been violated. Each such unenforced referential relationship must be probed regularly to detect violations.