15.00 - The Role of Human Error in Creating Bad Data - Teradata Database

Teradata Database Design

Teradata Database
User Guide

The Role of Human Error in Creating Bad Data

Numerous ergonomic studies have demonstrated that human error is inevitable for any system of arbitrary complexity (Bailey, 1983; Dörner, 1996; Reason, 1990), and that certainly includes the computer‑human interface (Norman, 1988). The majority of data entry errors fall into the category of “valid, but not correct,” including typographical errors. Consider keypunch errors, for example. While it is possible to minimize keypunch errors if optimum working conditions, such as adequate lighting and ergonomic furniture, are enforced and if employee vigilance is optimized by means of adequate rest periods, minimal visual and auditory distractions, and a generally non‑hostile working environment, some keypunch errors are inevitable. Constraints cannot completely eliminate this category of errors. After all, it is not possible to prevent all errors from being recorded in the database, but it is possible to exclude certain categories of errors by implementing a relatively small number of robust error prevention constraints.

The bottom line, as Bailey (1983) has noted, is that it is far more cost‑efficient to prevent human errors than it is to detect and correct them after they have been made.

For example, database constraints can easily validate data entries and enforce referential integrity, both of which are commonly reported sources of data errors. CHECK constraints can prevent a keypunch operator from successfully typing values into a table column that are outside the range of values permitted for that column by enterprise business rules, and referential integrity constraints can prevent child table rows from becoming orphaned as the result of a mistaken deletion of a parent table row or update to a parent table primary, or alternate, key. Cohen et al. (2009) point out that constraints are not only important for maintaining data integrity, but also because they capture dependencies among data items, which can then, for example, be used by the Optimizer to generate query plans.