Data Dependence and Data Independence
Applications implemented in pre-relational database systems are data‑dependent, meaning that both the physical representation of the data and the methods of accessing it are built directly into the application code. This makes even the slightest change to the physical design of a database an extraordinarily laborious effort.
The main objective of relational DBMSs is data independence. For years, the relational database management systems used to run businesses, often referred to as OLTP systems, made data independence obligatory. In an OLTP database, data is stored in nonredundant tables that demand that every column of the table be rigorously related to its primary key alone and to no other tables. This ensures that information is available to all applications and analyses that use it, and it provides a mechanism for maintaining consistency and reliability across applications: a single source of each particular data element, a single version of the truth.
Data independence works well for OLTP systems because the applications accessing the data generally access single tables or join only a few, small tables in relatively simple queries. With the introduction of the data warehouse, previously unheard of demands were placed on the relational database management systems underlying them. In the data warehouse environment, large tables must be scanned and large result sets are frequently returned. Many tables are joined together, complicated calculations are made, and detailed data is aggregated directly in the queries. In addition, large data volumes are extracted, transformed and loaded into the tables concurrently with users running queries against the data. It quickly became apparent that databases created and tuned for OLTP could not sustain the performance levels required to support the demands of business intelligence processing. The OLTP databases could not perform the queries within their allotted time window or, in some cases, at all.
This situation highlights the potential for contradiction between designing databases for optimum integrity and designing databases for optimum performance. The key to data independence is data normalization, and normalized data schemas are the most demanding of system performance.
To address the issue of poor performance, data independence has often been abandoned in many environments and denormalized schemas have been used to address a few particular, rather than all general, analytical needs of the enterprise.
Note: The term denormalized is used because of its familiarity in the industry, not because of its technical accuracy. As described elsewhere in this book (see, for example, “Normalization as a Logical Process” on page 77), normalization is a logical concept, not a physical, concept. Therefore, it is incorrect to speak of denormalization in the context of physical database design.
Although this arrangement addresses short‑term decision support needs, it compromises the enterprise view of the data and its adaptability. Data independence, adaptability, and cross‑enterprise functionality go hand in hand, and a normalized data schema is critical to reaching these objectives.
The following topics provide some detail about why this is true.