Arguments Against Denormalizing the Physical Database Schema to Increase Its Usability
Many data warehouse designers argue that denormalized physical database schemas are easier for end users to navigate than fully normalized schemas.
Denormalized physical schemas certainly seem more user-friendly than the complexity of a highly generalized, normalized data model. However, denormalized physical schemas are driven by known queries, so their ease of use is somewhat illusory. Formulating queries to address novel requirements, a task that is nearly definitive of the data warehouse process model, is made more difficult, if not impossible, in a denormalized environment. A fully normalized enterprise data model is flexible enough to support the undertaking of any new analyses of the data.
That said, the reality is that end users typically do not write queries anyway, and when they do, they are likely to use a third party natural language query generator, so the usability argument is often moot. Coding novel queries is often the responsibility of an application developer or a natural language query writing tool.
More importantly, you can create “denormalized” views to implement a semantic layer that makes the normalized data model easier to navigate (see “Denormalizing Through Views” on page 185). Few sites permit users to query base tables directly anyway, so creating views on base tables that look exactly like their denormalized table counterparts should not be an issue.
If there were no issues of performance for those database management systems that lack the parallel processing power of Teradata Database, then denormalization could be handled universally by implementing views (see “Arguments Against Denormalizing for Performance” on page 113). Star schemas, snowflakes, summary tables, derived data, and the like could be built as virtual clusters of tables that look exactly like their physical counterparts. By handling denormalization virtually, the relationships within, between, and among the underlying base tables of the schema remain intact, and referential integrity can be maintained by the system regardless of how many virtual denormalized relationships are created. This flexibility frees DBAs to create any number of denormalized views for users while simultaneously maintaining semantic data integrity and eliminating the data redundancies required by denormalized physical schemas.
DBAs can create virtual, subject-oriented schemas for specific applications as well as creating views for more general database access without affecting the underlying base table data. These same views can also be used to enforce security constraints for the different communities of business users who must access the database.
Consider another argument that favors the ease of use of a fully-normalized database schema over a denormalized schema. A physical star schema has physical dimensions that support a physical fact table. However, for some dimensions there can be mutually exclusive substitutes for the same data. For example, suppose an airline is interested in both the point-to-point travel of customers between segments in addition to their travel between their true origins and destinations. This discussion abbreviates this dimensional family as O&D.
The true O&D dimension is different from the segment O&D, although superficially, it looks the same. Moreover, their respective consolidation of facts is different as well, although the detailed base table data is the same. If the star schemas are physicalized, two very large schemas must be created, maintained, and coordinated to represent them, whereas with virtual star schemas, the data is maintained only in the base tables, producing a single, consistent version of the truth.