15.00 - Dimensional Analysis - Teradata Database

Teradata Database Design

Teradata Database
User Guide

Dimensional Analysis

A star schema is designed specifically to support dimensional analysis, but not all analysis is dimensional. Many types of quantitative analysis, such as data mining, statistical analysis, and case-based reasoning, are actually inhibited by a physical star schema design.

Although not addressing the topic of normalization directly, Xiong et al. (2006) address related issues when they write the following passages concerning the problem of enhancing data mining by removing noise from the database, “…data objects that are irrelevant, or only weakly relevant can … significantly hinder data analysis. Thus, if the goal is to enhance the data analysis as much as possible, these objects should … be considered as noise, at least with respect to the underlying analysis” (p. 304, italics not in original).

Later in the same paper, they write, “…we will refer to the objects that are eliminated as noise since this use of the word falls within the general meaning of noise as meaningless or irrelevant data” (p. 306).

Recall the definition of 3NF given by Kent: the key, the whole key, and nothing but the key (see “Definition of Third Normal Form” on page 86). Given this definition, any attributes that do not modify the key, the whole key, and nothing but the key are “noise” in the context of normalization theory.

If you consider the household/customer/account example (see “Denormalized Views Versus Physical Denormalization of the Database Schema” on page 115) in a different light, you encounter an interesting problem with denormalization. An OLAP modeler, focused on creating dimensional views and drill‑downs, is likely to assume that a household can have many customers but that a customer can belong to only one household. However, the underlying normalized data model might reveal a many-to-many relationship between households and customers. For example, a customer can belong to many different households over time.

A modeler focused on OLAP applications typically is not concerned with this relationship because it is unlikely that OLAP users would want to drill down into previous households. However, a data mining user might find tremendous value in exploring this relationship for patterns of behavior that revolve around the changing composition of households.

For example, consider the following possible household change behaviors:

  • Children leave their parents to start their own households.
  • Couples marry and form one household from two.
  • Married couples divorce and form two households from one.
  • And so on.

    The enterprise needs a logical model to address the precise needs of any analytical methodology, whether it be a complicated ad hoc SQL query, an OLAP analysis, exploratory work using data mining, or something entirely different.