Overview of Factor Analysis - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 3Analytic Functions

Teradata Warehouse Miner
Release Number
February 2018
English (United States)
Last Update
Product Category

Consider a dataset with a number of correlated numeric variables that is to be used in some type of analysis, such as Linear Regression or Cluster analysis. Or perhaps it is desired to understand customer behavior in a fundamental way, by discovering hidden structure and meaning in data. Factor analysis can be used to reduce a number of correlated numeric variables into a lesser number of variables called factors. These new variables or factors should hopefully be conceptually meaningful if the second goal just mentioned is to be achieved. Meaningful factors not only give insight into the dynamics of a business, but they also make any models built using these factors more explainable, which is generally a requirement for a useful analytic model.

There are two fundamental types of factor analysis: principal components and common factors. Teradata Warehouse Miner offers principal components, maximum likelihood common factors and principal axis factors, which is a restricted form of common factor analysis. The product also offers factor rotations, both orthogonal and oblique, as post-processing for any of these three types of models. Finally, as with all other models, automatic factor model scoring is offered via dynamically generated SQL.

Before using the Teradata Warehouse Miner Factor Analysis module, the user must first build a data reduction matrix using the Build Matrix function. The matrix must include all of the input variables to be used in the factor analysis. The user can base the analysis on either a covariance or correlation matrix, thus working with either centered and unscaled data, or centered and normalized data (i.e., unit variance). Teradata Warehouse Miner automatically converts the extended cross-products matrix stored in metadata results tables by the Build Matrix function into the desired covariance or correlation matrix. The choice affects the scaling of resulting factor measures and factor scores.

The primary source of information and formulae in this section is [Harman].