5.4.5 - Factor Scoring - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 3Analytic Functions

Product
Teradata Warehouse Miner
Release Number
5.4.5
Published
February 2018
Language
English (United States)
Last Update
2018-05-04
dita:mapPath
yuy1504291362546.ditamap
dita:ditavalPath
ft:empty

Factor analysis is designed primarily for the purpose of discovering the underlying structure or meaning in a set of variables and to facilitate their reduction to a fewer number of variables called factors or components. The first goal is facilitated by finding the factor loadings that describe the variables in a data set in terms of a linear combination of factors. The second goal is facilitated by finding a description for the factors as linear combinations of the original variables they describe. These are sometimes called factor measurements or scores. After computing the factor loadings, computing factor scores might seem like an afterthought, but it is somewhat more involved than that. Teradata Warehouse Miner does automate the process however based on the model information stored in metadata results tables, computing factor scores directly in the database by dynamically generating and executing SQL.

Factor Scoring computes factor scores for a data set that has the same columns as those used in performing the selected Factor analysis. When scoring is performed, a table is created including index (key) columns, optional “retain” columns, and factor scores for each row in the input table being scored. Scoring is performed differently depending on the type of factor analysis that was performed, whether principal components (PCA), principal axis factors (PAF) or maximum likelihood factors (MLF). Further, scoring is affected by whether or not the factor analysis included a rotation. Also, input data is centered based on the mean value of each variable, and if the factor analysis was performed on a correlation matrix, input values are each divided by the standard deviation of the variable in order to normalize to unit length variance.

When scoring a table using a PCA factor analysis model, the scores can be calculated directly without estimation, even if an orthogonal rotation was performed. When scoring using a PAF or MLF model, or a PCA model with an oblique rotation, a unique solution does not exist and cannot be directly solved for (a condition known as the indeterminacy of factor measurements). There are many techniques however for estimating factor measurements, and the technique used by Teradata Warehouse Miner is known as estimation by regression. This technique involves regressing each factor on the original variables in the factor analysis model using linear regression techniques. It gives an accurate solution in the “least-squared error” sense but it typically introduces some degree of dependence or correlation in the computed factor scores.

A final word about the independence or orthogonality of factor scores is appropriate here. It was pointed out earlier that factor loadings are orthogonal using the techniques offered by Teradata Warehouse Miner unless an oblique rotation is performed. Factor scores, however, will not necessarily be orthogonal for principal axis factors and maximum likelihood factors and with oblique rotations since scores are estimated by regression. This is a subtle distinction that is an easy source of confusion. That is, the new variables or factor scores created by a Factor analysis, expressed as a linear combination of the original variables, are not necessarily independent of each other, even if the factors themselves are. The user may measure their independence however by using the Matrix and Export Matrix functions to build a correlation matrix from the factor score table once it is built.