Maximum Likelihood Factors - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 3Analytic Functions

Product

Teradata Warehouse Miner

Release Number

5.4.5

Published

February 2018

Language

English (United States)

Last Update

2018-05-04

dita:mapPath

yuy1504291362546.ditamap

dita:ditavalPath

ft:empty

dita:id

B035-2302

Product Category

Software

As mentioned earlier, the common factor model attempts to find both common and unique factors explaining the covariance or correlations amongst a set of variables. That is, an attempt is made to find a factor pattern C and a uniqueness matrix R such that a covariance or correlation matrix S can be modeled as S = CCT + R. To do this, it is necessary to utilize the principle of maximum likelihood based on the assumption that the data comes from a multivariate normal distribution. Due to dealing with the distribution function of the elements of a covariance matrix it is necessary to use the Wishart distribution in order to derive the likelihood equation. The optimization technique used then to maximize the likelihood of a solution for C and R is the Expectation Maximization or EM technique. This technique, often used in the replacement of missing data, is the same basic technique used in Teradata Warehouse Miner’s cluster analysis algorithm. Some key points regarding this technique are described below.

Beginning with a correlation or covariance matrix S as with our other factor techniques, a principal components solution is first derived as an initial estimate for the factor pattern matrix C, with the initial estimate for the uniqueness matrix R taken simply as S - CCT. Then the maximum likelihood solution is iteratively found, yielding a best estimate of C and R. In order then to assess the effectiveness of the model, the correlation or covariance matrix S is compared to the reproduced matrix CCT - R.

It should be pointed out that when using the maximum likelihood solution the user must first specify the number of common factors f to produce in the model. The software will not automatically determine what this value should be or determine it based on a threshold value. Also, an internal adjustment is made to the final factor pattern matrix C to make the factors orthogonal, something that is automatically true of the other factor solutions. Finally, the user may optionally request that the signs of a factor in the matrix C be inverted if there are more minus signs than positive ones. This is purely cosmetic and does not affect the solution in a substantive way. However, if signs are reversed, this must be kept in mind when attempting to interpret or assign meaning to the factors.