Teradata Warehouse Miner functions are provided to build matrices which can drastically reduce the amount of data required for analytic algorithms. Numeric columns in potentially huge relational tables are reduced to a comparatively compact matrix (n-by-n if there are n columns), which can be delivered to a linear regression or factor analysis algorithm, or to an external application for further analysis. One example of an external application would be SAS, which provides principal component analysis and linear regression analysis based on a correlation or covariance matrix as input.
The matrix functions must operate on numeric data. Columns of type DATE will not produce meaningful results. By default, NULL values are handled via listwise deletion in the Matrix analysis, but an option to effectively replace NULL values with zeros is also available. Listwise deletion means that if the value of any column to be included in the matrix is NULL, the entire row is omitted during matrix calculations.
These functions are valid for any of the supported data reduction matrix types, namely correlation, covariance, sums of squares and cross products, and corrected sums of squares and cross products. Internally the Matrix analysis stores the matrix as an extended sums of squares and cross products matrix, with an additional column containing a constant value, 1. The actual conversion to another type, if requested, is computed in the Export Matrix or other receiving analysis.
Finally, it should be noted that to benefit from the potentially dramatic performance improvement that is possible when building a matrix utilizing the Matrix UDF (User Defined Function) that is delivered with the product, the Matrix UDF must first be installed on the target Teradata system. It can be installed according to the directions in the Installing Support Tables and Functions in the Teradata Warehouse Miner User Guide (Volume 1), B035-2300.
- Matrix — Build an extended Sums of Squares and Cross-Products (SSCP) data reduction matrix. Optionally, restart the Matrix process upon a failure or when a previously-executed Matrix was stopped. The restart feature is however not available when the Matrix UDF is installed on the target Teradata system.
Export Matrix — Convert or export the resultant matrix and build either a SAS data step, a Teradata table, or just view the results. Valid matrices include:
- Pearson-product moment correlations (COR)
- Covariances (COV)
- Sums of Squares and Cross-Products (SSCP)
- Corrected Sums of Squares and Cross-Products (CSSCP)