5.4.5 - Matrix Building - Teradata Warehouse Miner

In-Database Analytic Functions User Guide

Teradata Warehouse Miner
February 2018
User Guide


This function builds a sum-of-squares-and-cross-products (SSCP) matrix or other derived matrix type from a table in a Teradata Database. It does this by generating and executing the SQL to call the Teradata CALCMATRIX table operator provided in Teradata beginning with the 14.10 release. The results are stored either in a table or as a result set returned to the user. The purpose in building a matrix depends on the type of matrix built. For example, when a correlation matrix is built, it can be viewed to determine the correlations or relationships between the various columns in the matrix.

Refer to the Teradata reference manual SQL Functions, Operators, Expressions, and Predicates, B035-1145, for more detailed information about the CALCMATRIX table operator.


call twm.td_analyze('matrix','database=twm_database;tablename=twm_tablename;columns=c1,c2;outputdatabase=twm_out_db;outputtablename=twm_out_tbl;matrixtype={COR|COV|CSSCP|ESSCP|SSCP};where=c1>0;groupby=c3,c4;nullhandling={ZERO|IGNORE};matrixoutput={COLUMNS|VARBYTE}; ');

Required Parameters

The input columns comprising the created matrix or matrices. The columns must reside in the table named with the tablename parameter, residing in the database named with the database parameter.
For example: columns=column1,column2,column3
The database containing the input table.
The input table to build a matrix from.

Optional Parameters

If specified, group by columns divide the input table into parts, one for each combination of values in the group by columns. For each combination of values a separate matrix is built, though they are all stored in the same output table or result data set. The group by columns must reside in the table named with the tablename parameter. The default case is no group by columns.
For example: groupby=column1,column2,column3
matrixoutput {COLUMNS|VARBYTE}
Matrix output can either be returned as COLUMNS in a table or as VARBYTE values, one per column, in a reduced output table. The default is output returned as COLUMNS.
The following types of matrix can be built with this function. If not specified, a sum-of-squares-and-cross-products (SSCP) matrix is built.
  • SSCP = sum-of-squares-and-cross-products matrix (the default)
  • ESSCP = Extended-sum-of-squares-and-cross-products matrix
  • CSSCP = Corrected-sum-of-squares-and-cross-products matrix
  • COV = Covariance matrix
  • COR = Correlation matrix
nullhandling {ZERO|IGNORE}
If a value in a selected column is NULL, the row that contains the NULL value is by default omitted from processing (nullhandling=IGNORE), or the value may be replaced in calculations with zero through the use of this parameter (nullhandling=ZERO).
The database that contains the resulting matrix output table.
If outputdatabase and outputtablename are not both specified, a volatile output table with randomly generated name is created in the logon userdatabase and the results are returned to the user in a result data set.
The name of the output table representing one or more matrices. If group by columns are specified, there is a matrix for each combination of group by column values.
Note that the output table must first be dropped by the user before executing the function if outputdatabase and outputtablename are both specified. If outputdatabase and outputtablename are not both specified, a volatile output table with randomly generated name is created in the logon userdatabase, and the result set is returned to the user instead.

When overwrite is set to true (default), the output tables are dropped before creating new ones.

An optional conditional expression may be specified with this parameter, limiting the amount of data used to build the matrix.
For example: where=income>0


Examples in this section demonstrate the use of Matrix Building with various available options. To execute the provided examples, the td_analyze function must be installed in a database called twm and the Teradata Warehouse Miner tutorial data must be installed in the twm_source database.
If these examples are executed, do not introduce extra spaces between parameters when copying.

In this example, input columns age, years_with_bank, and nbr_children are used to build a 3-by-3 SSCP matrix. No permanent output table is created, just a result data set that is returned to the user.

call twm.td_analyze('matrix','database=twm_source;tablename=twm_customer;columns=age,years_with_bank,nbr_children');

In this example, a null handling parameter is added so that NULL values are replaced with zeros.

call twm.td_analyze('matrix','database=twm_source;tablename=twm_customer;columns=age,years_with_bank,nbr_children;nullhandling=zero');
In this example, an output table is requested to persist a copy of the created matrix.
You must select the data out of the created matrix table in order to view it, since a result set is not returned to the user.
call twm.td_analyze('matrix','database=twm_source;tablename=twm_customer;columns=age,years_with_bank,nbr_children;outputdatabase=twm_results;outputtablename=matrix1b');

select * from twm_results.matrix1b order by 1;

In this example, an SQL WHERE clause is requested to limit the amount of data passed to the CALCMATRIX table operator.

call twm.td_analyze('matrix','database=twm_source;tablename=twm_customer;columns=age,years_with_bank,nbr_children;where=nbr_children >1');

In this example, an SQL GROUP BY clause is added to the requested matrix so that two matrices are built and returned in the same result set (for gender=’F’ and gender=’M’).

call twm.td_analyze('matrix','database=twm_source;tablename=twm_customer;columns=age,years_with_bank,nbr_children;groupby=gender');

In the following examples, one matrix of each type is created and returned to the user as a result set.

call twm.td_analyze('matrix','database=twm_source;tablename=twm_customer;columns=age,years_with_bank,nbr_children;matrixtype=COR');
call twm.td_analyze('matrix','database=twm_source;tablename=twm_customer;columns=age,years_with_bank,nbr_children;matrixtype=COV');
call twm.td_analyze('matrix','database=twm_source;tablename=twm_customer;columns=age,years_with_bank,nbr_children;matrixtype=CSSCP');
call twm.td_analyze('matrix','database=twm_source;tablename=twm_customer;columns=age,years_with_bank,nbr_children;matrixtype=ESSCP');
call twm.td_analyze('matrix','database=twm_source;tablename=twm_customer;columns=age,years_with_bank,nbr_children;matrixtype=SSCP');