Matrix Building - Teradata Warehouse Miner

In-Database Analytic Functions User Guide

Product
Teradata Warehouse Miner
Release Number
5.4.5
Published
February 2018
Language
English (United States)
Last Update
2018-05-04
dita:mapPath
dfw1503087325991.ditamap
dita:ditavalPath
ft:empty
dita:id
B035-2306
lifecycle
previous
Product Category
Teradata® Warehouse Miner

Purpose

This function builds a sum-of-squares-and-cross-products (SSCP) matrix or other derived matrix type from a table in a Teradata Database. It does this by generating and executing the SQL to call the Teradata CALCMATRIX table operator provided in Teradata beginning with the 14.10 release. The results are stored either in a table or as a result set returned to the user. The purpose in building a matrix depends on the type of matrix built. For example, when a correlation matrix is built, it can be viewed to determine the correlations or relationships between the various columns in the matrix.

Refer to the Teradata reference manual SQL Functions, Operators, Expressions, and Predicates, B035-1145, for more detailed information about the CALCMATRIX table operator.

Syntax

call twm.td_analyze('matrix','database=twm_database;tablename=twm_tablename;columns=c1,c2;outputdatabase=twm_out_db;outputtablename=twm_out_tbl;matrixtype={COR|COV|CSSCP|ESSCP|SSCP};where=c1>0;groupby=c3,c4;nullhandling={ZERO|IGNORE};matrixoutput={COLUMNS|VARBYTE}; ');

Required Parameters

columns
The input columns comprising the created matrix or matrices. The columns must reside in the table named with the tablename parameter, residing in the database named with the database parameter.
For example: columns=column1,column2,column3
database
The database containing the input table.
tablename
The input table to build a matrix from.

Optional Parameters

groupby
If specified, group by columns divide the input table into parts, one for each combination of values in the group by columns. For each combination of values a separate matrix is built, though they are all stored in the same output table or result data set. The group by columns must reside in the table named with the tablename parameter. The default case is no group by columns.
For example: groupby=column1,column2,column3
matrixoutput {COLUMNS|VARBYTE}
Matrix output can either be returned as COLUMNS in a table or as VARBYTE values, one per column, in a reduced output table. The default is output returned as COLUMNS.
matrixtype
The following types of matrix can be built with this function. If not specified, a sum-of-squares-and-cross-products (SSCP) matrix is built.
  • SSCP = sum-of-squares-and-cross-products matrix (the default)
  • ESSCP = Extended-sum-of-squares-and-cross-products matrix
  • CSSCP = Corrected-sum-of-squares-and-cross-products matrix
  • COV = Covariance matrix
  • COR = Correlation matrix
nullhandling {ZERO|IGNORE}
If a value in a selected column is NULL, the row that contains the NULL value is by default omitted from processing (nullhandling=IGNORE), or the value may be replaced in calculations with zero through the use of this parameter (nullhandling=ZERO).
outputdatabase
The database that contains the resulting matrix output table.
If outputdatabase and outputtablename are not both specified, a volatile output table with randomly generated name is created in the logon userdatabase and the results are returned to the user in a result data set.
outputtablename
The name of the output table representing one or more matrices. If group by columns are specified, there is a matrix for each combination of group by column values.
Note that the output table must first be dropped by the user before executing the function if outputdatabase and outputtablename are both specified. If outputdatabase and outputtablename are not both specified, a volatile output table with randomly generated name is created in the logon userdatabase, and the result set is returned to the user instead.
overwrite

When overwrite is set to true (default), the output tables are dropped before creating new ones.

where
An optional conditional expression may be specified with this parameter, limiting the amount of data used to build the matrix.
For example: where=income>0

Examples

Examples in this section demonstrate the use of Matrix Building with various available options. To execute the provided examples, the td_analyze function must be installed in a database called twm and the Teradata Warehouse Miner tutorial data must be installed in the twm_source database.
If these examples are executed, do not introduce extra spaces between parameters when copying.

In this example, input columns age, years_with_bank, and nbr_children are used to build a 3-by-3 SSCP matrix. No permanent output table is created, just a result data set that is returned to the user.

call twm.td_analyze('matrix','database=twm_source;tablename=twm_customer;columns=age,years_with_bank,nbr_children');

In this example, a null handling parameter is added so that NULL values are replaced with zeros.

call twm.td_analyze('matrix','database=twm_source;tablename=twm_customer;columns=age,years_with_bank,nbr_children;nullhandling=zero');
In this example, an output table is requested to persist a copy of the created matrix.
You must select the data out of the created matrix table in order to view it, since a result set is not returned to the user.
call twm.td_analyze('matrix','database=twm_source;tablename=twm_customer;columns=age,years_with_bank,nbr_children;outputdatabase=twm_results;outputtablename=matrix1b');

select * from twm_results.matrix1b order by 1;

In this example, an SQL WHERE clause is requested to limit the amount of data passed to the CALCMATRIX table operator.

call twm.td_analyze('matrix','database=twm_source;tablename=twm_customer;columns=age,years_with_bank,nbr_children;where=nbr_children >1');

In this example, an SQL GROUP BY clause is added to the requested matrix so that two matrices are built and returned in the same result set (for gender=’F’ and gender=’M’).

call twm.td_analyze('matrix','database=twm_source;tablename=twm_customer;columns=age,years_with_bank,nbr_children;groupby=gender');

In the following examples, one matrix of each type is created and returned to the user as a result set.

call twm.td_analyze('matrix','database=twm_source;tablename=twm_customer;columns=age,years_with_bank,nbr_children;matrixtype=COR');
call twm.td_analyze('matrix','database=twm_source;tablename=twm_customer;columns=age,years_with_bank,nbr_children;matrixtype=COV');
call twm.td_analyze('matrix','database=twm_source;tablename=twm_customer;columns=age,years_with_bank,nbr_children;matrixtype=CSSCP');
call twm.td_analyze('matrix','database=twm_source;tablename=twm_customer;columns=age,years_with_bank,nbr_children;matrixtype=ESSCP');
call twm.td_analyze('matrix','database=twm_source;tablename=twm_customer;columns=age,years_with_bank,nbr_children;matrixtype=SSCP');