| |
- Matrix(data, columns=None, exclude_columns=None, group_columns=None, matrix_output='columns', type='ESSCCP', handle_nulls='IGNORE', filter=None)
- DESCRIPTION:
Matrix builds an extended sum-of-squares-and-cross-products (ESSCP) matrix or other
derived matrix type from a teradataml DataFrame. Matrix does this with the help of
Teradata CALCMATRIX table operator provided in Teradata Vantage. The purpose of
building a matrix depends on the type of matrix built. For example, when a correlation
matrix is built, view it to determine the correlations or relationships between
the various columns in the matrix.
PARAMETERS:
data:
Required Argument.
Specifies the input data to build matrix from.
Types: teradataml DataFrame
columns:
Required Argument.
Specifies the name(s) of the column(s) used in building one or more matrices.
Occasionally, it can also accept permitted strings to specify all columns, or all
numeric columns.
Note:
Do not use the following column names, as these are reserved for use by the
CALCMATRIX table operator:
'rownum', 'rowname', 'c', or 's'.
Permitted Values:
* Name(s) of the columns in "data".
* Pre-defined strings:
* 'all' - all columns
* 'allnumeric' - all numeric columns
Types: str OR list of Strings (str)
exclude_columns:
Optional Argument.
Specifies the name(s) of the column(s) to exclude from the analysis, if a column
specifier such as 'all', 'allnumeric' is used in the "columns" argument.
For convenience, when the "exclude_columns" argument is used, dependent variable
and group by columns, if any, are automatically excluded as input columns and do
not need to be included in this argument.
Types: str OR list of Strings (str)
group_columns:
Optional Argument.
Specifies the name(s) of the column(s) in input teradataml DataFrame to build a
separate matrix for each combination. If specified, group by columns divide the
input into parts, one for each combination of values in the group by columns. For
each combination of values, a separate matrix is built, though they are all stored
in the same output.
Note:
Do not use the following column names, as these are reserved for use by the
CALCMATRIX table operator:
'rownum', 'rowname', 'c', or 's'.
Types: str OR list of Strings (str)
matrix_output:
Optional Argument.
Specifies the type of matrix output. Matrix output can either be returned as
COLUMNS in an output teradataml DataFrame or as VARBYTE values, one per column,
in a reduced output teradataml DataFrame.
Permitted Values: 'columns', 'varbyte'
Default Value: 'columns'
Types: str
type:
Optional Argument.
Specifies the type of matrix to build.
Permitted Values:
* 'SSCP' - sum-of-squares-and-cross-products matrix
* 'ESSCP' - Extended-sum-of-squares-and-cross-products matrix
* 'CSSCP' - Corrected-sum-of-squares-and-cross-products matrix
* 'COV' - Covariance matrix
* 'COR' - Correlation matrix
Default Value: 'ESSCP'
Types: str
handle_nulls:
Optional Argument.
Specifies a way to treat null values in selected columns. When set to IGNORE,
the row that contains the NULL value in a selected column is omitted from
processing. When set to ZERO, the NULL value is replaced with zero (0) in
calculations.
Permitted Values: 'IGNORE', 'ZERO'
Default Value: 'IGNORE'
Types: str
filter:
Optional Argument.
Specifies the clause to filter rows selected for building the matrix.
For example,
filter = "cust_id > 0"
Types: str
RETURNS:
An instance of Matrix.
Output teradataml DataFrames can be accessed using attribute references, such as
MatrixObj.<attribute_name>.
Output teradataml DataFrame attribute name is: result.
RAISES:
TeradataMlException, TypeError, ValueError
EXAMPLES:
# Notes:
# 1. To execute Vantage Analytic Library functions,
# a. import "valib" object from teradataml.
# b. set 'configure.val_install_location' to the database name where Vantage
# analytic library functions are installed.
# 2. Datasets used in these examples can be loaded using Vantage Analytic Library
# installer.
# Import valib object from teradataml to execute this function.
from teradataml import valib
# Set the 'configure.val_install_location' variable.
from teradataml import configure
configure.val_install_location = "SYSLIB"
# Create required teradataml DataFrame.
df = DataFrame("customer")
print(df)
# Example 1: Build a 3-by-3 ESSCP matrix on input columns 'age', 'years_with_bank',
# and 'nbr_children'.
obj = valib.Matrix(data=df,
columns=["age", "years_with_bank", "nbr_children"])
# Print the results.
print(obj.result)
# Example 2: Build a 3-by-3 CSSCP matrix on input columns 'age', 'years_with_bank',
# and 'nbr_children' with null handling, where NULLs are replaced with zero.
obj = valib.Matrix(data=df,
columns=["age", "years_with_bank", "nbr_children"],
handle_nulls="zero",
type="CSSCP")
# Print the results.
print(obj.result)
# Example 3: Build a 3-by-3 COR matrix by limiting the input data by filtering
# rows. Matrix is built on input columns 'age', 'years_with_bank',
# and 'nbr_children'.
obj = valib.Matrix(data=df,
columns=["age", "years_with_bank", "nbr_children"],
filter="nbr_children > 1",
type="COR")
# Print the results.
print(obj.result)
# Example 4: Build two 3-by-3 COV matrices by grouping data on "gender" column.
obj = valib.Matrix(data=df,
columns=["age", "years_with_bank", "nbr_children"],
group_columns="gender",
type="COV")
# Print the results.
print(obj.result)
|