15.00 - CALCMATRIX - Teradata Database

Teradata Database SQL Functions, Operators, Expressions, and Predicates

Product
Teradata Database
Release Number
15.00
Content Type
Programming Reference
Publication ID
B035-1145-015K
Language
English (United States)
Last Update
2018-09-24

CALCMATRIX

Purpose  

Calculates a sum-of-squares-and-cross-products (SSCP) matrix.

Syntax  

where:

 

Syntax element …

Specifies …

TD_SYSFNLIB.

the name of the database where the function is located.

tableName

the name of a table with the input data for the function.

query_expression

the SELECT statement with input data for the function.

hashByColList

the column name set by which rows are to be hashed across the AMPs before being passed to the operator.

localOrderByList

the column name set by which rows are to be value-ordered locally on the AMPs before being passed to the operator.

ASC

that the results are to be ordered in ascending sort order.

If the sort field is a character string, the system orders it in ascending order according to the definition of the collation sequence for the current session.

The default order is ASC.

DESC

that the results are to be ordered in descending sort order.

If the sort field is a character string, the system orders it in descending order according to the definition of the collation sequence for the current session.

NULLS FIRST

that NULL results are to be listed first.

NULLS LAST

that NULL results are to be listed last.

PHASE

one of the phases of the function:

Valid values are:

  • ‘LOCAL’
  • ‘COMBINE’
  • For a definition of LOCAL and COMBINE, see “Usage Notes” on page 1296.

    CALCTYPE

    the type of matrix CALCMATRIX creates.

    Valid values are:

  • ‘SSCP’ = sum-of-squares-and-cross-products matrix (the default)
  • ‘ESSCP’= Extended-sum-of-squares-and-cross-products matrix
  • ‘CSSCP’ = Corrected-sum-of-squares-and-cross-products matrix
  • ‘COV’ = Covariance matrix
  • ‘COR’ = Correlation matrix
  • OUTPUT

    the format of the output:

    Valid values are:

  • ‘COLUMNS’
  • ‘VARBYTE’
  • NULL_HANDLING

    how nulls that appear as in any data columns are handled.

    Valid values are:

  • ‘ZERO’ = any NULL values are considered to be 0, and the row is processed (default).
  • ‘IGNORE’ = the row is skipped.
  • TD_SYSFNLIB

    Name of the database where the function is located.ANSI Compliance

    This is a Teradata extension to the ANSI SQL:2011 standard.

    Data Type

    The data type of the data columns must be some numeric data type.

    Authorization

    You must have EXECUTE FUNCTION privileges on the function or on the database containing the function.

    Usage Notes  

    Note: CALCMATRIX processes data representing a single input matrix or multiple input matrices. In the latter case, the input must have column(s) that identify which matrix a given input row belongs to. These columns are called as matrix id columns and the value of one tuple of matrix id columns is called a matrix id. All rows from the same matrix must have the same matrix id.

    CALCMATRIX works in LOCAL and COMBINE phases.

  • In the LOCAL phase, the function processes the input data that exists in the source table on each amp separately and outputs rows of summarized data for that amp. No data has to be moved among amps.
  • In the COMBINE phase, the function processes the summarized data from each amp (that is, the output from the LOCAL phase) and combines it to form the rows that make up final output.
  • In order to get one result matrix (or, if there are matrix id columns, one result matrix for each matrix id), the COMBINE phase must use a HASH BY clause and possibLY a LOCAL ORDER BY clause.

    LOCAL Phase Details

  • The input to the LOCAL phase must consist of zero or more matrix id columns, and 1 or more data columns (call this number n). The matrix id columns, if present, must be placed in the LOCAL ORDER BY list.
  • The output from the LOCAL phase consists of the matrix id columns, if any, an INT column named rownum, a VARCHAR(128) column named rowname, a BIGINT column named c (for count), a REAL column named s (for sum), and a REAL column for each data column of input.
  • The CALCTYPE and OUTPUT custom values in the LOCAL phase are always set to 'SSCP' and 'COLUMNS' respectively. Attempts to use different values for these keys are ignored.
  • If any of the data columns in a row have NULL values, the row's handling depends on the setting for NULL_HANDLING.
  • COMBINE Phase Details

  • The input to the COMBINE phase must be exactly the columns that are output from the LOCAL phase: zero or more matrix id columns, an INT rownum column, a VARCHAR(128) rowname column, a BIGINT column named c (for count), a REAL column names s (for sum), and a REAL column for each data column of input.
  • If there are matrix id columns in the data, they should be specified in the HASH BY and LOCAL ORDER BY lists. If not, a column with a constant value and an alias name, like "1 as p", should be added to the ON clause and the HASH BY list.

  • The custom values for CALCTYPE and OUTPUT control the output columns from the COMBINE phase.
  • If the OUTPUT is set to 'COLUMNS', then the columns output from the 'COMBINE' phase are: The matrix id columns, if present, an INT rownum column, a VARCHAR(128) rowname column, a REAL column for each data column of input and, if the CALCTYPE is 'ESSCP', a BIGINT column named c (for count), a REAL column named s (for sum).
  • If the OUTPUT is set to 'VARBYTE', then the columns output from the 'COMBINE' phase are: an INT rowname column, a VARCHAR(128) rowname column, and a VARBYTE column named v.

  • If there are no matrix id columns, the final output of the COMBINE phase consists of n rows, where n is the number of data columns in the input. The rownum field has values from 1 to n. Thus each output row is one row from the resulting matrix. If there are matrix id columns, the final output of the COMBINE phase consists of n rows per matrix id. The rownum field will have values from 1 to n.
  • VARBYTE Output

    When the OUTPUT is ‘VARBYTE’, the output row consists of the matrix id columns, if any, and these fields:

  • INT rownum
  • VARCHAR(128) rowname
  • VARBTYE value
  • Details for the Both the LOCAL and the COMBINE Phases

  • The values for the rownum column ranges from 1 to n, the number of data columns in the input. If the rownum column has value x, the rowname column is the x'th data column name.
  • If the columns in the ON clause are expressions, they must have an alias name. Columns in the LOCAL ORDER BY or HASH BY lists must be column or alias names from the ON clause, or ordinal values.
  • Example : One Input Matrix, Single SQL Statement

    In the following example, table T has 1 billion rows of weather data. It has 1 geometry column representing the point (latitude/longitude) of the observation, and these REAL columns:

  • temperature
  • air_pressure
  • rainfall.
  • Use an SQL statement that invokes CALCMATRIX twice, once for the LOCAL phase and once for the COMBINE phase, as follows:

       select * from CALCMATRIX( ON (
       select 1 as p, X.* from CALCMATRIX( ON (select temperature,       air_pressure, rainfall from T) 
       USING PHASE('LOCAL') )X )
       HASH BY p
       USING PHASE('COMBINE')
       )Y;

    The result set would be the SSCP matrix for the weather data. The following table shows the result set. n means the numeric result of the calculation.

     

    rownum

    rowname

    temperature

    air_pressure

    rainfall

    1

    temperature

    n

    n

    n

    2

    air_pressure

    n

    n

    n

    3

    rainfall

    n

    n

    n

    Example : One Input Matrix, Multiple SQL Statements

    In the following example, table T has the same weather data as table T in “Example 1: One Input Matrix, Single SQL Statement.” However, the aim this time is to produce both the covariance and correlation matrices. In order to avoid 2 passes over billion rows of raw data, insert the results of CALCMATRIX using the LOCAL phase into an intermediate table. Then run CALCMATRIX using the COMBINE phase twice, but against the intermediate table, which only has 3 * (the number of amps in the system) rows.

       CREATE TABLE T_INTERMEDIATE as (
       select 1 as p, X.* from CALCMATRIX( ON (select temperature,       air_pressure, rainfall from T) USING PHASE('LOCAL') )X 
       ) WITH DATA;
       
       select * from CALCMATRIX ( ON T_INTERMEDIATE
       HASH BY p
       USING PHASE('COMBINE') CALCTYPE('COV')   
       )Y;
       
       select * from CALCMATRIX( ON T_INTERMEDIATE
       HASH BY p
       USING PHASE('COMBINE') CALCTYPE('COR')   
       )Y;

    Example : Multiple Input Matrices, One SQL Statement

    In the following example, table T has the same weather data as table T in “Example 1: One Input Matrix, Single SQL Statement” and “Example 2: One Input Matrix, Multiple SQL Statements.” In this example, in addition to the point location of the observation, there is an additional VARCHAR column for continent, which is the matrix id. Our aim is to produce 7 SSCP matrices, one for each continent’s data.

    Use an SQL statement that invokes CALCMATRIX twice, once for the LOCAL phase and once for the COMBINE phase.

     select * from calcmatrix( ON (
       select * from calcmatrix( ON (select continent, temperature,    air_pressure, rainfall from T) LOCAL ORDER BY continent USING    PHASE('LOCAL') )X )
       HASH BY continent
       LOCAL ORDER BY continent
       USING PHASE('COMBINE')
       )Y
       order by continent, rownum;

    The result set in the following tables is the SSCP matrix for weather data by continent.

    Note: Only 2 of the matrices are shown below.

     

    continent

    rownum

    rowname

    temperature

    air_pressure

    rainfall

    Europe

    1

    temperature

    n

    n

    n

    Europe

    2

    air_pressure

    n

    n

    n

    Europe

    3

    rainfall

    n

    n

    n

     

    continent

    rownum

    rowname

    temperature

    air_pressure

    rainfall

    Asia

    1

    temperature

    n

    n

    n

    Asia

    2

    air_pressure

    n

    n

    n

    Asia

    3

    rainfall

    n

    n

    n

    . . .