Syntax | Linear Regression Function | Vantage Analytics Library - Syntax - Vantage Analytics Library

Vantage Analytics Library User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
Lake
VMware
Product
Vantage Analytics Library
Release Number
2.2.0
Published
March 2023
Language
English (United States)
Last Update
2024-01-02
dita:mapPath
ibw1595473364329.ditamap
dita:ditavalPath
iup1603985291876.ditaval
dita:id
zyl1473786378775
Product Category
Teradata Vantage
CALL td_analyze (
  'linear',
  'required_parameter_list [ optional_parameter; [...] ]'
);
required_parameter_list
database = input_database_name;
tablename = input_table_name;
columns = { all | column_name [,...] };
dependent = column_name;
optional_parameter
{ backward = { true | false } |
  backwardonly = { true | false } |
  columnstoexclude = column_name [,...] |
  conditionindexthreshold = threshold |
  constant = { true | false } |
  enter = entry_value |
  forward = { true | false } |
  forwardonly = { true | false } |
  groupby = column_name [,...] |
  matrixinput = { true | false } |
  neardependencyreport = { true | false } |
  outputdatabase = output_database_name |
  outputtablename = output_table_name |
  overwrite = { true | false } |
  remove = removal_value |
  statstable = { true | false } |
  stepwise = { true | false } |
  usefstat = { true | false } |
  usepvalue = { true | false } |
  varianceproportionthreshold = threshold
}

Syntax Elements

database
The database containing the input table.
tablename
The input table from which to build a predictive model.
columns
The columns to analyze.
keyword Description
all All columns.
allnumeric All numeric columns.
dependent
The input table column that represents the dependent variable.
backward
[Optional] Whether to start with all independent variables in the model and do the following until no more independent variables can be removed from the model:
  1. Take one backward step, removing the independent variable that worst explains the variance of the dependent variable (the variable that meets the criterion specified by remove).
  2. Take one forward step, adding the independent variable that best explains the variance of the dependent variable (the variable that meets the criterion specified by enter).
Default: false
backwardonly
[Optional] Like backward without the forward step.
Default: false
columnstoexclude
[Optional] The columns to exclude when columns specifies a keyword.
Any groupby columns are automatically excluded.
conditionindexthreshold
[Optional] One of two thresholds for neardependencyreport.
Default: 30
constant
[Optional] Whether the linear model includes a constant term.
Linear equation with a constant term:

Linear equation with a constant
Linear equation without a constant term:

Linear equation without a constant
Default: true
enter
[Optional] The criterion to enter a variable into the model.
Condition Entry Criterion
usefstat=true

or more than one variable has P-value zero.

Partial F-statistic must be greater than entry_value.

Default entry_value: 3.84

usepvalue=true

(Ignored if more than one variable has P-value zero.)

TStatistic P-value (ratio of B coefficient of variable to its standard error) must be less than entry_value.

Default entry_value: 0.05

forward
[Optional] Whether to start with no independent variables in the model and do the following until no more independent variables can be added to the model:
  1. Take one forward step, adding the independent variable that best explains the variance of the dependent variable (the variable that meets the criterion specified by enter).
  2. Take one backward step, removing the independent variable that worst explains the variance of the dependent variable (the variable that meets the criterion specified by remove).
forwardonly
[Optional] Like forward without the backward step.
groupby
[Optional] The input table columns for which to separately analyze each value or combination of values.
Do not use the following names for groupby columns. These names are reserved for use by the CALCMATRIX table operator.
  • c
  • rowname
  • rownum
  • s
The function builds a separate matrix for each combination of values, storing them in the same output table or result dataset.
Default behavior: Input is not grouped.
matrixinput
[Optional] Whether the input table located by database and tablename represents an ESSCP matrix built by the Matrix Building function and saved to a table.
If the input table represents a saved matrix and you do not specify matrixinput=true, the function may interpret the matrix an ordinary table, causing unpredictable results.
When the function uses a saved ESSCP matrix as the input table, it does not have to build the matrix each time it is called, providing a significant performance improvement.
If matrixinput=true, these rules apply to the columns that columns specifies:
  • They must appear in the matrix.
  • They can be a subset of the columns in the matrix.
  • columns can specify them in any order.
If the matrix specifies groupby columns, the function must specify the same columns with groupby.
Default: false
neardependencyreport
[Optional] Whether to output an XML report showing columns that may be collinear and store it in the XML output table if all these conditions are true:
  • You specify outputdatabase and outputtablename.
  • The thresholds conditionindexthreshold and varianceproportionthresholdspecify are crossed.
  • The function detects collinearity.
The same report is available for Factor Analysis, Linear Regression and Logistic Regression.
Default: false
outputdatabase
[Optional] The database that contains the output table that represents one or more linear models.
outputtablename
[Optional] The name of the output table representing one or more linear models (see groupby).
The function creates a second output table of statistical measures, output_table_name_rpt, and a third XML output table of requested reports, output_table_name_txt.
If you do not specify both outputdatabase and outputtablename, the function creates volatile output tables with randomly generated names in the logon user database and returns a result set.
overwrite
[Optional] Whether to drop the output tables before creating new ones.
Default: true
remove
[Optional] The criterion to remove a variable from the model.
Condition Entry Criterion
usefstat=true Partial F-statistic must be less than removal_value.

Default removal_value: 3.84

usepvalue=true TStatistic P-value must be greater than removal_value.

Default removal_value: 0.05

statstable
[Optional] Whether to include a data quality report in the XML output string. The report includes the mean and standard deviation of each model variable, derived from an ESSCP matrix.
stepwise
[Optional] Whether to perform the stepwise procedure (forward, forwardonly, backward, or backwardonly).
Default: false
usefstat
[Optional] Whether to use the partial F-Statistic to decide whether to add or remove a variable.
Default (if you omit both usefstat and usepvalue): true
usepvalue
[Optional] Whether to use the T-Statistic P-value to decide whether to add or remove a variable.
When stepwise, usefstat is the default. If not stepwise, usefstat and usepvalue must not be selected. Do not select both groupby and stepwise. Additional criteria for using usepvalue are as follows:
Value Criteria
enter Must be great than or equal to remove when using F Statistic
remove Must be greater than or equal to enter when using P-Value
remove Must be greater than or equal to 0 and less than or equal to 1 when using P-Value.
   
   
   
varianceproportionthreshold
[Optional] One of two thresholds for neardependencyreport.
Default: 0.5