Syntax | Linear Regression Function | Vantage Analytics Library - Syntax

CALL td_analyze (
  'linear',
  'required_parameter_list [ optional_parameter; [...] ]'
);

required_parameter_list

database = input_database_name;
tablename = input_table_name;
columns = { all | column_name [,...] };
dependent = column_name;

optional_parameter

{ backward = { true | false } |
  backwardonly = { true | false } |
  columnstoexclude = column_name [,...] |
  conditionindexthreshold = threshold |
  constant = { true | false } |
  enter = entry_value |
  forward = { true | false } |
  forwardonly = { true | false } |
  groupby = column_name [,...] |
  matrixinput = { true | false } |
  neardependencyreport = { true | false } |
  outputdatabase = output_database_name |
  outputtablename = output_table_name |
  overwrite = { true | false } |
  remove = removal_value |
  statstable = { true | false } |
  stepwise = { true | false } |
  usefstat = { true | false } |
  usepvalue = { true | false } |
  varianceproportionthreshold = threshold
}

Syntax Elements

database

The database containing the input table.

tablename

The input table from which to build a predictive model.

columns

The columns to analyze.

keyword	Description
all	All columns.
allnumeric	All numeric columns.

dependent

The input table column that represents the dependent variable.

backward

[Optional] Whether to start with all independent variables in the model and do the following until no more independent variables can be removed from the model:

Take one backward step, removing the independent variable that worst explains the variance of the dependent variable (the variable that meets the criterion specified by remove).
Take one forward step, adding the independent variable that best explains the variance of the dependent variable (the variable that meets the criterion specified by enter).

Default: false

backwardonly

[Optional] Like backward without the forward step.

Default: false

columnstoexclude

[Optional] The columns to exclude when columns specifies a keyword.

Any groupby columns are automatically excluded.

conditionindexthreshold

[Optional] One of two thresholds for neardependencyreport.

Default: 30

constant

[Optional] Whether the linear model includes a constant term.

Linear equation with a constant term:

Linear equation without a constant term:

Default: true

enter

[Optional] The criterion to enter a variable into the model.

Condition	Entry Criterion
usefstat=true or more than one variable has P-value zero.	Partial F-statistic must be greater than entry_value. Default entry_value: 3.84
usepvalue=true (Ignored if more than one variable has P-value zero.)	TStatistic P-value (ratio of B coefficient of variable to its standard error) must be less than entry_value. Default entry_value: 0.05

forward

[Optional] Whether to start with no independent variables in the model and do the following until no more independent variables can be added to the model:

Take one forward step, adding the independent variable that best explains the variance of the dependent variable (the variable that meets the criterion specified by enter).
Take one backward step, removing the independent variable that worst explains the variance of the dependent variable (the variable that meets the criterion specified by remove).

forwardonly

[Optional] Like forward without the backward step.

groupby

[Optional] The input table columns for which to separately analyze each value or combination of values.

Do not use the following names for groupby columns. These names are reserved for use by the CALCMATRIX table operator.

c
rowname
rownum
s

The function builds a separate matrix for each combination of values, storing them in the same output table or result dataset.

Default behavior: Input is not grouped.

matrixinput

[Optional] Whether the input table located by database and tablename represents an ESSCP matrix built by the Matrix Building function and saved to a table.

If the input table represents a saved matrix and you do not specify matrixinput=true, the function may interpret the matrix an ordinary table, causing unpredictable results.

When the function uses a saved ESSCP matrix as the input table, it does not have to build the matrix each time it is called, providing a significant performance improvement.

If matrixinput=true, these rules apply to the columns that columns specifies:

They must appear in the matrix.
They can be a subset of the columns in the matrix.
columns can specify them in any order.

If the matrix specifies groupby columns, the function must specify the same columns with groupby.

Default: false

neardependencyreport

[Optional] Whether to output an XML report showing columns that may be collinear and store it in the XML output table if all these conditions are true:

You specify outputdatabase and outputtablename.
The thresholds conditionindexthreshold and varianceproportionthresholdspecify are crossed.
The function detects collinearity.

The same report is available for Factor Analysis, Linear Regression and Logistic Regression.

Default: false

outputdatabase

[Optional] The database that contains the output table that represents one or more linear models.

outputtablename

[Optional] The name of the output table representing one or more linear models (see groupby).

The function creates a second output table of statistical measures, output_table_name_rpt, and a third XML output table of requested reports, output_table_name_txt.

If you do not specify both outputdatabase and outputtablename, the function creates volatile output tables with randomly generated names in the logon user database and returns a result set.

overwrite

[Optional] Whether to drop the output tables before creating new ones.

Default: true

remove

[Optional] The criterion to remove a variable from the model.

Condition	Entry Criterion
usefstat=true	Partial F-statistic must be less than removal_value. Default removal_value: 3.84
usepvalue=true	TStatistic P-value must be greater than removal_value. Default removal_value: 0.05

statstable

[Optional] Whether to include a data quality report in the XML output string. The report includes the mean and standard deviation of each model variable, derived from an ESSCP matrix.

stepwise

[Optional] Whether to perform the stepwise procedure (forward, forwardonly, backward, or backwardonly).

Default: false

usefstat

[Optional] Whether to use the partial F-Statistic to decide whether to add or remove a variable.

Default (if you omit both usefstat and usepvalue): true

usepvalue

[Optional] Whether to use the T-Statistic P-value to decide whether to add or remove a variable.

When stepwise, usefstat is the default. If not stepwise, usefstat and usepvalue must not be selected. Do not select both groupby and stepwise. Additional criteria for using usepvalue are as follows:

Value	Criteria
enter	Must be great than or equal to remove when using F Statistic
remove	Must be greater than or equal to enter when using P-Value
remove	Must be greater than or equal to 0 and less than or equal to 1 when using P-Value.

varianceproportionthreshold

[Optional] One of two thresholds for neardependencyreport.

Default: 0.5

Syntax | Linear Regression Function | Vantage Analytics Library - Syntax - Vantage Analytics Library

Vantage Analytics Library User Guide

Syntax Elements