Syntax | Logistic Regression | Vantage Analytics Library - Syntax

CALL td_analyze (
  'logistic',
  'required_parameter_list [ optional_parameter; [...] ]'
);

required_parameter_list

database = input_database_name;
tablename = input_table_name;
columns = { all | column_name [,...] };
dependent = column_name;

optional_parameter

{ backward = { true | false } |
  backwardonly = { true | false } |
  columnstoexclude = column_name [,...] |
  conditionindexthreshold = threshold |
  constant = { true | false } |
  convergence = convergence_value |
  enter = entry_value |
  forward = { true | false } |
  forwardonly = { true | false } |
  groupby = column_name [,...] |
  lifttable = { true | false } |
  matrixdatabase = matrix_database_name |
  matrixtablename = matrix_table_name |
  maxiterations = max_iterations |
  memorysize = memory_size |
  neardependencyreport = { true | false } |
  outputdatabase = output_database_name |
  outputtablename = output_table_name |
  overwrite = { true | false } |
  remove = removal_value |
  response = response_value |
  sample = { true | false } |
  statstable = { true | false } |
  stepwise = { true | false } |
  successtable = { true | false } |
  thresholdbegin = threshold_begin |
  thresholdend = threshold_end |
  thresholdincrement = threshold_increment |
  thresholdtable = { true | false } |
  varianceproportionthreshold = threshold
}

Syntax Elements

database

The database containing the input table.

tablename

The input table to build a logistic regression model from.

columns

The columns to analyze.

keyword	Description
all	All columns.
allnumeric	All numeric columns.

dependent

The input table column that represents the dependent variable.

backward

[Optional] Whether to start with all independent variables in the model and do the following until no more independent variables can be removed from the model:

Take one backward step, removing the independent variable that worst explains the variance of the dependent variable (the variable that meets the criterion specified by remove).
Take one forward step, adding the independent variable that best explains the variance of the dependent variable (the variable that meets the criterion specified by enter).

Default: false

backwardonly

[Optional] Like backward without the forward step.

Default: false

columnstoexclude

[Optional] The columns to exclude when columns specifies a keyword.

Any groupby columns are automatically excluded.

conditionindexthreshold

[Optional] One of two thresholds for neardependencyreport.

Default: 30

constant

[Optional] Whether the linear model includes a constant term.

Linear equation with a constant term:

Linear equation without a constant term:

Default: true

convergence

[Optional] The convergence criterion. The algorithm stops iterating when the change in the log likelihood function falls below this value.

Default: .001

enter

[Optional] The criterion to enter a variable into the model. The W-statistic chi-squared P-value of the variable must be less than entry_value for the variable to be added.

Default entry_value: 0.05

forward

[Optional] Whether to start with no independent variables in the model and do the following until no more independent variables can be added to the model:

Take one forward step, adding the independent variable that best explains the variance of the dependent variable (the variable that meets the criterion specified by enter).
Take one backward step, removing the independent variable that worst explains the variance of the dependent variable (the variable that meets the criterion specified by remove).

Refer to the rules in Stepwise Logistic Regression.

forwardonly

[Optional] Like forward without the backward step.

Refer to the rules in Stepwise Logistic Regression.

groupby

[Optional] The input table columns for which to separately analyze each value or combination of values.

Default behavior: Input is not grouped.

lifttable

[Optional] Whether to build a lift table (a table of information required to build a lift chart) and include it in the XML output string of the function.

This table splits the computed probability values into deciles with counts and percentages to demonstrate what happens when rows of ordered probabilities accumulate.

matrixdatabase

[Optional] The database where the matrix table specified by matrixtablename resides.

Required with matrixtablename.

matrixtablename

[Optional] The name of the ESSCP matrix that the Matrix Building function built.

If you specify matrixtablename, these rules apply to the columns that columns specifies:

They must appear in the matrix.
They can be a subset of the columns in the matrix.
columns can specify them in any order.

If the matrix specifies groupby columns, the function must specify the same columns with groupby.

maxiterations

[Optional] The maximum number of attempts to converge on a solution.

Default: 100

memorysize

[Optional] The memory size in megabytes to allocate for in-memory Logistic Regression.

Adjust this value according to workstation and network requirements.

If the data does not fit into this amount of memory, the function performs normal SQL processing.

Default: 0 (disables in-memory calculation feature)

neardependencyreport

[Optional] Whether to output an XML report showing columns that may be collinear and store it in the XML output table if all these conditions are true:

You specify outputdatabase and outputtablename.
The thresholds conditionindexthreshold and varianceproportionthresholdspecify are crossed.
The function detects collinearity.

The same report is available for Factor Analysis, Linear Regression and Logistic Regression.

Default: false

overwrite

[Optional] Whether to drop the output tables before creating new ones.

Default: true

outputdatabase

[Optional] The database that contains the output table that represents one or more logistic models.

If you do not specify both outputdatabase and outputtablename, the function creates a volatile output table with a randomly generated name in the logon user database.

outputtablename

[Optional] The name of the output table representing one or more logistic models (see groupby).

The function creates a second output table of statistical measures, output_table_name_rpt, and a third XML output table of requested reports, output_table_name_txt.

If you do not specify both outputdatabase and outputtablename, the function creates volatile output tables with randomly generated names in the logon user database and returns a result set.

remove

[Optional] The criterion to remove a variable from the model. The T-statistic P-value must be greater than removal_value for a variable to be removed.

Default removal_value: 0.05

response

[Optional] The value assumed by the dependent column, to treat as the response value.

Example: The dependent column, gender, has values M and F. To make F the response value, use response=F.

sample

[Optional] Whether to read a sample of the data into memory for processing.

Useful when not all input data fits in memory.

Default: false

statstable

[Optional] Whether to include a data quality report in the XML output string. The report includes the mean and standard deviation of each model variable, derived from an ESSCP matrix.

Default: false

stepwise

[Optional] Whether to perform the stepwise procedure (forward, forwardonly, backward, or backwardonly).

Refer to the rules in Stepwise Logistic Regression.

Default: false

successtable

[Optional] Whether to include the Success Table in the function XML output string, showing counts of predicted and actual values of the dependent variable of the logistic regression model.

The Success Table is similar to the Decision Tree Confusion Matrix, but the Success Table includes only two values of the dependent variable, response and nonresponse.

Default: false

thresholdbegin

[Optional] The beginning threshold value for the Multithreshold Success Table (see thresholdtable).

Default: 0

thresholdend

[Optional] The ending threshold value for the Multithreshold Success Table.

Default: 0

thresholdincrement

[Optional] The difference in threshold values between adjacent rows in the Multithreshold Success Table.

Default: 0

thresholdtable

[Optional] Whether to include the Multithreshold Success Table in the function XML output string.

Each row of the Multithreshold Success Table is a Prediction Success Table with a different threshold value, determined by thresholdbegin, thresholdend, and thresholdincrement. In this context, the threshold is the value above which the predicted probability indicates a response.

Default: false

varianceproportionthreshold

[Optional] One of two thresholds for neardependencyreport.

Default: 0.5

Syntax | Logistic Regression | Vantage Analytics Library - Syntax - Vantage Analytics Library

Vantage Analytics Library User Guide

Syntax Elements