Syntax | Logistic Regression | Vantage Analytics Library - Syntax - Vantage Analytics Library

Vantage Analytics Library User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
Lake
VMware
Product
Vantage Analytics Library
Release Number
2.2.0
Published
March 2023
Language
English (United States)
Last Update
2024-01-02
dita:mapPath
ibw1595473364329.ditamap
dita:ditavalPath
iup1603985291876.ditaval
dita:id
zyl1473786378775
Product Category
Teradata Vantage
CALL td_analyze (
  'logistic',
  'required_parameter_list [ optional_parameter; [...] ]'
);
required_parameter_list
database = input_database_name;
tablename = input_table_name;
columns = { all | column_name [,...] };
dependent = column_name;
optional_parameter
{ backward = { true | false } |
  backwardonly = { true | false } |
  columnstoexclude = column_name [,...] |
  conditionindexthreshold = threshold |
  constant = { true | false } |
  convergence = convergence_value |
  enter = entry_value |
  forward = { true | false } |
  forwardonly = { true | false } |
  groupby = column_name [,...] |
  lifttable = { true | false } |
  matrixdatabase = matrix_database_name |
  matrixtablename = matrix_table_name |
  maxiterations = max_iterations |
  memorysize = memory_size |
  neardependencyreport = { true | false } |
  outputdatabase = output_database_name |
  outputtablename = output_table_name |
  overwrite = { true | false } |
  remove = removal_value |
  response = response_value |
  sample = { true | false } |
  statstable = { true | false } |
  stepwise = { true | false } |
  successtable = { true | false } |
  thresholdbegin = threshold_begin |
  thresholdend = threshold_end |
  thresholdincrement = threshold_increment |
  thresholdtable = { true | false } |
  varianceproportionthreshold = threshold
}

Syntax Elements

database
The database containing the input table.
tablename
The input table to build a logistic regression model from.
columns
The columns to analyze.
keyword Description
all All columns.
allnumeric All numeric columns.
dependent
The input table column that represents the dependent variable.
backward
[Optional] Whether to start with all independent variables in the model and do the following until no more independent variables can be removed from the model:
  1. Take one backward step, removing the independent variable that worst explains the variance of the dependent variable (the variable that meets the criterion specified by remove).
  2. Take one forward step, adding the independent variable that best explains the variance of the dependent variable (the variable that meets the criterion specified by enter).
Default: false
backwardonly
[Optional] Like backward without the forward step.
Default: false
columnstoexclude
[Optional] The columns to exclude when columns specifies a keyword.
Any groupby columns are automatically excluded.
conditionindexthreshold
[Optional] One of two thresholds for neardependencyreport.
Default: 30
constant
[Optional] Whether the linear model includes a constant term.
Linear equation with a constant term:

Linear equation with a constant
Linear equation without a constant term:

Linear equation without a constant
Default: true
convergence
[Optional] The convergence criterion. The algorithm stops iterating when the change in the log likelihood function falls below this value.
Default: .001
enter
[Optional] The criterion to enter a variable into the model. The W-statistic chi-squared P-value of the variable must be less than entry_value for the variable to be added.
Default entry_value: 0.05
forward
[Optional] Whether to start with no independent variables in the model and do the following until no more independent variables can be added to the model:
  1. Take one forward step, adding the independent variable that best explains the variance of the dependent variable (the variable that meets the criterion specified by enter).
  2. Take one backward step, removing the independent variable that worst explains the variance of the dependent variable (the variable that meets the criterion specified by remove).
Refer to the rules in Stepwise Logistic Regression.
forwardonly
[Optional] Like forward without the backward step.
Refer to the rules in Stepwise Logistic Regression.
groupby
[Optional] The input table columns for which to separately analyze each value or combination of values.
Default behavior: Input is not grouped.
lifttable
[Optional] Whether to build a lift table (a table of information required to build a lift chart) and include it in the XML output string of the function.
This table splits the computed probability values into deciles with counts and percentages to demonstrate what happens when rows of ordered probabilities accumulate.
matrixdatabase
[Optional] The database where the matrix table specified by matrixtablename resides.
Required with matrixtablename.
matrixtablename
[Optional] The name of the ESSCP matrix that the Matrix Building function built.
If you specify matrixtablename, these rules apply to the columns that columns specifies:
  • They must appear in the matrix.
  • They can be a subset of the columns in the matrix.
  • columns can specify them in any order.
If the matrix specifies groupby columns, the function must specify the same columns with groupby.
maxiterations
[Optional] The maximum number of attempts to converge on a solution.
Default: 100
memorysize
[Optional] The memory size in megabytes to allocate for in-memory Logistic Regression.
Adjust this value according to workstation and network requirements.
If the data does not fit into this amount of memory, the function performs normal SQL processing.
Default: 0 (disables in-memory calculation feature)
neardependencyreport
[Optional] Whether to output an XML report showing columns that may be collinear and store it in the XML output table if all these conditions are true:
  • You specify outputdatabase and outputtablename.
  • The thresholds conditionindexthreshold and varianceproportionthresholdspecify are crossed.
  • The function detects collinearity.
The same report is available for Factor Analysis, Linear Regression and Logistic Regression.
Default: false
overwrite
[Optional] Whether to drop the output tables before creating new ones.
Default: true
outputdatabase
[Optional] The database that contains the output table that represents one or more logistic models.
If you do not specify both outputdatabase and outputtablename, the function creates a volatile output table with a randomly generated name in the logon user database.
outputtablename
[Optional] The name of the output table representing one or more logistic models (see groupby).
The function creates a second output table of statistical measures, output_table_name_rpt, and a third XML output table of requested reports, output_table_name_txt.
If you do not specify both outputdatabase and outputtablename, the function creates volatile output tables with randomly generated names in the logon user database and returns a result set.
remove
[Optional] The criterion to remove a variable from the model. The T-statistic P-value must be greater than removal_value for a variable to be removed.
Default removal_value: 0.05
response
[Optional] The value assumed by the dependent column, to treat as the response value.
Example: The dependent column, gender, has values M and F. To make F the response value, use response=F.
sample
[Optional] Whether to read a sample of the data into memory for processing.
Useful when not all input data fits in memory.
Default: false
statstable
[Optional] Whether to include a data quality report in the XML output string. The report includes the mean and standard deviation of each model variable, derived from an ESSCP matrix.
Default: false
stepwise
[Optional] Whether to perform the stepwise procedure (forward, forwardonly, backward, or backwardonly).
Refer to the rules in Stepwise Logistic Regression.
Default: false
successtable
[Optional] Whether to include the Success Table in the function XML output string, showing counts of predicted and actual values of the dependent variable of the logistic regression model.
The Success Table is similar to the Decision Tree Confusion Matrix, but the Success Table includes only two values of the dependent variable, response and nonresponse.
Default: false
thresholdbegin
[Optional] The beginning threshold value for the Multithreshold Success Table (see thresholdtable).
Default: 0
thresholdend
[Optional] The ending threshold value for the Multithreshold Success Table.
Default: 0
thresholdincrement
[Optional] The difference in threshold values between adjacent rows in the Multithreshold Success Table.
Default: 0
thresholdtable
[Optional] Whether to include the Multithreshold Success Table in the function XML output string.
Each row of the Multithreshold Success Table is a Prediction Success Table with a different threshold value, determined by thresholdbegin, thresholdend, and thresholdincrement. In this context, the threshold is the value above which the predicted probability indicates a response.
Default: false
varianceproportionthreshold
[Optional] One of two thresholds for neardependencyreport.
Default: 0.5