CALL td_analyze (
'logistic',
'required_parameter_list [ optional_parameter; [...] ]'
);
- required_parameter_list
database = input_database_name;
tablename = input_table_name;
columns = { all | column_name [,...] };
dependent = column_name;
- optional_parameter
{ backward = { true | false } |
backwardonly = { true | false } |
columnstoexclude = column_name [,...] |
conditionindexthreshold = threshold |
constant = { true | false } |
convergence = convergence_value |
enter = entry_value |
forward = { true | false } |
forwardonly = { true | false } |
groupby = column_name [,...] |
lifttable = { true | false } |
matrixdatabase = matrix_database_name |
matrixtablename = matrix_table_name |
maxiterations = max_iterations |
memorysize = memory_size |
neardependencyreport = { true | false } |
outputdatabase = output_database_name |
outputtablename = output_table_name |
overwrite = { true | false } |
remove = removal_value |
response = response_value |
sample = { true | false } |
statstable = { true | false } |
stepwise = { true | false } |
successtable = { true | false } |
thresholdbegin = threshold_begin |
thresholdend = threshold_end |
thresholdincrement = threshold_increment |
thresholdtable = { true | false } |
varianceproportionthreshold = threshold
}
Syntax Elements
- database
- The database containing the input table.
- tablename
- The input table to build a logistic regression model from.
- columns
- The columns to analyze.
keyword |
Description |
all |
All columns. |
allnumeric |
All numeric columns. |
- dependent
- The input table column that represents the dependent variable.
- backward
- [Optional] Whether to start with all independent variables in the model and do the following until no more independent variables can be removed from the model:
- Take one backward step, removing the independent variable that worst explains the variance of the dependent variable (the variable that meets the criterion specified by remove).
- Take one forward step, adding the independent variable that best explains the variance of the dependent variable (the variable that meets the criterion specified by enter).
- Default: false
- backwardonly
- [Optional] Like backward without the forward step.
- Default: false
- columnstoexclude
- [Optional] The columns to exclude when columns specifies a keyword.
- Any groupby columns are automatically excluded.
- conditionindexthreshold
- [Optional] One of two thresholds for neardependencyreport.
- Default: 30
- constant
- [Optional] Whether the linear model includes a constant term.
- Linear equation with a constant term:
- Linear equation without a constant term:
- Default: true
- convergence
- [Optional] The convergence criterion. The algorithm stops iterating when the change in the log likelihood function falls below this value.
- Default: .001
- enter
- [Optional] The criterion to enter a variable into the model. The W-statistic chi-squared P-value of the variable must be less than entry_value for the variable to be added.
- Default entry_value: 0.05
- forward
- [Optional] Whether to start with no independent variables in the model and do the following until no more independent variables can be added to the model:
- Take one forward step, adding the independent variable that best explains the variance of the dependent variable (the variable that meets the criterion specified by enter).
- Take one backward step, removing the independent variable that worst explains the variance of the dependent variable (the variable that meets the criterion specified by remove).
- Refer to the rules in Stepwise Logistic Regression.
- forwardonly
- [Optional] Like forward without the backward step.
- Refer to the rules in Stepwise Logistic Regression.
- groupby
- [Optional] The input table columns for which to separately analyze each value or combination of values.
- Default behavior: Input is not grouped.
- lifttable
- [Optional] Whether to build a lift table (a table of information required to build a lift chart) and include it in the XML output string of the function.
- This table splits the computed probability values into deciles with counts and percentages to demonstrate what happens when rows of ordered probabilities accumulate.
- matrixdatabase
- [Optional] The database where the matrix table specified by matrixtablename resides.
- Required with matrixtablename.
- matrixtablename
- [Optional] The name of the ESSCP matrix that the Matrix Building function built.
- If you specify matrixtablename, these rules apply to the columns that columns specifies:
- They must appear in the matrix.
- They can be a subset of the columns in the matrix.
- columns can specify them in any order.
- If the matrix specifies groupby columns, the function must specify the same columns with groupby.
- maxiterations
- [Optional] The maximum number of attempts to converge on a solution.
- Default: 100
- memorysize
- [Optional] The memory size in megabytes to allocate for in-memory Logistic Regression.
- Adjust this value according to workstation and network requirements.
- If the data does not fit into this amount of memory, the function performs normal SQL processing.
- Default: 0 (disables in-memory calculation feature)
- neardependencyreport
- [Optional] Whether to output an XML report showing columns that may be collinear and store it in the XML output table if all these conditions are true:
- You specify outputdatabase and outputtablename.
- The thresholds conditionindexthreshold and varianceproportionthresholdspecify are crossed.
- The function detects collinearity.
- The same report is available for Factor Analysis, Linear Regression and Logistic Regression.
- Default: false
- overwrite
- [Optional] Whether to drop the output tables before creating new ones.
- Default: true
- outputdatabase
- [Optional] The database that contains the output table that represents one or more logistic models.
- If you do not specify both outputdatabase and outputtablename, the function creates a volatile output table with a randomly generated name in the logon user database.
- outputtablename
- [Optional] The name of the output table representing one or more logistic models (see groupby).
- The function creates a second output table of statistical measures, output_table_name_rpt, and a third XML output table of requested reports, output_table_name_txt.
- If you do not specify both outputdatabase and outputtablename, the function creates volatile output tables with randomly generated names in the logon user database and returns a result set.
- remove
- [Optional] The criterion to remove a variable from the model. The T-statistic P-value must be greater than removal_value for a variable to be removed.
- Default removal_value: 0.05
- response
- [Optional] The value assumed by the dependent column, to treat as the response value.
- Example: The dependent column, gender, has values M and F. To make F the response value, use response=F.
- sample
- [Optional] Whether to read a sample of the data into memory for processing.
- Useful when not all input data fits in memory.
- Default: false
- statstable
- [Optional] Whether to include a data quality report in the XML output string. The report includes the mean and standard deviation of each model variable, derived from an ESSCP matrix.
- Default: false
- stepwise
- [Optional] Whether to perform the stepwise procedure (forward, forwardonly, backward, or backwardonly).
- Refer to the rules in Stepwise Logistic Regression.
- Default: false
- successtable
- [Optional] Whether to include the Success Table in the function XML output string, showing counts of predicted and actual values of the dependent variable of the logistic regression model.
- The Success Table is similar to the Decision Tree Confusion Matrix, but the Success Table includes only two values of the dependent variable, response and nonresponse.
- Default: false
- thresholdbegin
- [Optional] The beginning threshold value for the Multithreshold Success Table (see thresholdtable).
- Default: 0
- thresholdend
- [Optional] The ending threshold value for the Multithreshold Success Table.
- Default: 0
- thresholdincrement
- [Optional] The difference in threshold values between adjacent rows in the Multithreshold Success Table.
- Default: 0
- thresholdtable
- [Optional] Whether to include the Multithreshold Success Table in the function XML output string.
- Each row of the Multithreshold Success Table is a Prediction Success Table with a different threshold value, determined by thresholdbegin, thresholdend, and thresholdincrement. In this context, the threshold is the value above which the predicted probability indicates a response.
- Default: false
- varianceproportionthreshold
- [Optional] One of two thresholds for neardependencyreport.
- Default: 0.5