Description
Logistic Regression function model can be passed to this function to generate evaluation reports. Function produces the result containing the following reports in XML format:
Success result - This output is delivered in the function's XML output string, displaying counts of predicted versus actual values of the dependent variable of the logistic regression model. This report is similar to the Decision Tree Confusion Matrix, but the Success output only includes two values of the dependent variable, namely response versus non-response.
Multi-Threshold Success result - This output is delivered in the function's XML output string. Report can be thought of as a table where each row is a Prediction Success Output, and each row has a different threshold value as generated by the "start.threshold", "end.threshold", and "increment.threshold" arguments. What is meant by a threshold here is the value above which the predicted probability indicates a response.
Lift result - Result containing information required to build a lift chart. It splits up the computed probability values into deciles with the usual counts and percentages to demonstrate what happens when more and more rows of ordered probabilities are accumulated. It is delivered in the function's XML output string.
Usage
td_log_reg_evaluator_valib(data, model, ...)
Arguments
data |
Required Argument. |
model |
Required Argument. |
... |
Specifies other arguments supported by the function as described in the 'Other Arguments' section. |
Value
Function returns an object of class "td_log_reg_evaluator_valib"
which is a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator
using name: result.
Other Arguments
estimate.column
Optional Argument.
Specifies the name of a column in the score output containing the
estimated value of the dependent variable (column).
Notes:
Either "estimate.column" or "prob.column" must be requested.
If the estimate column is not unique in the score output, '_tm_' is automatically placed in front of the name.
Types: character
index.columns
Optional Argument.
Specifies the name(s) of the column(s) representing the primary
index of the score output. By default, the primary index columns
of the score output are the primary index columns of the input.
In addition, the index columns need to form a unique key for the
score output. Otherwise, there are more than one score for a
given observation.
Types: character OR vector of Strings (character)
prob.column
Optional Argument.
Specifies the name of a column in the score output containing the
probability that the dependent value is equal to the response value.
Notes:
Either "estimate.column" or "prob.column" must be requested.
If the probability column is not unique in the score output, '_tm_' is automatically placed in front of the name. Types: character
accumulate
Optional Argument.
Specifies the name(s) of the column(s) from the input to retain in
the output.
Types: character OR vector of Strings (character)
prob.threshold
Optional Argument.
Specifies the probability threshold value. When the probability
of the dependent variable being 1 is greater than or equal to
this value, the estimated value of the dependent variable is 1.
If less than this value, the estimated value is 0.
Default Value: 0.5
Types: numeric
start.threshold
Optional Argument.
Specifies the beginning threshold value utilized in the
Multi-Threshold Success output.
Types: numeric
end.threshold
Optional Argument.
Specifies the ending threshold value utilized in the
Multi-Threshold Success output.
Types: numeric
increment.threshold
Optional Argument.
Specifies the difference in threshold values between
adjacent rows in the Multi-Threshold Success output.
Types: numeric
Examples
# Notes:
# 1. To execute Vantage Analytic Library functions, set option 'val.install.location' to
# the database name where Vantage analytic library functions are installed.
# 2. Datasets used in these examples can be loaded using Vantage Analytic Library installer.
# Set the option 'val.install.location'.
options(val.install.location = "SYSLIB")
# Get remote data source connection.
con <- td_get_context()$connection
# Create an object of class "tbl_teradata".
df <- tbl(con, "customer")
print(df)
# Example 1: Shows how evaluation on logistic model can be performed.
# Generate a logistic model.
log_reg_obj <- td_log_reg_valib(data=df,
columns=c("age", "years_with_bank", "income"),
response.column="nbr_children",
response.value=0)
# Print the model.
print(log_reg_obj$model)
# Evaluate the model generated above.
obj <- td_log_reg_evaluator_valib(data=df,
model=log_reg_obj$model,
prob.column="Probability")
# Print the results.
print(obj$result)