| |
- LogRegEvaluator(data, model, estimate_column=None, index_columns=None, prob_column=None, accumulate=None, prob_threshold=0.5, start_threshold=None, end_threshold=None, increment_threshold=None, gen_sql_only=False, charset=None)
- DESCRIPTION:
Logistic Regression function model can be passed to this function to generate evaluation
reports. Function produces the result containing the following reports in XML format:
* Success result - This output is delivered in the function's XML output string,
displaying counts of predicted versus actual values of the dependent variable
of the logistic regression model. This report is similar to the Decision Tree
Confusion Matrix, but the Success output only includes two values of the
dependent variable, namely response versus non-response.
* Multi - Threshold Success result - This output is delivered in the function's XML
output string. Report can be thought of as a table where each row is a Prediction
Success Output, and each row has a different threshold value as generated by the
"start_threshold", "end_threshold", and "increment_threshold" arguments. What
is meant by a threshold here is the value above which the predicted probability
indicates a response.
* Lift result - Result containing information, such as would be required to build
a lift chart is available. It splits up the computed probability values into
deciles with the usual counts and percentages to demonstrate what happens when
more and more rows of ordered probabilities are accumulated. It is delivered in
the function's XML output string.
PARAMETERS:
data:
Required Argument.
Specifies the input data to evaluate.
Types: teradataml DataFrame
model:
Required Argument.
Specifies the input containing the logistic model to use in scoring. This must
be the "model" teradataml DataFrame generated by LogReg() function from VALIB or
a teradataml DataFrame created on a table generated by 'logistic' function from
Vantage Analytic Library.
Types: teradataml DataFrame
estimate_column:
Required Argument.
Specifies the name of a column in the score output containing the estimated value
of the dependent variable (column).
Notes:
1. Either "estimate_column" or "prob_column" must be requested.
2. If the estimate column is not unique in the score output, '_tm_' is
automatically placed in front of the name.
Types: str
index_columns:
Optional Argument.
Specifies the name(s) of the column(s) representing the primary index of the
score output. By default, the primary index columns of the score output are
the primary index columns of the input. In addition, the index columns need
to form a unique key for the score output. Otherwise, there are more than one
score for a given observation.
Types: str OR list of Strings (str)
prob_column:
Optional Argument.
Specifies the name of a column in the score output containing the probability
that the dependent value is equal to the response value.
Notes:
1. Either "estimate_column" or "prob_column" must be requested.
2. If the probability column is not unique in the score output, '_tm_' is
automatically placed in front of the name.
Types: str
accumulate:
Optional Argument.
Specifies the name(s) of the column(s) from the input to retain in the output.
Types: str OR list of Strings (str)
prob_threshold:
Optional Argument.
Specifies the probability threshold value. When the probability of the dependent
variable being 1 is greater than or equal to this value, the estimated value of
the dependent variable is 1. If less than this value, the estimated value is 0.
Default Value: 0.5
Types: float, int
start_threshold:
Optional Argument.
Specifies the beginning threshold value utilized in the Multi-Threshold Success output.
Types: float, int
end_threshold:
Optional Argument.
Specifies the ending threshold value utilized in the Multi-Threshold Success output.
Types: float, int
increment_threshold:
Optional Argument.
Specifies the difference in threshold values between adjacent rows in the
Multi-Threshold Success output.
Types: float, int
gen_sql_only:
Optional Argument.
Specifies whether to generate only SQL for the function.
When set to True, function SQL is generated, not executed, which can be accessed
using show_query() method, otherwise SQL is just executed but not returned.
Default Value: False
Types: bool
charset:
Optional Argument.
Specifies the character set for the table name and column names.
If this argument is not set, the function takes default value set by
VAL library.
Permitted Values:
* 'UTF8'
* 'ASCII'
Types: str
RETURNS:
An instance of LogRegEvaluator.
Output teradataml DataFrames can be accessed using attribute references, such as
LogRegEvaluatorObj.<attribute_name>.
Output teradataml DataFrame attribute name is: result
RAISES:
TeradataMlException, TypeError, ValueError
EXAMPLES:
# Notes:
# 1. To execute Vantage Analytic Library functions,
# a. import "valib" object from teradataml.
# b. set 'configure.val_install_location' to the database name where Vantage
# analytic library functions are installed.
# 2. Datasets used in these examples can be loaded using Vantage Analytic Library
# installer.
# Import valib object from teradataml to execute this function.
from teradataml import valib
# Set the 'configure.val_install_location' variable.
from teradataml import configure
configure.val_install_location = "SYSLIB"
# Create required teradataml DataFrame.
df = DataFrame("customer")
print(df)
# Example 1: Shows how evaluation on logistic model can be performed.
# Generate a logistic model.
log_reg_obj = valib.LogReg(data=df,
columns=["age", "years_with_bank", "income"],
response_column="nbr_children",
response_value=0)
# Print the model.
print(log_reg_obj.model)
# Evaluate the model generated above.
obj = valib.LogRegEvaluator(data=df,
model=log_reg_obj.model,
prob_column="Probability")
# Print the results.
print(obj.result)
# Example 2: Generate only SQL for the function, but do not execute the same.
obj = valib.LogRegEvaluator(data=df,
model=log_reg_obj.model,
gen_sql_only=True)
# Print the generated SQL.
print(obj.show_query("sql"))
# Print both generated SQL and stored procedure call.
print(obj.show_query("both"))
# Print the stored procedure call.
print(obj.show_query())
print(obj.show_query("sp"))
|