| |
- DecisionTreePredict(data, model, include_confidence=False, index_columns=None, response_column=None, accumulate=None, targeted_value=None, gen_sql_only=False)
- DESCRIPTION:
The function predicts the values of the dependent variable in test data, using the model
created by DecisionTree() VALIB function. The function also generates two profile
DataFrames containing the details about the decisions made during the prediction.
Apart from the score and profile DataFrames, the function optionally generates
1. Confidence factors
2. Targeted Binary Confidence
PARAMETERS:
data:
Required Argument.
Specifies the input data containing the columns to analyse, representing the
dependent and independent variables in the analysis.
Types: teradataml DataFrame
model:
Required Argument.
Specifies the teradataml DataFrame generated by VALIB DecisionTree() function,
containing the decision tree model in PMML format that is used to predict the data.
Types: teradataml DataFrame
include_confidence:
Optional Argument.
Specifies whether the output DataFrame contain a column indicating how likely it
is, for a particular leaf node on the tree, that the prediction is correct. If not
specified or set to 'False', the confidence column is not created.
Note:
This argument cannot be specified along with "targeted_value" argument.
Default Value: False
Types: bool
index_columns:
Optional Argument.
Specifies one or more different columns for the primary index of the result output
DataFrame. By default, the primary index columns of the result output DataFrame are
the primary index columns of the input DataFrame "data". In addition, the columns
specified in this argument need to form a unique key for the result output DataFrame.
Otherwise, there are more than one score for a given observation.
Types: str OR list of Strings (str)
response_column:
Optional Argument.
Specifies the name of the predicted value column. If this argument is not specified,
the name of the dependent column in "data" DataFrame is used.
Types: str
accumulate:
Optional Argument.
Specifies one or more columns from the "data" DataFrame that can be passed to the
result output DataFrame.
Types: str OR list of Strings (str)
targeted_value:
Optional Argument.
Specifies whether the result output DataFrame contain a column indicating how likely
it is, for a particular leaf node and targeted value of a predicted result with only
two values, that the prediction is correct.
Note:
This argument cannot be specified along with "include_confidence" argument.
Permitted values: One of the values in the dependent column used in the argument
"response_column".
Types: str
gen_sql_only:
Optional Argument.
Specifies whether to generate only SQL for the function.
When set to True, function SQL is generated, not executed, which can be accessed
using show_query() method, otherwise SQL is just executed but not returned.
Default Value: False
Types: bool
RETURNS:
An instance of DecisionTreePredict.
Output teradataml DataFrames can be accessed using attribute references, such as
DecisionTreePredObj.<attribute_name>.
Output teradataml DataFrame attribute names are
1. result
2. profile_result_1
3. profile_result_2
RAISES:
TeradataMlException, TypeError, ValueError
EXAMPLES:
# Notes:
# 1. To execute Vantage Analytic Library functions,
# a. import "valib" object from teradataml.
# b. set 'configure.val_install_location' to the database name where Vantage
# analytic library functions are installed.
# 2. Datasets used in these examples can be loaded using Vantage Analytic Library
# installer.
# Import valib object from teradataml to execute this function.
from teradataml import valib
# Set the 'configure.val_install_location' variable.
from teradataml import configure
configure.val_install_location = "SYSLIB"
# Create the required teradataml DataFrame.
df = DataFrame("customer_analysis")
print(df)
# Run DecisionTree() on columns "age", "income" and "nbr_children", with dependent
# variable "gender".
dt_obj = valib.DecisionTree(data=df,
columns=["age", "income", "nbr_children"],
response_column="gender",
algorithm="gainratio",
binning=False,
max_depth=5,
num_splits=2,
pruning="gainratio")
# Example 1: Predict the likeliness for a particular leaf node in the tree.
obj = valib.DecisionTreePredict(data=df,
model=dt_obj.result,
include_confidence=True,
accumulate=["city_name", "state_code"])
# Print the results.
print(obj.result)
print(obj.profile_result_1)
print(obj.profile_result_2)
# Example 2: Predict the likeliness for a particular leaf node in the tree and binary
# targeted value.
obj = valib.DecisionTreePredict(data=df,
model=dt_obj.result,
targeted_value="F",
accumulate=["city_name", "state_code"])
# Print the results.
print(obj.result)
print(obj.profile_result_1)
print(obj.profile_result_2)
# Example 3: Generate only SQL for the function, but do not execute the same.
obj = valib.DecisionTreePredict(data=df,
model=decision_tree_obj.result,
targeted_value="F",
accumulate=["city_name", "state_code"],
gen_sql_only=True)
# Print the generated SQL.
print(obj.show_query("sql"))
# Print both generated SQL and stored procedure call.
print(obj.show_query("both"))
# Print the stored procedure call.
print(obj.show_query())
print(obj.show_query("sp"))
|