Description
The Gain Ratio Extreme Decision Tree function performs decision tree modeling and returns an object of class tbl_teradata containing one row with two columns. The second column contains an XML string representing the resulting decision tree model described in Predictive Model Markup Language (PMML).
Usage
td_decision_tree_valib(data, columns, response.column, ...)
Arguments
data |
Required Argument. |
columns |
Required Argument.
Types: character OR vector of Strings (character) |
response.column |
Required Argument. |
... |
Specifies other arguments supported by the function as described in the 'Other Arguments' section. |
Value
Function returns an object of class "td_decision_tree_valib"
which is a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator
using name: result.
Other Arguments
algorithm
Optional Argument.
Specifies the name of the algorithm that the decision tree uses
during building.
Permitted Values: "gainratio"
Default Value: "gainratio"
Types: character
binning
Optional Argument.
Specifies whether to perform binning on the continuous independent
variables automatically. When set to TRUE, continuous data is
separated into one hundred bins. If the column has fewer than one
hundred distinct values, this argument is ignored.
Default Value: FALSE
Types: logical
exclude.columns
Optional Argument.
Specifies the name(s) of the column(s) to exclude from the
decision tree building. If 'all', 'allnumeric' or 'allcharacter'
is used in the "columns" argument, this argument can be used
to exclude specific columns from tree building.
Types: character OR vector of Strings (character)
max.depth
Optional Argument.
Specifies the maximum number of levels the tree can grow.
Default Value: 100
Types: integer
num.splits
Optional Argument.
Specifies how far the decision tree can be split. Unless a node
is pure (meaning it has only observations with the same dependent
value) it splits if each branch that can come off this node
contains at least this many observations. The default is a minimum
of two cases for each branch.
Types: integer
operator.database
Optional Argument.
Specifies the database where the table operators called by
Vantage Analytic Library reside. If not specified, the
library searches the standard search path for table
operators, including the current database.
Types: character
pruning
Optional Argument.
Specifies the style of pruning to use after the tree is fully built.
Permitted Values: "gainratio", "none" (no pruning)
Default Value: "gainratio"
Types: character
Examples
# Notes:
# 1. To execute Vantage Analytic Library functions, set option 'val.install.location' to
# the database name where Vantage analytic library functions are installed.
# 2. Datasets used in these examples can be loaded using Vantage Analytic Library installer.
# Set the option 'val.install.location'.
options(val.install.location = "SYSLIB")
# Get remote data source connection.
con <- td_get_context()$connection
# Create an object of class "tbl_teradata".
df <- tbl(con, "customer_analysis")
print(df)
# Run td_decision_tree_valib() on columns "age", "income" and "nbr_children", with
# dependent variable "gender".
obj <- td_decision_tree_valib(data=df,
columns=c("age", "income", "nbr_children"),
response.column="gender",
algorithm="gainratio",
binning=FALSE,
max.depth=5,
num.splits=2,
pruning="gainratio")
# Print the results.
print(obj$result)