Teradata Package for R Function Reference | 17.00 - 17.00 - td_decision_tree_valib - Teradata Package for R

Teradata® Package for R Function Reference

Product
Teradata Package for R
Release Number
17.00
Release Date
July 2021
Content Type
Programming Reference
Publication ID
B700-4007-090K
Language
English (United States)

Description

The Gain Ratio Extreme Decision Tree function performs decision tree modeling and returns an object of class tbl_teradata containing one row with two columns. The second column contains an XML string representing the resulting decision tree model described in Predictive Model Markup Language (PMML).

Usage

td_decision_tree_valib(data, columns, response.column, ...)

Arguments

data

Required Argument.
Specifies the input data to be used for decision tree modeling.
Types: tbl_teradata

columns

Required Argument.
Specifies the name(s) of the column(s) to be used in decision tree building. Occasionally, it can also accept permitted strings to specify all columns, all numeric columns or all character columns.
Permitted Values:

  1. Name(s) of the column(s) in "data".

  2. Pre-defined strings:

    1. 'all' - all columns

    2. 'allnumeric' - all numeric columns

    3. 'allcharacter' - all character columns

Types: character OR vector of Strings (character)

response.column

Required Argument.
Specifies the name of a column whose values are being predicted.
Types: character

...

Specifies other arguments supported by the function as described in the 'Other Arguments' section.

Value

Function returns an object of class "td_decision_tree_valib" which is a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator using name: result.

Other Arguments

algorithm

Optional Argument.
Specifies the name of the algorithm that the decision tree uses during building.
Permitted Values: "gainratio"
Default Value: "gainratio"
Types: character

binning

Optional Argument.
Specifies whether to perform binning on the continuous independent variables automatically. When set to TRUE, continuous data is separated into one hundred bins. If the column has fewer than one hundred distinct values, this argument is ignored.
Default Value: FALSE
Types: logical

exclude.columns

Optional Argument.
Specifies the name(s) of the column(s) to exclude from the decision tree building. If 'all', 'allnumeric' or 'allcharacter' is used in the "columns" argument, this argument can be used to exclude specific columns from tree building.
Types: character OR vector of Strings (character)

max.depth

Optional Argument.
Specifies the maximum number of levels the tree can grow.
Default Value: 100
Types: integer

num.splits

Optional Argument.
Specifies how far the decision tree can be split. Unless a node is pure (meaning it has only observations with the same dependent value) it splits if each branch that can come off this node contains at least this many observations. The default is a minimum of two cases for each branch.
Types: integer

operator.database

Optional Argument.
Specifies the database where the table operators called by Vantage Analytic Library reside. If not specified, the library searches the standard search path for table operators, including the current database.
Types: character

pruning

Optional Argument.
Specifies the style of pruning to use after the tree is fully built.
Permitted Values: "gainratio", "none" (no pruning)
Default Value: "gainratio"
Types: character

Examples

# Notes:
#   1. To execute Vantage Analytic Library functions, set option 'val.install.location' to
#      the database name where Vantage analytic library functions are installed.
#   2. Datasets used in these examples can be loaded using Vantage Analytic Library installer.

# Set the option 'val.install.location'.
options(val.install.location = "SYSLIB")

# Get remote data source connection.
con <- td_get_context()$connection

# Create an object of class "tbl_teradata".
df <- tbl(con, "customer_analysis")
print(df)

# Run td_decision_tree_valib() on columns "age", "income" and "nbr_children", with
# dependent variable "gender".
obj <- td_decision_tree_valib(data=df,
                              columns=c("age", "income", "nbr_children"),
                              response.column="gender",
                              algorithm="gainratio",
                              binning=FALSE,
                              max.depth=5,
                              num.splits=2,
                              pruning="gainratio")
# Print the results.
print(obj$result)