Teradata Package for R Function Reference | 17.00 - DecisionTreePredict - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Product
Teradata Package for R
Release Number
17.00
Published
July 2021
Language
English (United States)
Last Update
2023-08-08
dita:id
B700-4007
NMT
no
Product Category
Teradata Vantage
DecisionTreePredict

Description

The DecisionTreePredict function applies a tree model to input data, to output predicted labels for each data point.

Usage

  td_decision_tree_predict_sqle (
      object = NULL,
      newdata = NULL,
      attr.table.groupby.columns = NULL,
      attr.table.pid.columns = NULL,
      attr.table.val.column = NULL,
      accumulate = NULL,
      output.response.probdist = FALSE,
      output.responses = NULL,
      newdata.partition.column = NULL,
      newdata.order.column = NULL,
      object.order.column = NULL
  )
## S3 method for class 'td_decision_tree_mle'
predict(
      object = NULL,
      newdata = NULL,
      attr.table.groupby.columns = NULL,
      attr.table.pid.columns = NULL,
      attr.table.val.column = NULL,
      accumulate = NULL,
      output.response.probdist = FALSE,
      output.responses = NULL,
      newdata.partition.column = NULL,
      newdata.order.column = NULL,
      object.order.column = NULL)

Arguments

object

Required Argument.
Specifies the model tbl_teradata generated by DecisionTree (td_decision_tree_mle) function.
This argument can accept either a tbl_teradata or an object of "td_decision_tree_mle" class.

object.order.column

Optional Argument.
Specifies Order By columns for "object".
Values to this argument can be provided as a vector, if multiple columns are used for ordering.
Types: character OR vector of Strings (character)

newdata

Required Argument.
Specifies the name of the tbl_teradata containing the attribute names and the values.

newdata.partition.column

Required Argument.
Specifies Partition By columns for "newdata".
Values to this argument can be provided as a vector, if multiple columns are used for partition.
Types: character OR vector of Strings (character)

newdata.order.column

Optional Argument.
Specifies Order By columns for "newdata".
Values to this argument can be provided as a vector, if multiple columns are used for ordering.
Types: character OR vector of Strings (character)

attr.table.groupby.columns

Required Argument.
Specifies the names of the columns on which "newdata" is partitioned. Each partition contains one attribute of the input data.
Types: character OR vector of Strings (character)

attr.table.pid.columns

Required Argument.
Specifies the names of the columns that define the data point identifiers.
Types: character OR vector of Strings (character)

attr.table.val.column

Required Argument.
Specifies the name of the column that contains the input values.
Types: character

accumulate

Optional Argument.
Specifies the names of "newdata" columns to copy to the output tbl_teradata.
Types: character OR vector of Strings (character)

output.response.probdist

Optional Argument.
Specifies whether to output probabilities.
Note: "output.response.probdist" argument can accept input value True only when tdplyr is connected to Vantage 1.0 Maintenance Update 2 version or later.
Default Value: FALSE
Types: logical

output.responses

Optional Argument. Required if "output.response.probdist" is TRUE.
Specifies all responses in newdata.
Types: character OR vector of characters

Value

Function returns an object of class "td_decision_tree_predict_sqle" which is a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator using the name: result.

Examples

  
    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("decisiontreepredict_example", "iris_attribute_test")
    loadExampleData("decision_tree_example", "iris_attribute_train", "iris_response_train",
                    "iris_altinput")
    
    # Create object(s) of class "tbl_teradata".
    iris_attribute_train <- tbl(con, "iris_attribute_train")
    iris_response_train <- tbl(con, "iris_response_train")
    iris_attribute_test <- tbl(con, "iris_attribute_test")
    
    # Example -
    # First train the data, i.e. create a Model
    decision_tree_out <- td_decision_tree_mle(attribute.name.columns = c("attribute"),
                               attribute.value.column = "attrvalue",
                               id.columns = c("pid"),
                               attribute.table = iris_attribute_train,
                               response.table = iris_response_train,
                               response.column = "response",
                               num.splits = 3,
                               approx.splits = FALSE,
                               nodesize = 10,
                               max.depth = 10,
                               split.measure = "gini"
                               )
    
    # Run predict on the output of td_decision_tree_mle() function.
    td_decision_tree_predict_out <- td_decision_tree_predict_sqle(object = decision_tree_out,
                                       newdata = iris_attribute_test,
                                       newdata.partition.column = c("pid"),
                                       newdata.order.column = c("attribute"),
                                       attr.table.groupby.columns = c("attribute"),
                                       attr.table.pid.columns = c("pid"),
                                       attr.table.val.column = "attrvalue",
                                       accumulate = c("attrvalue")
                                       )
    
    # Alternatively use S3 predict function to run predict on the output of
    # td_decision_tree_mle() function.
    predict_out <- predict(decision_tree_out,
                      newdata = iris_attribute_test,
                      newdata.partition.column = c("pid"),
                      newdata.order.column = c("attribute"),
                      attr.table.groupby.columns = c("attribute"),
                      attr.table.pid.columns = c("pid"),
                      attr.table.val.column = "attrvalue",
                      accumulate = c("attrvalue")
                      )