Teradata R Package Function Reference - 16.20 - DecisionTreePredict - Teradata R Package

Teradata® R Package Function Reference

prodname
Teradata R Package
vrm_release
16.20
created_date
February 2020
category
Programming Reference
featnum
B700-4007-098K

Description

The Decision Tree Predict function applies a tree model to input data, to output predicted labels for each data point.

Usage

  td_decision_tree_predict_sqle (
      object = NULL,
      newdata = NULL,
      attr.table.groupby.columns = NULL,
      attr.table.pid.columns = NULL,
      attr.table.val.column = NULL,
      accumulate = NULL,
      output.response.probdist = FALSE,
      output.responses = NULL,
      newdata.partition.column = NULL,
      newdata.order.column = NULL
  )
## S3 method for class 'td_decision_tree_mle'
predict(
      object = NULL, 
      newdata = NULL,
      attr.table.groupby.columns = NULL,
      attr.table.pid.columns = NULL,
      attr.table.val.column = NULL, 
      accumulate = NULL,
      output.response.probdist = FALSE, 
      output.responses = NULL,
      newdata.partition.column = NULL, 
      newdata.order.column = NULL)

Arguments

object

Required Argument.
Specifies the name of the object that contains the decision tree model which is the output of function td_decision_tree_mle. For td_decision_tree_predict_sqle, this can also be the tibble containing the model of decision tree model.

newdata

Required Argument.
Specifies the teradata_tbl object pointing to a test dataset table in the Advanced SQL Engine.

newdata.partition.column

Partition By columns for newdata.
Values to this argument can be provided as vector, if multiple columns are used for ordering.

newdata.order.column

Order By columns for newdata.
Values to this argument can be provided as vector, if multiple columns are used for ordering.

attr.table.groupby.columns

Required Argument.
Specifies the names of the columns on which attribute_table is partitioned. Each partition contains one attribute of the input data.

attr.table.pid.columns

Required Argument. Specifies the names of the columns that define the data point identifiers.

attr.table.val.column

Required Argument.
Specifies the name of the column that contains the input values.

accumulate

Optional Argument.
Specifies the names of input table columns to copy to the output table.

output.response.probdist

Optional Argument.
Specifies whether to output probabilities. If this value is TRUE, you must specify in the 'output.responses' argument every label in the input table.
Default Value: FALSE

output.responses

Required if 'output.response.probdist' is TRUE, otherwise disallowed. Specifies the labels in the input table.

Value

Function returns an object of class "td_decision_tree_predict_sqle" which is a named list containing Teradata tbl object.
Named list member can be referenced directly with the "$" operator using name: result.

Examples

    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("decisiontreepredict_example", "iris_attribute_test", "iris_attribute_output")
    loadExampleData("decision_tree_example", "iris_attribute_train", "iris_response_train", "iris_altinput")
    
    # Create remote tibble objects.
    iris_attribute_train <- tbl(con, "iris_attribute_train")
    iris_response_train <- tbl(con, "iris_response_train")
    iris_attribute_test <- tbl(con, "iris_attribute_test")
    
    # Example -
    # First train the data, i.e. create a Model
    decision_tree_out <- td_decision_tree_mle(attribute.name.columns = c("attribute"),
                               attribute.value.column = "attrvalue",
                               id.columns = c("pid"),
                               attribute.table = iris_attribute_train,
                               response.table = iris_response_train,
                               response.column = "response",
                               num.splits = 3,
                               approx.splits = FALSE,
                               nodesize = 10,
                               max.depth = 10,
                               split.measure = "gini"
                               )
    
    #Run predict on the output of td_decision_tree_mle
    td_decision_tree_predict_out <- td_decision_tree_predict_sqle(object = decision_tree_out,
                                       newdata = iris_attribute_test,
                                       newdata.partition.column = c("pid"),
                                       newdata.order.column = c("attribute"),
                                       attr.table.groupby.columns = c("attribute"),
                                       attr.table.pid.columns = c("pid"),
                                       attr.table.val.column = "attrvalue",
                                       accumulate = c("attrvalue")
                                       )
    
    # Alternatively use S3 predict function to run predict on the output of td_decision_tree_mle.
    predict_out <- predict(decision_tree_out,
                      newdata = iris_attribute_test,
                      newdata.partition.column = c("pid"),
                      newdata.order.column = c("attribute"),
                      attr.table.groupby.columns = c("attribute"),
                      attr.table.pid.columns = c("pid"),
                      attr.table.val.column = "attrvalue",
                      accumulate = c("attrvalue")
                      )