Teradata R Package Function Reference - DecisionTreePredict - Teradata R Package - Look here for syntax, methods and examples for the functions included in the Teradata R Package.

Description

The Decision Tree Predict function applies a tree model to input data, to output predicted labels for each data point.

Usage

  td_decision_tree_predict_sqle (
      object = NULL,
      newdata = NULL,
      attr.table.groupby.columns = NULL,
      attr.table.pid.columns = NULL,
      attr.table.val.column = NULL,
      accumulate = NULL,
      output.response.probdist = FALSE,
      output.responses = NULL,
      newdata.partition.column = NULL,
      newdata.order.column = NULL
  )
## S3 method for class 'td_decision_tree_mle'
predict(
      object = NULL, 
      newdata = NULL,
      attr.table.groupby.columns = NULL,
      attr.table.pid.columns = NULL,
      attr.table.val.column = NULL, 
      accumulate = NULL,
      output.response.probdist = FALSE, 
      output.responses = NULL,
      newdata.partition.column = NULL, 
      newdata.order.column = NULL)

Arguments

`object`	Required Argument. Specifies the name of the object that contains the decision tree model which is the output of function `td_decision_tree_mle`. For `td_decision_tree_predict_sqle`, this can also be the tibble containing the model of decision tree model.
`newdata`	Required Argument. Specifies the teradata_tbl object pointing to a test dataset table in the Advanced SQL Engine.
`newdata.partition.column`	Partition By columns for newdata. Values to this argument can be provided as vector, if multiple columns are used for ordering.
`newdata.order.column`	Order By columns for newdata. Values to this argument can be provided as vector, if multiple columns are used for ordering.
`attr.table.groupby.columns`	Required Argument. Specifies the names of the columns on which attribute_table is partitioned. Each partition contains one attribute of the input data.
`attr.table.pid.columns`	Required Argument. Specifies the names of the columns that define the data point identifiers.
`attr.table.val.column`	Required Argument. Specifies the name of the column that contains the input values.
`accumulate`	Optional Argument. Specifies the names of input table columns to copy to the output table.
`output.response.probdist`	Optional Argument. Specifies whether to output probabilities. If this value is TRUE, you must specify in the 'output.responses' argument every label in the input table. Default Value: FALSE
`output.responses`	Required if 'output.response.probdist' is TRUE, otherwise disallowed. Specifies the labels in the input table.

Value

Function returns an object of class "td_decision_tree_predict_sqle" which is a named list containing Teradata tbl object.
Named list member can be referenced directly with the "$" operator using name: result.

Examples

    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("decisiontreepredict_example", "iris_attribute_test", "iris_attribute_output")
    loadExampleData("decision_tree_example", "iris_attribute_train", "iris_response_train", "iris_altinput")
    
    # Create remote tibble objects.
    iris_attribute_train <- tbl(con, "iris_attribute_train")
    iris_response_train <- tbl(con, "iris_response_train")
    iris_attribute_test <- tbl(con, "iris_attribute_test")
    
    # Example -
    # First train the data, i.e. create a Model
    decision_tree_out <- td_decision_tree_mle(attribute.name.columns = c("attribute"),
                               attribute.value.column = "attrvalue",
                               id.columns = c("pid"),
                               attribute.table = iris_attribute_train,
                               response.table = iris_response_train,
                               response.column = "response",
                               num.splits = 3,
                               approx.splits = FALSE,
                               nodesize = 10,
                               max.depth = 10,
                               split.measure = "gini"
                               )
    
    #Run predict on the output of td_decision_tree_mle
    td_decision_tree_predict_out <- td_decision_tree_predict_sqle(object = decision_tree_out,
                                       newdata = iris_attribute_test,
                                       newdata.partition.column = c("pid"),
                                       newdata.order.column = c("attribute"),
                                       attr.table.groupby.columns = c("attribute"),
                                       attr.table.pid.columns = c("pid"),
                                       attr.table.val.column = "attrvalue",
                                       accumulate = c("attrvalue")
                                       )
    
    # Alternatively use S3 predict function to run predict on the output of td_decision_tree_mle.
    predict_out <- predict(decision_tree_out,
                      newdata = iris_attribute_test,
                      newdata.partition.column = c("pid"),
                      newdata.order.column = c("attribute"),
                      attr.table.groupby.columns = c("attribute"),
                      attr.table.pid.columns = c("pid"),
                      attr.table.val.column = "attrvalue",
                      accumulate = c("attrvalue")
                      )