Teradata Package for R Function Reference | 17.00 - 17.00 - td_pmml_predict_sqle - Teradata Package for R

Teradata® Package for R Function Reference

Product
Teradata Package for R
Release Number
17.00
Release Date
July 2021
Content Type
Programming Reference
Publication ID
B700-4007-090K
Language
English (United States)

Description

This function is used to score data in Vantage with a model that has been created outside Vantage and exported to vantage using PMML format.

Usage

  td_pmml_predict_sqle (
      modeldata = NULL,
      newdata = NULL,
      accumulate = NULL,
      model.output.fields = NULL,
      overwrite.cached.models = NULL,
      newdata.partition.column = "ANY",
      newdata.order.column = NULL,
      modeldata.order.column = NULL
  )

Arguments

modeldata

Required Argument.
Specifies the model tbl_teradata to be used for scoring.

modeldata.order.column

Optional Argument.
Specifies Order By columns for "modeldata".
Values to this argument can be provided as a vector, if multiple columns are used for ordering.
Types: character OR vector of Strings (character)

newdata

Required Argument.
Specifies the input tbl_teradata that contains the data to be scored.

newdata.partition.column

Optional Argument.
Specifies Partition By columns for "newdata".
Values to this argument can be provided as a vector, if multiple columns are used for partition.
Default Value: ANY
Types: character OR vector of Strings (character)

newdata.order.column

Optional Argument.
Specifies Order By columns for "newdata".
Values to this argument can be provided as a vector, if multiple columns are used for ordering.
Types: character OR vector of Strings (character)

accumulate

Required Argument.
Specifies the names of "newdata" columns to copy to the output tbl_teradata.
Types: character OR vector of Strings (character)

model.output.fields

Optional Argument.
Specifies the columns of the json output that the user wants to specify as individual columns instead of the entire json report.
Types: character OR vector of characters

overwrite.cached.models

Optional Argument.
Specifies the model name that needs to be removed from the cache. Use * to remove all cached models.
Types: character OR vector of characters

Value

Function returns an object of class "td_pmml_predict_sqle" which is a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator using the name: result.

Examples

    # Get the current context/connection.
    con <- td_get_context()$connection
    
    # Create following table on vantage. 
    crt_tbl <- "CREATE SET TABLE pmml_models(model_id VARCHAR(40), model BLOB) 
                PRIMARY INDEX (model_id);"
    DBI::dbExecute(con, sql(crt_tbl))
    
    # Run the following query through BTEQ or Teradata Studio to load the 
    # models. 'load_pmml_model.txt' and pmml files can be found under 
    # 'inst/scripts' in tdplyr installation directory. This file and the pmml 
    # models to be loaded should be in the same directory.  
    
    # .import vartext file load_pmml_model.txt
    # .repeat *
    # USING (c1 VARCHAR(40), c2 BLOB AS DEFERRED BY NAME) INSERT INTO pmml_models(:c1, :c2);
    
    # Load example data.
    loadExampleData("pmmlpredict_example", "iris_train", "iris_test")
    
    # Create object(s) of class "tbl_teradata".
    iris_train <- tbl(con, "iris_train")
    iris_test <- tbl(con, "iris_test")
    
    # Example 1 - 
    # This example runs a query with XGBoost model with no prediction values.
    # It also uses "overwrite.cached.models" argument.
    modeldata <- tbl(con, "pmml_models") 
    ml_name <- "iris_db_xgb_model"  
    pmml_predict_out <- td_pmml_predict(modeldata = modeldata, 
                                            newdata = iris_test, 
                                            accumulate = "id", 
                                            overwrite.cached.models = ml_name)
    
    # Example 2 - 
    # This example runs a query with RandomForest model with prediction values.
    # It also used "model.output.fields" argument.
    modeldata <- tbl(con, "pmml_models") 
    ml_op_field <- c('probability_0', 'probability_1', 'probability_2')
    pmml_predict_out <- td_pmml_predict(modeldata = modeldata, 
                                            newdata = iris_test, 
                                            accumulate = "id", 
                                            model.output.fields = ml_op_field)
    
    # Example 3 - 
    # This example runs a query with XGBoost model and 
    # "overwrite.cached.models". This will erase entire cache.
    modeldata <- tbl(con, "pmml_models") 
    pmml_predict_out <- td_pmml_predict(modeldata = modeldata, 
                                            newdata = iris_test, 
                                            accumulate = "id", 
                                            overwrite.cached.models = "*")