Scoring with XGBoost | BYOM | Teradata Package for R - Scoring with XGBoost - Teradata Package for R

Teradata® Package for R User Guide

Product
Teradata Package for R
Release Number
17.00
Published
July 2021
Language
English (United States)
Last Update
2023-08-08
dita:mapPath
yih1585763700215.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
B700-4005
Product Category
Teradata Vantage

This example performs scoring with XGBoost. This example assumes that the user is connected to another database where byom is not installed and runs a query with XGBoost model with no prediction values.

Set the global option in order to point to the database ("mldb" in this example) where byom is installed.

  1. Load the libraries.
    > library(pmml)
    > library(xgboost)
  2. Convert tbl_teradata to "data.frame".
    # Convert tbl_teradata to "data.frame".
    > iris_train_df <- as.data.frame(iris_train)
  3. Generate multinomial model using iris data.
    # Multinomial model using iris data.
    > xgb_model <- xgboost(data = as.matrix(iris_train_df[, 2:5]),
                           label = as.numeric(iris_train_df[, 6])-1,
                           max_depth = 2,
                           eta = 1,
                           nthread = 2,
                           nrounds = 2,
                           objective = "multi:softprob",
                           num_class = 3
                           )
    > xgb_model
  4. Convert the generated model to PMML format.
    1. Save the tree information in an external file.
      # Convert the generated model to "PMML" format.
      # Save the tree information in an external file.
      > xgb.dump(xgb_model, "xgb_model.dumped.trees")
    2. Convert the external file to PMML.
      # Convert to PMML.
      > train_data_colnames <- colnames(as.matrix(iris_train_df[, 2:5]))
      > xgb_model_pmml <- pmml(xgb_model,
                               input_feature_names = train_data_colnames,
                               output_label_name = "species",
                               output_categories = c(1, 2, 3),
                               xgb_dump_file = "xgb_model.dumped.trees"
                              )
    3. Save pmml file in the tdplyr installation directory.
      # pmml file will be saved in the tdplyr installation directory.
      > save_pmml(xgb_model_pmml, "xgb_model_tdplyr.pmml")
  5. Create a "pmml_models" table on Vantage and insert the pmml file in the BLOB column of the table.
    # User should create a 'pmml_models' table on Vantage and insert pmml file in the
    # BLOB column of the table.
    # Create following table on vantage.
    > crt_tbl <- "CREATE SET TABLE pmml_models(model_id VARCHAR(40), model BLOB)
                  PRIMARY INDEX (model_id);"
    > DBI::dbExecute(con, sql(crt_tbl))
  6. Create a file load_pmml_model.txt that has a model_id and a model file name.
    # Create a file load_pmml_model.txt that has a model_id and a model file name
    # entry such as:
    # xgb_model_tdplyr|xgb_model_tdplyr.pmml
    #
    # This file and the pmml models to be loaded should be in the same directory. 
  7. Load model.
    • Load model with BTEQ.
      # Loading model with BTEQ.
      # .import vartext file load_pmml_model.txt
      # .repeat *
      # USING (c1 VARCHAR(40), c2 BLOB AS DEFERRED BY NAME) INSERT INTO pmml_models(:c1, :c2);
    • Load model with TDStudio.
      # Loading model with TDStudio. The '?' will open a GUI to browse the PMML file
      # that needs to be updated.
      # insert into pmml_models values ('xgb_model_tdplyr', ?);
  8. Set global option to point to the database where byom is installed.
    # Set the global option in order to point to the database
    # (mldb in this case) where byom is installed.
    > options(byom.install.location="mldb")
  9. Load sample test dataset.
    # Load the sample test dataset.
    > loadExampleData("pmmlpredict_example", "iris_train", "iris_test")
  10. Create object of class "tbl_teradata" on the sample test dataset.
    # Create object of class "tbl_teradata" on this dataset.
    > iris_train <- tbl(con, "iris_train")
    > iris_test <- tbl(con, "iris_test")
  11. Select the pmml file to be used for scoring.
    # select the pmml file to be used for scoring.
    > modeldata <- tbl(con, "pmml_models") %>% filter(model_id=='iris_db_xgb_model')
  12. Perform scoring.
    # Perform scoring.
    # Run a query with XGBoost model with no prediction
    # values. It also uses "overwrite.cached.models" argument.
    > ml_name <- "iris_db_xgb_model"
    > pmml_predict_out <- td_pmml_predict(modeldata = modeldata,
                                          newdata = iris_test,
                                          accumulate = "id",
                                          overwrite.cached.models = ml_name)