Using ONNXPredict to Score using Externally Trained Models | teradataml - Using ONNXPredict to Score using Externally Trained Models - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
March 2024
Language
English (United States)
Last Update
2024-04-09
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

This example uses the iris_input dataset and performs a prediction on each row of the input table using a model trained in ONNX format and then loaded into database.

  1. Set up the environment.
    1. Import required libraries.
      import tempfile
      import getpass
      from teradataml import DataFrame, load_example_data, create_context, db_drop_table, remove_context, save_byom, delete_byom, retrieve_byom, list_byom
      from teradataml.options.configure import configure
    2. Create the connection to database.
      con = create_context(host=getpass.getpass("Hostname: "),
                           username=getpass.getpass("Username: "),
                           password=getpass.getpass("Password: "))
    3. Load example data.
      load_example_data("byom", "iris_input")
      iris_input = DataFrame("iris_input")
  2. Create train dataset and test dataset.
    1. Create two samples of input data.
      This step creates two samples of input data: sample 1 has 80% of total rows and sample 2 has 20% of total rows.
      iris_sample = iris_input.sample(frac=[0.8, 0.2])
      iris_sample
    2. Create train dataset.
      This step creates train dataset from sample 1 by filtering on "sampleid" and dropping "sampleid" column as it is not required for training model.
      iris_train = iris_sample[iris_sample.sampleid == "1"].drop("sampleid", axis = 1)
      iris_train
    3. Create test dataset.
      This step creates test dataset from sample 2 by filtering on "sampleid" and dropping "sampleid" column as it is not required for scoring.
      iris_test = iris_sample[iris_sample.sampleid == "2"].drop("sampleid", axis = 1)
      iris_test
  3. Train the Random Forest model and perform the Prediction using ONNXPredict().
    1. Import required libraries.
      from teradataml import ONNXPredict
      from sklearn.pipeline import Pipeline
      from sklearn.preprocessing import StandardScaler
      from sklearn.ensemble import RandomForestClassifier
    2. Prepare dataset for training Random Forest model.

      Convert teradataml dataframe to pandas dataframe.

      train_pd = iris_train.to_pandas()
      features = train_pd.columns.drop('species')
      target = 'species'
    3. Generate Random Forest model.
      rf_pipe_obj = Pipeline([
          ('scaler', StandardScaler()),
          ("rf", RandomForestClassifier(max_depth=5))
      ])
    4. Train the Random Forest model.
      rf_pipe_obj.fit(train_pd[features], train_pd[target])
    5. Save the model in ONNX format.
      1. Create temporary file path to save the model.
        temp_dir = tempfile.TemporaryDirectory()
        model_file_path = f"{temp_dir.name}/iris_db_rf_model.onnx"
        
      2. Convert and save the Random Forest model in ONNX format.
        onx = to_onnx(rf_pipe_obj, train_pd.iloc[:,:4].astype(np.float32))
        
        with open(model_file_path, "wb") as f:
            f.write(onx.SerializeToString())
    6. Save the model in Vantage.
      save_byom("onnx_rf_iris", model_file_path, "byom_models")
    7. List the model in Vantage.
      list_byom("byom_models")
    8. Retrieve the model from table "byom_models", using the model id 'onnx_rf_iris'.
      modeldata = retrieve_byom("onnx_rf_iris", "byom_models")
    9. Set "configure.byom_install_location" to the database where BYOM functions are installed.
      configure.byom_install_location = getpass.getpass("byom_install_location: ")
    10. Perform prediction using ONNXPredict() function and the ONNX model stored in Vantage.
      predict_output = ONNXPredict(
                                  modeldata = modeldata,
                                  newdata = iris_test,
                                  accumulate = ['id', 'sepal_length', 'petal_length'],
                                  overwrite_cached_models = '*',
                                  model_output_fields = "output_label"
                                  )
    11. Print the equivalent SQL query and Score result.
      print(result.show_query())
      result.result
  4. Clean up.
    1. Delete the model from table "byom_models".
      delete_byom("onnx_rf_iris", "byom_models")
    2. Delete models table.
      db_drop_table("byom_models")
    3. Drop input data table.
      db_drop_table("iris_input")
    4. Remove context.
      remove_context()