Using ONNXPredict to Score using Externally Trained Models | teradataml - Using ONNXPredict to Score using Externally Trained Models - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

This example uses the iris_input dataset and performs a prediction on each row of the input table using a model trained in ONNX format and then loaded into database.

  1. Set up the environment.
    1. Import required libraries.
      import tempfile
      import getpass
      from teradataml import DataFrame, load_example_data, create_context, db_drop_table, remove_context, save_byom, delete_byom, retrieve_byom, list_byom
      from teradataml.options.configure import configure
    2. Create the connection to database.
      con = create_context(host=getpass.getpass("Hostname: "),
                           username=getpass.getpass("Username: "),
                           password=getpass.getpass("Password: "))
    3. Load example data.
      load_example_data("byom", "iris_input")
      iris_input = DataFrame("iris_input")
  2. Create train dataset and test dataset.
    1. Create two samples of input data.
      This step creates two samples of input data: sample 1 has 80% of total rows and sample 2 has 20% of total rows.
      iris_sample = iris_input.sample(frac=[0.8, 0.2])
      iris_sample
    2. Create train dataset.
      This step creates train dataset from sample 1 by filtering on "sampleid" and dropping "sampleid" column as it is not required for training model.
      iris_train = iris_sample[iris_sample.sampleid == "1"].drop("sampleid", axis = 1)
      iris_train
    3. Create test dataset.
      This step creates test dataset from sample 2 by filtering on "sampleid" and dropping "sampleid" column as it is not required for scoring.
      iris_test = iris_sample[iris_sample.sampleid == "2"].drop("sampleid", axis = 1)
      iris_test
  3. Train the Random Forest model and perform the Prediction using ONNXPredict().
    1. Import required libraries.
      from teradataml import ONNXPredict
      from sklearn.pipeline import Pipeline
      from sklearn.preprocessing import StandardScaler
      from sklearn.ensemble import RandomForestClassifier
    2. Prepare dataset for training Random Forest model.

      Convert teradataml dataframe to pandas dataframe.

      train_pd = iris_train.to_pandas()
      features = train_pd.columns.drop('species')
      target = 'species'
    3. Generate Random Forest model.
      rf_pipe_obj = Pipeline([
          ('scaler', StandardScaler()),
          ("rf", RandomForestClassifier(max_depth=5))
      ])
    4. Train the Random Forest model.
      rf_pipe_obj.fit(train_pd[features], train_pd[target])
    5. Save the model in ONNX format.
      1. Create temporary file path to save the model.
        temp_dir = tempfile.TemporaryDirectory()
        model_file_path = f"{temp_dir.name}/iris_db_rf_model.onnx"
        
      2. Convert and save the Random Forest model in ONNX format.
        onx = to_onnx(rf_pipe_obj, train_pd.iloc[:,:4].astype(np.float32))
        
        with open(model_file_path, "wb") as f:
            f.write(onx.SerializeToString())
    6. Save the model in Vantage.
      save_byom("onnx_rf_iris", model_file_path, "byom_models")
    7. List the model in Vantage.
      list_byom("byom_models")
    8. Retrieve the model from table "byom_models", using the model id 'onnx_rf_iris'.
      modeldata = retrieve_byom("onnx_rf_iris", "byom_models")
    9. Set "configure.byom_install_location" to the database where BYOM functions are installed.
      configure.byom_install_location = getpass.getpass("byom_install_location: ")
    10. Perform prediction using ONNXPredict() function and the ONNX model stored in Vantage.
      predict_output = ONNXPredict(
                                  modeldata = modeldata,
                                  newdata = iris_test,
                                  accumulate = ['id', 'sepal_length', 'petal_length'],
                                  overwrite_cached_models = '*',
                                  model_output_fields = "output_label"
                                  )
    11. Print the equivalent SQL query and Score result.
      print(result.show_query())
      result.result
  4. Clean up.
    1. Delete the model from table "byom_models".
      delete_byom("onnx_rf_iris", "byom_models")
    2. Delete models table.
      db_drop_table("byom_models")
    3. Drop input data table.
      db_drop_table("iris_input")
    4. Remove context.
      remove_context()