Using PMMLPredict to Score using Externally Trained Models | teradataml - Using PMMLPredict to Score using Externally Trained Models - Teradata Package for Python

Teradata® Package for Python User Guide

Product
Teradata Package for Python
Release Number
17.00
Published
November 2021
Language
English (United States)
Last Update
2022-01-14
dita:mapPath
bol1585763678431.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
B700-4006
lifecycle
previous
Product Category
Teradata Vantage

This example uses the iris_input dataset and performs a prediction on each row of the input table using a model previously trained in PMML and then loaded into the database.

  1. Set up the environment.
    1. Import required libraries.
      import tempfile
      import getpass
      from teradataml import PMMLPredict, DataFrame, load_example_data, create_context, db_drop_table, remove_context, save_byom, delete_byom, retrieve_byom, list_byom
      from teradataml.options.configure import configure
    2. Create the connection to database.
      con = create_context(host=getpass.getpass("Hostname: "),
                           username=getpass.getpass("Username: "),
                           password=getpass.getpass("Password: "))
    3. Load example data.
      load_example_data("byom", "iris_input")
      iris_input = DataFrame("iris_input")
  2. Create train dataset and test dataset.
    1. Create two samples of input data.
      This step creates two samples of input data: sample 1 has 80% of total rows and sample 2 has 20% of total rows.
      iris_sample = iris_input.sample(frac=[0.8, 0.2])
      iris_sample
    2. Create train dataset.
      This step creates train dataset from sample 1 by filtering on "sampleid" and drop "sampleid" column as it is not required for training model.
      iris_train = iris_sample[iris_sample.sampleid == "1"].drop("sampleid", axis = 1)
      iris_train
    3. Create test dataset.
      This step creates test dataset from sample 2 by filtering on "sampleid" and drop "sampleid" column as it is not required for scoring.
      iris_test = iris_sample[iris_sample.sampleid == "2"].drop("sampleid", axis = 1)
      iris_test
  3. Train the Random Forest model and perform the Prediction using PMMLPredict().
    1. Import required libraries.
      import numpy as np
      from sklearn import tree
      from nyoka import skl_to_pmml
      from sklearn.pipeline import Pipeline
      from sklearn_pandas import DataFrameMapper
      from sklearn.impute import SimpleImputer
      from sklearn.preprocessing import StandardScaler
      
      from sklearn.ensemble import RandomForestClassifier
    2. Prepare dataset to create a Random Forest model.
      # features : Training data.
      # target : Training targets.
      traid_pd = iris_train.to_pandas()
      features = traid_pd.columns.drop('species')
      target = 'species'
    3. Generate the Random Forest model.
      imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
      rf_pipe_obj = Pipeline([
          ("mapping", DataFrameMapper([
          (['sepal_length', 'sepal_width'], StandardScaler()) ,
          (['petal_length', 'petal_width'], imputer)
          ])),
          ("rfc", RandomForestClassifier(n_estimators = 100))
      ])
      rf_pipe_obj.fit(traid_pd[features], traid_pd[target])
    4. Save the model in PMML format.
      temp_dir = tempfile.TemporaryDirectory()
      model_file_path = f"{temp_dir.name}/iris_rf_class_model.pmml"
      skl_to_pmml(rf_pipe_obj, features, target, model_file_path)
    5. Save the model in Vantage.
      # Save the PMML Model in Vantage.
      save_byom("pmml_random_forest_iris", model_file_path, "byom_models")
    6. List the model from Vantage.
      # List the PMML Model in Vantage.
      list_byom("byom_models")
    7. Retrieve the model from Vantage.
      # Retrieve the model from table "byom_models", using the model id 'pmml_random_forest_iris'.
      modeldata = retrieve_byom("pmml_random_forest_iris", "byom_models")
    8. Score the test data using PMMLPredict function with the retrieved model.
      # Perform prediction using PMMLPredict().
      result = PMMLPredict(
                          modeldata = modeldata,
                          newdata = iris_test,
                          accumulate = ['id', 'sepal_length', 'petal_length'],
                          overwrite_cached_models = '*',
                          )
    9. Print the equivalent SQL query and Score result.
      # Print the query.
      print(result.show_query())
      # Print the result.
      result.result
  4. Clean up.
    # Delete the model from table "byom_models", using the model id 'pmml_random_forest_iris'.
    delete_byom("pmml_random_forest_iris", "byom_models")
    # Drop models table.
    db_drop_table("byom_models")
    # Drop input data tables.
    db_drop_table("iris_input")
    # One must run remove_context() to close the connection and garbage collect internally generated objects.
    remove_context()