Using PMMLPredict to Score using Externally Trained Models | teradataml - Using PMMLPredict to Score using Externally Trained Models

Using PMMLPredict to Score using Externally Trained Models | teradataml - Using PMMLPredict to Score using Externally Trained Models - Teradata Package for Python

Teradata® Package for Python User Guide

Product

Teradata Package for Python

Release Number

17.00

Published

November 2021

Language

English (United States)

Last Update

2022-01-14

dita:mapPath

bol1585763678431.ditamap

dita:ditavalPath

ayr1485454803741.ditaval

dita:id

B700-4006

lifecycle

Product Category

Teradata Vantage

This example uses the iris_input dataset and performs a prediction on each row of the input table using a model previously trained in PMML and then loaded into the database.

Set up the environment.

Import required libraries.

import tempfile

import getpass

from teradataml import PMMLPredict, DataFrame, load_example_data, create_context, db_drop_table, remove_context, save_byom, delete_byom, retrieve_byom, list_byom

from teradataml.options.configure import configure

Create the connection to database.

con = create_context(host=getpass.getpass("Hostname: "),
                     username=getpass.getpass("Username: "),
                     password=getpass.getpass("Password: "))

Load example data.

load_example_data("byom", "iris_input")

iris_input = DataFrame("iris_input")

Create train dataset and test dataset.
1. Create two samples of input data.
  This step creates two samples of input data: sample 1 has 80% of total rows and sample 2 has 20% of total rows.
```
iris_sample = iris_input.sample(frac=[0.8, 0.2])
```
```
iris_sample
```
2. Create train dataset.
  This step creates train dataset from sample 1 by filtering on "sampleid" and drop "sampleid" column as it is not required for training model.
```
iris_train = iris_sample[iris_sample.sampleid == "1"].drop("sampleid", axis = 1)
```
```
iris_train
```
3. Create test dataset.
  This step creates test dataset from sample 2 by filtering on "sampleid" and drop "sampleid" column as it is not required for scoring.
```
iris_test = iris_sample[iris_sample.sampleid == "2"].drop("sampleid", axis = 1)
```
```
iris_test
```

Train the Random Forest model and perform the Prediction using PMMLPredict().

Import required libraries.

import numpy as np

from sklearn import tree

from nyoka import skl_to_pmml

from sklearn.pipeline import Pipeline

from sklearn_pandas import DataFrameMapper

from sklearn.impute import SimpleImputer

from sklearn.preprocessing import StandardScaler

from sklearn.ensemble import RandomForestClassifier

Prepare dataset to create a Random Forest model.

# features : Training data.
# target : Training targets.
traid_pd = iris_train.to_pandas()
features = traid_pd.columns.drop('species')
target = 'species'

Generate the Random Forest model.

imputer = SimpleImputer(missing_values=np.nan, strategy='mean')

rf_pipe_obj = Pipeline([
    ("mapping", DataFrameMapper([
    (['sepal_length', 'sepal_width'], StandardScaler()) ,
    (['petal_length', 'petal_width'], imputer)
    ])),
    ("rfc", RandomForestClassifier(n_estimators = 100))
])

rf_pipe_obj.fit(traid_pd[features], traid_pd[target])

Save the model in PMML format.

temp_dir = tempfile.TemporaryDirectory()

model_file_path = f"{temp_dir.name}/iris_rf_class_model.pmml"

skl_to_pmml(rf_pipe_obj, features, target, model_file_path)

Save the model in Vantage.

# Save the PMML Model in Vantage.
save_byom("pmml_random_forest_iris", model_file_path, "byom_models")

List the model from Vantage.

# List the PMML Model in Vantage.
list_byom("byom_models")

Retrieve the model from Vantage.

# Retrieve the model from table "byom_models", using the model id 'pmml_random_forest_iris'.
modeldata = retrieve_byom("pmml_random_forest_iris", "byom_models")

Score the test data using PMMLPredict function with the retrieved model.

# Perform prediction using PMMLPredict().
result = PMMLPredict(
                    modeldata = modeldata,
                    newdata = iris_test,
                    accumulate = ['id', 'sepal_length', 'petal_length'],
                    overwrite_cached_models = '*',
                    )

Print the equivalent SQL query and Score result.

# Print the query.
print(result.show_query())

# Print the result.
result.result

Clean up.

# Delete the model from table "byom_models", using the model id 'pmml_random_forest_iris'.
delete_byom("pmml_random_forest_iris", "byom_models")

# Drop models table.
db_drop_table("byom_models")

# Drop input data tables.
db_drop_table("iris_input")

# One must run remove_context() to close the connection and garbage collect internally generated objects.
remove_context()