Using Open Analytics to score using externally trained models using APPLY - Using Open Analytics to Score using Externally Trained Models using Apply

Using Open Analytics to score using externally trained models using APPLY - Using Open Analytics to Score using Externally Trained Models using Apply - Teradata Vantage

Teradata® VantageCloud Lake

Deployment

VantageCloud

Edition

Lake

Product

Teradata Vantage

Published

January 2023

Language

English (United States)

Last Update

2024-04-03

dita:mapPath

phg1621910019905.ditamap

dita:ditavalPath

pny1626732985837.ditaval

dita:id

phg1621910019905

This example uses Open Analytics to score using externally trained models using Apply.

This example works only on VantageCloud Lake.

Set up the environment.

Import required libraries.

from teradataml import create_context, remove_context, list_base_envs, list_user_envs, create_env, remove_env, get_env, DataFrame, copy_to_sql, Apply, configure, read_csv, set_config_params

from teradataml.options.display import display

import pandas as pd, getpass, os

from collections import OrderedDict

from teradatasqlalchemy.types import BIGINT, VARCHAR, INTEGER, FLOAT

Set Authentication token and UES URL.

set_config_params(ues_url=getpass.getpass("UES URL: "),
                  auth_token=getpass.getpass("JWT Token: "))

Create the connection.

con = create_context(host=getpass.getpass("Hostname: "),
                     username=getpass.getpass("Username: "),
                     password=getpass.getpass("Password: "))

You can use the same JWT token instead of password to create a context. See create_context for more details.

Generate model.

Import required libraries.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

Read the data from the scikit-learn package.

iris = load_iris()
X, y = iris.data, iris.target

Train a model with Random Forests.

X_train, X_test, y_train, y_test = train_test_split(X, y)
clr = RandomForestClassifier()
clr.fit(X_train, y_train)

Convert the model into ONNX format. Generate ONNX model file "rf_iris.onnx".

from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType

initial_type = [('float_input', FloatTensorType([None, 4]))]
onx = convert_sklearn(clr, initial_types = initial_type)
with open("rf_iris.onnx", "wb") as f:
    f.write(onx.SerializeToString())

print("RF model trained and saved in 'rf_iris.onnx'.")

Load test data into VantageCloud Lake and create teradataml dataframe for the input table.

dfIn = pd.DataFrame(X_test, columns=["sepal_length", "sepal_width", "petal_length", "petal_width"])
copy_to_sql(dfIn, table_name = 'onnx_test_table_dataset', if_exists = 'replace')

onnx_test_data = DataFrame.from_table("onnx_test_table_dataset")
onnx_test_data.head(n=5)

Create a python file to score the model.

Create a file with the name 'sklearn_onnx_scoring.py' in local client with following code.

# Train a model.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
import csv
import sys
 
 
# Read input data from stdin into a dataframe.
_reader = csv.DictReader(sys.stdin.readlines(), fieldnames = ["sepal_length","sepal_width","petal_length","petal_width"])
data=pd.DataFrame(_reader, columns = ["sepal_length","sepal_width","petal_length","petal_width"])
 
# For AMPs that receive no data, exit the script instance gracefully.
if data.empty:
    sys.exit()
 
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y)
clr = RandomForestClassifier()
clr.fit(X_train, y_train)
 
# Compute the prediction with ONNX Runtime
import onnxruntime as rt
import numpy
sess = rt.InferenceSession("rf_iris.onnx")
input_name = sess.get_inputs()[0].name
label_name = sess.get_outputs()[0].name
pred_onx = sess.run([label_name], {input_name: data.values.astype(numpy.float32)})[0]
 
listToStr = ' '.join([str(elem) for elem in pred_onx])
   
print(listToStr)

Create Environment and install the corresponding files in the environment.
1. List the base Python environments.
```
list_base_envs()
```
  Assume a new Python environment is needed.
2. Create a new Python user environment for Python 3.8.13.
  Function create_env() will return an object of 'UserEnv'.
```
demo_env = create_env(env_name = 'oaf_usecase_2c_env',
                      base_env = 'python_3.8.13',
                      desc = 'OAF Demo Use Case 2c Environment')
```
3. Verify the new environment has been created.
```
list_user_envs()
```
4. Install necessary Python add-ons synchronously, for ues by the script in the user environment using an object 'demo_env' of class "UserEnv".
```
demo_env.install_lib(["skl2onnx", "sklearn", "onnxruntime", "pandas"])
```
5. Verify the Python libraries have been installed correctly.
```
demo_env.libs
```
6. Install the model file and Python file to score the data inside VantageCloud Lake.
```
demo_env.install_file(file_path = 'rf_iris.onnx', replace = True)
demo_env.install_file(file_path = 'sklearn_onnx_scoring.py', replace = True)
```
7. Verify the files have been installed correctly.
```
demo_env.files
```

Score the data inside VantageCloud Lake.

Use Apply to create an object for the Random Forest based prediction.

applyRF_obj = Apply(data = onnx_test_data,
                    apply_command = 'python3 sklearn_onnx_scoring.py',
                    returns = {"Predicted_Class_RF": VARCHAR(200)},
                    env_name = demo_env
                   )

Run the Python script inside the remote user environment.
```
applyRF_obj.execute_script()
```
You can display the underlying SQL by setting 'display.print_sqlmr_query = True'.

Remove the environment and disconnect from VantageCloud Lake.
1. After scoring the data, remove the environment.
```
remove_env('oaf_usecase_2c_env')
```
2. Verify the specified environment has been removed.
```
list_user_envs()
```
3. Disconnect from VantageCloud Lake.
```
remove_context()
```