This example uses Open Analytics to score using externally trained models using Apply.
This example works only on VantageCloud Lake.
- Set up the environment.
- Import required libraries.
from teradataml import create_context, remove_context, list_base_envs, list_user_envs, create_env, remove_env, get_env, DataFrame, copy_to_sql, Apply, configure, read_csv, set_config_params
from teradataml.options.display import display
import pandas as pd, getpass, os
from collections import OrderedDict
from teradatasqlalchemy.types import BIGINT, VARCHAR, INTEGER, FLOAT
- Set Authentication token and UES URL.
set_config_params(ues_url=getpass.getpass("UES URL: "), auth_token=getpass.getpass("JWT Token: "))
- Create the connection.
con = create_context(host=getpass.getpass("Hostname: "), username=getpass.getpass("Username: "), password=getpass.getpass("Password: "))
You can use the same JWT token instead of password to create a context. See create_context for more details.
- Import required libraries.
- Generate model.
- Import required libraries.
from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.linear_model import LogisticRegression
- Read the data from the scikit-learn package.
iris = load_iris() X, y = iris.data, iris.target
- Train a model with Random Forests.
X_train, X_test, y_train, y_test = train_test_split(X, y) clr = RandomForestClassifier() clr.fit(X_train, y_train)
- Convert the model into ONNX format. Generate ONNX model file
"rf_iris.onnx".
from skl2onnx import convert_sklearn from skl2onnx.common.data_types import FloatTensorType
initial_type = [('float_input', FloatTensorType([None, 4]))] onx = convert_sklearn(clr, initial_types = initial_type) with open("rf_iris.onnx", "wb") as f: f.write(onx.SerializeToString())
print("RF model trained and saved in 'rf_iris.onnx'.")
- Import required libraries.
- Load test data into VantageCloud Lake and create teradataml dataframe for the input table.
dfIn = pd.DataFrame(X_test, columns=["sepal_length", "sepal_width", "petal_length", "petal_width"]) copy_to_sql(dfIn, table_name = 'onnx_test_table_dataset', if_exists = 'replace')
onnx_test_data = DataFrame.from_table("onnx_test_table_dataset") onnx_test_data.head(n=5)
- Create a python file to score the model.Create a file with the name 'sklearn_onnx_scoring.py' in local client with following code.
# Train a model. from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier import pandas as pd import csv import sys # Read input data from stdin into a dataframe. _reader = csv.DictReader(sys.stdin.readlines(), fieldnames = ["sepal_length","sepal_width","petal_length","petal_width"]) data=pd.DataFrame(_reader, columns = ["sepal_length","sepal_width","petal_length","petal_width"]) # For AMPs that receive no data, exit the script instance gracefully. if data.empty: sys.exit() iris = load_iris() X, y = iris.data, iris.target X_train, X_test, y_train, y_test = train_test_split(X, y) clr = RandomForestClassifier() clr.fit(X_train, y_train) # Compute the prediction with ONNX Runtime import onnxruntime as rt import numpy sess = rt.InferenceSession("rf_iris.onnx") input_name = sess.get_inputs()[0].name label_name = sess.get_outputs()[0].name pred_onx = sess.run([label_name], {input_name: data.values.astype(numpy.float32)})[0] listToStr = ' '.join([str(elem) for elem in pred_onx]) print(listToStr)
- Create Environment and install the corresponding files in the environment.
- List the base Python environments.
list_base_envs()
Assume a new Python environment is needed. - Create a new Python user environment for Python 3.8.13.Function create_env() will return an object of 'UserEnv'.
demo_env = create_env(env_name = 'oaf_usecase_2c_env', base_env = 'python_3.8.13', desc = 'OAF Demo Use Case 2c Environment')
- Verify the new environment has been created.
list_user_envs()
- Install necessary Python add-ons synchronously, for ues by the script in the user environment using an object 'demo_env' of class "UserEnv".
demo_env.install_lib(["skl2onnx", "sklearn", "onnxruntime", "pandas"])
- Verify the Python libraries have been installed correctly.
demo_env.libs
- Install the model file and Python file to score the data inside VantageCloud Lake.
demo_env.install_file(file_path = 'rf_iris.onnx', replace = True) demo_env.install_file(file_path = 'sklearn_onnx_scoring.py', replace = True)
- Verify the files have been installed correctly.
demo_env.files
- List the base Python environments.
- Score the data inside VantageCloud Lake.
- Use Apply to create an object for the Random Forest based prediction.
applyRF_obj = Apply(data = onnx_test_data, apply_command = 'python3 sklearn_onnx_scoring.py', returns = {"Predicted_Class_RF": VARCHAR(200)}, env_name = demo_env )
- Run the Python script inside the remote user environment.
applyRF_obj.execute_script()
You can display the underlying SQL by setting 'display.print_sqlmr_query = True'.
- Use Apply to create an object for the Random Forest based prediction.
- Remove the environment and disconnect from VantageCloud Lake.
- After scoring the data, remove the environment.
remove_env('oaf_usecase_2c_env')
- Verify the specified environment has been removed.
list_user_envs()
- Disconnect from VantageCloud Lake.
remove_context()
- After scoring the data, remove the environment.