This example uses the iris_input dataset and performs a prediction on each row of the input table using a model trained in ONNX format and then loaded into database.
- Set up the environment.
- Import required libraries.
import tempfile
import getpass
from teradataml import DataFrame, load_example_data, create_context, db_drop_table, remove_context, save_byom, delete_byom, retrieve_byom, list_byom
from teradataml.options.configure import configure
- Create the connection to database.
con = create_context(host=getpass.getpass("Hostname: "), username=getpass.getpass("Username: "), password=getpass.getpass("Password: "))
- Load example data.
load_example_data("byom", "iris_input")
iris_input = DataFrame("iris_input")
- Import required libraries.
- Create train dataset and test dataset.
- Create two samples of input data.This step creates two samples of input data: sample 1 has 80% of total rows and sample 2 has 20% of total rows.
iris_sample = iris_input.sample(frac=[0.8, 0.2])
iris_sample
- Create train dataset.This step creates train dataset from sample 1 by filtering on "sampleid" and dropping "sampleid" column as it is not required for training model.
iris_train = iris_sample[iris_sample.sampleid == "1"].drop("sampleid", axis = 1)
iris_train
- Create test dataset.This step creates test dataset from sample 2 by filtering on "sampleid" and dropping "sampleid" column as it is not required for scoring.
iris_test = iris_sample[iris_sample.sampleid == "2"].drop("sampleid", axis = 1)
iris_test
- Create two samples of input data.
- Train the Random Forest model and perform the Prediction using ONNXPredict().
- Import required libraries.
from teradataml import ONNXPredict
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
- Prepare dataset for training Random Forest model.
Convert teradataml dataframe to pandas dataframe.
train_pd = iris_train.to_pandas() features = train_pd.columns.drop('species') target = 'species'
- Generate Random Forest model.
rf_pipe_obj = Pipeline([ ('scaler', StandardScaler()), ("rf", RandomForestClassifier(max_depth=5)) ])
- Train the Random Forest model.
rf_pipe_obj.fit(train_pd[features], train_pd[target])
- Save the model in ONNX format.
- Create temporary file path to save the model.
temp_dir = tempfile.TemporaryDirectory()
model_file_path = f"{temp_dir.name}/iris_db_rf_model.onnx"
- Convert and save the Random Forest model in ONNX format.
onx = to_onnx(rf_pipe_obj, train_pd.iloc[:,:4].astype(np.float32))
with open(model_file_path, "wb") as f: f.write(onx.SerializeToString())
- Create temporary file path to save the model.
- Save the model in Vantage.
save_byom("onnx_rf_iris", model_file_path, "byom_models")
- List the model in Vantage.
list_byom("byom_models")
- Retrieve the model from table "byom_models", using the model id 'onnx_rf_iris'.
modeldata = retrieve_byom("onnx_rf_iris", "byom_models")
- Set "configure.byom_install_location" to the database where BYOM functions are installed.
configure.byom_install_location = getpass.getpass("byom_install_location: ")
- Perform prediction using ONNXPredict() function and the ONNX model stored in Vantage.
predict_output = ONNXPredict( modeldata = modeldata, newdata = iris_test, accumulate = ['id', 'sepal_length', 'petal_length'], overwrite_cached_models = '*', model_output_fields = "output_label" )
- Print the equivalent SQL query and Score result.
print(result.show_query())
result.result
- Import required libraries.
- Clean up.
- Delete the model from table "byom_models".
delete_byom("onnx_rf_iris", "byom_models")
- Delete models table.
db_drop_table("byom_models")
- Drop input data table.
db_drop_table("iris_input")
- Remove context.
remove_context()
- Delete the model from table "byom_models".