PMML is the most popular standard serialization format for exchange of Machine Learning models. Most customers train their models in tools external to Vantage, such as Scikit-learn. Vantage Analytics enables customers to bring their models to Vantage (by inserting the model as a blob into a table) and apply them to data stored in SQL Engine for scoring. Users can use these external models for scoring through teradataml by using PMMLPredict() function.
The following are examples of PMMLPredict() function call.
Example Setup
- Import necessary modules.
>>> import os, teradataml
>>> from teradataml.options.configure import configure
>>> from teradataml import PMMLPredict, DataFrame, load_example_data, save_byom, retrieve_byom
- Load example data.
>>> load_example_data("byom", "iris_input")
- Create teradataml DataFrame object.
>>> iris_test = DataFrame("iris_input")
- Set install location of the BYOM functions.
>>> configure.byom_install_location = "mldb"
Example 1: Run a query with GLM model and overwrite cached models
- Load model file into Vantage.
>>> model_file = os.path.join(os.path.dirname(teradataml.__file__), "data", "models", "iris_db_glm_model.pmml")
- Save the model.
>>> save_byom("iris_db_glm_model", model_file, "byom_models")
- Retrieve the model.
>>> modeldata = retrieve_byom("iris_db_glm_model", table_name="byom_models")
- Pass the output of the retrieve_model API as an input to the PMMLPredict function to score data.
>>> result = PMMLPredict(modeldata = modeldata, newdata = iris_test, accumulate = ['id', 'sepal_length', 'petal_length'], overwrite_cached_models = '*')
Example 2: Run a query with XGBoost model and overwrite cached models
- Load model file into Vantage.
>>> model_file = os.path.join(os.path.dirname(teradataml.__file__), "data", "models", "iris_db_xgb_model.pmml")
- Save the model.
>>> save_byom("iris_db_xgb_model", model_file, "byom_models")
- Retrieve the model.
>>> modeldata = retrieve_byom("iris_db_xgb_model", table_name="byom_models")
- Pass the output of the retrieve_model API as an input to the PMMLPredict function to score data.
>>> result = PMMLPredict(modeldata = modeldata, newdata = iris_test, accumulate = ['id', 'sepal_length', 'petal_length'], overwrite_cached_models = '*')