H2OPredict | Supported External Model Types | Teradata Package for Python - H2OPredict

H2OPredict | Supported External Model Types | Teradata Package for Python - H2OPredict - Teradata Package for Python

Teradata® Package for Python User Guide

Product

Teradata Package for Python

Release Number

17.00

Published

November 2021

Language

English (United States)

Last Update

2022-01-14

dita:mapPath

bol1585763678431.ditamap

dita:ditavalPath

ayr1485454803741.ditaval

dita:id

B700-4006

lifecycle

Product Category

Teradata Vantage

H2OPredict performs a prediction on each row of the input table using a model previously trained in H2O and then loaded into the database. The model uses an interchange format called MOJO and it is loaded as a blob to a table in Teradata database by the user.

The following are examples of H2OPredict() function call.

Example Setup

Import necessary modules.

>>> import os, teradataml

>>> from teradataml.options.configure import configure

>>> from teradataml import H2Oredict, DataFrame, load_example_data, save_byom, retrieve_byom

Load example data.

>>> load_example_data("byom", "iris_input")

Create teradataml DataFrame object.
```
>>> iris_test = DataFrame("iris_input")
```

Set install location of the BYOM functions.

>>> # Set install location of BYOM functions.
>>> configure.byom_install_location = "mldb"

Example 1: Run a query with GLM model and overwrite cached models

The query also includes arguments model_type, enable_options and model_output_fields.

Load model file into Vantage.

>>> model_file = os.path.join(os.path.dirname(teradataml.__file__), "data", "models", "iris_mojo_glm_h2o_model")

Save the model.

>>> save_byom("iris_mojo_glm_h2o_model", model_file, "byom_models")

Retrieve the model.

>>> modeldata = retrieve_byom("iris_mojo_glm_h2o_model", table_name="byom_models")

Pass the output of the retrieve_model API as an input to the PMMLPredict function to score data.

>>> result = H2OPredict(newdata=iris_test,
                        newdata_partition_column='id',
                        newdata_order_column='id',
                        modeldata=modeldata,
                        modeldata_order_column='model_id',
                        model_output_fields=['label', 'classProbabilities'],
                        accumulate=['id', 'sepal_length', 'petal_length'],
                        overwrite_cached_models='*',
                        enable_options='stageProbabilities',
                        model_type='OpenSource')

Example 2: Run a query with XGBoost model and overwrite cached models

The query also includes arguments model_type, enable_options and model_output_fields.

Load model file into Vantage.

>>> model_file = os.path.join(os.path.dirname(teradataml.__file__), "data", "models", "iris_mojo_xgb_h2o_model")

Save the model.

>>> save_byom("iris_mojo_xgb_h2o_model", model_file, "byom_models")

Retrieve the model.

>>> modeldata = retrieve_byom("iris_mojo_xgb_h2o_model", table_name="byom_models")

Pass the output of the retrieve_model API as an input to the PMMLPredict function to score data.

>>> result = H2OPredict(newdata=iris_test,
                        newdata_partition_column='id',
                        newdata_order_column='id',
                        modeldata=modeldata,
                        modeldata_order_column='model_id',
                        model_output_fields=['label', 'classProbabilities'],
                        accumulate=['id', 'sepal_length', 'petal_length'],
                        overwrite_cached_models='*',
                        enable_options='stageProbabilities',
                        model_type='OpenSource')