H2OPredict | Supported External Model Types | Teradata Package for Python - H2OPredict - Teradata Package for Python

Teradata® Package for Python User Guide

Product
Teradata Package for Python
Release Number
17.00
Published
November 2021
Language
English (United States)
Last Update
2022-01-14
dita:mapPath
bol1585763678431.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
B700-4006
lifecycle
previous
Product Category
Teradata Vantage

H2OPredict performs a prediction on each row of the input table using a model previously trained in H2O and then loaded into the database. The model uses an interchange format called MOJO and it is loaded as a blob to a table in Teradata database by the user.

The following are examples of H2OPredict() function call.

Example Setup

  • Import necessary modules.
    >>> import os, teradataml
    >>> from teradataml.options.configure import configure
    >>> from teradataml import H2Oredict, DataFrame, load_example_data, save_byom, retrieve_byom
  • Load example data.
    >>> load_example_data("byom", "iris_input")
  • Create teradataml DataFrame object.
    >>> iris_test = DataFrame("iris_input")
  • Set install location of the BYOM functions.
    >>> # Set install location of BYOM functions.
    >>> configure.byom_install_location = "mldb"

Example 1: Run a query with GLM model and overwrite cached models

The query also includes arguments model_type, enable_options and model_output_fields.

  • Load model file into Vantage.
    >>> model_file = os.path.join(os.path.dirname(teradataml.__file__), "data", "models", "iris_mojo_glm_h2o_model")
  • Save the model.
    >>> save_byom("iris_mojo_glm_h2o_model", model_file, "byom_models")
  • Retrieve the model.
    >>> modeldata = retrieve_byom("iris_mojo_glm_h2o_model", table_name="byom_models")
  • Pass the output of the retrieve_model API as an input to the PMMLPredict function to score data.
    >>> result = H2OPredict(newdata=iris_test,
                            newdata_partition_column='id',
                            newdata_order_column='id',
                            modeldata=modeldata,
                            modeldata_order_column='model_id',
                            model_output_fields=['label', 'classProbabilities'],
                            accumulate=['id', 'sepal_length', 'petal_length'],
                            overwrite_cached_models='*',
                            enable_options='stageProbabilities',
                            model_type='OpenSource')

Example 2: Run a query with XGBoost model and overwrite cached models

The query also includes arguments model_type, enable_options and model_output_fields.

  • Load model file into Vantage.
    >>> model_file = os.path.join(os.path.dirname(teradataml.__file__), "data", "models", "iris_mojo_xgb_h2o_model")
  • Save the model.
    >>> save_byom("iris_mojo_xgb_h2o_model", model_file, "byom_models")
  • Retrieve the model.
    >>> modeldata = retrieve_byom("iris_mojo_xgb_h2o_model", table_name="byom_models")
  • Pass the output of the retrieve_model API as an input to the PMMLPredict function to score data.
    >>> result = H2OPredict(newdata=iris_test,
                            newdata_partition_column='id',
                            newdata_order_column='id',
                            modeldata=modeldata,
                            modeldata_order_column='model_id',
                            model_output_fields=['label', 'classProbabilities'],
                            accumulate=['id', 'sepal_length', 'petal_length'],
                            overwrite_cached_models='*',
                            enable_options='stageProbabilities',
                            model_type='OpenSource')