H2OPredict | Supported External Model Types | Teradata Package for Python - H2OPredict - Teradata Package for Python

Teradata® Package for Python User Guide

Product
Teradata Package for Python
Release Number
17.10
Published
May 2022
Language
English (United States)
Last Update
2022-08-18
dita:mapPath
rsu1641592952675.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
B700-4006
lifecycle
previous
Product Category
Teradata Vantage

H2OPredict performs a prediction on each row of the input table using a model previously trained in H2O and then loaded into the database. The model uses an interchange format called MOJO and it is loaded as a blob to a table in Teradata database by the user.

The following are examples of H2OPredict() function call.

Example Setup

  • Import necessary modules.
    >>> import os, teradataml
    >>> from teradataml.options.configure import configure
    >>> from teradataml import H2Oredict, DataFrame, load_example_data, save_byom, retrieve_byom
  • Load example data.
    >>> load_example_data("byom", "iris_input")
  • Create teradataml DataFrame object.
    >>> iris_test = DataFrame("iris_input")
  • Set install location of the BYOM functions.
    >>> # Set install location of BYOM functions.
    >>> configure.byom_install_location = "mldb"

Example 1: Run a query with GLM model and overwrite cached models

The query also includes arguments model_type, enable_options and model_output_fields.

  • Load model file into Vantage.
    >>> model_file = os.path.join(os.path.dirname(teradataml.__file__), "data", "models", "iris_mojo_glm_h2o_model")
  • Save the model.
    >>> save_byom("iris_mojo_glm_h2o_model", model_file, "byom_models")
  • Retrieve the model.
    >>> modeldata = retrieve_byom("iris_mojo_glm_h2o_model", table_name="byom_models")
  • Pass the output of the retrieve_model API as an input to the PMMLPredict function to score data.
    >>> result = H2OPredict(newdata=iris_test,
                            newdata_partition_column='id',
                            newdata_order_column='id',
                            modeldata=modeldata,
                            modeldata_order_column='model_id',
                            model_output_fields=['label', 'classProbabilities'],
                            accumulate=['id', 'sepal_length', 'petal_length'],
                            overwrite_cached_models='*',
                            enable_options='stageProbabilities',
                            model_type='OpenSource')
  • Print the results.
    print(result.result)

Example 2: Run a query with XGBoost model and overwrite cached models

The query also includes arguments model_type, enable_options and model_output_fields.

  • Load model file into Vantage.
    >>> model_file = os.path.join(os.path.dirname(teradataml.__file__), "data", "models", "iris_mojo_xgb_h2o_model")
  • Save the model.
    >>> save_byom("iris_mojo_xgb_h2o_model", model_file, "byom_models")
  • Retrieve the model.
    >>> modeldata = retrieve_byom("iris_mojo_xgb_h2o_model", table_name="byom_models")
  • Pass the output of the retrieve_model API as an input to the PMMLPredict function to score data.
    >>> result = H2OPredict(newdata=iris_test,
                            newdata_partition_column='id',
                            newdata_order_column='id',
                            modeldata=modeldata,
                            modeldata_order_column='model_id',
                            model_output_fields=['label', 'classProbabilities'],
                            accumulate=['id', 'sepal_length', 'petal_length'],
                            overwrite_cached_models='*',
                            enable_options='stageProbabilities',
                            model_type='OpenSource')
  • Print the results.
    print(result.result)

Example 3: Run a query with a licensed model

This example runs a query with a licensed model with id 'licensed_model1' from the table 'byom_licensed_models' and associated license key stored in column 'license_key' of the table 'license' present in the schema 'mldb'.

  • Retrieve model.
    modeldata = retrieve_byom('licensed_model1',
                               table_name='byom_licensed_models',
                               license='license_key',
                               is_license_column=True,
                               license_table_name='license',
                               license_schema_name='mldb')
  • Pass the output of the retrieve_model API as an input to the H2OPredict function to score data.
    result = H2OPredict(newdata=iris_test,
                        newdata_partition_column='id',
                        newdata_order_column='id',
                        modeldata=modeldata,
                        modeldata_order_column='model_id',
                        model_output_fields=['label', 'classProbabilities'],
                        accumulate=['id', 'sepal_length', 'petal_length'],
                        overwrite_cached_models='*',
                        enable_options='stageProbabilities',
                        model_type='OpenSource')
  • Print the results.
    print(result.result)