Using H2OPredict to Score using Externally Trained Models | teradataml - 17.00 - Using H2OPredict to Score using Externally Trained Models - Teradata Package for Python

Teradata® Package for Python User Guide

Teradata Package for Python
Release Number
Release Date
November 2021
Content Type
User Guide
Publication ID
English (United States)

This example uses the iris_input dataset and performs a prediction on each row of the input table using a model previously trained in H2O and then loaded into the database.

  1. Set up the environment.
    1. Import required libraries.
      import tempfile
      import getpass
      from teradataml import H2OPredict, DataFrame, load_example_data, create_context, db_drop_table, remove_context, save_byom, delete_byom, retrieve_byom, list_byom
      from teradataml.options.configure import configure
    2. Create the connection to database.
      con = create_context(host=getpass.getpass("Hostname: "),
                           username=getpass.getpass("Username: "),
                           password=getpass.getpass("Password: "))
    3. Load example data.
      load_example_data("byom", "iris_input")
      iris_input = DataFrame("iris_input")
  2. Create train dataset and test dataset.
    1. Create two samples of input data.
      This step creates two samples of input data: sample 1 has 80% of total rows and sample 2 has 20% of total rows.
      iris_sample = iris_input.sample(frac=[0.8, 0.2])
    2. Create train dataset.
      This step creates train dataset from sample 1 by filtering on "sampleid" and dropping "sampleid" column as it is not required for training model.
      iris_train = iris_sample[iris_sample.sampleid == "1"].drop("sampleid", axis = 1)
    3. Create test dataset.
      This step creates test dataset from sample 2 by filtering on "sampleid" and dropping "sampleid" column as it is not required for scoring.
      iris_test = iris_sample[iris_sample.sampleid == "2"].drop("sampleid", axis = 1)
  3. Train the Gradient Boosting Machine model and perform the Prediction using H2OPredict().
    1. Import required libraries.
      import h2o
      from import H2OPredict
      from h2o.estimators import H2OGradientBoostingEstimator
    2. Prepare dataset to create a Gradient Boosting Machine model.
      # Since H2OFrame accepts pandas DataFrame, converting teradataml DataFrame to pandas DataFrame.
      iris_train_pd = iris_train.to_pandas()
      h2o_df = h2o.H2OFrame(iris_train_pd)
    3. Train the Gradient Boosting Machine model.
      # Add the code for training model.
      h2o_df["species"] = h2o_df["species"].asfactor()
      predictors = h2o_df.columns
      response = "species"
      gbm_model = H2OGradientBoostingEstimator(nfolds=5, seed=1111, keep_cross_validation_predictions = True)
      gbm_model.train(x=predictors, y=response, training_frame=h2o_df)
    4. Save the model in MOJO format.
      # Saving H2O Model to a file.
      temp_dir = tempfile.TemporaryDirectory()
      model_file_path = gbm_model.save_mojo(path=f"{}", force=True)
    5. Save the model in Vantage.
      # Save the H2O Model in Vantage.
      save_byom(model_id="h2o_gbm_iris", model_file=model_file_path, table_name="byom_models")
    6. List the model in Vantage.
    7. Retrieve the model from Vantage.
      # Retrieve the model from vantage using the model name 'h2o_gbm_iris'.
      model=retrieve_byom("h2o_gbm_iris", "byom_models")
    8. Set "configure.byom_install_location" to the database where BYOM functions are installed.
      configure.byom_install_location = getpass.getpass("byom_install_location: ")
    9. Score the test data using H2OPredict function with the retrieved model.
      # Score the model on 'iris_test' data.
      result = H2OPredict(newdata=iris_test,
                          model_output_fields=['label', 'classProbabilities'],
                          accumulate=['id', 'sepal_length', 'petal_length'],
    10. Print the equivalent SQL query and Score result.
      # Print the query.
      # Print the result.
  4. Clean up.
    # Delete the saved Model.
    delete_byom("h2o_gbm_iris", table_name="byom_models")
    # Drop models table.
    # Drop input data tables.
    # One must run remove_context() to close the connection and garbage collect internally generated objects.