Using H2OPredict to Score using Externally Trained Models | teradataml - Using H2OPredict to Score using Externally Trained Models - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
March 2024
Language
English (United States)
Last Update
2024-04-09
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

This example uses the iris_input dataset and performs a prediction on each row of the input table using a model previously trained in H2O and then loaded into the database.

  1. Set up the environment.
    1. Import required libraries.
      import tempfile
      import getpass
      from teradataml import H2OPredict, DataFrame, load_example_data, create_context, db_drop_table, remove_context, save_byom, delete_byom, retrieve_byom, list_byom
      from teradataml.options.configure import configure
    2. Create the connection to database.
      con = create_context(host=getpass.getpass("Hostname: "),
                           username=getpass.getpass("Username: "),
                           password=getpass.getpass("Password: "))
    3. Load example data.
      load_example_data("byom", "iris_input")
      iris_input = DataFrame("iris_input")
  2. Create train dataset and test dataset.
    1. Create two samples of input data.
      This step creates two samples of input data: sample 1 has 80% of total rows and sample 2 has 20% of total rows.
      iris_sample = iris_input.sample(frac=[0.8, 0.2])
      iris_sample
    2. Create train dataset.
      This step creates train dataset from sample 1 by filtering on "sampleid" and dropping "sampleid" column as it is not required for training model.
      iris_train = iris_sample[iris_sample.sampleid == "1"].drop("sampleid", axis = 1)
      iris_train
    3. Create test dataset.
      This step creates test dataset from sample 2 by filtering on "sampleid" and dropping "sampleid" column as it is not required for scoring.
      iris_test = iris_sample[iris_sample.sampleid == "2"].drop("sampleid", axis = 1)
      iris_test
  3. Train the Gradient Boosting Machine model and perform the Prediction using H2OPredict().
    1. Import required libraries.
      import h2o
      from h2o.estimators import H2OGradientBoostingEstimator
    2. Prepare dataset to create a Gradient Boosting Machine model.

      Converting teradataml DataFrame to pandas DataFrame, since H2OFrame accepts pandas DataFrame.

      h2o.init()
      iris_train_pd = iris_train.to_pandas()
      h2o_df = h2o.H2OFrame(iris_train_pd)
      h2o_df
    3. Train the Gradient Boosting Machine model.

      Add the code for training model.

      h2o_df["species"] = h2o_df["species"].asfactor()
      predictors = h2o_df.columns
      response = "species"
      gbm_model = H2OGradientBoostingEstimator(nfolds=5, seed=1111, keep_cross_validation_predictions = True)
      gbm_model.train(x=predictors, y=response, training_frame=h2o_df)
    4. Save the model to a file in MOJO format.
      temp_dir = tempfile.TemporaryDirectory()
      model_file_path = gbm_model.save_mojo(path=f"{temp_dir.name}", force=True)
    5. Save the model in Vantage.
      save_byom(model_id="h2o_gbm_iris", model_file=model_file_path, table_name="byom_models")
    6. List the model in Vantage.
      list_byom("byom_models")
    7. Retrieve the model from Vantage.
      model=retrieve_byom("h2o_gbm_iris", "byom_models")
    8. Set "configure.byom_install_location" to the database where BYOM functions are installed.
      configure.byom_install_location = getpass.getpass("byom_install_location: ")
    9. Score the test data using H2OPredict function with the retrieved model.
      result = H2OPredict(newdata=iris_test,
                          newdata_partition_column='id',
                          newdata_order_column='id',
                          modeldata=model,
                          modeldata_order_column='model_id',
                          model_output_fields=['label', 'classProbabilities'],
                          accumulate=['id', 'sepal_length', 'petal_length'],
                          overwrite_cached_models='*',
                          enable_options='stageProbabilities',
                          model_type='OpenSource'
                         )
    10. Print the equivalent SQL query and Score result.
      print(result.show_query())
      result.result
  4. Clean up.
    # Delete the saved Model.
    delete_byom("h2o_gbm_iris", table_name="byom_models")
    # Drop models table.
    db_drop_table("byom_models")
    # Drop input data tables.
    db_drop_table("iris_input")
    # One must run remove_context() to close the connection and garbage collect internally generated objects.
    remove_context()