This example uses the iris_input dataset and performs a prediction on each row of the input table using a model previously trained in H2O and then loaded into the database.
- Set up the environment.
- Import required libraries.
import tempfile
import getpass
from teradataml import H2OPredict, DataFrame, load_example_data, create_context, db_drop_table, remove_context, save_byom, delete_byom, retrieve_byom, list_byom
from teradataml.options.configure import configure
- Create the connection to database.
con = create_context(host=getpass.getpass("Hostname: "), username=getpass.getpass("Username: "), password=getpass.getpass("Password: "))
- Load example data.
load_example_data("byom", "iris_input")
iris_input = DataFrame("iris_input")
- Import required libraries.
- Create train dataset and test dataset.
- Create two samples of input data.This step creates two samples of input data: sample 1 has 80% of total rows and sample 2 has 20% of total rows.
iris_sample = iris_input.sample(frac=[0.8, 0.2])
iris_sample
- Create train dataset.This step creates train dataset from sample 1 by filtering on "sampleid" and dropping "sampleid" column as it is not required for training model.
iris_train = iris_sample[iris_sample.sampleid == "1"].drop("sampleid", axis = 1)
iris_train
- Create test dataset.This step creates test dataset from sample 2 by filtering on "sampleid" and dropping "sampleid" column as it is not required for scoring.
iris_test = iris_sample[iris_sample.sampleid == "2"].drop("sampleid", axis = 1)
iris_test
- Create two samples of input data.
- Train the Gradient Boosting Machine model and perform the Prediction using H2OPredict().
- Import required libraries.
import h2o
from h2o.estimators import H2OGradientBoostingEstimator
- Prepare dataset to create a Gradient Boosting Machine model.
Converting teradataml DataFrame to pandas DataFrame, since H2OFrame accepts pandas DataFrame.
h2o.init()
iris_train_pd = iris_train.to_pandas() h2o_df = h2o.H2OFrame(iris_train_pd) h2o_df
- Train the Gradient Boosting Machine model.
Add the code for training model.
h2o_df["species"] = h2o_df["species"].asfactor() predictors = h2o_df.columns response = "species"
gbm_model = H2OGradientBoostingEstimator(nfolds=5, seed=1111, keep_cross_validation_predictions = True)
gbm_model.train(x=predictors, y=response, training_frame=h2o_df)
- Save the model to a file in MOJO format.
temp_dir = tempfile.TemporaryDirectory() model_file_path = gbm_model.save_mojo(path=f"{temp_dir.name}", force=True)
- Save the model in Vantage.
save_byom(model_id="h2o_gbm_iris", model_file=model_file_path, table_name="byom_models")
- List the model in Vantage.
list_byom("byom_models")
- Retrieve the model from Vantage.
model=retrieve_byom("h2o_gbm_iris", "byom_models")
- Set "configure.byom_install_location" to the database where BYOM functions are installed.
configure.byom_install_location = getpass.getpass("byom_install_location: ")
- Score the test data using H2OPredict function with the retrieved model.
result = H2OPredict(newdata=iris_test, newdata_partition_column='id', newdata_order_column='id', modeldata=model, modeldata_order_column='model_id', model_output_fields=['label', 'classProbabilities'], accumulate=['id', 'sepal_length', 'petal_length'], overwrite_cached_models='*', enable_options='stageProbabilities', model_type='OpenSource' )
- Print the equivalent SQL query and Score result.
print(result.show_query())
result.result
- Import required libraries.
- Clean up.
# Delete the saved Model. delete_byom("h2o_gbm_iris", table_name="byom_models")
# Drop models table. db_drop_table("byom_models")
# Drop input data tables. db_drop_table("iris_input")
# One must run remove_context() to close the connection and garbage collect internally generated objects. remove_context()