H2OPredict() using KMeans model.¶
Setup¶
In [1]:
import tempfile
import getpass
import teradataml as td
from teradataml import create_context, remove_context, load_example_data, DataFrame, \
db_drop_table, save_byom, retrieve_byom, delete_byom, list_byom
from teradataml.options.configure import configure
from teradataml.analytics.byom.H2OPredict import H2OPredict
import h2o
In [2]:
# Create the connection.
host = getpass.getpass("Host: ")
username = getpass.getpass("Username: ")
password = getpass.getpass("Password: ")
con = create_context(host=host, username=username, password=password)
Load example data and use sample() for splitting input data into testing and training dataset.¶
In [3]:
load_example_data("byom", "iris_input")
iris_input = DataFrame("iris_input")
# Create 2 samples of input data - sample 1 will have 80% of total rows and sample 2 will have 20% of total rows.
iris_sample = iris_input.sample(frac=[0.8, 0.2])
In [4]:
# Create train dataset from sample 1 by filtering on "sampleid" and drop "sampleid" column as it is not required for training model.
iris_train = iris_sample[iris_sample.sampleid == "1"].drop("sampleid", axis = 1)
iris_train
Out[4]:
In [5]:
# Create test dataset from sample 2 by filtering on "sampleid" and drop "sampleid" column as it is not required for scoring.
iris_test = iris_sample[iris_sample.sampleid == "2"].drop("sampleid", axis = 1)
iris_test
Out[5]:
Prepare dataset for creating a KMeans model.¶
In [6]:
h2o.init()
# Since H2OFrame accepts pandas DataFrame, converting teradataml DataFrame to pandas DataFrame.
iris_train_pd = iris_train.to_pandas()
h2o_df_train = h2o.H2OFrame(iris_train_pd)
h2o_df_train
Out[6]:
In [7]:
# Since H2OFrame accepts pandas DataFrame, converting teradataml DataFrame to pandas DataFrame.
iris_test_pd = iris_test.to_pandas()
h2o_df_test = h2o.H2OFrame(iris_test_pd)
h2o_df_test
Out[7]:
Train KMeans Model.¶
In [8]:
# Import required libraries.
from h2o.estimators import H2OKMeansEstimator
In [9]:
# Add the code for training model.
h2o_df_train["species"] = h2o_df_train["species"].asfactor()
predictors = h2o_df_train.columns
response = "species"
In [10]:
iris_kmeans = H2OKMeansEstimator(k=10, estimate_k=True, standardize=False, seed=1234)
In [11]:
iris_kmeans.train(x=predictors, training_frame=h2o_df_train, validation_frame=h2o_df_test)
Save the model in MOJO format.¶
In [12]:
# Saving H2O Model to a file.
temp_dir = tempfile.TemporaryDirectory()
model_file_path = iris_kmeans.save_mojo(path=f"{temp_dir.name}", force=True)
Save the model in Vantage.¶
In [13]:
# Save the H2O Model in Vantage.
save_byom("h2o_kmeans_iris", model_file_path, "byom_models")
List the models from Vantage.¶
In [14]:
# List the models from "byom_models".
list_byom("byom_models")
Retrieve the model from Vantage.¶
In [15]:
# Retrieve the model from table "byom_models", using the model id 'h2o_kmeans_iris'.
modeldata = retrieve_byom("h2o_kmeans_iris", "byom_models")
Set "configure.byom_install_location" to the database where BYOM functions are installed.¶
In [16]:
configure.byom_install_location = getpass.getpass("byom_install_location: ")
Score the model.¶
In [17]:
result = H2OPredict(newdata=iris_test,
newdata_partition_column='id',
newdata_order_column='id',
modeldata=modeldata,
modeldata_order_column='model_id',
accumulate=['id', 'sepal_length', 'petal_length'],
overwrite_cached_models='*',
model_type='OpenSource'
)
In [18]:
# Print the query.
print(result.show_query())
In [19]:
# Print the result.
result.result
Out[19]:
Cleanup.¶
In [20]:
# Delete the model from table "byom_models", using the model id 'h2o_kmeans_iris'.
delete_byom("h2o_kmeans_iris", "byom_models")
In [21]:
# Drop models table.
db_drop_table("byom_models")
Out[21]:
In [22]:
# Drop input data table.
db_drop_table("iris_input")
Out[22]:
In [23]:
# One must run remove_context() to close the connection and garbage collect internally generated objects.
remove_context()
Out[23]:
In [ ]: