H2OPredict() using Deep Learning model.¶
Setup¶
In [1]:
import getpass
import tempfile
from teradataml import create_context, DataFrame, save_byom, retrieve_byom, \
delete_byom, list_byom, remove_context, load_example_data, db_drop_table
from teradataml.options.configure import configure
from teradataml.analytics.byom.H2OPredict import H2OPredict
import h2o
In [2]:
# Create the connection.
host = getpass.getpass("Host: ")
username = getpass.getpass("Username: ")
password = getpass.getpass("Password: ")
con = create_context(host=host, username=username, password=password)
Host: ········ Username: ········ Password: ········
Load example data and use sample() for splitting input data into testing and training dataset.¶
In [3]:
load_example_data("byom", "iris_input")
WARNING: Skipped loading table iris_input since it already exists in the database.
In [4]:
iris_input = DataFrame("iris_input")
In [5]:
# Create 2 samples of input data - sample 1 will have 80% of total rows and sample 2 will have 20% of total rows.
iris_sample = iris_input.sample(frac=[0.8, 0.2])
In [6]:
# Create train dataset from sample 1 by filtering on "sampleid" and drop "sampleid" column as it is not required for training model.
iris_train = iris_sample[iris_sample.sampleid == "1"].drop("sampleid", axis = 1)
iris_train
Out[6]:
id | sepal_length | sepal_width | petal_length | petal_width | species |
---|---|---|---|---|---|
17 | 5.4 | 3.9 | 1.3 | 0.4 | 1 |
38 | 4.9 | 3.6 | 1.4 | 0.1 | 1 |
76 | 6.6 | 3.0 | 4.4 | 1.4 | 2 |
122 | 5.6 | 2.8 | 4.9 | 2.0 | 3 |
59 | 6.6 | 2.9 | 4.6 | 1.3 | 2 |
80 | 5.7 | 2.6 | 3.5 | 1.0 | 2 |
120 | 6.0 | 2.2 | 5.0 | 1.5 | 3 |
118 | 7.7 | 3.8 | 6.7 | 2.2 | 3 |
19 | 5.7 | 3.8 | 1.7 | 0.3 | 1 |
61 | 5.0 | 2.0 | 3.5 | 1.0 | 2 |
In [7]:
# Create test dataset from sample 2 by filtering on "sampleid" and drop "sampleid" column as it is not required for scoring.
iris_test = iris_sample[iris_sample.sampleid == "2"].drop("sampleid", axis = 1)
iris_test
Out[7]:
id | sepal_length | sepal_width | petal_length | petal_width | species |
---|---|---|---|---|---|
66 | 6.7 | 3.1 | 4.4 | 1.4 | 2 |
97 | 5.7 | 2.9 | 4.2 | 1.3 | 2 |
146 | 6.7 | 3.0 | 5.2 | 2.3 | 3 |
118 | 7.7 | 3.8 | 6.7 | 2.2 | 3 |
62 | 5.9 | 3.0 | 4.2 | 1.5 | 2 |
38 | 4.9 | 3.6 | 1.4 | 0.1 | 1 |
108 | 7.3 | 2.9 | 6.3 | 1.8 | 3 |
24 | 5.1 | 3.3 | 1.7 | 0.5 | 1 |
150 | 5.9 | 3.0 | 5.1 | 1.8 | 3 |
122 | 5.6 | 2.8 | 4.9 | 2.0 | 3 |
Prepare dataset for a creating Deep Learning Model.¶
In [8]:
h2o.init()
# Since H2OFrame accepts pandas DataFrame, converting teradataml DataFrame to pandas DataFrame.
iris_train_pd = iris_train.to_pandas()
h2o_df = h2o.H2OFrame(iris_train_pd)
h2o_df
Checking whether there is an H2O instance running at http://localhost:54321 ..... not found. Attempting to start a local H2O server... ; Java HotSpot(TM) 64-Bit Server VM (build 25.271-b09, mixed mode) Starting server from C:\Users\pg255042\Anaconda3\envs\teraml\lib\site-packages\h2o\backend\bin\h2o.jar Ice root: C:\Users\pg255042\AppData\Local\Temp\tmpncr7fdcl JVM stdout: C:\Users\pg255042\AppData\Local\Temp\tmpncr7fdcl\h2o_pg255042_started_from_python.out JVM stderr: C:\Users\pg255042\AppData\Local\Temp\tmpncr7fdcl\h2o_pg255042_started_from_python.err Server is running at http://127.0.0.1:54321 Connecting to H2O server at http://127.0.0.1:54321 ... successful. Warning: Your H2O cluster version is too old (5 months and 1 day)!Please download and install the latest version from http://h2o.ai/download/
H2O_cluster_uptime: | 02 secs |
H2O_cluster_timezone: | Asia/Kolkata |
H2O_data_parsing_timezone: | UTC |
H2O_cluster_version: | 3.34.0.1 |
H2O_cluster_version_age: | 5 months and 1 day !!! |
H2O_cluster_name: | H2O_from_python_pg255042_n0d5ha |
H2O_cluster_total_nodes: | 1 |
H2O_cluster_free_memory: | 7.052 Gb |
H2O_cluster_total_cores: | 0 |
H2O_cluster_allowed_cores: | 0 |
H2O_cluster_status: | locked, healthy |
H2O_connection_url: | http://127.0.0.1:54321 |
H2O_connection_proxy: | {"http": null, "https": null} |
H2O_internal_security: | False |
H2O_API_Extensions: | Amazon S3, Algos, AutoML, Core V3, TargetEncoder, Core V4 |
Python_version: | 3.6.12 final |
Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
sepal_length | sepal_width | petal_length | petal_width | species |
---|---|---|---|---|
5 | 2 | 3.5 | 1 | 2 |
6.3 | 3.3 | 6 | 2.5 | 3 |
5.1 | 3.4 | 1.5 | 0.2 | 1 |
5.7 | 3.8 | 1.7 | 0.3 | 1 |
6.7 | 3 | 5 | 1.7 | 2 |
6.7 | 3.1 | 5.6 | 2.4 | 3 |
6.3 | 3.3 | 4.7 | 1.6 | 2 |
6.6 | 2.9 | 4.6 | 1.3 | 2 |
6.4 | 3.2 | 5.3 | 2.3 | 3 |
5.4 | 3.9 | 1.3 | 0.4 | 1 |
Out[8]:
Train Deep Learning Model.¶
In [9]:
#Import required libraries.
from h2o.estimators.deeplearning import H2ODeepLearningEstimator
In [10]:
# Add the code for training model.
h2o_df["species"] = h2o_df["species"].asfactor()
predictors = h2o_df.columns
response = "species"
In [11]:
dl_model=H2ODeepLearningEstimator()
In [12]:
dl_model.train(x=predictors, y=response, training_frame=h2o_df)
deeplearning Model Build progress: |█████████████████████████████████████████████| (done) 100% Model Details ============= H2ODeepLearningEstimator : Deep Learning Model Key: DeepLearning_model_python_1644984159615_1 Status of Neuron Layers: predicting species, 3-class classification, multinomial distribution, CrossEntropy loss, 41,803 weights/biases, 498.3 KB, 1,200 training samples, mini-batch size 1
layer | units | type | dropout | l1 | l2 | mean_rate | rate_rms | momentum | mean_weight | weight_rms | mean_bias | bias_rms | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 4 | Input | 0 | ||||||||||
1 | 2 | 200 | Rectifier | 0 | 0 | 0 | 0.00446281 | 0.00367723 | 0 | -0.00802765 | 0.107465 | 0.491997 | 0.0113614 | |
2 | 3 | 200 | Rectifier | 0 | 0 | 0 | 0.0263239 | 0.0926596 | 0 | -0.00342238 | 0.0713205 | 0.982641 | 0.158966 | |
3 | 4 | 3 | Softmax | 0 | 0 | 0.0154025 | 0.0718914 | 0 | -0.0056864 | 0.400361 | -0.000312151 | 0.00322719 |
ModelMetricsMultinomial: deeplearning ** Reported on train data. ** MSE: 0.07083642778722152 RMSE: 0.26615113711427485 LogLoss: 0.2858728266432885 Mean Per-Class Error: 0.08333333333333333 AUC: NaN AUCPR: NaN Multinomial auc values: Table is not computed because it is disabled (model parameter 'auc_type' is set to AUTO or NONE) or due to domain size (maximum is 50 domains). Multinomial auc_pr values: Table is not computed because it is disabled (model parameter 'auc_type' is set to AUTO or NONE) or due to domain size (maximum is 50 domains). Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
1 | 2 | 3 | Error | Rate | |
---|---|---|---|---|---|
0 | 43.0 | 0.0 | 0.0 | 0.000000 | 0 / 43 |
1 | 0.0 | 37.0 | 0.0 | 0.000000 | 0 / 37 |
2 | 0.0 | 10.0 | 30.0 | 0.250000 | 10 / 40 |
3 | 43.0 | 47.0 | 30.0 | 0.083333 | 10 / 120 |
Top-3 Hit Ratios:
k | hit_ratio | |
---|---|---|
0 | 1 | 0.916667 |
1 | 2 | 1.000000 |
2 | 3 | 1.000000 |
Scoring History:
timestamp | duration | training_speed | epochs | iterations | samples | training_rmse | training_logloss | training_r2 | training_classification_error | training_auc | training_pr_auc | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2022-02-16 09:32:43 | 0.000 sec | None | 0.0 | 0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | |
1 | 2022-02-16 09:32:44 | 1.183 sec | 779 obs/sec | 1.0 | 1 | 120.0 | 0.312775 | 0.394377 | 0.858434 | 0.141667 | NaN | NaN | |
2 | 2022-02-16 09:32:45 | 1.293 sec | 4580 obs/sec | 10.0 | 10 | 1200.0 | 0.266151 | 0.285873 | 0.897493 | 0.083333 | NaN | NaN |
Variable Importances:
variable | relative_importance | scaled_importance | percentage | |
---|---|---|---|---|
0 | petal_width | 1.000000 | 1.000000 | 0.268083 |
1 | sepal_length | 0.931197 | 0.931197 | 0.249638 |
2 | petal_length | 0.924395 | 0.924395 | 0.247814 |
3 | sepal_width | 0.874601 | 0.874601 | 0.234465 |
Out[12]:
Save the model in MOJO format.¶
In [13]:
# Saving h2o model to a file.
temp_dir = tempfile.TemporaryDirectory()
model_file_path = dl_model.save_mojo(path=f"{temp_dir.name}", force=True)
Save the model in Vantage.¶
In [16]:
# Saving the h2o model in vantage.
save_byom(model_id="h2o_dl_iris", model_file=model_file_path, table_name="byom_models")
Model is saved.
List the models from Vantage.¶
In [17]:
# List the models from "byom_models".
list_byom("byom_models")
model model_id h2o_dl_iris b'504B03041400080808...'
Retrieve the model from Vantage.¶
In [18]:
model=retrieve_byom(model_id="h2o_dl_iris", table_name="byom_models")
Set "configure.byom_install_location" to the database where BYOM functions are installed.¶
In [19]:
configure.byom_install_location = getpass.getpass("byom_install_location: ")
byom_install_location: ········
Score the model.¶
In [20]:
result = H2OPredict(newdata=iris_test,
newdata_partition_column='id',
newdata_order_column='id',
modeldata=model,
modeldata_order_column='model_id',
model_output_fields=['label', 'classProbabilities'],
accumulate=['id', 'sepal_length', 'petal_length'],
overwrite_cached_models='*',
enable_options='stageProbabilities',
model_type='OpenSource'
)
In [21]:
# Print the query.
print(result.show_query())
SELECT * FROM "mldb".H2OPredict( ON "MLDB"."ml__select__1644990039784978" AS InputTable PARTITION BY "id" ORDER BY "id" ON (select model_id,model from "MLDB"."ml__filter__1644985511527604") AS ModelTable DIMENSION ORDER BY "model_id" USING Accumulate('id','sepal_length','petal_length') ModelOutputFields('label','classProbabilities') OverwriteCachedModel('*') EnableOptions('stageProbabilities') ) as sqlmr
In [22]:
# Print the result.
result.result
Out[22]:
id | sepal_length | petal_length | prediction | label | classprobabilities |
---|---|---|---|---|---|
65 | 5.6 | 3.6 | 2 | 2 | {"1": 2.913569781015937E-6,"2": 0.9999970798430343,"3": 6.5871847449533734E-9} |
24 | 5.1 | 1.7 | 1 | 1 | {"1": 0.9997385917776745,"2": 2.614082223254326E-4,"3": 3.965376132206562E-21} |
73 | 6.3 | 4.9 | 2 | 2 | {"1": 3.7928238391692515E-12,"2": 0.9996728607633336,"3": 3.271392328735716E-4} |
30 | 4.7 | 1.6 | 1 | 1 | {"1": 0.9999850304779964,"2": 1.4969522003627505E-5,"3": 1.1243029210372163E-23} |
47 | 5.1 | 1.6 | 1 | 1 | {"1": 0.9999996053508485,"2": 3.946491514700323E-7,"3": 3.1879575130755833E-25} |
12 | 4.8 | 1.6 | 1 | 1 | {"1": 0.999996809388207,"2": 3.1906117930260754E-6,"3": 2.1626858803931788E-24} |
35 | 4.9 | 1.5 | 1 | 1 | {"1": 0.999831072526406,"2": 1.6892747359393979E-4,"3": 4.731732528448486E-23} |
37 | 5.5 | 1.3 | 1 | 1 | {"1": 0.9998930293707516,"2": 1.069706292484114E-4,"3": 6.175983138020294E-24} |
36 | 5.0 | 1.2 | 1 | 1 | {"1": 0.9999369345961132,"2": 6.306540388681125E-5,"3": 6.3237409338068505E-24} |
1 | 5.1 | 1.4 | 1 | 1 | {"1": 0.9999935881770767,"2": 6.411822923278436E-6,"3": 1.293839606039162E-24} |
Cleanup.¶
In [23]:
# Delete the saved Model.
delete_byom("h2o_dl_iris", table_name="byom_models")
Model is deleted.
In [24]:
# Drop model table.
db_drop_table("byom_models")
Out[24]:
True
In [25]:
# Drop input data table.
db_drop_table("iris_input")
Out[25]:
True
In [26]:
# One must run remove_context() to close the connection and garbage collect internally generated objects.
remove_context()
Out[26]:
True
In [ ]: