H2OPredict
Description
The td_h2o_predict_sqle()
function performs a prediction on each row of the input table
using a model previously trained in H2O and then loaded into the database.
The model uses an interchange format called MOJO and it is loaded to
Teradata database in a table by the user as a blob.
The model data prepared by user should have a model id for each model
(residing as a MOJO object) created by the user.
td_h2o_predict_sqle()
supports Driverless AI and H2O-3 MOJO models.
H2O Driverless AI (DAI) provides a number of transformations.
The following transformers are available for regression and classification
(multiclass and binary) experiments:
Numeric
Categorical
Time and Date
Time Series
NLP (test)
Image
Usage
td_h2o_predict_sqle (
modeldata = NULL,
newdata = NULL,
accumulate = NULL,
model.output.fields = NULL,
overwrite.cached.models = NULL,
model.type = "OpenSource",
enable.options = NULL,
is.debug = FALSE,
...
)
Arguments
modeldata |
Required Argument. |
newdata |
Required Argument. |
accumulate |
Required Argument. |
model.output.fields |
Optional Argument. |
overwrite.cached.models |
Optional Argument. |
model.type |
Optional Argument. |
enable.options |
Optional Argument.
When the feature options are not specified, the features are considered false and the following values are not populated in the output JSON:
Permitted Values: "contributions", "stageProbabilities", "leafNodeAssignments" |
is.debug |
Optional Argument.
Default Value: FALSE |
... |
Specifies the generic keyword arguments SQLE functions accept. Below
are the generic keyword arguments: volatile: Function allows the user to partition, hash, order or local order the input data. These generic arguments are available for each argument that accepts tbl_teradata as input and can be accessed as:
Note: |
Value
Function returns an object of class "td_h2o_predict_sqle"
which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator
using the name(s):result
Examples
# Get the current context/connection..
con <- td_get_context()$connection
# Load example data.
loadExampleData("pmmlpredict_example", "iris_test")
# Create tbl_teradata object.
iris_test <- tbl(con, "iris_test")
# Set install location of BYOM functions.
options(byom.install.location = "mldb")
# Check the list of available analytic functions.
display_analytic_functions(type="BYOM")
# Example 1: This example scores the data on Vantage using a GLM model generated
# outside of Vantage. The example performs prediction with td_h2o_predict_sqle
# function using this GLM model in mojo format generated by H2O.
# Corresponding values are specified for the "model.type", "enable.options",
# "model.output.fields" and "overwrite.cached.models". This will erase
# entire cache.
# Load model file into Vantage.
# Create following table on vantage if it does not exist.
crt_tbl <- "CREATE SET TABLE byom_models(model_id VARCHAR(40), model BLOB)
PRIMARY INDEX (model_id);"
DBI::dbExecute(con, sql(crt_tbl))
# Run the following query through BTEQ or Teradata Studio to load the
# models. 'load_byom_model.txt' and byom files can be found under
# 'inst/scripts' in tdplyr installation directory. This file and the byom
# models to be loaded should be in the same directory.
# .import vartext file load_byom_model.txt
# .repeat *
# USING (c1 VARCHAR(40), c2 BLOB AS DEFERRED BY NAME) INSERT INTO byom_models(:c1, :c2);
# Retrieve model.
modeldata <- tbl(con, "byom_models")
result <- td_h2o_predict_sqle(
newdata=iris_test,
newdata.partition.column='id',
newdata.order.column='id',
modeldata=modeldata,
modeldata.order.column='model_id',
model.output.fields=c('label', 'classProbabilities'),
accumulate=c('id', 'sepal_length', 'petal_length'),
overwrite.cached.models='*',
enable.options='stageProbabilities',
model.type='OpenSource'
)
# Print the results.
print(result$result)
# Example 2: This example scores the data on Vantage using a XGBoost model generated
# outside of Vantage. The example performs prediction with td_h2o_predict_sqle
# function using this XGBoost model in mojo format generated by H2O.
# Corresponding values are specified for the "model.type", "enable.options",
# "model.output.fields" and "overwrite.cached.models". This will erase
# entire cache.
# Retrieve model.
modeldata <- tbl(con, "byom_models")
result <- td_h2o_predict_sqle(
newdata=iris_test,
newdata.partition.column='id',
newdata.order.column='id',
modeldata=modeldata,
modeldata.order.column='model_id',
model.output.fields=c('label', 'classProbabilities'),
accumulate=c('id', 'sepal_length', 'petal_length'),
overwrite.cached.models='*',
enable.options='stageProbabilities',
model.type='OpenSource'
)
# Print the results.
print(result$result)
# Example 3: Example to show case the trace table usage using
# is.debug=TRUE.
# Create the trace table if not present.
crt_tbl_query <- 'CREATE GLOBAL TEMPORARY TRACE TABLE BYOM_Trace \
(vproc_ID BYTE(2) \
,Sequence INTEGER \
,Trace_Output VARCHAR(31000) CHARACTER SET LATIN NOT CASESPECIFIC) \
ON COMMIT PRESERVE ROWS;'
DBI::dbExecute(con, sql(crt_tbl_query))
# Turn on tracing for the session.
DBI::dbExecute(con, "SET SESSION FUNCTION TRACE USING '' FOR TABLE BYOM_Trace;")
modeldata <- tbl(con, "byom_models")
# Execute the td_h2o_predict_sqle() function using is.debug=TRUE.
result <- td_h2o_predict_sqle(
newdata=iris_test,
newdata.partition.column='id',
newdata.order.column='id',
modeldata=modeldata,
modeldata.order.column='model_id',
model.output.fields=c('label', 'classProbabilities'),
accumulate=c('id', 'sepal_length', 'petal_length'),
overwrite.cached.models='*',
enable.options='stageProbabilities',
model.type='OpenSource',
is.debug=TRUE
)
# Print the results.
print(result$result)
# View the trace table information.
trace_df <- dbGetQuery(con, "select * from BYOM_Trace")
print(trace_df)
# Turn off tracing for the session.
DBI::dbExecute(con, "SET SESSION FUNCTION TRACE OFF;")