Teradata Package for Python Function Reference | 17.10 - H2OPredict - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.10

Published

April 2022

Language

English (United States)

Last Update

2022-08-19

lifecycle

Product Category

Teradata Vantage

teradataml.analytics.byom.H2OPredict.__init__ = __init__(self, modeldata=None, newdata=None, accumulate=None, model_output_fields=None, overwrite_cached_models=None, model_type='OpenSource', enable_options=None, newdata_partition_column='ANY', newdata_order_column=None, modeldata_order_column=None): DESCRIPTION: The H2OPredict function performs a prediction on each row of the input table using a model previously trained in H2O and then loaded into the database. The model uses an interchange format called as MOJO and it is loaded to Teradata database in a table by the user as a blob. The model table prepared by user should have a model id for each model (residing as a MOJO object) created by the user. PARAMETERS: modeldata: Required Argument. Specifies the model teradataml DataFrame to be used for scoring. modeldata_order_column: Optional Argument. Specifies Order By columns for "modeldata". Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) newdata: Required Argument. Specifies the input teradataml DataFrame that contains the data to be scored. newdata_partition_column: Optional Argument Specifies Partition By columns for "newdata". Values to this argument can be provided as a list, if multiple columns are used for partition. Default Value: ANY Types: str OR list of Strings (str) newdata_order_column: Optional Argument. Specifies Order By columns for "newdata". Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) accumulate: Required Argument. Specifies the names of the input columns from "newdata" DataFrame to copy to the output DataFrame. Types: str OR list of Strings (str) model_output_fields: Optional Argument. Specifies the columns of the json output that the user wants to specify as individual columns instead of the entire json report. Types: str OR list of Strings (str) overwrite_cached_models: Optional Argument. Specifies the model name that needs to be removed from the cache. Use * to remove all cached models. Types: str OR list of Strings (str) model_type: Optional Argument. Specifies the model type for H2O model prediction. Default Value: "OpenSource" Permitted Values: DAI, OpenSource Types: str OR list of Strings (str) enable_options: Optional Argument. Specifies the options to be enabled for H2O model prediction. Permitted Values: contributions, stageProbabilities, leafNodeAssignments Types: str OR list of Strings (str) RETURNS: Instance of H2OPredict. Output teradataml DataFrame can be accessed using attribute references, such as H2OPredictObj.<attribute_name>. Output teradataml DataFrame attribute name is: result RAISES: TeradataMlException, TypeError, ValueError EXAMPLES: # Note: # To execute BYOM functions, set 'configure.byom_install_location' to the # database name where BYOM functions are installed. # Import required libraries / functions. import os from teradataml import save_byom, retrieve_byom # Load example data. load_example_data("byom", "iris_test") # Create teradataml DataFrame objects. iris_test = DataFrame.from_table("iris_test") # Set install location of BYOM functions. configure.byom_install_location = "mldb" # Example 1: This example runs a query with GLM model, "model_type", # "enable_options", "model_output_fields" and "overwrite.cached.models". # This will erase entire cache. # Load model file into Vantage. model_file = os.path.join(os.path.dirname(teradataml.__file__), "data", "models", "iris_mojo_glm_h2o_model") save_byom("iris_mojo_glm_h2o_model", model_file, "byom_models") # Retrieve model. modeldata = retrieve_byom("iris_mojo_glm_h2o_model", table_name="byom_models") result = H2OPredict(newdata=iris_test, newdata_partition_column='id', newdata_order_column='id', modeldata=modeldata, modeldata_order_column='model_id', model_output_fields=['label', 'classProbabilities'], accumulate=['id', 'sepal_length', 'petal_length'], overwrite_cached_models='*', enable_options='stageProbabilities', model_type='OpenSource' ) # Print the results. print(result.result) # Example 2: This example runs a query with XGBoost model, "model_type", # "enable_options", "model_output_fields" and "overwrite.cached.models". # This will erase entire cache. # Load model file into Vantage. model_file = os.path.join(os.path.dirname(teradataml.__file__), "data", "models", "iris_mojo_xgb_h2o_model") save_byom("iris_mojo_xgb_h2o_model", model_file, "byom_models") # Retrieve model. modeldata = retrieve_byom("iris_mojo_xgb_h2o_model", table_name="byom_models") result = H2OPredict(newdata=iris_test, newdata_partition_column='id', newdata_order_column='id', modeldata=modeldata, modeldata_order_column='model_id', model_output_fields=['label', 'classProbabilities'], accumulate=['id', 'sepal_length', 'petal_length'], overwrite_cached_models='*', enable_options='stageProbabilities', model_type='OpenSource' ) # Print the results. print(result.result) # Example 3: This example runs a query with a licensed model with id 'licensed_model1' # from the table 'byom_licensed_models' and associated license key stored in column # 'license_key' of the table 'license' present in the schema 'mldb'. # Retrieve model. modeldata = retrieve_byom('licensed_model1', table_name='byom_licensed_models', license='license_key', is_license_column=True, license_table_name='license', license_schema_name='mldb') result = H2OPredict(newdata=iris_test, newdata_partition_column='id', newdata_order_column='id', modeldata=modeldata, modeldata_order_column='model_id', model_output_fields=['label', 'classProbabilities'], accumulate=['id', 'sepal_length', 'petal_length'], overwrite_cached_models='*', enable_options='stageProbabilities', model_type='OpenSource' ) # Print the results. print(result.result)