fit | GridSearch | Hyperparameter Tuning in teradataml - fit - Teradata Package for Python

Teradata® Package for Python User Guide

Teradata Package for Python
Release Number
March 2024
English (United States)
Last Update
Product Category
Teradata Vantage
Use the fit() method to perform hyperparameter tuning using GridSearch algorithm.
  • In model trainer function, the best parameters are selected based on training results.
  • In non- model trainer function, first execution parameter set is selected as the best parameters.

Optional Arguments:

  • data: Specifies the input teradataml DataFrame for model trainer function.
    • This is a required argument for model trainer functions.
    • This argument is ignored for non-model trainer functions.
    This argument can be either a dictionary, or a tuple, or a teradataml DataFrame.
    • If it is a dictionary, then Key represents the label for DataFrame and Value represents the DataFrame.
    • If it is a tuple, then teradataml converts it to dictionary by generating the labels internally.
    • If it is a teradataml DataFrame, then teradataml label it as "DF_0".
    • When this argument is passed as a model hyperparameters (params), DataFrame need not to be passed in fit() method.
    • This argument can contain a single DataFrame or multiple DataFrames.
    • You can pass multiple DataFrames to argument. Hyperparameter tuning is performed on all the DataFrames for every model parameter.
  • evaluation_metric: Specifies the evaluation metrics to be considered for model evaluation.
    • This argument is applicable for model trainer functions.
    • Best model is not selected when evaluation returns non-finite values.
    Permitted Values:
    • Classification: Accuracy, Micro-Precision, Micro-Recall, Micro-F1, Macro-Precision, Macro-Recall, Macro-F1, Weighted-Precision, Weighted-Recall, Weighted-F1.
    • Regression: MAE, MSE, MSLE, MAPE, MPE, RMSE, RMSLE, ME, R2, EV, MPD, MGD.
    Default Value:
    • Classification: Accuracy
    • Regression: MAE
  • early_stop: Specifies the early stop mechanism value for model trainer functions. Hyperparameter tuning ends model training when the training model evaluation metric attains the value of this argument.
    Early stopping supports only when evaluation returns finite value.
  • frac: Specifies the split percentage of rows to be sampled for training dataset and testing dataset. The value of this argument is considered as train split percentage and the remaining percentage is for test splitting.

    The value must range between (0, 1). Default value is 0.8.

    This argument is not supported for non-model trainer function.
  • run_parallel: Specifies the parallel execution functionality of hyperparameter tuning.

    When set to 'True', model functions are executed concurrently. Otherwise, model functions are executed sequentially. Default value is 'True'.

  • wait: Specifies whether to wait for the completion of execution of hyperparameter tuning or not.

    When set to 'False', hyperparameter tuning is executed in the background and you can use is_running method to check the status. Otherwise it waits until the execution is complete to return the control back to user. Default value is 'True'.

  • verbose: Specifies whether to log the model training information and display the logs.
    • When set to 0, nothing is logged in the console.
    • When set to 1, progress bar alone is logged in the console.
    • When set to 2, along with progress bar, execution steps and execution time are logged in the console.
    Default value is 0.
    This argument is not significant when the wait argument is set to 'False'.
  • sample_seed: Specifies the seed value that controls the shuffling applied to the data before applying the train test split. Pass an integer for reproducible output across multiple function calls.

    When this argument is not specified, different runs of the query generate different outputs.

    Permitted value is an integer in the range [0, 2147483647].

    Seed is only supported when stratify_column is passed, otherwise it is optional.
  • stratify_column: Specifies column name that contains the labels indicating which data needs to be stratified for train test split.
  • sample_id_column: Specifies the input data column name that has the unique identifier for each row in the input.

    This argument is required when sample_seed argument is present.

  • max_time: Specifies the maximum time for the completion of Hyperparameter tuning execution.
  • kwargs: Specifies the keyword arguments. Accepts additional arguments required for the teradataml analytic function.