Teradata Package for Python Function Reference | 20.00 - fit - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- Product Category
- Teradata Vantage
- teradataml.hyperparameter_tuner.optimizer.RandomSearch.fit = fit(self, data=None, evaluation_metric=None, early_stop=None, frac=0.8, run_parallel=True, wait=True, verbose=0, stratify_column=None, sample_id_column=None, sample_seed=None, max_time=None, **kwargs)
- DESCRIPTION:
Function to perform hyperparameter tuning using RandomSearch algorithm.
Notes:
* In the Model trainer function, the best parameters are
selected based on training results.
* In the Non model trainer function, First execution parameter
set is selected as the best parameters.
PARAMETERS:
data:
Optional Argument.
Specifies the input teradataml DataFrame for model trainer function.
Notes:
* DataFrame need not to be passed in fit() methods, when "data" is
passed as a model hyperparameters ("params").
* "data" is a required argument for model trainer functions.
* "data" is ignored for non-model trainer functions.
* "data" can be contain single DataFrame or multiple DataFrame.
* One can pass multiple dataframes to "data". Hyperparameter
tuning is performed on all the dataframes for every model
parameter.
* "data" can be either a dictionary OR a tuple OR a dataframe.
* If it is a dictionary then Key represents the label for
dataframe and Value represents the dataframe.
* If it is a tuple then teradataml converts it to dictionary
by generating the labels internally.
* If it is a dataframe then teradataml label it as "DF_0".
Types: teradataml DataFrame, dictionary, tuples
evaluation_metric:
Optional Argument.
Specifies the evaluation metrics to considered for model
evaluation.
Notes:
* evaluation_metric applicable for model trainer functions.
* Best model is not selected when evaluation returns
non-finite values.
Permitted Values:
* Classification: Accuracy, Micro-Precision, Micro-Recall,
Micro-F1, Macro-Precision, Macro-Recall,
Macro-F1, Weighted-Precision,
Weighted-Recall,
Weighted-F1.
* Regression: MAE, MSE, MSLE, MAPE, MPE, RMSE, RMSLE, ME,
R2, EV, MPD, MGD
Default Value:
* Classification: Accuracy
* Regression: MAE
Types: str
early_stop:
Optional Argument.
Specifies the early stop mechanism value for model trainer
functions. Hyperparameter tuning ends model training when
the training model evaluation metric attains "early_stop" value.
Note:
* Early stopping supports only when evaluation returns
finite value.
Types: int or float
frac:
Optional Argument.
Specifies the split percentage of rows to be sampled for training
and testing dataset. "frac" argument value must range between (0, 1).
Notes:
* This "frac" argument is not supported for non-model trainer
function.
* The "frac" value is considered as train split percentage and
The remaining percentage is taken into account for test splitting.
Default Value: 0.8
Types: float
run_parallel:
Optional Argument.
Specifies the parallel execution functionality of hyperparameter
tuning. When "run_parallel" set to true, model functions are
executed concurrently. Otherwise, model functions are executed
sequentially.
Default Value: True
Types: bool
wait:
Optional Argument.
Specifies whether to wait for the completion of execution
of hyperparameter tuning or not. When set to False, hyperparameter
tuning is executed in the background and user can use "is_running()"
method to check the status. Otherwise it waits until the execution
is complete to return the control back to user.
Default Value: True
Type: bool
verbose:
Optional Argument.
Specifies whether to log the model training information and display
the logs. When it is set to 1, progress bar alone logged in the
console. When it is set to 2, along with progress bar, execution
steps and execution time is logged in the console. When it is set
to 0, nothing is logged in the console.
Note:
* verbose is not significant when "wait" is 'False'.
Default Value: 0
Type: bool
sample_seed:
Optional Argument.
Specifies the seed value that controls the shuffling applied
to the data before applying the Train-Test split. Pass an int for
reproducible output across multiple function calls.
Notes:
* When the argument is not specified, different
runs of the query generate different outputs.
* It must be in the range [0, 2147483647]
* Seed is supported for stratify column.
Types: int
stratify_column:
Optional Argument.
Specifies column name that contains the labels indicating
which data needs to be stratified for TrainTest split.
Notes:
* seed is supported for stratify column.
Types: str
sample_id_column:
Optional Argument.
Specifies the input data column name that has the
unique identifier for each row in the input.
Note:
* Mandatory when "sample_seed" argument is present.
Types: str
max_time:
Optional Argument.
Specifies the maximum time for the completion of Hyperparameter tuning execution.
Default Value: None
Types: int or float
kwargs:
Optional Argument.
Specifies the keyword arguments. Accepts additional arguments
required for the teradataml analytic function.
RETURNS:
None
RAISES:
TeradataMlException, TypeError, ValueError
EXAMPLES:
>>> # Create an instance of the RandomSearch algorithm called "optimizer_obj"
>>> optimizer_obj = RandomSearch(func=SVM, params=params, n_iter=3)
>>> eval_params = {"id_column": "id",
"accumulate": "MedHouseVal"}
>>> # Example 1: Passing single DataFrame for model trainer function.
>>> optimizer_obj.fit(data=train_df,
evaluation_metric="MAE",
early_stop=70.9,
**eval_params)
>>> # Example 2: Passing multiple datasets as tuple of DataFrames for
>>> # model trainer function.
>>> optimizer_obj.fit(data=(train_df_1, train_df_2),
evaluation_metric="MAE",
early_stop=70.9,
**eval_params)
>>> # Example 3: Passing multiple datasets as dictionary of DataFrames
>>> # for model trainer function.
>>> optimizer_obj.fit(data={"Data-1":train_df_1, "Data-2":train_df_2},
evaluation_metric="MAE",
early_stop=70.9,
**eval_params)
>>> # Example 4: No data argument passed in fit() method for model trainer function.
>>> # Note: data argument must be passed while creating HPT object as
>>> # model hyperparameters.
>>> # Define parameter space for model training with "data" argument.
>>> params = {"data":(df1, df2),
"input_columns":['MedInc', 'HouseAge', 'AveRooms',
'AveBedrms', 'Population', 'AveOccup',
'Latitude', 'Longitude'],
"response_column":"MedHouseVal",
"model_type":"regression",
"batch_size":(11, 50, 75),
"iter_max":(100, 301),
"intercept":False,
"learning_rate":"INVTIME",
"nesterov":True,
"local_sgd_iterations":1}
>>> # Create "optimizer_obj" using RandomSearch algorithm and perform
>>> # fit() method without any "data" argument for model trainer function.
>>> optimizer_obj.fit(evaluation_metric="MAE",
early_stop=70.9,
**eval_params)
>>> # Example 5: Do not pass data argument in fit() method for
>>> # non-model trainer function.
>>> # Note: data argument must be passed while creating HPT
>>> # object as model hyperparameters.
>>> optimizer_obj.fit()
>>> # Example 6: Passing "verbose" argument value '1' in fit() method to
>>> # display model log.
>>> optimizer_obj.fit(data=train_df, evaluation_metric="R2",
verbose=1, **eval_params)
completed: |████████████████████████████████████████████████████████████| 100% - 6/6
>>> # Example 7: max_time argument is passed in fit() method.
>>> # Model training parameters
>>> model_params = {"input_columns":['sepal_length', 'sepal_width', 'petal_length', 'petal_width'],
... "response_column" : 'species',
... "max_depth":(5,10,15),
... "lambda1" : (1000.0,0.001),
... "model_type" :"Classification",
... "seed":32,
... "shrinkage_factor":0.1,
... "iter_num":(5, 50)}
>>>
>>> eval_params = {"id_column": "id",
... "accumulate": "species",
... "model_type":'Classification',
... "object_order_column":['task_index', 'tree_num', 'iter','class_num', 'tree_order']
... }
>>>
>>> # Import model trainer and optimizer
>>> from teradataml import XGBoost, RandomSearch
>>>
>>> # Initialize the RandomSearch optimizer with model trainer
>>> # function and parameter space required for model training.
>>> rs_obj = RandomSearch(func=XGBoost, params=model_params, n_iter=5)
>>>
>>> # fit() method with max_time argument(in seconds) for model trainer function.
>>> rs_obj.fit(data=data, max_time=30, verbose=2, **eval_params)
Model_id:XGBOOST_3 - Run time:28.292s - Status:PASS - ACCURACY:0.8
Model_id:XGBOOST_0 - Run time:28.291s - Status:PASS - ACCURACY:0.867
Model_id:XGBOOST_2 - Run time:28.289s - Status:PASS - ACCURACY:0.867
Model_id:XGBOOST_1 - Run time:28.291s - Status:PASS - ACCURACY:0.867
Computing: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾| 80% - 4/5
>>>
>>> # status 'SKIP' for the models which are not completed within the max_time.
>>> rs_obj.models
MODEL_ID DATA_ID PARAMETERS STATUS ACCURACY
0 XGBOOST_3 DF_0 {'input_columns': ['sepal_length', 'sepal_widt... PASS 0.800000
1 XGBOOST_4 DF_0 {'input_columns': ['sepal_length', 'sepal_widt... SKIP NaN
2 XGBOOST_0 DF_0 {'input_columns': ['sepal_length', 'sepal_widt... PASS 0.866667
3 XGBOOST_2 DF_0 {'input_columns': ['sepal_length', 'sepal_widt... PASS 0.866667
4 XGBOOST_1 DF_0 {'input_columns': ['sepal_length', 'sepal_widt... PASS 0.866667