Teradata Package for Python Function Reference | 20.00 - __init__ - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- Product Category
- Teradata Vantage
- teradataml.hyperparameter_tuner.optimizer.RandomSearch.__init__ = __init__(self, func, params, n_iter=10, **kwargs)
- DESCRIPTION:
RandomSearch algorithm performs random sampling on hyperparameter
space to identify optimal hyperparameters. It works for
teradataml analytic functions from SQLE, BYOM, VAL and UAF features.
teradataml RandomSearch allows user to perform hyperparameter tuning for
all model trainer and non-model trainer functions.
When used for model trainer functions:
* Based on evaluation metrics search determines best model.
* All methods and properties can be used.
When used for non-model trainer functions:
* Only fit() method is supported.
* User can choose the best output as they see fit to use this.
teradataml RandomSearch also allows user to use input data as the
hyperparameter. This option can be suitable when the user wants to
identify the best models for a set of input data. When user passes
set of data as hyperparameter for model trainer function, the search
determines the best data along with the best model based on the
evaluation metrics.
PARAMETERS:
func:
Required Argument.
Specifies a teradataml analytic function from SQLE, VAL, and UAF.
Types:
teradataml Analytic Functions
* Advanced analytic functions
* UAF
* VAL
Refer to display_analytic_functions() function for list of functions.
params:
Required Argument.
Specifies the parameter(s) of a teradataml analytic function.
The parameter(s) must be in dictionary. keys refers to the
argument names and values refers to argument values for corresponding
arguments.
Notes:
* One can specify the argument value in a tuple to run HPT
with different arguments.
* Model trainer function arguments "id_column", "input_columns",
and "target_columns" must be passed in fit() method.
* All required arguments of non-model trainer function must be
passed while RandomSearch object creation.
Types: dict
n_iter:
Optional Argument.
Specifies the number of iterations random search need to be performed.
Note:
* n_iter must be less than the size of parameter populations.
Default Value: 10
Types: int
RETURNS:
None
RAISES:
TeradataMlException, TypeError, ValueError
EXAMPLES:
>>> # Example 1: Model trainer function. Performing hyperparameter-tuning
>>> # on SVM model trainer function using random search algorithm.
>>> # Load the example data.
>>> load_example_data("teradataml", ["cal_housing_ex_raw"])
>>> # Create teradataml DataFrame objects.
>>> data_input = DataFrame.from_table("cal_housing_ex_raw")
>>> # Scale "target_columns" with respect to 'STD' value of the column.
>>> fit_obj = ScaleFit(data=data_input,
target_columns=['MedInc', 'HouseAge', 'AveRooms',
'AveBedrms', 'Population', 'AveOccup',
'Latitude', 'Longitude'],
scale_method="STD")
>>> # Transform the data.
>>> transform_obj = ScaleTransform(data=data_input,
object=fit_obj.output,
accumulate=["id", "MedHouseVal"])
>>> # Define parameter space for model training.
>>> # Note: These parameters create 6 models based on batch_size and iter_max.
>>> params = {"input_columns":['MedInc', 'HouseAge', 'AveRooms',
'AveBedrms', 'Population', 'AveOccup',
'Latitude', 'Longitude'],
"response_column":"MedHouseVal",
"model_type":"regression",
"batch_size":(11, 50, 75),
"iter_max":(100, 301),
"lambda1":0.1,
"alpha":0.5,
"iter_num_no_change":60,
"tolerance":0.01,
"intercept":False,
"learning_rate":"INVTIME",
"initial_data":0.5,
"decay_rate":0.5,
"momentum":0.6,
"nesterov":True,
"local_sgd_iterations":1}
>>> # Import trainer function and optimizer.
>>> from teradataml import SVM, RandomSearch
>>> # Initialize the random search optimizer with model trainer
>>> # function and parameter space required for model training.
>>> rs_obj = RandomSearch(func=SVM, params=params, n_iter=3)
>>> # Perform model optimization for SVM function.
>>> # Evaluation and prediction arguments are passed along with
>>> # training dataframe.
>>> rs_obj.fit(data=transform_obj.result, evaluation_metric="R2",
id_column="id", verbose=1)
completed: |████████████████████████████████████████████████████████████| 100% - 3/3
>>> # View trained models.
>>> rs_obj.models
MODEL_ID DATA_ID PARAMETERS STATUS R2
0 SVM_2 DF_0 {'input_columns': ['MedInc', 'HouseAge', 'AveR... PASS -3.668091
1 SVM_1 DF_0 {'input_columns': ['MedInc', 'HouseAge', 'AveR... PASS -3.668091
2 SVM_0 DF_0 {'input_columns': ['MedInc', 'HouseAge', 'AveR... PASS -3.668091
>>> # View model evaluation stats.
>>> rs_obj.model_stats
MODEL_ID MAE MSE MSLE MAPE ... ME R2 EV MPD MGD
0 SVM_2 2.354167 6.715689 0.0 120.054758 ... 3.801619 -3.668091 0.184238 NaN NaN
1 SVM_1 2.354167 6.715689 0.0 120.054758 ... 3.801619 -3.668091 0.184238 NaN NaN
2 SVM_0 2.354167 6.715689 0.0 120.054758 ... 3.801619 -3.668091 0.184238 NaN NaN
[3 rows x 13 columns]
>>> # Performing prediction on sampled data using best trained model.
>>> test_data = transform_obj.result.iloc[:5]
>>> rs_pred = rs_obj.predict(newdata=test_data, id_column="id")
>>> print("Prediction result:
", rs_pred.result)
Prediction result:
id prediction
0 686 -0.024033
1 2018 -0.069738
2 1754 -0.117881
3 670 -0.021818
4 244 -0.187346
>>> # Perform evaluation using best model.
>>> rs_obj.evaluate()
############ result Output ############
MAE MSE MSLE MAPE MPE RMSE RMSLE ME R2 EV MPD MGD
0 2.354167 6.715689 0.0 120.054758 120.054758 2.591465 0.0 3.801619 -3.668091 0.184238 NaN NaN
>>> # Retrieve any trained model.
>>> rs_obj.get_model("SVM_1")
############ output_data Output ############
iterNum loss eta bias
0 3 2.012817 0.028868 0.0
1 5 2.010455 0.022361 0.0
2 6 2.009331 0.020412 0.0
3 7 2.008276 0.018898 0.0
4 9 2.006384 0.016667 0.0
5 10 2.005518 0.015811 0.0
6 8 2.007302 0.017678 0.0
7 4 2.011636 0.025000 0.0
8 2 2.014326 0.035355 0.0
9 1 2.016398 0.050000 0.0
############ result Output ############
predictor estimate value
attribute
-7 Alpha 0.500000 Elasticnet
-3 Number of Observations 55.000000 None
5 Population 0.000000 None
0 (Intercept) 0.000000 None
-17 OneClass SVM NaN FALSE
-16 Kernel NaN LINEAR
-1 Loss Function NaN EPSILON_INSENSITIVE
7 Latitude -0.076648 None
-9 Learning Rate (Initial) 0.050000 None
-14 Epsilon 0.100000 None
>>> # View best data, model ID, score and parameters.
>>> print("Best data ID: ", rs_obj.best_data_id)
Best data ID: DF_0
>>> print("Best model ID: ", rs_obj.best_model_id)
Best model ID: SVM_2
>>> print("Best model score: ", rs_obj.best_score_)
Best model score: -3.6680912444156455
>>> print("Best model parameters: ", rs_obj.best_params_)
Best model parameters: {'input_columns': ['MedInc', 'HouseAge', 'AveRooms',
'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude'],
'response_column': 'MedHouseVal', 'model_type': 'regression',
'batch_size': 50, 'iter_max': 301, 'lambda1': 0.1, 'alpha': 0.5,
'iter_num_no_change': 60, 'tolerance': 0.01, 'intercept': False,
'learning_rate': 'INVTIME', 'initial_data': 0.5, 'decay_rate': 0.5,
'momentum': 0.6, 'nesterov': True, 'local_sgd_iterations': 1,
'data': '"ALICE"."ml__select__1696595493985650"'}
>>> # Update the default model.
>>> rs_obj.set_model("SVM_1")
>>> # Example 2: Non-Model trainer function. Performing random search
>>> # on AntiSelect model trainer function using random
>>> # search algorithm.
>>> # Load the example dataset.
>>> load_example_data("teradataml", "titanic")
>>> # Create teradaraml dataframe.
>>> titanic = DataFrame.from_table("titanic")
>>> # Define the non-model trainer function parameter space.
>>> # Include input data in parameter space for non-model trainer function.
>>> # Note: These parameters creates two model hyperparameters.
>>> params = {"data":titanic, "exclude":(['survived', 'age'],['age'],
['survived', 'name', 'age'],
['ticket'],['parch'],['sex','age'],
['survived'], ['ticket','parch'],
["ticket", "parch", "sex", "age"])}
>>> # Import non-model trainer function and optimizer.
>>> from teradataml import Antiselect, RandomSearch
>>> # Initialize the random search optimizer with non-model trainer
>>> # function and parameter space required for non-model training.
>>> rs_obj = RandomSearch(func=Antiselect, params=params, n_iter=4)
>>> # Perform execution of Antiselect function.
>>> rs_obj.fit()
>>> # Note: Since it is a non-model trainer function model ID, score
>>> # and parameters are not applicable here.
>>> # View trained models.
>>> rs_obj.models
MODEL_ID PARAMETERS STATUS
0 ANTISELECT_1 {'data': '"titanic"', 'exclude': ['survived', ... PASS
1 ANTISELECT_3 {'data': '"titanic"', 'exclude': ['ticket', 'p... PASS
2 ANTISELECT_2 {'data': '"titanic"', 'exclude': ['survived']} PASS
3 ANTISELECT_0 {'data': '"titanic"', 'exclude': ['sex', 'age']} PASS
>>> # Retrieve any trained model using "MODEL_ID".
>>> rs_obj.get_model("ANTISELECT_0")
############ result Output ############
passenger survived pclass name sibsp parch ticket fare cabin embarked
0 162 1 2 Watt, Mrs. James (Elizabeth "Bessie" Inglis Milne) 0 0 C.A. 33595 15.7500 None S
1 591 0 3 Rintamaki, Mr. Matti 0 0 STON/O 2. 3101273 7.1250 None S
2 387 0 3 Goodwin, Master. Sidney Leonard 5 2 CA 2144 46.9000 None S
3 469 0 3 Scanlan, Mr. James 0 0 36209 7.7250 None Q
4 326 1 1 Young, Miss. Marie Grice 0 0 PC 17760 135.6333 C32 C
5 265 0 3 Henry, Miss. Delia 0 0 382649 7.7500 None Q
6 530 0 2 Hocking, Mr. Richard George 2 1 29104 11.5000 None S
7 244 0 3 Maenpaa, Mr. Matti Alexanteri 0 0 STON/O 2. 3101275 7.1250 None S
8 61 0 3 Sirayanian, Mr. Orsen 0 0 2669 7.2292 None C
9 122 0 3 Moore, Mr. Leonard Charles 0 0 A4. 54510 8.0500 None S