AutoML._init_ | AutoML | teradataml - AutoML.__init__ - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
VMware
Enterprise
IntelliFlex
Product
Teradata Package for Python
Release Number
20.00
Published
March 2025
ft:locale
en-US
ft:lastEdition
2026-02-20
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

AutoML is an approach that automates the process of building, training, and validating machine learning models. It involves various algorithms to automate various aspects of the machine learning workflow, such as data preparation, feature engineering, model selection, hyperparameter tuning, and model deployment. It aims to simplify the process of building machine learning models, by automating some of the more time-consuming and labor-intensive tasks involved in the process.

AutoML is designed to handle both regression, classification (binary and multiclass), and clustering tasks. You can specify the task type on the provided dataset. By default, AutoML will automatically decide whether the task is regression or classification. For clustering, it is mandatory for the user to specify the task type explicitly.

AutoML can also be run specifically for fraud detection and churn prediction scenarios (binary classification). You can set the available parameters to leverage specialized workflows and model selection tailored for these use cases, enabling more effective handling of fraud and churn-related datasets.

AutoML by default, trains using all model algorithms applicable for the task type problem.

For example,
  • For clustering, only two models ("KMeans", "GaussianMixture") are used.
  • For multiclass classification problem, only three models, "svm", "knn", "decision_forest", "xgboost", are available to train, by default. Because "glm" and "svm" does not support multiclass classification problem.
  • For regression and binary classification problem, all five models, "glm", "svm", "knn", "decision_forest", "xgboost", are available to train by default.
AutoML provides functionality to use specific model algorithms for training. You can provide either include or exclude model.
  • For include, only specified models are trained.
  • For exclude, all models except specified model are trained.

AutoML also provides an option to customize the processes within feature engineering, data preparation and model training phases. You can customize the processes by passing the JSON file path in case of custom run. It also supports early stopping of model training based on stopping metrics and maximum running time.

configure.temp_object_type="VT" follows sequential execution.

Optional Arguments

task_type
Required when clustering data is involved, otherwise optional

Specifies the machine learning task type for AutoML: regression, classification, or clustering.

Set this argument to "Default" if you want AutoML to decide the regression or classification task type automatically. You must set task_type to 'Clustering' for clustering tasks.

Permitted values are "Regression", "Classification", "Default", "Clustering".

Default value is "Default".

include
Specifies the model algorithms to be used for model training phase.

By default, all five models are used for training for regression and binary classification problem, while only three models are used for multiclass.

For clustering, only two models ("KMeans", "GaussianMixture") are used.

Permitted values are "glm", "svm", "knn", "decision_forest", "xgboost", "KMeans", "GaussianMixture".

exclude
Specifies the model algorithms to be excluded from model training phase.

No model is excluded by default.

Permitted values are "glm", "svm", "knn", "decision_forest", "xgboost", "KMeans", "GaussianMixture".

verbose
Specifies the detailed execution steps based on verbose level.
Permitted values are: *
  • 0: prints the progress bar and leaderboard.
  • 1: prints the execution steps of AutoML.
  • 2: prints the intermediate data between the execution of each step of AutoML.

Default value is 0.

max_runtime_secs
Specifies the time limit in seconds for model training.
stopping_metric
Specifies the stopping metrics for stopping tolerance in model training.
This argument is required if stopping_tolerance is set; otherwise, optional.
Permitted values are:
  • For task_type "Regression": "R2", "MAE", "MAPE", "MSE", "MSLE", "RMSE", "RMSLE", "MPE", "ME", "EV", "MPD", "MGD".
  • For task_type "Classification": "MICRO-F1", "MACRO-F1", "MICRO-RECALL", "MACRO-RECALL", "MICRO-PRECISION", "MACRO-PRECISION", "WEIGHTED-PRECISION", "WEIGHTED-RECALL", "WEIGHTED-F1", "ACCURACY".
  • For task_type "Clustering": "SILHOUETTE", "CALINSKI", "DAVIES".
max_models
Specifies the maximum number of models to be trained.
stopping_tolerance
Specifies the stopping tolerance for stopping metrics in model training.
This argument is required if stopping_metric is set; otherwise, optional.
custom_config_file
Specifies the path of JSON file in case of custom run.
is_fraud
Specifies whether the use case is for fraud detection.

Default value: False

is_churn
Specifies whether the use case is for churn prediction.

Default value: False

**kwargs
Specifies additional arguments for AutoML.
volatile
Specifies whether to put the interim results of the functions in a volatile table or not. When set to True, results are stored in a volatile table, otherwise not.

Default value: False

Types: bool

persist
Specifies whether to persist the interim results of the functions in a table or not. When set to True, results are persisted in a table; otherwise, results are garbage collected at the end of the session.

You must handle cleanup of persisted tables. Use get_persisted_tables() to view the list of persisted tables in the current session.

Default value: False

Types: bool

seed
Specifies the random seed for reproducibility.

Default Value: 42

Types: int

imbalance_handling_method
Specifies which data imbalance method to use for classification problems.

Default value: SMOTE

Permitted values: "SMOTE", "ADASYN", "SMOTETomek", "NearMiss".

enable_lasso
Specifies whether to use lasso regression for feature selection. By default, only RFE and PCA are used for feature selection.

Default value: False

raise_errors
Specifies whether to raise errors or warnings for non-blocking errors. When set to True, raises errors, otherwise raises warnings.

Default value: False