Teradata Package for Python Function Reference on VantageCloud Lake - __init__ - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference on VantageCloud Lake
- Deployment
- VantageCloud
- Edition
- Lake
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.08
- Published
- November 2025
- ft:locale
- en-US
- ft:lastEdition
- 2025-12-05
- dita:id
- TeradataPython_FxRef_Lake_2000
- Product Category
- Teradata Vantage
- teradataml.automl.AutoML.__init__ = __init__(self, task_type='Default', include=None, exclude=None, verbose=0, max_runtime_secs=None, stopping_metric=None, stopping_tolerance=None, max_models=None, custom_config_file=None, is_fraud=False, is_churn=False, **kwargs)
- DESCRIPTION:
AutoML (Automated Machine Learning) is an approach that automates the process
of building, training, and validating machine learning models. It involves
various algorithms to automate various aspects of the machine learning workflow,
such as data preparation, feature engineering, model selection, hyperparameter
tuning, and model deployment. It aims to simplify the process of building
machine learning models, by automating some of the more time-consuming
and labor-intensive tasks involved in the process.
AutoML is designed to handle regression, classification (binary and multiclass),
and clustering tasks. The user can specify the task type to apply regression,
classification, or clustering algorithms on the provided dataset. By default,
AutoML will automatically decide whether the task is regression or classification.
For clustering, it is mandatory for the user to specify the task type explicitly.
AutoML can also be run specifically for fraud detection and churn prediction
scenarios (binary classification). By setting the available parameters, users
can leverage specialized workflows and model selection tailored for these usecases,
enabling more effective handling of fraud and churn-related datasets.
By default, AutoML trains using all model algorithms that are applicable to
the selected task type. Beside that, AutoML also provides functionality to use
specific model algorithms for training. User can provide either include
or exclude model. In case of include, only specified models are trained
while for exclude, all models except specified model are trained.
AutoML also provides an option to customize the processes within feature
engineering, data preparation and model training phases. User can customize
the processes by passing the JSON file path in case of custom run. It also
supports early stopping of model training based on stopping metrics,
maximum running time and maximum models to be trained.
Note:
* configure.temp_object_type="VT" follows sequential execution.
PARAMETERS:
task_type:
Required when clustering data is involved otherwise optional.
Specifies the type of machine learning task for AutoML: regression, classification, or
clustering. If set to "Default", AutoML will automatically determine whether to perform
regression or classification based on the target column. For clustering tasks, user must
explicitly set this parameter to "Clustering".
Default Value: "Default"
Permitted Values: "Regression", "Classification", "Default", "Clustering"
Types: str
include:
Optional Argument.
Specifies the model algorithms to be used for model training phase.
By default, all 5 models ("glm", "svm", "knn", "decision_forest", "xgboost") are
used for training for regression and binary classification problem, while only 3
models ("knn", "decision_forest", "xgboost") are used for multi-class.
For clustering, only 2 models ("kmeans", "gaussianmixture") are used.
Permitted Values: "glm", "svm", "knn", "decision_forest", "xgboost", "kmeans", "gaussianmixture"
Types: str OR list of str
exclude:
Optional Argument.
Specifies the model algorithms to be excluded from model training phase.
No model is excluded by default.
Permitted Values: "glm", "svm", "knn", "decision_forest", "xgboost", "kmeans", "gaussianmixture"
Types: str OR list of str
verbose:
Optional Argument.
Specifies the detailed execution steps based on verbose level.
Default Value: 0
Permitted Values:
* 0: prints the progress bar and leaderboard
* 1: prints the execution steps of AutoML.
* 2: prints the intermediate data between the execution of each step of AutoML.
Types: int
max_runtime_secs:
Optional Argument.
Specifies the time limit in seconds for model training.
Types: int
stopping_metric:
Required, when "stopping_tolerance" is set, otherwise optional.
Specifies the stopping metrics for stopping tolerance in model training.
Permitted Values:
* For task_type "Regression": "R2", "MAE", "MSE", "MSLE", "MAPE", "MPE",
"RMSE", "RMSLE", "ME", "EV", "MPD", "MGD"
* For task_type "Classification": "MICRO-F1", "MACRO-F1", "MICRO-RECALL", "MACRO-RECALL",
"MICRO-PRECISION", "MACRO-PRECISION", "WEIGHTED-PRECISION",
"WEIGHTED-RECALL", "WEIGHTED-F1", "ACCURACY"
* For task_type "Clustering": "SILHOUETTE", "CALINSKI", "DAVIES"
Types: str
stopping_tolerance:
Required, when "stopping_metric" is set, otherwise optional.
Specifies the stopping tolerance for stopping metrics in model training.
Types: float
max_models:
Optional Argument.
Specifies the maximum number of models to be trained.
Types: int
custom_config_file:
Optional Argument.
Specifies the path of JSON file in case of custom run.
Types: str
is_fraud:
Optional Argument.
Specifies whether the usecase is for fraud detection.
Default Value: False
Types: bool
is_churn:
Optional Argument.
Specifies whether the usecase is for churn prediction.
Default Value: False
Types: bool
**kwargs:
Specifies the additional arguments for AutoML. Below
are the additional arguments:
volatile:
Optional Argument.
Specifies whether to put the interim results of the
functions in a volatile table or not. When set to
True, results are stored in a volatile table,
otherwise not.
Default Value: False
Types: bool
persist:
Optional Argument.
Specifies whether to persist the interim results of the
functions in a table or not. When set to True,
results are persisted in a table; otherwise,
results are garbage collected at the end of the
session.
Note:
* User is responsible for cleanup of the persisted tables. List of persisted tables
in current session can be viewed using get_persisted_tables() method.
Default Value: False
Types: bool
seed:
Optional Argument.
Specifies the random seed for reproducibility.
Default Value: 42
Types: int
imbalance_handling_method:
Optional Argument.
Specifies which data imbalance method to use for classification
problems.
Default Value: SMOTE
Permitted Values: "SMOTE", "ADASYN", "SMOTETomek", "NearMiss"
Types: str
enable_lasso:
Optional Argument.
Specifies whether to use lasso regression for feature selection.
By default, only RFE and PCA are used for feature selection.
Default Value: False
Types: bool
raise_errors:
Optional Argument.
Specifies whether to raise errors or warnings for
non-blocking errors. When set to True, raises errors,
otherwise raises warnings.
Default Value: False
Types: bool
RETURNS:
Instance of AutoML.
RAISES:
TeradataMlException, TypeError, ValueError
EXAMPLES:
# Notes:
# 1. Get the connection to Vantage to execute the function.
# 2. One must import the required functions mentioned in
# the example from teradataml.
# 3. Function raises error if not supported on the Vantage
# user is connected to.
# Load the example data.
>>> load_example_data("GLMPredict", ["admissions_test", "admissions_train"])
>>> load_example_data("decisionforestpredict", ["housing_train", "housing_test"])
>>> load_example_data("teradataml", "iris_input")
>>> load_example_data("teradataml", "payment_fraud_dataset")
>>> load_example_data("teradataml", "bank_churn")
>>> load_example_data("teradataml", "bank_marketing")
# Create teradataml DataFrames.
>>> admissions_train = DataFrame.from_table("admissions_train")
>>> admissions_test = DataFrame.from_table("admissions_test")
>>> housing_train = DataFrame.from_table("housing_train")
>>> housing_test = DataFrame.from_table("housing_test")
>>> iris_input = DataFrame.from_table("iris_input")
>>> payment_fraud_df = DataFrame.from_table("payment_fraud_dataset")
>>> churn_df = DataFrame.from_table("bank_churn")
>>> bank_df = DataFrame.from_table("bank_marketing")
# Example 1: Run AutoML for classification problem.
# Scenario: Predict whether a student will be admitted to a university
# based on different factors. Run AutoML to get the best
# performing model out of available models.
# Create an instance of AutoML.
>>> automl_obj = AutoML(task_type="Classification")
# Fit the data.
>>> automl_obj.fit(admissions_train, "admitted", id_column="id")
# Display leaderboard.
>>> automl_obj.leaderboard()
# Display best performing model.
>>> automl_obj.leader()
# Run predict on test data using best performing model.
>>> prediction = automl_obj.predict(admissions_test)
>>> prediction
# Run predict on test data using second best performing model.
>>> prediction = automl_obj.predict(admissions_test, rank=2)
>>> prediction
# Run evaluate to get performance metrics using best performing model.
>>> performance_metrics = automl_obj.evaluate(admissions_test)
>>> performance_metrics
# Run evaluate to get performance metrics using model rank 3.
>>> performance_metrics = automl_obj.evaluate(admissions_test, rank=3)
>>> performance_metrics
# Example 2 : Run AutoML for regression problem.
# Scenario : Predict the price of house based on different factors.
# Run AutoML to get the best performing model using custom
# configuration file to customize different processes of
# AutoML Run. Use include to specify "xgbooost" and
# "decision_forset" models to be used for training.
# Generate custom JSON file
>>> AutoML.generate_custom_config("custom_housing")
# Create instance of AutoML.
>>> automl_obj = AutoML(task_type="Regression",
>>> verbose=1,
>>> include=["decision_forest", "xgboost"],
>>> custom_config_file="custom_housing.json")
# Fit the data.
>>> automl_obj.fit(housing_train, "price")
# Display leaderboard.
>>> automl_obj.leaderboard()
# Display best performing model.
>>> automl_obj.leader()
# Run predict on test data using best performing model.
>>> prediction = automl_obj.predict(housing_test)
>>> prediction
# Run predict on test data using second best performing model.
>>> prediction = automl_obj.predict(housing_test, rank=2)
>>> prediction
# Run evaluate to get performance metrics using best performing model.
>>> performance_metrics = automl_obj.evaluate(housing_test)
>>> performance_metrics
# Run evaluate to get performance metrics using second best performing model.
>>> performance_metrics = automl_obj.evaluate(housing_test, rank=2)
>>> performance_metrics
# Example 3 : Run AutoML for multiclass classification problem.
# Scenario : Predict the species of iris flower based on different
# factors. Use custom configuration file to customize
# different processes of AutoML Run to get the best
# performing model out of available models.
# Split the data into train and test.
>>> iris_sample = iris_input.sample(frac = [0.8, 0.2])
>>> iris_train = iris_sample[iris_sample['sampleid'] == 1].drop('sampleid', axis=1)
>>> iris_test = iris_sample[iris_sample['sampleid'] == 2].drop('sampleid', axis=1)
# Generate custom JSON file
>>> AutoML.generate_custom_config()
# Create instance of AutoML.
>>> automl_obj = AutoML(verbose=2,
>>> exclude="xgboost",
>>> custom_config_file="custom.json")
# Fit the data.
>>> automl_obj.fit(iris_train, iris_train.species)
# Display leaderboard.
>>> automl_obj.leaderboard()
# Display best performing model.
>>> automl_obj.leader()
# Run predict on test data using second best performing model.
>>> prediction = automl_obj.predict(iris_test, rank=2)
>>> prediction
# Run evaluate to get performance metrics using best performing model.
>>> performance_metrics = automl_obj.evaluate(iris_test)
>>> performance_metrics
# Example 4 : Run AutoML for regression problem with early stopping metric and tolerance.
# Scenario : Predict the price of house based on different factors.
# Use custom configuration file to customize different
# processes of AutoML Run. Define performance threshold
# to acquire for the available models, and terminate training
# upon meeting the stipulated performance criteria.
# Generate custom JSON file
>>> AutoML.generate_custom_config("custom_housing")
# Create instance of AutoML.
>>> automl_obj = AutoML(verbose=2,
>>> exclude="xgboost",
>>> stopping_metric="R2",
>>> stopping_tolerance=0.7,
>>> max_models=10,
>>> custom_config_file="custom_housing.json")
# Fit the data.
>>> automl_obj.fit(housing_train, "price")
# Display leaderboard.
>>> automl_obj.leaderboard()
# Run predict on test data using best performing model.
>>> prediction = automl_obj.predict(housing_test)
>>> prediction
# Run evaluate to get performance metrics using best performing model.
>>> performance_metrics = automl_obj.evaluate(housing_test)
>>> performance_metrics
# Example 5 : Run AutoML for regression problem with maximum runtime.
# Scenario : Predict the species of iris flower based on different factors.
# Run AutoML to get the best performing model in specified time.
# Split the data into train and test.
>>> iris_sample = iris_input.sample(frac = [0.8, 0.2])
>>> iris_train = iris_sample[iris_sample['sampleid'] == 1].drop('sampleid', axis=1)
>>> iris_test = iris_sample[iris_sample['sampleid'] == 2].drop('sampleid', axis=1)
# Create instance of AutoML.
>>> automl_obj = AutoML(verbose=2,
>>> exclude="xgboost",
>>> max_runtime_secs=500,
>>> max_models=3)
# Fit the data.
>>> automl_obj.fit(iris_train, iris_train.species)
# Display leaderboard.
>>> automl_obj.leaderboard()
# Display best performing model.
>>> automl_obj.leader()
# Run predict on test data using best performing model.
>>> prediction = automl_obj.predict(iris_test)
>>> prediction
# Run predict on test data using second best performing model.
>>> prediction = automl_obj.predict(iris_test, rank=2)
>>> prediction
# Run evaluate to get performance metrics using best performing model.
>>> performance_metrics = automl_obj.evaluate(iris_test)
>>> performance_metrics
# Run evaluate to get performance metrics using model rank 4.
>>> performance_metrics = automl_obj.evaluate(iris_test, 4)
>>> performance_metrics
# Example 6 : Run AutoML for fraud detection problem.
# Scenario : Predict whether transaction is Fraud or not.
# Split the data into train and test.
>>> payment_fraud_sample = payment_fraud_df.sample(frac = [0.8, 0.2])
>>> payment_fraud_train = payment_fraud_sample[payment_fraud_sample['sampleid'] == 1].drop('sampleid', axis=1)
>>> payment_fraud_test = payment_fraud_sample[payment_fraud_sample['sampleid'] == 2].drop('sampleid', axis=1)
# Create instance of AutoML with is_fraud set to True.
>>> automl_obj = AutoML(is_fraud=True)
# Fit the data.
>>> automl_obj.fit(payment_fraud_train, "isFraud")
# Display leaderboard.
>>> automl_obj.leaderboard()
# Display best performing model.
>>> automl_obj.leader()
# Run predict on test data using best performing model.
>>> prediction = automl_obj.predict(payment_fraud_test)
>>> prediction
# Run predict on test data using second best performing model.
>>> prediction = automl_obj.predict(payment_fraud_test, rank=2)
>>> prediction
# Run evaluate to get performance metrics using best performing model.
>>> performance_metrics = automl_obj.evaluate(payment_fraud_test)
>>> performance_metrics
# Run evaluate to get performance metrics using model rank 4.
>>> performance_metrics = automl_obj.evaluate(payment_fraud_test, 4)
>>> performance_metrics
# Example 7 : Run AutoML for churn prediction problem.
# Scenario : Predict whether a customer churn for bank or not.
# Split the data into train and test.
>>> churn_sample = churn_df.sample(frac = [0.8, 0.2])
>>> churn_train = churn_sample[churn_sample['sampleid'] == 1].drop('sampleid', axis=1)
>>> churn_test = churn_sample[chrun_sample['sampleid'] == 2].drop('sampleid', axis=1)
# Create instance of AutoML with is_churn=True
>>> automl_obj = AutoML(is_churn=True)
# Fit the data.
>>> automl_obj.fit(churn_train, "churn")
# Display leaderboard.
>>> automl_obj.leaderboard()
# Display best performing model.
>>> automl_obj.leader()
# Run predict on test data using best performing model.
>>> prediction = automl_obj.predict(churn_test)
>>> prediction
# Run predict on test data using second best performing model.
>>> prediction = automl_obj.predict(churn_test, rank=2)
>>> prediction
# Run evaluate to get performance metrics using best performing model.
>>> performance_metrics = automl_obj.evaluate(churn_test)
>>> performance_metrics
# Run evaluate to get performance metrics using model rank 4.
>>> performance_metrics = automl_obj.evaluate(churn_test, 4)
>>> performance_metrics
# Example 8: Use AutoML for unsupervised clustering task based on bank data.
# Scenario: Automatically group similar records in the dataset into clusters.
# Split the data into train and test.
>>> bank_sample = bank_df.sample(frac = [0.8, 0.2])
>>> bank_train = bank_sample[bank_sample['sampleid'] == 1].drop('sampleid', axis=1)
>>> bank_test = bank_sample[bank_sample['sampleid'] == 2].drop('sampleid', axis=1)
# Create instance of AutoML.
>>> automl_obj = AutoML(task_type="Clustering")
# Fit the data.
>>> automl_obj.fit(bank_train)
# Display leaderboard.
>>> automl_obj.leaderboard()
# Display best performing model.
>>> automl_obj.leader()
# Run predict on test data using best performing model.
>>> prediction = automl_obj.predict(bank_test)
>>> prediction
# Run predict on test data using second best performing model.
>>> prediction = automl_obj.predict(bank_test, rank=2)
>>> prediction