Teradata Package for Python Function Reference on VantageCloud Lake - __init__ - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference on VantageCloud Lake
- Deployment
- VantageCloud
- Edition
- Lake
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Lake_2000
- Product Category
- Teradata Vantage
- teradataml.automl.__init__.AutoClassifier.__init__ = __init__(self, include=None, exclude=None, verbose=0, max_runtime_secs=None, stopping_metric=None, stopping_tolerance=None, max_models=None, custom_config_file=None, **kwargs)
- DESCRIPTION:
AutoClassifier is a special purpose AutoML feature to run classification specific tasks.
PARAMETERS:
include:
Optional Argument.
Specifies the model algorithms to be used for model training phase.
By default, all 5 models are used for training for regression and binary
classification problem, while only 3 models are used for multi-class.
Permitted Values: "glm", "svm", "knn", "decision_forest", "xgboost"
Types: str OR list of str
exclude:
Optional Argument.
Specifies the model algorithms to be excluded from model training phase.
No model is excluded by default.
Permitted Values: "glm", "svm", "knn", "decision_forest", "xgboost"
Types: str OR list of str
verbose:
Optional Argument.
Specifies the detailed execution steps based on verbose level.
Default Value: 0
Permitted Values:
* 0: prints the progress bar and leaderboard
* 1: prints the execution steps of AutoML.
* 2: prints the intermediate data between the execution of each step of AutoML.
Types: int
max_runtime_secs:
Optional Argument.
Specifies the time limit in seconds for model training.
Types: int
stopping_metric:
Required, when "stopping_tolerance" is set, otherwise optional.
Specifies the stopping mertics for stopping tolerance in model training.
Permitted Values:
* For task_type "Regression": "R2", "MAE", "MSE", "MSLE",
"MAPE", "MPE", "RMSE", "RMSLE",
"ME", "EV", "MPD", "MGD"
* For task_type "Classification": 'MICRO-F1','MACRO-F1',
'MICRO-RECALL','MACRO-RECALL',
'MICRO-PRECISION', 'MACRO-PRECISION',
'WEIGHTED-PRECISION','WEIGHTED-RECALL',
'WEIGHTED-F1', 'ACCURACY'
Types: str
stopping_tolerance:
Required, when "stopping_metric" is set, otherwise optional.
Specifies the stopping tolerance for stopping metrics in model training.
Types: float
max_models:
Optional Argument.
Specifies the maximum number of models to be trained.
Types: int
custom_config_file:
Optional Argument.
Specifies the path of json file in case of custom run.
Types: str
**kwargs:
Specifies the additional arguments for AutoClassifier. Below
are the additional arguments:
volatile:
Optional Argument.
Specifies whether to put the interim results of the
functions in a volatile table or not. When set to
True, results are stored in a volatile table,
otherwise not.
Default Value: False
Types: bool
persist:
Optional Argument.
Specifies whether to persist the interim results of the
functions in a table or not. When set to True,
results are persisted in a table; otherwise,
results are garbage collected at the end of the
session.
Default Value: False
Types: bool
RETURNS:
Instance of AutoClassifier.
RAISES:
TeradataMlException, TypeError, ValueError
EXAMPLES:
# Notes:
# 1. Get the connection to Vantage to execute the function.
# 2. One must import the required functions mentioned in
# the example from teradataml.
# 3. Function will raise error if not supported on the Vantage
# user is connected to.
# Load the example data.
>>> load_example_data("teradataml", ["titanic", "iris_input"])
>>> load_example_data("GLMPredict", ["admissions_test", "admissions_train"])
# Create teradataml DataFrame object.
>>> admissions_train = DataFrame.from_table("admissions_train")
>>> titanic = DataFrame.from_table("titanic")
>>> iris_input = DataFrame.from_table("iris_input")
>>> admissions_test = DataFrame.from_table("admissions_test")
# Example 1 : Run AutoClassifier for binary classification problem
# Scenario : Predict whether a student will be admitted to a university
# based on different factors. Run AutoML to get the best performing model
# out of available models.
# Create instance of AutoClassifier..
>>> automl_obj = AutoClassifier()
# Fit the data.
>>> automl_obj.fit(admissions_train, "admitted")
# Display leaderboard.
>>> automl_obj.leaderboard()
# Display best performing model.
>>> automl_obj.leader()
# Run predict on test data using best performing model.
>>> prediction = automl_obj.predict(admissions_test)
>>> prediction
# Run predict on test data using second best performing model.
>>> prediction = automl_obj.predict(admissions_test, rank=2)
>>> prediction
# Run evaluate to get performance metrics using best performing model.
>>> performance_metrics = automl_obj.evaluate(admissions_test)
>>> performance_metrics
# Run evaluate to get performance metrics using model rank 4.
>>> performance_metrics = automl_obj.evaluate(admissions_test, 4)
>>> performance_metrics
# Example 2 : Run AutoClassifier for binary classification.
# Scenario : Predict whether passenger aboard the RMS Titanic survived
# or not based on differect factors. Run AutoML to get the
# best performing model out of available models. Use custom
# configuration file to customize different processes of
# AutoML Run.
# Split the data into train and test.
>>> titanic_sample = titanic.sample(frac = [0.8, 0.2])
>>> titanic_train= titanic_sample[titanic_sample['sampleid'] == 1].drop('sampleid', axis=1)
>>> titanic_test = titanic_sample[titanic_sample['sampleid'] == 2].drop('sampleid', axis=1)
# Generate custom configuration file.
>>> AutoClassifier.generate_custom_config("custom_titanic")
# Create instance of AutoClassifier.
>>> automl_obj = AutoClassifier(verbose=2,
>>> custom_config_file="custom_titanic.json")
# Fit the data.
>>> automl_obj.fit(titanic_train, titanic_train.survived)
# Display leaderboard.
>>> automl_obj.leaderboard()
# Display best performing model.
>>> automl_obj.leader()
# Run predict on test data using best performing model.
>>> prediction = automl_obj.predict(titanic_test)
>>> prediction
# Run predict on test data using second best performing model.
>>> prediction = automl_obj.predict(titanic_test, rank=2)
>>> prediction
# Run evaluate to get performance metrics using best performing model.
>>> performance_metrics = automl_obj.evaluate(titanic_test)
>>> performance_metrics
# Example 3 : Run AutoClassifier for multiclass classification problem.
# Scenario : Predict the species of iris flower based on different factors.
# Run AutoML to get the best performing model out of available
# models. Use custom configuration file to customize different
# processes of AutoML Run.
# Split the data into train and test.
>>> iris_sample = iris_input.sample(frac = [0.8, 0.2])
>>> iris_train= iris_sample[iris_sample['sampleid'] == 1].drop('sampleid', axis=1)
>>> iris_test = iris_sample[iris_sample['sampleid'] == 2].drop('sampleid', axis=1)
# Generate custom configuration file.
>>> AutoClassifier.generate_custom_config("custom_iris")
# Create instance of AutoClassifier.
>>> automl_obj = AutoClassifier(verbose=1,
>>> custom_config_file="custom_iris.json")
# Fit the data.
>>> automl_obj.fit(iris_train, "species")
# Display leaderboard.
>>> automl_obj.leaderboard()
# Display best performing model.
>>> automl_obj.leader()
# Predict on test data using best performing model.
>>> prediction = automl_obj.predict(iris_test)
>>> prediction
# Run evaluate to get performance metrics using best performing model.
>>> performance_metrics = automl_obj.evaluate(iris_test)
>>> performance_metrics
# Example 4 : Run AutoClassifier for classification problem with stopping metric and tolerance.
# Scenario : Predict whether passenger aboard the RMS Titanic survived
# or not based on differect factors. Use custom configuration
# file to customize different processes of AutoML Run. Define
# performance threshold to acquire for the available models, and
# terminate training upon meeting the stipulated performance criteria.
# Split the data into train and test.
>>> titanic_sample = titanic.sample(frac = [0.8, 0.2])
>>> titanic_train= titanic_sample[titanic_sample['sampleid'] == 1].drop('sampleid', axis=1)
>>> titanic_test = titanic_sample[titanic_sample['sampleid'] == 2].drop('sampleid', axis=1)
# Generate custom configuration file.
>>> AutoClassifier.generate_custom_config("custom_titanic")
# Create instance of AutoClassifier.
>>> automl_obj = AutoClassifier(verbose=2,
>>> exclude="xgboost",
>>> stopping_metric="MICRO-F1",
>>> stopping_tolerance=0.7,
>>> max_models=8
>>> custom_config_file="custom_titanic.json")
# Fit the data.
>>> automl_obj.fit(titanic_train, titanic_train.survived)
# Display leaderboard.
>>> automl_obj.leaderboard()
# Run predict on test data using best performing model.
>>> prediction = automl_obj.predict(titanic_test)
>>> prediction
# Run evaluate to get performance metrics using best performing model.
>>> performance_metrics = automl_obj.evaluate(titanic_test)
>>> performance_metrics
# Example 5 : Run AutoClassifier for classification problem with maximum runtime.
# Scenario : Predict the species of iris flower based on different factors.
# Run AutoML to get the best performing model in specified time.
# Split the data into train and test.
>>> iris_sample = iris_input.sample(frac = [0.8, 0.2])
>>> iris_train= iris_sample[iris_sample['sampleid'] == 1].drop('sampleid', axis=1)
>>> iris_test = iris_sample[iris_sample['sampleid'] == 2].drop('sampleid', axis=1)
# Create instance of AutoClassifier.
>>> automl_obj = AutoClassifier(verbose=2,
>>> exclude="xgboost",
>>> max_runtime_secs=500)
>>> max_models=3)
# Fit the data.
>>> automl_obj.fit(iris_train, iris_train.species)
# Display leaderboard.
>>> automl_obj.leaderboard()
# Display best performing model.
>>> automl_obj.leader()
# Run predict on test data using best performing model.
>>> prediction = automl_obj.predict(iris_test)
>>> prediction
# Run predict on test data using second best performing model.
>>> prediction = automl_obj.predict(iris_test, rank=2)
>>> prediction
# Run evaluate to get performance metrics using model rank 3.
>>> performance_metrics = automl_obj.evaluate(iris_test, 3)
>>> performance_metrics