AutoML is an approach that automates the process of building, training, and validating machine learning models. It involves various algorithms to automate various aspects of the machine learning workflow, such as data preparation, feature engineering, model selection, hyperparameter tuning, and model deployment. It aims to simplify the process of building machine learning models, by automating some of the more time-consuming and labor-intensive tasks involved in the process.
AutoML is designed to handle both regression and classification (binary and multiclass) tasks. You can specify the task type on the provided dataset. By default, AutoML decides the task type.
AutoML by default, trains using all model algorithms applicable for the task type problem.
- For multiclass classification problem, only three models, "svm", "knn", "decision_forest", "xgboost", are available to train, by default. Because "glm" and "svm" does not support multiclass classification problem.
- For regression and binary classification problem, all five models, "glm", "svm", "knn", "decision_forest", "xgboost", are available to train by default.
- For include, only specified models are trained.
- For exclude, all models except specified model are trained.
AutoML also provides an option to customize the processes within feature engineering, data preparation and model training phases. You can customize the processes by passing the JSON file path in case of custom run. It also supports early stopping of model training based on stopping metrics and maximum running time.
Optional Arguments:
- task_type: Specifies the task type for AutoML, whether to apply regression or classification on the provided dataset.
Set this argument to "Default", if you want AutoML to decide the task type automatically.
Permitted values are "Regression", "Classification", "Default".
Default value is "Default".
- include: Specifies the model algorithms to be used for model training phase.
By default, all five models are used for training for regression and binary classification problem, while only three models are used for multiclass.
Permitted values are "glm", "svm", "knn", "decision_forest", "xgboost".
- exclude: Specifies the model algorithms to be excluded from model training phase.
No model is excluded by default.
Permitted values are "glm", "svm", "knn", "decision_forest", "xgboost".
- verbose: Specifies the detailed execution steps based on verbose level.Permitted values are: *
- 0: prints the progress bar and leaderboard.
- 1: prints the execution steps of AutoML.
- 2: prints the intermediate data between the execution of each step of AutoML.
- max_runtime_secs: Specifies the time limit in seconds for model training.
- stopping_metric: Specifies the stopping metrics for stopping tolerance in model training.This argument is required if stopping_tolerance is set; otherwise, optional.Permitted values are:
- For task_type "Regression": "R2", "MAE", "MSE", "MSLE", "RMSE", "RMSLE".
- For task_type "Regression": "R2", "MAE", "MSE", "MSLE", "RMSE", "RMSLE", "MAPE", "MPE", "ME", "EV", "MPD" and "MGD".
- For task_type "Classification": 'MICRO-F1','MACRO-F1', 'MICRO-RECALL','MACRO-RECALL', 'MICRO-PRECISION', 'MACRO-PRECISION', 'WEIGHTED-PRECISION','WEIGHTED-RECALL', 'WEIGHTED-F1', 'ACCURACY'.
- max_models: Specifies the maximum number of models to be trained.
- stopping_tolerance: Specifies the stopping tolerance for stopping metrics in model training.This argument is required if stopping_metric is set; otherwise, optional.
- custom_config_file: Specifies the path of JSON file in case of custom run.
- volatile: Specifies whether to put the interim results of the functions in a volatile table or not. When set to True, results are stored in a volatile table, otherwise not.
Default value: False
Types: bool
- persist: Specifies whether to persist the interim results of the functions in a table or not. When set to True, results are persisted in a table; otherwise, results are garbage collected at the end of the session.
Default value: False
Types: bool