This example trains AutoClassifier for admission dataset with id column specified during training and get prediction on test data with id column mapping. Run AutoML to get the best performing model with the following specifications:
- Set max_models to 6.
- Include only xgboost and svm for model training.
- Opt for verbose level 2 to get detailed logging.
- Set flag for lasso feature selection.
- Set flag for raise error if any issue rather than skipping step.
- Load the online fraud data.
>>> load_example_data("dataframe", "admissions_train")>>> admissions = DataFrame('admissions_train') >>> admissions_sample = admissions.sample(frac = [0.8, 0.2]) >>> adm_train= admissions_sample[admissions_sample['sampleid'] == 1].drop('sampleid', axis=1) >>> adm_test = admissions_sample[admissions_sample['sampleid'] == 2].drop('sampleid', axis=1) - Create an AutoML instance.
>>> aml = AutoClassifier(verbose=2, >>> include=['xgboost','svm'], >>> max_models=6, >>> enable_lasso=True, >>> raise_errors=True)
- Fit the data.
>>> aml.fit(data=adm_train, target_column="admitted", id_column='id')
1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 04:58:37,709 | INFO | Feature Exploration started 2025-11-04 04:58:37,709 | INFO | Data Overview: 2025-11-04 04:58:37,730 | INFO | Total Rows in the data: 32 2025-11-04 04:58:37,771 | INFO | Total Columns in the data: 6 2025-11-04 04:58:38,565 | INFO | Column Summary: ColumnName Datatype NonNullCount NullCount BlankCount ZeroCount PositiveCount NegativeCount NullPercentage NonNullPercentage 0 stats VARCHAR(30) CHARACTER SET LATIN 32 0 0.0 NaN NaN NaN 0.0 100.0 1 programming VARCHAR(30) CHARACTER SET LATIN 32 0 0.0 NaN NaN NaN 0.0 100.0 2 id INTEGER 32 0 NaN 0.0 32.0 0.0 0.0 100.0 3 admitted INTEGER 32 0 NaN 13.0 19.0 0.0 0.0 100.0 4 gpa FLOAT 32 0 NaN 0.0 32.0 0.0 0.0 100.0 5 masters VARCHAR(5) CHARACTER SET LATIN 32 0 0.0 NaN NaN NaN 0.0 100.0 2025-11-04 04:58:39,360 | INFO | Statistics of Data: ATTRIBUTE StatName StatValue 0 admitted MAXIMUM 1.000000 1 admitted STANDARD DEVIATION 0.498991 2 admitted PERCENTILES(25) 0.000000 3 admitted PERCENTILES(50) 1.000000 4 id COUNT 32.000000 5 id MINIMUM 1.000000 6 gpa COUNT 32.000000 7 gpa MINIMUM 1.870000 8 gpa MAXIMUM 4.000000 9 gpa MEAN 3.577500 2025-11-04 04:58:39,506 | INFO | Categorical Columns with their Distinct values: ColumnName DistinctValueCount masters 2 stats 3 programming 3 2025-11-04 04:58:41,775 | INFO | No Futile columns found. 2025-11-04 04:58:44,668 | INFO | Columns with outlier percentage :- ColumnName OutlierPercentage 0 gpa 9.375 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 04:58:44,988 | INFO | Feature Engineering started ... 2025-11-04 04:58:44,988 | INFO | Handling duplicate records present in dataset ... 2025-11-04 04:58:45,124 | INFO | Analysis completed. No action taken. 2025-11-04 04:58:45,124 | INFO | Total time to handle duplicate records: 0.14 sec 2025-11-04 04:58:45,124 | INFO | Handling less significant features from data ... 2025-11-04 04:58:46,999 | INFO | Analysis indicates all categorical columns are significant. No action Needed. 2025-11-04 04:58:46,999 | INFO | Total time to handle less significant features: 1.87 sec 2025-11-04 04:58:46,999 | INFO | Handling Date Features ... 2025-11-04 04:58:46,999 | INFO | Analysis Completed. Dataset does not contain any feature related to dates. No action needed. 2025-11-04 04:58:46,999 | INFO | Total time to handle date features: 0.00 sec 2025-11-04 04:58:46,999 | INFO | Checking Missing values in dataset ... 2025-11-04 04:58:48,116 | INFO | Analysis Completed. No Missing Values Detected. 2025-11-04 04:58:48,116 | INFO | Total time to find missing values in data: 1.12 sec 2025-11-04 04:58:48,116 | INFO | Imputing Missing Values ... 2025-11-04 04:58:48,116 | INFO | Analysis completed. No imputation required. 2025-11-04 04:58:48,117 | INFO | Time taken to perform imputation: 0.00 sec 2025-11-04 04:58:48,117 | INFO | Performing encoding for categorical columns ... 2025-11-04 04:58:50,561 | INFO | ONE HOT Encoding these Columns: ['masters', 'stats', 'programming'] 2025-11-04 04:58:50,561 | INFO | Sample of dataset after performing one hot encoding: masters_0 masters_1 gpa stats_0 stats_1 stats_2 programming_0 programming_1 programming_2 admitted id 13 1 0 4.00 1 0 0 0 0 1 1 7 0 1 2.33 0 0 1 0 0 1 1 39 0 1 3.75 1 0 0 0 1 0 0 19 0 1 1.98 1 0 0 1 0 0 0 15 0 1 4.00 1 0 0 1 0 0 1 5 1 0 3.44 0 0 1 0 0 1 0 24 1 0 1.87 1 0 0 0 0 1 1 3 1 0 3.70 0 0 1 0 1 0 1 36 1 0 3.00 1 0 0 0 0 1 0 40 0 1 3.95 0 0 1 0 1 0 0 32 rows X 11 columns 2025-11-04 04:58:50,651 | INFO | Time taken to encode the columns: 2.53 sec 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 04:58:50,651 | INFO | Data preparation started ... 2025-11-04 04:58:50,652 | INFO | Outlier preprocessing ... 2025-11-04 04:58:53,575 | INFO | Columns with outlier percentage :- ColumnName OutlierPercentage 0 gpa 9.375 2025-11-04 04:58:54,006 | INFO | median inplace of outliers: ['gpa'] 2025-11-04 04:58:56,068 | INFO | Sample of dataset after performing MEDIAN inplace: masters_0 masters_1 gpa stats_0 stats_1 stats_2 programming_0 programming_1 programming_2 admitted id 13 1 0 4.000 1 0 0 0 0 1 1 24 1 0 3.755 1 0 0 0 0 1 1 3 1 0 3.700 0 0 1 0 1 0 1 19 0 1 3.755 1 0 0 1 0 0 0 15 0 1 4.000 1 0 0 1 0 0 1 40 0 1 3.950 0 0 1 0 1 0 0 7 0 1 3.755 0 0 1 0 0 1 1 39 0 1 3.750 1 0 0 0 1 0 0 36 1 0 3.000 1 0 0 0 0 1 0 5 1 0 3.440 0 0 1 0 0 1 0 32 rows X 11 columns 2025-11-04 04:58:56,179 | INFO | Time Taken by Outlier processing: 5.53 sec 2025-11-04 04:58:56,180 | INFO | Checking imbalance data ... 2025-11-04 04:58:56,242 | INFO | Imbalance Not Found. 2025-11-04 04:58:56,963 | INFO | Feature selection using lasso ... 2025-11-04 04:58:57,613 | INFO | feature selected by lasso: ['gpa', 'masters_0', 'stats_0', 'masters_1', 'programming_1', 'programming_2', 'stats_1', 'programming_0', 'stats_2'] 2025-11-04 04:58:57,614 | INFO | Total time taken by feature selection: 0.65 sec 2025-11-04 04:58:57,896 | INFO | Scaling Features of lasso data ... 2025-11-04 04:58:59,359 | INFO | columns that will be scaled: ['gpa'] 2025-11-04 04:59:01,328 | INFO | Dataset sample after scaling: masters_0 stats_0 id masters_1 programming_1 programming_2 stats_1 programming_0 stats_2 admitted gpa 0 1 0 3 0 1 0 0 0 1 1 0.700 1 1 0 5 0 0 1 0 0 1 0 0.440 2 0 0 7 1 0 1 0 0 1 1 0.755 3 1 0 8 0 0 0 1 1 0 1 0.600 4 1 1 10 0 0 0 0 1 0 1 0.710 5 1 0 12 0 0 1 0 0 1 1 0.650 6 1 1 9 0 0 0 0 1 0 1 0.820 7 0 0 4 1 0 1 1 0 0 1 0.500 8 0 0 2 1 1 0 1 0 0 0 0.760 9 0 0 1 1 1 0 1 0 0 0 0.950 32 rows X 11 columns 2025-11-04 04:59:01,916 | INFO | Total time taken by feature scaling: 4.02 sec 2025-11-04 04:59:01,917 | INFO | Feature selection using rfe ... 2025-11-04 04:59:14,194 | INFO | feature selected by RFE: ['masters_0', 'masters_1', 'programming_1', 'gpa'] 2025-11-04 04:59:14,195 | INFO | Total time taken by feature selection: 12.28 sec 2025-11-04 04:59:14,494 | INFO | Scaling Features of rfe data ... 2025-11-04 04:59:15,445 | INFO | columns that will be scaled: ['r_gpa'] 2025-11-04 04:59:17,293 | INFO | Dataset sample after scaling: id r_programming_1 r_masters_1 r_masters_0 admitted r_gpa 0 3 1 0 1 1 0.700 1 5 0 0 1 0 0.440 2 7 0 1 0 1 0.755 3 8 0 0 1 1 0.600 4 10 0 0 1 1 0.710 5 12 0 0 1 1 0.650 6 9 0 0 1 1 0.820 7 4 0 1 0 1 0.500 8 2 1 1 0 0 0.760 9 1 1 1 0 0 0.950 32 rows X 6 columns 2025-11-04 04:59:17,760 | INFO | Total time taken by feature scaling: 3.27 sec 2025-11-04 04:59:17,760 | INFO | Scaling Features of pca data ... 2025-11-04 04:59:18,644 | INFO | columns that will be scaled: ['gpa'] 2025-11-04 04:59:20,557 | INFO | Dataset sample after scaling: masters_0 stats_0 id masters_1 programming_1 programming_2 stats_1 programming_0 stats_2 admitted gpa 0 1 0 3 0 1 0 0 0 1 1 0.700 1 0 1 34 1 1 0 0 0 0 0 0.850 2 1 1 13 0 0 1 0 0 0 1 1.000 3 0 0 40 1 1 0 0 0 1 0 0.950 4 0 1 39 1 1 0 0 0 0 0 0.750 5 0 1 19 1 0 0 0 1 0 0 0.755 6 1 1 36 0 0 1 0 0 0 0 0.000 7 0 1 15 1 0 0 0 1 0 1 1.000 8 0 0 7 1 0 1 0 0 1 1 0.755 9 1 1 17 0 0 0 0 1 0 1 0.830 32 rows X 11 columns 2025-11-04 04:59:21,204 | INFO | Total time taken by feature scaling: 3.44 sec 2025-11-04 04:59:21,204 | INFO | Dimension Reduction using pca ... 2025-11-04 04:59:21,824 | INFO | PCA columns: ['col_0', 'col_1', 'col_2', 'col_3', 'col_4'] 2025-11-04 04:59:21,824 | INFO | Total time taken by PCA: 0.62 sec 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 04:59:22,220 | INFO | Model Training started ... 2025-11-04 04:59:22,305 | INFO | Hyperparameters used for model training: 2025-11-04 04:59:22,305 | INFO | Model: svm 2025-11-04 04:59:22,306 | INFO | Hyperparameters: {'response_column': 'admitted', 'name': 'svm', 'model_type': 'Classification', 'lambda1': (0.001, 0.02, 0.1), 'alpha': (0.15, 0.85), 'tolerance': (0.001, 0.01), 'learning_rate': 'OPTIMAL', 'initial_eta': (0.05, 0.1), 'momentum': (0.65, 0.8, 0.95), 'nesterov': True, 'intercept': True, 'iter_num_no_change': (5, 10, 50), 'local_sgd_iterations ': (10, 20), 'iter_max': (300, 200, 400), 'batch_size': (10, 50, 60, 80)} 2025-11-04 04:59:22,306 | INFO | Total number of models for svm: 5184 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2025-11-04 04:59:22,306 | INFO | Model: xgboost 2025-11-04 04:59:22,307 | INFO | Hyperparameters: {'response_column': 'admitted', 'name': 'xgboost', 'model_type': 'Classification', 'column_sampling': (1, 0.6), 'min_impurity': (0.0, 0.1, 0.2), 'lambda1': (1.0, 0.01, 0.1), 'shrinkage_factor': (0.5, 0.1, 0.3), 'max_depth': (5, 6, 8, 10), 'min_node_size': (1, 2, 3), 'iter_num': (10, 20, 30), 'num_boosted_trees': (-1, 5, 10), 'seed': 42} 2025-11-04 04:59:22,308 | INFO | Total number of models for xgboost: 5832 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2025-11-04 04:59:22,308 | INFO | Performing hyperparameter tuning ... 2025-11-04 04:59:24,073 | INFO | Model training for svm 2025-11-04 04:59:34,260 | INFO | ---------------------------------------------------------------------------------------------------- 2025-11-04 04:59:34,261 | INFO | Model training for xgboost 2025-11-04 04:59:45,689 | INFO | ---------------------------------------------------------------------------------------------------- 2025-11-04 04:59:45,692 | INFO | Leaderboard RANK MODEL_ID FEATURE_SELECTION ACCURACY MICRO-PRECISION ... MACRO-RECALL MACRO-F1 WEIGHTED-PRECISION WEIGHTED-RECALL WEIGHTED-F1 0 1 XGBOOST_1 rfe 0.857143 0.857143 ... 0.875000 0.857143 0.892857 0.857143 0.857143 1 2 SVM_2 pca 0.714286 0.714286 ... 0.708333 0.708333 0.714286 0.714286 0.714286 2 3 SVM_1 rfe 0.714286 0.714286 ... 0.708333 0.708333 0.714286 0.714286 0.714286 3 4 XGBOOST_0 lasso 0.714286 0.714286 ... 0.666667 0.650000 0.809524 0.714286 0.671429 4 5 SVM_0 lasso 0.571429 0.571429 ... 0.583333 0.571429 0.595238 0.571429 0.571429 5 6 XGBOOST_2 pca 0.571429 0.571429 ... 0.500000 0.363636 0.326531 0.571429 0.415584 [6 rows x 13 columns] 6 rows X 13 columns 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 14/14 - Display model leaderboard.
>>> aml.leaderboard()
RANK MODEL_ID FEATURE_SELECTION ACCURACY MICRO-PRECISION ... MACRO-RECALL MACRO-F1 WEIGHTED-PRECISION WEIGHTED-RECALL WEIGHTED-F1 0 1 XGBOOST_1 rfe 0.857143 0.857143 ... 0.875000 0.857143 0.892857 0.857143 0.857143 1 2 SVM_2 pca 0.714286 0.714286 ... 0.708333 0.708333 0.714286 0.714286 0.714286 2 3 SVM_1 rfe 0.714286 0.714286 ... 0.708333 0.708333 0.714286 0.714286 0.714286 3 4 XGBOOST_0 lasso 0.714286 0.714286 ... 0.666667 0.650000 0.809524 0.714286 0.671429 4 5 SVM_0 lasso 0.571429 0.571429 ... 0.583333 0.571429 0.595238 0.571429 0.571429 5 6 XGBOOST_2 pca 0.571429 0.571429 ... 0.500000 0.363636 0.326531 0.571429 0.415584 [6 rows x 13 columns]
- Display best performing model.
>>> aml.leader()
RANK MODEL_ID FEATURE_SELECTION ACCURACY MICRO-PRECISION ... MACRO-RECALL MACRO-F1 WEIGHTED-PRECISION WEIGHTED-RECALL WEIGHTED-F1 0 1 XGBOOST_1 rfe 0.857143 0.857143 ... 0.875 0.857143 0.892857 0.857143 0.857143 [1 rows x 13 columns]
- Display model hyperparameters for rank 1.
>>> aml.model_hyperparameters(rank=1)
{'response_column': 'admitted', 'name': 'xgboost', 'model_type': 'Classification', 'column_sampling': 0.6, 'min_impurity': 0.1, 'lambda1': 0.01, 'shrinkage_factor': 0.5, 'max_depth': 6, 'min_node_size': 2, 'iter_num': 30, 'num_boosted_trees': 5, 'seed': 42, 'persist': False, 'output_prob': True, 'output_responses': ['1', '0'], 'max_models': 1} - Generate prediction on test dataset using best performing model.
>>> prediction = aml.predict(adm_test)
2025-11-04 05:02:25,938 | INFO | Data Transformation started ... 2025-11-04 05:02:25,939 | INFO | Performing transformation carried out in feature engineering phase ... 2025-11-04 05:02:25,940 | INFO | Updated dataset after performing target column transformation : masters gpa stats programming admitted id 38 yes 2.65 Advanced Beginner 1 31 yes 3.50 Advanced Beginner 1 6 yes 3.50 Beginner Advanced 1 11 no 3.13 Advanced Advanced 1 16 no 3.70 Advanced Advanced 1 26 yes 3.57 Advanced Advanced 1 35 no 3.68 Novice Beginner 1 22 yes 3.46 Novice Beginner 0 8 rows X 6 columns 2025-11-04 05:02:27,364 | INFO | Updated dataset after performing categorical encoding : masters_0 masters_1 gpa stats_0 stats_1 stats_2 programming_0 programming_1 programming_2 admitted id 22 0 1 3.46 0 0 1 0 1 0 0 11 1 0 3.13 1 0 0 1 0 0 1 16 1 0 3.70 1 0 0 1 0 0 1 31 0 1 3.50 1 0 0 0 1 0 1 6 0 1 3.50 0 1 0 1 0 0 1 35 1 0 3.68 0 0 1 0 1 0 1 26 0 1 3.57 1 0 0 1 0 0 1 38 0 1 2.65 1 0 0 0 1 0 1 8 rows X 11 columns 2025-11-04 05:02:27,539 | INFO | Performing transformation carried out in data preparation phase ... 2025-11-04 05:02:28,344 | INFO | Updated dataset after performing Lasso feature selection: id gpa stats_0 masters_1 programming_1 programming_2 stats_1 programming_0 stats_2 admitted masters_0 0 6 3.50 0 1 0 0 1 1 0 1 0 38 2.65 1 1 1 0 0 0 0 1 1 16 3.70 1 0 0 0 0 1 0 1 1 11 3.13 1 0 0 0 0 1 0 1 1 35 3.68 0 0 1 0 0 0 1 1 0 26 3.57 1 1 0 0 0 1 0 1 0 22 3.46 0 1 1 0 0 0 1 0 0 31 3.50 1 1 1 0 0 0 0 1 8 rows X 11 columns 2025-11-04 05:02:29,209 | INFO | Updated dataset after performing scaling on Lasso selected features : masters_0 stats_0 id masters_1 programming_1 programming_2 stats_1 programming_0 stats_2 admitted gpa 0 1 0 35 0 1 0 0 0 1 1 0.68 1 0 0 22 1 1 0 0 0 1 0 0.46 2 0 0 6 1 0 0 1 1 0 1 0.50 3 0 1 26 1 0 0 0 1 0 1 0.57 4 0 1 38 1 1 0 0 0 0 1 -0.35 5 0 1 31 1 1 0 0 0 0 1 0.50 6 1 1 11 0 0 0 0 1 0 1 0.13 7 1 1 16 0 0 0 0 1 0 1 0.70 8 rows X 11 columns 2025-11-04 05:02:29,848 | INFO | Updated dataset after performing RFE feature selection: id masters_1 programming_1 gpa admitted masters_0 0 6 1 0 3.50 1 0 38 1 1 2.65 1 1 16 0 0 3.70 1 1 11 0 0 3.13 1 1 35 0 1 3.68 1 0 26 1 0 3.57 1 0 22 1 1 3.46 0 0 31 1 1 3.50 1 8 rows X 6 columns 2025-11-04 05:02:31,038 | INFO | Updated dataset after performing scaling on RFE selected features : id r_programming_1 r_masters_1 r_masters_0 admitted r_gpa 0 35 1 0 1 1 0.68 1 22 1 1 0 0 0.46 2 6 0 1 0 1 0.50 3 26 0 1 0 1 0.57 4 38 1 1 0 1 -0.35 5 31 1 1 0 1 0.50 6 11 0 0 1 1 0.13 7 16 0 0 1 1 0.70 8 rows X 6 columns 2025-11-04 05:02:32,357 | INFO | Updated dataset after performing scaling for PCA feature selection : masters_0 id stats_0 masters_1 programming_1 programming_2 stats_1 programming_0 stats_2 admitted gpa 0 1 35 0 0 1 0 0 0 1 1 0.68 1 0 22 0 1 1 0 0 0 1 0 0.46 2 0 6 0 1 0 0 1 1 0 1 0.50 3 0 26 1 1 0 0 0 1 0 1 0.57 4 0 38 1 1 1 0 0 0 0 1 -0.35 5 0 31 1 1 1 0 0 0 0 1 0.50 6 1 11 1 0 0 0 0 1 0 1 0.13 7 1 16 1 0 0 0 0 1 0 1 0.70 8 rows X 11 columns 2025-11-04 05:02:32,819 | INFO | Updated dataset after performing PCA feature selection : id col_0 col_1 col_2 col_3 col_4 admitted 0 16 -0.076593 1.103297 -0.423129 -0.018144 -0.057361 1 1 31 -0.731760 -0.612164 -0.021522 -0.579707 -0.437160 1 2 11 -0.022041 1.112804 -0.334358 0.011074 -0.126759 1 3 22 0.060695 -1.283824 -0.394887 -0.233696 0.368872 0 4 35 0.974680 -0.477419 -0.949009 -0.359877 -0.029963 1 5 6 -0.633256 -0.261056 -0.171215 1.298137 0.126677 1 6 26 -0.999192 0.295391 0.116976 0.103424 0.352431 1 7 38 -0.650410 -0.597987 0.110855 -0.536135 -0.540648 1 8 rows X 7 columns 2025-11-04 05:02:33,168 | INFO | Data Transformation completed.⫿⫿⫿⫿⫿| 100% - 10/10 2025-11-04 05:02:33,714 | INFO | Following model is being picked for evaluation: 2025-11-04 05:02:33,715 | INFO | Model ID : XGBOOST_1 2025-11-04 05:02:33,715 | INFO | Feature Selection Method : rfe 2025-11-04 05:02:34,406 | INFO | Applying SHAP for Model Interpretation... 2025-11-04 05:02:36,451 | INFO | SHAP Analysis Completed. Feature Importance Available. /root/automl_testing/pyTeradata/teradataml/automl/model_evaluation.py:380: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown plt.show() 2025-11-04 05:02:36,583 | INFO | Prediction : id Prediction admitted prob_0 prob_1 0 35 1 1 0.123971 0.876029 1 22 0 0 0.553171 0.446829 2 6 0 1 0.553171 0.446829 3 26 0 1 0.553171 0.446829 4 38 0 1 0.553171 0.446829 5 31 0 1 0.553171 0.446829 6 11 1 1 0.123971 0.876029 7 16 1 1 0.123971 0.876029 2025-11-04 05:02:38,354 | INFO | ROC-AUC : GINI AUC 0.714286 0.428571 threshold_value tpr fpr 0 0.040816 1.0 1.0 1 0.081633 1.0 1.0 2 0.102041 1.0 1.0 3 0.122449 1.0 1.0 4 0.163265 1.0 1.0 5 0.183673 1.0 1.0 6 0.142857 1.0 1.0 7 0.061224 1.0 1.0 8 0.020408 1.0 1.0 9 0.000000 1.0 1.0 2025-11-04 05:02:38,859 | INFO | Confusion Matrix : [[1 0] [4 3]]>>> prediction.head()
id Prediction admitted prob_0 prob_1 0 35 1 1 0.123971 0.876029 1 22 0 0 0.553171 0.446829 2 6 0 1 0.553171 0.446829 3 26 0 1 0.553171 0.446829 4 38 0 1 0.553171 0.446829 5 31 0 1 0.553171 0.446829 6 11 1 1 0.123971 0.876029 7 16 1 1 0.123971 0.876029
- Generate prediction on test dataset using best performing model with preserve_columns 'True'.
>>> prediction = aml.predict(adm_test, preserve_columns=True)
2025-11-04 05:22:49,288 | INFO | Skipping data transformation as data is already transformed. 2025-11-04 05:22:49,833 | INFO | Following model is being picked for evaluation: 2025-11-04 05:22:49,834 | INFO | Model ID : XGBOOST_1 2025-11-04 05:22:49,834 | INFO | Feature Selection Method : rfe 2025-11-04 05:22:50,443 | INFO | Applying SHAP for Model Interpretation... 2025-11-04 05:22:52,404 | INFO | SHAP Analysis Completed. Feature Importance Available. /root/automl_testing/pyTeradata/teradataml/automl/model_evaluation.py:380: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown plt.show() 2025-11-04 05:22:52,535 | INFO | Prediction : id Prediction r_programming_1 r_masters_1 r_masters_0 admitted r_gpa prob_0 prob_1 0 35 1 1 0 1 1 0.68 0.123971 0.876029 1 22 0 1 1 0 0 0.46 0.553171 0.446829 2 6 0 0 1 0 1 0.50 0.553171 0.446829 3 26 0 0 1 0 1 0.57 0.553171 0.446829 4 38 0 1 1 0 1 -0.35 0.553171 0.446829 5 31 0 1 1 0 1 0.50 0.553171 0.446829 6 11 1 0 0 1 1 0.13 0.123971 0.876029 7 16 1 0 0 1 1 0.70 0.123971 0.876029 2025-11-04 05:22:54,753 | INFO | ROC-AUC : GINI AUC 0.714286 0.428571 threshold_value tpr fpr 0 0.040816 1.0 1.0 1 0.081633 1.0 1.0 2 0.102041 1.0 1.0 3 0.122449 1.0 1.0 4 0.163265 1.0 1.0 5 0.183673 1.0 1.0 6 0.142857 1.0 1.0 7 0.061224 1.0 1.0 8 0.020408 1.0 1.0 9 0.000000 1.0 1.0 2025-11-04 05:22:55,358 | INFO | Confusion Matrix : [[1 0] [4 3]]>>> prediction
id Prediction r_programming_1 r_masters_1 r_masters_0 admitted r_gpa prob_0 prob_1 0 6 0 0 1 0 1 0.50 0.553171 0.446829 1 38 0 1 1 0 1 -0.35 0.553171 0.446829 2 16 1 0 0 1 1 0.70 0.123971 0.876029 3 11 1 0 0 1 1 0.13 0.123971 0.876029 4 35 1 1 0 1 1 0.68 0.123971 0.876029 5 26 0 0 1 0 1 0.57 0.553171 0.446829 6 22 0 1 1 0 0 0.46 0.553171 0.446829 7 31 0 1 1 0 1 0.50 0.553171 0.446829
- Generate evaluation metrics on test dataset using second best performing model.
>>> performance_metrics = aml.evaluate(adm_test, 2)
2025-11-04 05:04:48,154 | INFO | Skipping data transformation as data is already transformed. 2025-11-04 05:04:48,905 | INFO | Following model is being picked for evaluation: 2025-11-04 05:04:48,906 | INFO | Model ID : SVM_2 2025-11-04 05:04:48,906 | INFO | Feature Selection Method : pca 2025-11-04 05:04:52,600 | INFO | Performance Metrics : Prediction Mapping CLASS_1 CLASS_2 Precision Recall F1 Support SeqNum 0 0 CLASS_1 1 3 0.25 1.000000 0.400000 1 1 1 CLASS_2 0 4 1.00 0.571429 0.727273 7 -------------------------------------------------------------------------------- SeqNum Metric MetricValue 0 3 Micro-Recall 0.625000 1 5 Macro-Precision 0.625000 2 6 Macro-Recall 0.785714 3 7 Macro-F1 0.563636 4 9 Weighted-Recall 0.625000 5 10 Weighted-F1 0.686364 6 8 Weighted-Precision 0.906250 7 4 Micro-F1 0.625000 8 2 Micro-Precision 0.625000 9 1 Accuracy 0.625000>>> performance_metrics
Prediction Mapping CLASS_1 CLASS_2 Precision Recall F1 Support SeqNum 0 0 CLASS_1 1 3 0.25 1.000000 0.400000 1 1 1 CLASS_2 0 4 1.00 0.571429 0.727273 7
- Get raw data with id mapping.
>>> raw_data = aml.get_raw_data_with_id(adm_test)
>>> raw_data
masters gpa stats programming admitted id 38 yes 2.65 Advanced Beginner 1 11 no 3.13 Advanced Advanced 1 16 no 3.70 Advanced Advanced 1 22 yes 3.46 Novice Beginner 0 35 no 3.68 Novice Beginner 1 26 yes 3.57 Advanced Advanced 1 6 yes 3.50 Beginner Advanced 1 31 yes 3.50 Advanced Beginner 1
- Get transformed data for all feature selection.
>>> transformed_data = aml.get_transformed_data(adm_test)
>>> transformed_data
{'lasso_test': masters_0 stats_0 id masters_1 programming_1 programming_2 stats_1 programming_0 stats_2 admitted gpa 0 0 0 6 1 0 0 1 1 0 1 0.50 1 0 1 38 1 1 0 0 0 0 1 -0.35 2 1 1 16 0 0 0 0 1 0 1 0.70 3 1 1 11 0 0 0 0 1 0 1 0.13 4 1 0 35 0 1 0 0 0 1 1 0.68 5 0 1 26 1 0 0 0 1 0 1 0.57 6 0 0 22 1 1 0 0 0 1 0 0.46 7 0 1 31 1 1 0 0 0 0 1 0.50, 'rfe_test': id r_programming_1 r_masters_1 r_masters_0 admitted r_gpa 0 6 0 1 0 1 0.50 1 38 1 1 0 1 -0.35 2 16 0 0 1 1 0.70 3 11 0 0 1 1 0.13 4 35 1 0 1 1 0.68 5 26 0 1 0 1 0.57 6 22 1 1 0 0 0.46 7 31 1 1 0 1 0.50, 'pca_test': id col_0 col_1 col_2 col_3 col_4 admitted 0 11 -0.022041 1.112804 -0.334358 0.011074 -0.126759 1 1 35 0.974680 -0.477419 -0.949009 -0.359877 -0.029963 1 2 6 -0.633256 -0.261056 -0.171215 1.298137 0.126677 1 3 26 -0.999192 0.295391 0.116976 0.103424 0.352431 1 4 38 -0.650410 -0.597987 0.110855 -0.536135 -0.540648 1 5 22 0.060695 -1.283824 -0.394887 -0.233696 0.368872 0 6 31 -0.731760 -0.612164 -0.021522 -0.579707 -0.437160 1 7 16 -0.076593 1.103297 -0.423129 -0.018144 -0.057361 1} - Get list of failed models during training.
>>> failed_models=aml.get_error_logs()
>>> failed_models
Empty DataFrame Columns: [MODEL_ID, ERROR_MSG] Index: []