Run AutoClassifier with specified id column alongside lasso feature selection - Example 11: Run AutoClassifier with specified id column alongside lasso feature selection - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
VMware
Enterprise
IntelliFlex
Product
Teradata Package for Python
Release Number
20.00
Published
March 2025
ft:locale
en-US
ft:lastEdition
2026-02-20
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage
This example trains AutoClassifier for admission dataset with id column specified during training and get prediction on test data with id column mapping. Run AutoML to get the best performing model with the following specifications:
  • Set max_models to 6.
  • Include only xgboost and svm for model training.
  • Opt for verbose level 2 to get detailed logging.
  • Set flag for lasso feature selection.
  • Set flag for raise error if any issue rather than skipping step.
  1. Load the online fraud data.
    >>> load_example_data("dataframe", "admissions_train")
    >>> admissions = DataFrame('admissions_train')
    >>> admissions_sample = admissions.sample(frac = [0.8, 0.2])
    >>> adm_train= admissions_sample[admissions_sample['sampleid'] == 1].drop('sampleid', axis=1)
    >>> adm_test = admissions_sample[admissions_sample['sampleid'] == 2].drop('sampleid', axis=1)
  2. Create an AutoML instance.
    >>> aml = AutoClassifier(verbose=2, 
    >>>                      include=['xgboost','svm'],
    >>>                      max_models=6, 
    >>>                      enable_lasso=True, 
    >>>                      raise_errors=True)
  3. Fit the data.
    >>> aml.fit(data=adm_train, target_column="admitted", id_column='id')
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 04:58:37,709 | INFO     | Feature Exploration started
    2025-11-04 04:58:37,709 | INFO     | Data Overview:
    2025-11-04 04:58:37,730 | INFO     | Total Rows in the data: 32
    2025-11-04 04:58:37,771 | INFO     | Total Columns in the data: 6
    2025-11-04 04:58:38,565 | INFO     | Column Summary:
        ColumnName                         Datatype  NonNullCount  NullCount  BlankCount  ZeroCount  PositiveCount  NegativeCount  NullPercentage  NonNullPercentage
    0        stats  VARCHAR(30) CHARACTER SET LATIN            32          0         0.0        NaN            NaN            NaN             0.0              100.0
    1  programming  VARCHAR(30) CHARACTER SET LATIN            32          0         0.0        NaN            NaN            NaN             0.0              100.0
    2           id                          INTEGER            32          0         NaN        0.0           32.0            0.0             0.0              100.0
    3     admitted                          INTEGER            32          0         NaN       13.0           19.0            0.0             0.0              100.0
    4          gpa                            FLOAT            32          0         NaN        0.0           32.0            0.0             0.0              100.0
    5      masters   VARCHAR(5) CHARACTER SET LATIN            32          0         0.0        NaN            NaN            NaN             0.0              100.0
    2025-11-04 04:58:39,360 | INFO     | Statistics of Data:
      ATTRIBUTE            StatName  StatValue
    0  admitted             MAXIMUM   1.000000
    1  admitted  STANDARD DEVIATION   0.498991
    2  admitted     PERCENTILES(25)   0.000000
    3  admitted     PERCENTILES(50)   1.000000
    4        id               COUNT  32.000000
    5        id             MINIMUM   1.000000
    6       gpa               COUNT  32.000000
    7       gpa             MINIMUM   1.870000
    8       gpa             MAXIMUM   4.000000
    9       gpa                MEAN   3.577500
    2025-11-04 04:58:39,506 | INFO     | Categorical Columns with their Distinct values:
    ColumnName                DistinctValueCount
    masters                   2
    stats                     3
    programming               3
    2025-11-04 04:58:41,775 | INFO     | No Futile columns found.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           2025-11-04 04:58:44,668 | INFO     | Columns with outlier percentage :-
      ColumnName  OutlierPercentage
    0        gpa              9.375
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 04:58:44,988 | INFO     | Feature Engineering started ...
    2025-11-04 04:58:44,988 | INFO     | Handling duplicate records present in dataset ...
    2025-11-04 04:58:45,124 | INFO     | Analysis completed. No action taken.
    2025-11-04 04:58:45,124 | INFO     | Total time to handle duplicate records: 0.14 sec
    2025-11-04 04:58:45,124 | INFO     | Handling less significant features from data ...
    2025-11-04 04:58:46,999 | INFO     | Analysis indicates all categorical columns are significant. No action Needed.
    2025-11-04 04:58:46,999 | INFO     | Total time to handle less significant features: 1.87 sec
    2025-11-04 04:58:46,999 | INFO     | Handling Date Features ...
    2025-11-04 04:58:46,999 | INFO     | Analysis Completed. Dataset does not contain any feature related to dates. No action needed.
    2025-11-04 04:58:46,999 | INFO     | Total time to handle date features: 0.00 sec
    2025-11-04 04:58:46,999 | INFO     | Checking Missing values in dataset ...
    2025-11-04 04:58:48,116 | INFO     | Analysis Completed. No Missing Values Detected.
    2025-11-04 04:58:48,116 | INFO     | Total time to find missing values in data: 1.12 sec
    2025-11-04 04:58:48,116 | INFO     | Imputing Missing Values ...
    2025-11-04 04:58:48,116 | INFO     | Analysis completed. No imputation required.
    2025-11-04 04:58:48,117 | INFO     | Time taken to perform imputation: 0.00 sec
    2025-11-04 04:58:48,117 | INFO     | Performing encoding for categorical columns ...
    2025-11-04 04:58:50,561 | INFO     | ONE HOT Encoding these Columns:
    ['masters', 'stats', 'programming']
    2025-11-04 04:58:50,561 | INFO     | Sample of dataset after performing one hot encoding:
        masters_0  masters_1   gpa  stats_0  stats_1  stats_2  programming_0  programming_1  programming_2  admitted
    id
    13          1          0  4.00        1        0        0              0              0              1         1
    7           0          1  2.33        0        0        1              0              0              1         1
    39          0          1  3.75        1        0        0              0              1              0         0
    19          0          1  1.98        1        0        0              1              0              0         0
    15          0          1  4.00        1        0        0              1              0              0         1
    5           1          0  3.44        0        0        1              0              0              1         0
    24          1          0  1.87        1        0        0              0              0              1         1
    3           1          0  3.70        0        0        1              0              1              0         1
    36          1          0  3.00        1        0        0              0              0              1         0
    40          0          1  3.95        0        0        1              0              1              0         0
    32 rows X 11 columns
    2025-11-04 04:58:50,651 | INFO     | Time taken to encode the columns: 2.53 sec
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 04:58:50,651 | INFO     | Data preparation started ...
    2025-11-04 04:58:50,652 | INFO     | Outlier preprocessing ...
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           2025-11-04 04:58:53,575 | INFO     | Columns with outlier percentage :-
      ColumnName  OutlierPercentage
    0        gpa              9.375
    2025-11-04 04:58:54,006 | INFO     | median inplace of outliers:
    ['gpa']
    2025-11-04 04:58:56,068 | INFO     | Sample of dataset after performing MEDIAN inplace:
        masters_0  masters_1    gpa  stats_0  stats_1  stats_2  programming_0  programming_1  programming_2  admitted
    id
    13          1          0  4.000        1        0        0              0              0              1         1
    24          1          0  3.755        1        0        0              0              0              1         1
    3           1          0  3.700        0        0        1              0              1              0         1
    19          0          1  3.755        1        0        0              1              0              0         0
    15          0          1  4.000        1        0        0              1              0              0         1
    40          0          1  3.950        0        0        1              0              1              0         0
    7           0          1  3.755        0        0        1              0              0              1         1
    39          0          1  3.750        1        0        0              0              1              0         0
    36          1          0  3.000        1        0        0              0              0              1         0
    5           1          0  3.440        0        0        1              0              0              1         0
    32 rows X 11 columns
    2025-11-04 04:58:56,179 | INFO     | Time Taken by Outlier processing: 5.53 sec
    2025-11-04 04:58:56,180 | INFO     | Checking imbalance data ...
    2025-11-04 04:58:56,242 | INFO     | Imbalance Not Found.
    2025-11-04 04:58:56,963 | INFO     | Feature selection using lasso ...
    2025-11-04 04:58:57,613 | INFO     | feature selected by lasso:
    ['gpa', 'masters_0', 'stats_0', 'masters_1', 'programming_1', 'programming_2', 'stats_1', 'programming_0', 'stats_2']
    2025-11-04 04:58:57,614 | INFO     | Total time taken by feature selection: 0.65 sec
    2025-11-04 04:58:57,896 | INFO     | Scaling Features of lasso data ...
    2025-11-04 04:58:59,359 | INFO     | columns that will be scaled:
    ['gpa']
    2025-11-04 04:59:01,328 | INFO     | Dataset sample after scaling:
       masters_0  stats_0  id  masters_1  programming_1  programming_2  stats_1  programming_0  stats_2  admitted    gpa
    0          1        0   3          0              1              0        0              0        1         1  0.700
    1          1        0   5          0              0              1        0              0        1         0  0.440
    2          0        0   7          1              0              1        0              0        1         1  0.755
    3          1        0   8          0              0              0        1              1        0         1  0.600
    4          1        1  10          0              0              0        0              1        0         1  0.710
    5          1        0  12          0              0              1        0              0        1         1  0.650
    6          1        1   9          0              0              0        0              1        0         1  0.820
    7          0        0   4          1              0              1        1              0        0         1  0.500
    8          0        0   2          1              1              0        1              0        0         0  0.760
    9          0        0   1          1              1              0        1              0        0         0  0.950
    32 rows X 11 columns
    2025-11-04 04:59:01,916 | INFO     | Total time taken by feature scaling: 4.02 sec
    2025-11-04 04:59:01,917 | INFO     | Feature selection using rfe ...
    2025-11-04 04:59:14,194 | INFO     | feature selected by RFE:
    ['masters_0', 'masters_1', 'programming_1', 'gpa']
    2025-11-04 04:59:14,195 | INFO     | Total time taken by feature selection: 12.28 sec
    2025-11-04 04:59:14,494 | INFO     | Scaling Features of rfe data ...
    2025-11-04 04:59:15,445 | INFO     | columns that will be scaled:
    ['r_gpa']
    2025-11-04 04:59:17,293 | INFO     | Dataset sample after scaling:
       id  r_programming_1  r_masters_1  r_masters_0  admitted  r_gpa
    0   3                1            0            1         1  0.700
    1   5                0            0            1         0  0.440
    2   7                0            1            0         1  0.755
    3   8                0            0            1         1  0.600
    4  10                0            0            1         1  0.710
    5  12                0            0            1         1  0.650
    6   9                0            0            1         1  0.820
    7   4                0            1            0         1  0.500
    8   2                1            1            0         0  0.760
    9   1                1            1            0         0  0.950
    32 rows X 6 columns
    2025-11-04 04:59:17,760 | INFO     | Total time taken by feature scaling: 3.27 sec
    2025-11-04 04:59:17,760 | INFO     | Scaling Features of pca data ...
    2025-11-04 04:59:18,644 | INFO     | columns that will be scaled:
    ['gpa']
    2025-11-04 04:59:20,557 | INFO     | Dataset sample after scaling:
       masters_0  stats_0  id  masters_1  programming_1  programming_2  stats_1  programming_0  stats_2  admitted    gpa
    0          1        0   3          0              1              0        0              0        1         1  0.700
    1          0        1  34          1              1              0        0              0        0         0  0.850
    2          1        1  13          0              0              1        0              0        0         1  1.000
    3          0        0  40          1              1              0        0              0        1         0  0.950
    4          0        1  39          1              1              0        0              0        0         0  0.750
    5          0        1  19          1              0              0        0              1        0         0  0.755
    6          1        1  36          0              0              1        0              0        0         0  0.000
    7          0        1  15          1              0              0        0              1        0         1  1.000
    8          0        0   7          1              0              1        0              0        1         1  0.755
    9          1        1  17          0              0              0        0              1        0         1  0.830
    32 rows X 11 columns
    2025-11-04 04:59:21,204 | INFO     | Total time taken by feature scaling: 3.44 sec
    2025-11-04 04:59:21,204 | INFO     | Dimension Reduction using pca ...
    2025-11-04 04:59:21,824 | INFO     | PCA columns:
    ['col_0', 'col_1', 'col_2', 'col_3', 'col_4']
    2025-11-04 04:59:21,824 | INFO     | Total time taken by PCA: 0.62 sec
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 04:59:22,220 | INFO     | Model Training started ...
    2025-11-04 04:59:22,305 | INFO     | Hyperparameters used for model training:
    2025-11-04 04:59:22,305 | INFO     | Model: svm
    2025-11-04 04:59:22,306 | INFO     | Hyperparameters: {'response_column': 'admitted', 'name': 'svm', 'model_type': 'Classification', 'lambda1': (0.001, 0.02, 0.1), 'alpha': (0.15, 0.85), 'tolerance': (0.001, 0.01), 'learning_rate': 'OPTIMAL', 'initial_eta': (0.05, 0.1), 'momentum': (0.65, 0.8, 0.95), 'nesterov': True, 'intercept': True, 'iter_num_no_change': (5, 10, 50), 'local_sgd_iterations ': (10, 20), 'iter_max': (300, 200, 400), 'batch_size': (10, 50, 60, 80)}
    2025-11-04 04:59:22,306 | INFO     | Total number of models for svm: 5184
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    2025-11-04 04:59:22,306 | INFO     | Model: xgboost
    2025-11-04 04:59:22,307 | INFO     | Hyperparameters: {'response_column': 'admitted', 'name': 'xgboost', 'model_type': 'Classification', 'column_sampling': (1, 0.6), 'min_impurity': (0.0, 0.1, 0.2), 'lambda1': (1.0, 0.01, 0.1), 'shrinkage_factor': (0.5, 0.1, 0.3), 'max_depth': (5, 6, 8, 10), 'min_node_size': (1, 2, 3), 'iter_num': (10, 20, 30), 'num_boosted_trees': (-1, 5, 10), 'seed': 42}
    2025-11-04 04:59:22,308 | INFO     | Total number of models for xgboost: 5832
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    2025-11-04 04:59:22,308 | INFO     | Performing hyperparameter tuning ...
                                                                                                                                                                 2025-11-04 04:59:24,073 | INFO     | Model training for svm
    2025-11-04 04:59:34,260 | INFO     | ----------------------------------------------------------------------------------------------------
                                                                                                                                                                 2025-11-04 04:59:34,261 | INFO     | Model training for xgboost
    2025-11-04 04:59:45,689 | INFO     | ----------------------------------------------------------------------------------------------------
    2025-11-04 04:59:45,692 | INFO     | Leaderboard
       RANK   MODEL_ID FEATURE_SELECTION  ACCURACY  MICRO-PRECISION  ...  MACRO-RECALL  MACRO-F1  WEIGHTED-PRECISION  WEIGHTED-RECALL  WEIGHTED-F1
    0     1  XGBOOST_1               rfe  0.857143         0.857143  ...      0.875000  0.857143            0.892857         0.857143     0.857143
    1     2      SVM_2               pca  0.714286         0.714286  ...      0.708333  0.708333            0.714286         0.714286     0.714286
    2     3      SVM_1               rfe  0.714286         0.714286  ...      0.708333  0.708333            0.714286         0.714286     0.714286
    3     4  XGBOOST_0             lasso  0.714286         0.714286  ...      0.666667  0.650000            0.809524         0.714286     0.671429
    4     5      SVM_0             lasso  0.571429         0.571429  ...      0.583333  0.571429            0.595238         0.571429     0.571429
    5     6  XGBOOST_2               pca  0.571429         0.571429  ...      0.500000  0.363636            0.326531         0.571429     0.415584
    [6 rows x 13 columns]
    6 rows X 13 columns
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 14/14
  4. Display model leaderboard.
    >>> aml.leaderboard()
       RANK   MODEL_ID FEATURE_SELECTION  ACCURACY  MICRO-PRECISION  ...  MACRO-RECALL  MACRO-F1  WEIGHTED-PRECISION  WEIGHTED-RECALL  WEIGHTED-F1
    0     1  XGBOOST_1               rfe  0.857143         0.857143  ...      0.875000  0.857143            0.892857         0.857143     0.857143
    1     2      SVM_2               pca  0.714286         0.714286  ...      0.708333  0.708333            0.714286         0.714286     0.714286
    2     3      SVM_1               rfe  0.714286         0.714286  ...      0.708333  0.708333            0.714286         0.714286     0.714286
    3     4  XGBOOST_0             lasso  0.714286         0.714286  ...      0.666667  0.650000            0.809524         0.714286     0.671429
    4     5      SVM_0             lasso  0.571429         0.571429  ...      0.583333  0.571429            0.595238         0.571429     0.571429
    5     6  XGBOOST_2               pca  0.571429         0.571429  ...      0.500000  0.363636            0.326531         0.571429     0.415584
    [6 rows x 13 columns]
  5. Display best performing model.
    >>> aml.leader()
       RANK   MODEL_ID FEATURE_SELECTION  ACCURACY  MICRO-PRECISION  ...  MACRO-RECALL  MACRO-F1  WEIGHTED-PRECISION  WEIGHTED-RECALL  WEIGHTED-F1
    0     1  XGBOOST_1               rfe  0.857143         0.857143  ...         0.875  0.857143            0.892857         0.857143     0.857143
    [1 rows x 13 columns]
  6. Display model hyperparameters for rank 1.
    >>> aml.model_hyperparameters(rank=1)
    {'response_column': 'admitted', 
      'name': 'xgboost', 
      'model_type': 'Classification', 
      'column_sampling': 0.6, 
      'min_impurity': 0.1, 
      'lambda1': 0.01, 
      'shrinkage_factor': 0.5, 
      'max_depth': 6, 
      'min_node_size': 2, 
      'iter_num': 30, 
      'num_boosted_trees': 5, 
      'seed': 42, 
      'persist': False, 
      'output_prob': True, 
      'output_responses': ['1', '0'], 
      'max_models': 1}
  7. Generate prediction on test dataset using best performing model.
    >>> prediction = aml.predict(adm_test)
    2025-11-04 05:02:25,938 | INFO     | Data Transformation started ...
    2025-11-04 05:02:25,939 | INFO     | Performing transformation carried out in feature engineering phase ...
    2025-11-04 05:02:25,940 | INFO     | Updated dataset after performing target column transformation :
       masters   gpa     stats programming  admitted
    id
    38     yes  2.65  Advanced    Beginner         1
    31     yes  3.50  Advanced    Beginner         1
    6      yes  3.50  Beginner    Advanced         1
    11      no  3.13  Advanced    Advanced         1
    16      no  3.70  Advanced    Advanced         1
    26     yes  3.57  Advanced    Advanced         1
    35      no  3.68    Novice    Beginner         1
    22     yes  3.46    Novice    Beginner         0
    8 rows X 6 columns
    2025-11-04 05:02:27,364 | INFO     | Updated dataset after performing categorical encoding :
        masters_0  masters_1   gpa  stats_0  stats_1  stats_2  programming_0  programming_1  programming_2  admitted
    id
    22          0          1  3.46        0        0        1              0              1              0         0
    11          1          0  3.13        1        0        0              1              0              0         1
    16          1          0  3.70        1        0        0              1              0              0         1
    31          0          1  3.50        1        0        0              0              1              0         1
    6           0          1  3.50        0        1        0              1              0              0         1
    35          1          0  3.68        0        0        1              0              1              0         1
    26          0          1  3.57        1        0        0              1              0              0         1
    38          0          1  2.65        1        0        0              0              1              0         1
    8 rows X 11 columns
    2025-11-04 05:02:27,539 | INFO     | Performing transformation carried out in data preparation phase ...
    2025-11-04 05:02:28,344 | INFO     | Updated dataset after performing Lasso feature selection:
               id   gpa  stats_0  masters_1  programming_1  programming_2  stats_1  programming_0  stats_2  admitted
    masters_0
    0           6  3.50        0          1              0              0        1              1        0         1
    0          38  2.65        1          1              1              0        0              0        0         1
    1          16  3.70        1          0              0              0        0              1        0         1
    1          11  3.13        1          0              0              0        0              1        0         1
    1          35  3.68        0          0              1              0        0              0        1         1
    0          26  3.57        1          1              0              0        0              1        0         1
    0          22  3.46        0          1              1              0        0              0        1         0
    0          31  3.50        1          1              1              0        0              0        0         1
    8 rows X 11 columns
    2025-11-04 05:02:29,209 | INFO     | Updated dataset after performing scaling on Lasso selected features :
       masters_0  stats_0  id  masters_1  programming_1  programming_2  stats_1  programming_0  stats_2  admitted   gpa
    0          1        0  35          0              1              0        0              0        1         1  0.68
    1          0        0  22          1              1              0        0              0        1         0  0.46
    2          0        0   6          1              0              0        1              1        0         1  0.50
    3          0        1  26          1              0              0        0              1        0         1  0.57
    4          0        1  38          1              1              0        0              0        0         1 -0.35
    5          0        1  31          1              1              0        0              0        0         1  0.50
    6          1        1  11          0              0              0        0              1        0         1  0.13
    7          1        1  16          0              0              0        0              1        0         1  0.70
    8 rows X 11 columns
    2025-11-04 05:02:29,848 | INFO     | Updated dataset after performing RFE feature selection:
               id  masters_1  programming_1   gpa  admitted
    masters_0
    0           6          1              0  3.50         1
    0          38          1              1  2.65         1
    1          16          0              0  3.70         1
    1          11          0              0  3.13         1
    1          35          0              1  3.68         1
    0          26          1              0  3.57         1
    0          22          1              1  3.46         0
    0          31          1              1  3.50         1
    8 rows X 6 columns
    2025-11-04 05:02:31,038 | INFO     | Updated dataset after performing scaling on RFE selected features :
       id  r_programming_1  r_masters_1  r_masters_0  admitted  r_gpa
    0  35                1            0            1         1   0.68
    1  22                1            1            0         0   0.46
    2   6                0            1            0         1   0.50
    3  26                0            1            0         1   0.57
    4  38                1            1            0         1  -0.35
    5  31                1            1            0         1   0.50
    6  11                0            0            1         1   0.13
    7  16                0            0            1         1   0.70
    8 rows X 6 columns
    2025-11-04 05:02:32,357 | INFO     | Updated dataset after performing scaling for PCA feature selection :
       masters_0  id  stats_0  masters_1  programming_1  programming_2  stats_1  programming_0  stats_2  admitted   gpa
    0          1  35        0          0              1              0        0              0        1         1  0.68
    1          0  22        0          1              1              0        0              0        1         0  0.46
    2          0   6        0          1              0              0        1              1        0         1  0.50
    3          0  26        1          1              0              0        0              1        0         1  0.57
    4          0  38        1          1              1              0        0              0        0         1 -0.35
    5          0  31        1          1              1              0        0              0        0         1  0.50
    6          1  11        1          0              0              0        0              1        0         1  0.13
    7          1  16        1          0              0              0        0              1        0         1  0.70
    8 rows X 11 columns
    2025-11-04 05:02:32,819 | INFO     | Updated dataset after performing PCA feature selection :
       id     col_0     col_1     col_2     col_3     col_4  admitted
    0  16 -0.076593  1.103297 -0.423129 -0.018144 -0.057361         1
    1  31 -0.731760 -0.612164 -0.021522 -0.579707 -0.437160         1
    2  11 -0.022041  1.112804 -0.334358  0.011074 -0.126759         1
    3  22  0.060695 -1.283824 -0.394887 -0.233696  0.368872         0
    4  35  0.974680 -0.477419 -0.949009 -0.359877 -0.029963         1
    5   6 -0.633256 -0.261056 -0.171215  1.298137  0.126677         1
    6  26 -0.999192  0.295391  0.116976  0.103424  0.352431         1
    7  38 -0.650410 -0.597987  0.110855 -0.536135 -0.540648         1
    8 rows X 7 columns
    2025-11-04 05:02:33,168 | INFO     | Data Transformation completed.⫿⫿⫿⫿⫿| 100% - 10/10
    2025-11-04 05:02:33,714 | INFO     | Following model is being picked for evaluation:
    2025-11-04 05:02:33,715 | INFO     | Model ID : XGBOOST_1
    2025-11-04 05:02:33,715 | INFO     | Feature Selection Method : rfe
    2025-11-04 05:02:34,406 | INFO     | Applying SHAP for Model Interpretation...
    2025-11-04 05:02:36,451 | INFO     | SHAP Analysis Completed. Feature Importance Available.
    /root/automl_testing/pyTeradata/teradataml/automl/model_evaluation.py:380: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
      plt.show()
    2025-11-04 05:02:36,583 | INFO     | Prediction :
       id  Prediction  admitted    prob_0    prob_1
    0  35           1         1  0.123971  0.876029
    1  22           0         0  0.553171  0.446829
    2   6           0         1  0.553171  0.446829
    3  26           0         1  0.553171  0.446829
    4  38           0         1  0.553171  0.446829
    5  31           0         1  0.553171  0.446829
    6  11           1         1  0.123971  0.876029
    7  16           1         1  0.123971  0.876029
    2025-11-04 05:02:38,354 | INFO     | ROC-AUC :
                  GINI
    AUC
    0.714286  0.428571
       threshold_value  tpr  fpr
    0         0.040816  1.0  1.0
    1         0.081633  1.0  1.0
    2         0.102041  1.0  1.0
    3         0.122449  1.0  1.0
    4         0.163265  1.0  1.0
    5         0.183673  1.0  1.0
    6         0.142857  1.0  1.0
    7         0.061224  1.0  1.0
    8         0.020408  1.0  1.0
    9         0.000000  1.0  1.0
    2025-11-04 05:02:38,859 | INFO     | Confusion Matrix :
    [[1 0]
     [4 3]]
    >>> prediction.head()
       id  Prediction  admitted    prob_0    prob_1
    0  35           1         1  0.123971  0.876029
    1  22           0         0  0.553171  0.446829
    2   6           0         1  0.553171  0.446829
    3  26           0         1  0.553171  0.446829
    4  38           0         1  0.553171  0.446829
    5  31           0         1  0.553171  0.446829
    6  11           1         1  0.123971  0.876029
    7  16           1         1  0.123971  0.876029
  8. Generate prediction on test dataset using best performing model with preserve_columns 'True'.
    >>> prediction = aml.predict(adm_test, preserve_columns=True)
    2025-11-04 05:22:49,288 | INFO     | Skipping data transformation as data is already transformed.
    2025-11-04 05:22:49,833 | INFO     | Following model is being picked for evaluation:
    2025-11-04 05:22:49,834 | INFO     | Model ID : XGBOOST_1
    2025-11-04 05:22:49,834 | INFO     | Feature Selection Method : rfe
    2025-11-04 05:22:50,443 | INFO     | Applying SHAP for Model Interpretation...
    2025-11-04 05:22:52,404 | INFO     | SHAP Analysis Completed. Feature Importance Available.
    /root/automl_testing/pyTeradata/teradataml/automl/model_evaluation.py:380: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
      plt.show()
    2025-11-04 05:22:52,535 | INFO     | Prediction :
       id  Prediction  r_programming_1  r_masters_1  r_masters_0  admitted  r_gpa    prob_0    prob_1
    0  35           1                1            0            1         1   0.68  0.123971  0.876029
    1  22           0                1            1            0         0   0.46  0.553171  0.446829
    2   6           0                0            1            0         1   0.50  0.553171  0.446829
    3  26           0                0            1            0         1   0.57  0.553171  0.446829
    4  38           0                1            1            0         1  -0.35  0.553171  0.446829
    5  31           0                1            1            0         1   0.50  0.553171  0.446829
    6  11           1                0            0            1         1   0.13  0.123971  0.876029
    7  16           1                0            0            1         1   0.70  0.123971  0.876029
    2025-11-04 05:22:54,753 | INFO     | ROC-AUC :
                  GINI
    AUC
    0.714286  0.428571
       threshold_value  tpr  fpr
    0         0.040816  1.0  1.0
    1         0.081633  1.0  1.0
    2         0.102041  1.0  1.0
    3         0.122449  1.0  1.0
    4         0.163265  1.0  1.0
    5         0.183673  1.0  1.0
    6         0.142857  1.0  1.0
    7         0.061224  1.0  1.0
    8         0.020408  1.0  1.0
    9         0.000000  1.0  1.0
    2025-11-04 05:22:55,358 | INFO     | Confusion Matrix :
    [[1 0]
     [4 3]]
    >>> prediction
       id  Prediction  r_programming_1  r_masters_1  r_masters_0  admitted  r_gpa    prob_0    prob_1
    0   6           0                0            1            0         1   0.50  0.553171  0.446829
    1  38           0                1            1            0         1  -0.35  0.553171  0.446829
    2  16           1                0            0            1         1   0.70  0.123971  0.876029
    3  11           1                0            0            1         1   0.13  0.123971  0.876029
    4  35           1                1            0            1         1   0.68  0.123971  0.876029
    5  26           0                0            1            0         1   0.57  0.553171  0.446829
    6  22           0                1            1            0         0   0.46  0.553171  0.446829
    7  31           0                1            1            0         1   0.50  0.553171  0.446829
  9. Generate evaluation metrics on test dataset using second best performing model.
    >>> performance_metrics = aml.evaluate(adm_test, 2)
    2025-11-04 05:04:48,154 | INFO     | Skipping data transformation as data is already transformed.
    2025-11-04 05:04:48,905 | INFO     | Following model is being picked for evaluation:
    2025-11-04 05:04:48,906 | INFO     | Model ID : SVM_2
    2025-11-04 05:04:48,906 | INFO     | Feature Selection Method : pca
    2025-11-04 05:04:52,600 | INFO     | Performance Metrics :
           Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
    SeqNum
    0               0  CLASS_1        1        3       0.25  1.000000  0.400000        1
    1               1  CLASS_2        0        4       1.00  0.571429  0.727273        7
    --------------------------------------------------------------------------------
       SeqNum              Metric  MetricValue
    0       3        Micro-Recall     0.625000
    1       5     Macro-Precision     0.625000
    2       6        Macro-Recall     0.785714
    3       7            Macro-F1     0.563636
    4       9     Weighted-Recall     0.625000
    5      10         Weighted-F1     0.686364
    6       8  Weighted-Precision     0.906250
    7       4            Micro-F1     0.625000
    8       2     Micro-Precision     0.625000
    9       1            Accuracy     0.625000
    >>> performance_metrics
           Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
    SeqNum
    0               0  CLASS_1        1        3       0.25  1.000000  0.400000        1
    1               1  CLASS_2        0        4       1.00  0.571429  0.727273        7
  10. Get raw data with id mapping.
    >>> raw_data = aml.get_raw_data_with_id(adm_test)
    >>> raw_data
       masters   gpa     stats programming  admitted
    id
    38     yes  2.65  Advanced    Beginner         1
    11      no  3.13  Advanced    Advanced         1
    16      no  3.70  Advanced    Advanced         1
    22     yes  3.46    Novice    Beginner         0
    35      no  3.68    Novice    Beginner         1
    26     yes  3.57  Advanced    Advanced         1
    6      yes  3.50  Beginner    Advanced         1
    31     yes  3.50  Advanced    Beginner         1
  11. Get transformed data for all feature selection.
    >>> transformed_data = aml.get_transformed_data(adm_test)
    >>> transformed_data
    {'lasso_test':    masters_0  stats_0  id  masters_1  programming_1  programming_2  stats_1  programming_0  stats_2  admitted   gpa
    0          0        0   6          1              0              0        1              1        0         1  0.50
    1          0        1  38          1              1              0        0              0        0         1 -0.35
    2          1        1  16          0              0              0        0              1        0         1  0.70
    3          1        1  11          0              0              0        0              1        0         1  0.13
    4          1        0  35          0              1              0        0              0        1         1  0.68
    5          0        1  26          1              0              0        0              1        0         1  0.57
    6          0        0  22          1              1              0        0              0        1         0  0.46
    7          0        1  31          1              1              0        0              0        0         1  0.50, 'rfe_test':    id  r_programming_1  r_masters_1  r_masters_0  admitted  r_gpa
    0   6                0            1            0         1   0.50
    1  38                1            1            0         1  -0.35
    2  16                0            0            1         1   0.70
    3  11                0            0            1         1   0.13
    4  35                1            0            1         1   0.68
    5  26                0            1            0         1   0.57
    6  22                1            1            0         0   0.46
    7  31                1            1            0         1   0.50, 'pca_test':    id     col_0     col_1     col_2     col_3     col_4  admitted
    0  11 -0.022041  1.112804 -0.334358  0.011074 -0.126759         1
    1  35  0.974680 -0.477419 -0.949009 -0.359877 -0.029963         1
    2   6 -0.633256 -0.261056 -0.171215  1.298137  0.126677         1
    3  26 -0.999192  0.295391  0.116976  0.103424  0.352431         1
    4  38 -0.650410 -0.597987  0.110855 -0.536135 -0.540648         1
    5  22  0.060695 -1.283824 -0.394887 -0.233696  0.368872         0
    6  31 -0.731760 -0.612164 -0.021522 -0.579707 -0.437160         1
    7  16 -0.076593  1.103297 -0.423129 -0.018144 -0.057361         1}
  12. Get list of failed models during training.
    >>> failed_models=aml.get_error_logs()
    >>> failed_models
    Empty DataFrame
    Columns: [MODEL_ID, ERROR_MSG]
    Index: []