Run AutoML for clustering with early stopping timer - Example 10: Run AutoML for clustering with early stopping timer - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
VMware
Enterprise
IntelliFlex
Product
Teradata Package for Python
Release Number
20.00
Published
March 2025
ft:locale
en-US
ft:lastEdition
2025-12-05
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage
This example segregates the mall customer data based on different factors. Run AutoML to get the best performing model with the following specifications:
  • Set task_type to "Clustering"
  • Set early stopping criteria, i.e., time limit to 300 sec.
  • Opt for verbose level 2 to get detailed logging.
  1. Load the dataset.
    >>> load_example_data('teradataml','Mall_Customers')
    >>> cluster_df = DataFrame("Mall_Customers")
    >>> cluster_df_sample = cluster_df.sample(frac = [0.8, 0.2])
    >>> cluster_train = cluster_df_sample[cluster_df_sample['sampleid'] == 1].drop('sampleid', axis=1)
    >>> cluster_test = cluster_df_sample[cluster_df_sample['sampleid'] == 2].drop('sampleid', axis=1)
  2. Create an AutoML instance.
    >>> cl = AutoML(verbose=2,
    >>>              task_type = "Clustering"  
    >>>              max_runtime_secs=300,
    >>>              seed=42)
  3. Fit the data.
    >>> cl.fit(cluster_train)
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 05:49:57,298 | INFO     | Feature Exploration started
    2025-11-04 05:49:57,299 | INFO     | Data Overview:
    2025-11-04 05:49:57,403 | INFO     | Total Rows in the data: 160
    2025-11-04 05:49:57,425 | INFO     | Total Columns in the data: 4
    2025-11-04 05:49:58,883 | INFO     | Column Summary:
           ColumnName                         Datatype  NonNullCount  NullCount  BlankCount  ZeroCount  PositiveCount  NegativeCount  NullPercentage  NonNullPercentage
    0  Spending_Score                            FLOAT           160          0         NaN        0.0          160.0            0.0             0.0              100.0
    1          Gender  VARCHAR(40) CHARACTER SET LATIN           160          0         0.0        NaN            NaN            NaN             0.0              100.0
    2   Annual_Income                            FLOAT           160          0         NaN        0.0          160.0            0.0             0.0              100.0
    3             Age                          INTEGER           160          0         NaN        0.0          160.0            0.0             0.0              100.0
    2025-11-04 05:49:59,676 | INFO     | Statistics of Data:
           ATTRIBUTE            StatName   StatValue
    0            Age             MAXIMUM   70.000000
    1            Age  STANDARD DEVIATION   14.279857
    2            Age     PERCENTILES(25)   28.750000
    3  Annual_Income               COUNT  160.000000
    4  Annual_Income             MAXIMUM  137.000000
    5  Annual_Income                MEAN   60.468750
    6  Annual_Income  STANDARD DEVIATION   25.635059
    7  Annual_Income     PERCENTILES(25)   42.750000
    8  Annual_Income             MINIMUM   15.000000
    9            Age                MEAN   39.187500
    2025-11-04 05:49:59,825 | INFO     | Categorical Columns with their Distinct values:
    ColumnName                DistinctValueCount
    Gender                    2
    2025-11-04 05:50:01,521 | INFO     | No Futile columns found.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           2025-11-04 05:50:04,517 | INFO     | Columns with outlier percentage :-
          ColumnName  OutlierPercentage
    0  Annual_Income              0.625
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 05:50:04,725 | INFO     | Feature Engineering started ...
    2025-11-04 05:50:04,725 | INFO     | Handling duplicate records present in dataset ...
    2025-11-04 05:50:04,845 | INFO     | Analysis completed. No action taken.
    2025-11-04 05:50:04,846 | INFO     | Total time to handle duplicate records: 0.12 sec
    2025-11-04 05:50:04,846 | INFO     | Handling less significant features from data ...
    2025-11-04 05:50:07,666 | INFO     | Analysis indicates all categorical columns are significant. No action Needed.
    2025-11-04 05:50:07,666 | INFO     | Total time to handle less significant features: 2.82 sec
    2025-11-04 05:50:07,667 | INFO     | Handling Date Features ...
    2025-11-04 05:50:07,667 | INFO     | Analysis Completed. Dataset does not contain any feature related to dates. No action needed.
    2025-11-04 05:50:07,667 | INFO     | Total time to handle date features: 0.00 sec
    2025-11-04 05:50:07,668 | INFO     | Checking Missing values in dataset ...
    2025-11-04 05:50:08,872 | INFO     | Analysis Completed. No Missing Values Detected.
    2025-11-04 05:50:08,873 | INFO     | Total time to find missing values in data: 1.20 sec
    2025-11-04 05:50:08,873 | INFO     | Imputing Missing Values ...
    2025-11-04 05:50:08,873 | INFO     | Analysis completed. No imputation required.
    2025-11-04 05:50:08,873 | INFO     | Time taken to perform imputation: 0.00 sec
    2025-11-04 05:50:08,873 | INFO     | Performing encoding for categorical columns ...
    2025-11-04 05:50:11,407 | INFO     | ONE HOT Encoding these Columns:
    ['Gender']
    2025-11-04 05:50:11,408 | INFO     | Sample of dataset after performing one hot encoding:
              Gender_1  Age  Annual_Income  Spending_Score  automl_id
    Gender_0
    0                1   20           21.0            66.0         13
    0                1   39           71.0            75.0         19
    0                1   67           19.0            14.0         21
    0                1   18           59.0            41.0         23
    0                1   28           77.0            97.0         27
    0                1   37           20.0            13.0         29
    0                1   21           15.0            81.0         25
    0                1   22           20.0            79.0         15
    0                1   18           48.0            59.0         11
    0                1   33          113.0             8.0          7
    160 rows X 6 columns
    2025-11-04 05:50:11,499 | INFO     | Time taken to encode the columns: 2.63 sec
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 05:50:11,500 | INFO     | Data preparation started ...
    2025-11-04 05:50:11,500 | INFO     | Outlier preprocessing ...
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           2025-11-04 05:50:14,378 | INFO     | Columns with outlier percentage :-
          ColumnName  OutlierPercentage
    0  Annual_Income              0.625
    2025-11-04 05:50:14,732 | INFO     | median inplace of outliers:
    ['Annual_Income']
    2025-11-04 05:50:16,731 | INFO     | Sample of dataset after performing MEDIAN inplace:
              Gender_1  Age  Annual_Income  Spending_Score  automl_id
    Gender_0
    0                1   67           19.0            14.0         21
    0                1   28           77.0            97.0         27
    0                1   37           20.0            13.0         29
    0                1   58           88.0            15.0         31
    0                1   27           88.0            69.0         37
    0                1   59           71.0            11.0         39
    0                1   36           87.0            10.0         35
    0                1   18           59.0            41.0         23
    0                1   39           71.0            75.0         19
    0                1   20           21.0            66.0         13
    160 rows X 6 columns
    2025-11-04 05:50:16,847 | INFO     | Time Taken by Outlier processing: 5.35 sec
    2025-11-04 05:50:17,587 | INFO     | Scaling Features of non_pca data ...
    2025-11-04 05:50:18,042 | INFO     | columns that will be scaled:
    ['Age', 'Annual_Income', 'Spending_Score']
    2025-11-04 05:50:19,988 | INFO     | Dataset sample after scaling:
       Gender_0  automl_id  Gender_1       Age  Annual_Income  Spending_Score
    0         1         20         0  1.883540       0.121104       -0.034080
    1         1         26         0 -0.504912      -0.483156       -0.150098
    2         1         28         0 -1.066900      -1.611108        1.010079
    3         0         13         1 -1.347895      -1.570824        0.584681
    4         0         21         1  1.953789      -1.651392       -1.426293
    5         0         23         1 -1.488392      -0.040032       -0.382133
    6         0         27         1 -0.785906       0.685080        1.783531
    7         0         29         1 -0.153669      -1.611108       -1.464966
    8         0         19         1 -0.013172       0.443376        0.932734
    9         1         22         0 -1.277646       0.080820       -0.343461
    160 rows X 6 columns
    2025-11-04 05:50:20,493 | INFO     | Total time taken by feature scaling: 2.91 sec
    2025-11-04 05:50:20,494 | INFO     | Scaling Features of pca data ...
    2025-11-04 05:50:20,982 | INFO     | columns that will be scaled:
    ['Age', 'Annual_Income', 'Spending_Score']
    2025-11-04 05:50:22,924 | INFO     | Dataset sample after scaling:
       Gender_0  automl_id  Gender_1       Age  Annual_Income  Spending_Score
    0         0         21         1  1.953789      -1.651392       -1.426293
    1         0         27         1 -0.785906       0.685080        1.783531
    2         0         29         1 -0.153669      -1.611108       -1.464966
    3         1         12         0  1.040557      -1.288836       -1.426293
    4         1         20         0  1.883540       0.121104       -0.034080
    5         1         22         0 -1.277646       0.080820       -0.343461
    6         1         26         0 -0.504912      -0.483156       -0.150098
    7         1         28         0 -1.066900      -1.611108        1.010079
    8         1         18         0 -0.645409       1.128204        1.358133
    9         0         23         1 -1.488392      -0.040032       -0.382133
    160 rows X 6 columns
    2025-11-04 05:50:23,408 | INFO     | Total time taken by feature scaling: 2.91 sec
    2025-11-04 05:50:23,410 | INFO     | Dimension Reduction using pca ...
    2025-11-04 05:50:24,026 | INFO     | PCA columns:
    ['col_0', 'col_1', 'col_2', 'col_3']
    2025-11-04 05:50:24,027 | INFO     | Total time taken by PCA: 0.62 sec
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 05:50:24,447 | INFO     | Model Training started ...
    2025-11-04 05:50:24,447 | INFO     | Hyperparameters used for model training:
    2025-11-04 05:50:24,447 | INFO     | Model: kmeans
    2025-11-04 05:50:24,447 | INFO     | Hyperparameter Grid: {'n_clusters': (2, 3, 4, 5, 6, 7, 8, 9, 10), 'init': ('k-means++', 'random'), 'n_init': (5, 10), 'max_iter': (100, 200), 'tol': (0.001, 0.01), 'algorithm': ('lloyd', 'elkan')}
    2025-11-04 05:50:24,447 | INFO     | Total number of models for kmeans: 288
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    2025-11-04 05:50:24,448 | INFO     | Model: gaussianmixture
    2025-11-04 05:50:24,448 | INFO     | Hyperparameter Grid: {'n_components': (2, 3, 4, 5, 6, 7, 8, 9, 10), 'covariance_type': ('full', 'tied', 'diag', 'spherical'), 'max_iter': (100, 300)}
    2025-11-04 05:50:24,448 | INFO     | Total number of models for gaussianmixture: 72
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    2025-11-04 05:50:24,448 | INFO     | Performing hyperparameter tuning ...
    2025-11-04 05:50:25,416 | INFO     | Model training for kmeans
    2025-11-04 05:53:05,913 | INFO     | ----------------------------------------------------------------------------------------------------
    2025-11-04 05:53:05,914 | INFO     | Model training for gaussianmixture
    2025-11-04 05:55:46,170 | INFO     | ----------------------------------------------------------------------------------------------------
    2025-11-04 05:55:46,173 | INFO     | Leaderboard
        RANK            MODEL_ID FEATURE_SELECTION  SILHOUETTE    CALINSKI    DAVIES
    0      1            KMEANS_3           non_pca    0.606146  427.546072  0.520285
    1      2            KMEANS_1           non_pca    0.606146  427.546072  0.520285
    2      3            KMEANS_9           non_pca    0.606146  427.546072  0.520285
    3      4           KMEANS_13           non_pca    0.606146  427.546072  0.520285
    4      5           KMEANS_15           non_pca    0.606146  427.546072  0.520285
    5      6            KMEANS_7           non_pca    0.606146  427.546072  0.520285
    6      7            KMEANS_0               pca    0.606146  427.546072  0.520285
    7      8            KMEANS_4               pca    0.606146  427.546072  0.520285
    8      9            KMEANS_6               pca    0.606146  427.546072  0.520285
    9     10           KMEANS_10               pca    0.606146  427.546072  0.520285
    10    11           KMEANS_14               pca    0.606146  427.546072  0.520285
    11    12           KMEANS_16               pca    0.606146  427.546072  0.520285
    12    13           KMEANS_11           non_pca    0.606050  427.522226  0.520512
    13    14            KMEANS_2               pca    0.606050  427.522226  0.520512
    14    15            KMEANS_8               pca    0.606050  427.522226  0.520512
    15    16           KMEANS_12               pca    0.606050  427.522226  0.520512
    16    17  GAUSSIANMIXTURE_16               pca    0.602497  420.199827  0.520709
    17    18   GAUSSIANMIXTURE_8               pca    0.574867  384.583206  0.542858
    18    19  GAUSSIANMIXTURE_10               pca    0.574867  384.583206  0.542858
    19    20  GAUSSIANMIXTURE_13           non_pca    0.574867  384.583206  0.542858
    20    21  GAUSSIANMIXTURE_12               pca    0.570246  378.581039  0.546390
    21    22  GAUSSIANMIXTURE_14               pca    0.570246  378.581039  0.546390
    22    23  GAUSSIANMIXTURE_15           non_pca    0.570246  378.581039  0.546390
    23    24   GAUSSIANMIXTURE_0               pca    0.561870  357.772553  0.558782
    24    25   GAUSSIANMIXTURE_1           non_pca    0.561870  357.772553  0.558782
    25    26   GAUSSIANMIXTURE_3           non_pca    0.550086  339.655383  0.570790
    26    27   GAUSSIANMIXTURE_5           non_pca    0.550086  339.655383  0.570790
    27    28   GAUSSIANMIXTURE_2               pca    0.550086  339.655383  0.570790
    28    29   GAUSSIANMIXTURE_4               pca    0.550086  339.655383  0.570790
    29    30   GAUSSIANMIXTURE_6               pca    0.550086  339.655383  0.570790
    30    31   GAUSSIANMIXTURE_7           non_pca    0.550086  339.655383  0.570790
    31    32  GAUSSIANMIXTURE_17           non_pca    0.012148    6.313395  4.208284
    32 rows X 6 columns
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 13/13
  4. Display leaderboard.
    >>> cl.leaderboard()
        RANK            MODEL_ID FEATURE_SELECTION  SILHOUETTE    CALINSKI    DAVIES
    0      1            KMEANS_3           non_pca    0.606146  427.546072  0.520285
    1      2            KMEANS_1           non_pca    0.606146  427.546072  0.520285
    2      3            KMEANS_9           non_pca    0.606146  427.546072  0.520285
    3      4           KMEANS_13           non_pca    0.606146  427.546072  0.520285
    4      5           KMEANS_15           non_pca    0.606146  427.546072  0.520285
    5      6            KMEANS_7           non_pca    0.606146  427.546072  0.520285
    6      7            KMEANS_0               pca    0.606146  427.546072  0.520285
    7      8            KMEANS_4               pca    0.606146  427.546072  0.520285
    8      9            KMEANS_6               pca    0.606146  427.546072  0.520285
    9     10           KMEANS_10               pca    0.606146  427.546072  0.520285
    10    11           KMEANS_14               pca    0.606146  427.546072  0.520285
    11    12           KMEANS_16               pca    0.606146  427.546072  0.520285
    12    13           KMEANS_11           non_pca    0.606050  427.522226  0.520512
    13    14            KMEANS_2               pca    0.606050  427.522226  0.520512
    14    15            KMEANS_8               pca    0.606050  427.522226  0.520512
    15    16           KMEANS_12               pca    0.606050  427.522226  0.520512
    16    17  GAUSSIANMIXTURE_16               pca    0.602497  420.199827  0.520709
    17    18   GAUSSIANMIXTURE_8               pca    0.574867  384.583206  0.542858
    18    19  GAUSSIANMIXTURE_10               pca    0.574867  384.583206  0.542858
    19    20  GAUSSIANMIXTURE_13           non_pca    0.574867  384.583206  0.542858
    20    21  GAUSSIANMIXTURE_12               pca    0.570246  378.581039  0.546390
    21    22  GAUSSIANMIXTURE_14               pca    0.570246  378.581039  0.546390
    22    23  GAUSSIANMIXTURE_15           non_pca    0.570246  378.581039  0.546390
    23    24   GAUSSIANMIXTURE_0               pca    0.561870  357.772553  0.558782
    24    25   GAUSSIANMIXTURE_1           non_pca    0.561870  357.772553  0.558782
    25    26   GAUSSIANMIXTURE_3           non_pca    0.550086  339.655383  0.570790
    26    27   GAUSSIANMIXTURE_5           non_pca    0.550086  339.655383  0.570790
    27    28   GAUSSIANMIXTURE_2               pca    0.550086  339.655383  0.570790
    28    29   GAUSSIANMIXTURE_4               pca    0.550086  339.655383  0.570790
    29    30   GAUSSIANMIXTURE_6               pca    0.550086  339.655383  0.570790
    30    31   GAUSSIANMIXTURE_7           non_pca    0.550086  339.655383  0.570790
    31    32  GAUSSIANMIXTURE_17           non_pca    0.012148    6.313395  4.208284
  5. Display best performing model.
    >>> cl.leader()
       RANK  MODEL_ID FEATURE_SELECTION  SILHOUETTE    CALINSKI    DAVIES
    0     1  KMEANS_3           non_pca    0.606146  427.546072  0.520285
  6. Display model hyperparameters for rank 1.
    >>> cl.model_hyperparameters(rank=1)
    {'n_clusters': 2, 
      'init': 'k-means++', 
      'n_init': 5, 
      'max_iter': 100, 
      'tol': 0.001, 
      'algorithm': 'lloyd'}
    
  7. Generate prediction on test dataset using best performing model.
    >>> prediction = cl.predict(cluster_test)
    2025-11-04 05:59:57,369 | INFO     | Data Transformation started ...
    2025-11-04 05:59:57,370 | INFO     | Performing transformation carried out in feature engineering phase ...
    2025-11-04 05:59:59,599 | INFO     | Updated dataset after performing categorical encoding :
              Gender_1  Age  Annual_Income  Spending_Score  automl_id
    Gender_0
    0                1   48           54.0            46.0         13
    0                1   40           71.0            95.0         19
    0                1   32           73.0            73.0         21
    1                0   23           29.0            87.0          6
    1                0   27           40.0            47.0         12
    1                0   31           43.0            54.0         14
    1                0   60           50.0            49.0         18
    1                0   22           57.0            55.0         20
    1                0   31           39.0            61.0         10
    0                1   65           63.0            52.0         15
    40 rows X 6 columns
    2025-11-04 05:59:59,732 | INFO     | Performing transformation carried out in data preparation phase ...
    2025-11-04 06:00:00,933 | INFO     | Updated dataset after performing scaling for PCA feature selection :
       Gender_0  automl_id  Gender_1       Age  Annual_Income  Spending_Score
    0         0         37         1 -1.418143      -1.812528       -0.459479
    1         0         17         1 -1.418143       0.161388       -0.188771
    2         0          9         1  1.391800      -0.684576        0.352646
    3         0         21         1 -0.504912       0.523944        0.855389
    4         1         38         0 -0.504912       1.490760        1.358133
    5         0         29         1 -0.504912       3.102120       -1.271603
    6         1         28         0 -0.785906       0.644796       -0.420806
    7         1         34         0  0.548817       0.725364       -1.348948
    8         0         25         1 -0.504912       1.087920        0.468663
    9         0         11         1 -1.418143      -0.563724        0.159283
    40 rows X 6 columns
    2025-11-04 06:00:01,272 | INFO     | Updated dataset after performing PCA feature selection :
       automl_id     col_0     col_1     col_2     col_3
    0         27  1.037314  1.275501 -0.981280  0.547659
    1         21 -0.945170  0.537739  0.232898  0.810681
    2         19 -1.141862  0.415388  1.227232  0.872484
    3         25 -0.717832  1.123170 -0.020854  0.716752
    4         37 -0.477862 -1.711002 -1.414514  0.991762
    5         38 -1.469477  1.292374  0.665142 -0.657746
    6         11 -1.011856 -0.517949 -0.939467  0.901320
    7         29  0.346639  3.239135 -1.179548  0.352675
    8         17 -0.824468  0.224307 -1.160862  0.792440
    9         28 -0.349081  0.570877 -0.811958 -0.697328
    10 rows X 5 columns
    2025-11-04 06:00:01,640 | INFO     | Running Non-PCA feature selection transformation for clustering...
    2025-11-04 06:00:02,211 | INFO     | Updated dataset after performing Non-PCA scaling for clustering:
       Gender_0  automl_id  Gender_1       Age  Annual_Income  Spending_Score
    0         0         37         1 -1.418143      -1.812528       -0.459479
    1         0         17         1 -1.418143       0.161388       -0.188771
    2         0          9         1  1.391800      -0.684576        0.352646
    3         0         21         1 -0.504912       0.523944        0.855389
    4         1         38         0 -0.504912       1.490760        1.358133
    5         0         29         1 -0.504912       3.102120       -1.271603
    6         1         28         0 -0.785906       0.644796       -0.420806
    7         1         34         0  0.548817       0.725364       -1.348948
    8         0         25         1 -0.504912       1.087920        0.468663
    9         0         11         1 -1.418143      -0.563724        0.159283
    40 rows X 6 columns
    2025-11-04 06:00:02,762 | INFO     | Data Transformation completed.█████| 100% - 9/9
    2025-11-04 06:00:02,763 | INFO     | Following model is being picked for evaluation of clustering:
    2025-11-04 06:00:02,763 | INFO     | Model ID : KMEANS_3
    2025-11-04 06:00:02,763 | INFO     | Feature Selection Method : non_pca
    2025-11-04 06:00:06,022 | INFO     | Visualizing Clusters for interpretability...
       Gender_0  automl_id  Gender_1       Age  Annual_Income  Spending_Score
    0         0         27         1  0.057077       1.087920       -1.464966
    1         0         21         1 -0.504912       0.523944        0.855389
    2         0         19         1  0.057077       0.443376        1.706186
    3         0         25         1 -0.504912       1.087920        0.468663
    4         0         37         1 -1.418143      -1.812528       -0.459479
    2025-11-04 06:00:06,079 | INFO     | Selection Criteria: Top 2 High Variance Features
    2025-11-04 06:00:06,079 | INFO     | Selected Features: Annual_Income, Spending_Score
    /root/automl_testing/pyTeradata/teradataml/automl/model_evaluation.py:488: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
      plt.show()
    2025-11-04 06:00:08,555 | INFO     | Cluster Assignment:
       automl_id  cluster_assignment
    0         35                   0
    1         24                   0
    2         13                   0
    3          7                   0
    4         15                   0
    5         27                   0
    6         12                   0
    7         14                   0
    8          2                   0
    9         34                   0
    >>> prediction
       automl_id  cluster_assignment
    0         35                   0
    1         24                   0
    2         13                   0
    3          7                   0
    4         15                   0
    5         27                   0
    6         12                   0
    7         14                   0
    8          2                   0
    9         34                   0
  8. Generate prediction on test dataset using third best performing model.
    >>> prediction = cl.predict(cluster_test,3)
    2025-11-04 06:00:45,735 | INFO     | Skipping data transformation as data is already transformed.
    2025-11-04 06:00:45,736 | INFO     | Following model is being picked for evaluation of clustering:
    2025-11-04 06:00:45,736 | INFO     | Model ID : KMEANS_9
    2025-11-04 06:00:45,736 | INFO     | Feature Selection Method : non_pca
    2025-11-04 06:00:49,040 | INFO     | Visualizing Clusters for interpretability...
       Gender_0  automl_id  Gender_1       Age  Annual_Income  Spending_Score
    0         0         27         1  0.057077       1.087920       -1.464966
    1         0         21         1 -0.504912       0.523944        0.855389
    2         0         19         1  0.057077       0.443376        1.706186
    3         0         25         1 -0.504912       1.087920        0.468663
    4         0         37         1 -1.418143      -1.812528       -0.459479
    2025-11-04 06:00:49,099 | INFO     | Selection Criteria: Top 2 High Variance Features
    2025-11-04 06:00:49,099 | INFO     | Selected Features: Annual_Income, Spending_Score
    /root/automl_testing/pyTeradata/teradataml/automl/model_evaluation.py:488: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
      plt.show()
    2025-11-04 06:00:49,237 | INFO     | Cluster Assignment:
       automl_id  cluster_assignment
    0         35                   1
    1         24                   1
    2         13                   1
    3          7                   1
    4         15                   1
    5         27                   1
    6         12                   1
    7         14                   1
    8          2                   1
    9         34                   1
    >>> prediction
       automl_id  cluster_assignment
    0         44                   1
    1         42                   1
    2         40                   1
    3         18                   1
    4          5                   1
    5         30                   1
    6         36                   1
    7         16                   1
    8         26                   1
    9          9                   1