This example predict whether passenger aboard the RMS Titanic survived or not based on different factors.
Run AutoClassifier to get the best performing model out of available models with following specifications:
- Use all default models except 'knn' and 'svm'.
- Set early stopping timer to 100 sec.
- Opt for verbose level 2 to get detailed log.
- Load data and split it to train and test datasets.
- Load the example data and create teradataml DataFrame.
>>> load_example_data("teradataml", "titanic")>>> titanic = DataFrame.from_table("titanic") - Perform sampling to get 80% for training and 20% for testing.
>>> titanic_sample = titanic.sample(frac = [0.8, 0.2])
- Fetch train and test data.
>>> titanic_train= titanic_sample[titanic_sample['sampleid'] == 1].drop('sampleid', axis=1)>>> titanic_test = titanic_sample[titanic_sample['sampleid'] == 2].drop('sampleid', axis=1)
- Load the example data and create teradataml DataFrame.
- Create an AutoClassifier instance.
>>> aml = AutoClassifier(exclude='knn' 'svm', verbose=2, max_runtime_secs=100) - Fit the data.
>>> aml.fit(titanic_train, 'survived')
1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 01:44:50,473 | INFO | Feature Exploration started 2025-11-04 01:44:50,473 | INFO | Data Overview: 2025-11-04 01:44:50,619 | INFO | Total Rows in the data: 713 2025-11-04 01:44:50,661 | INFO | Total Columns in the data: 12 2025-11-04 01:44:51,316 | INFO | Column Summary: ColumnName Datatype NonNullCount NullCount BlankCount ZeroCount PositiveCount NegativeCount NullPercentage NonNullPercentage 0 embarked VARCHAR(20) CHARACTER SET LATIN 711 2 0.0 NaN NaN NaN 0.280505 99.719495 1 sibsp INTEGER 713 0 NaN 486.0 227.0 0.0 0.000000 100.000000 2 survived INTEGER 713 0 NaN 445.0 268.0 0.0 0.000000 100.000000 3 pclass INTEGER 713 0 NaN 0.0 713.0 0.0 0.000000 100.000000 4 age INTEGER 573 140 NaN 5.0 568.0 0.0 19.635344 80.364656 5 ticket VARCHAR(20) CHARACTER SET LATIN 713 0 0.0 NaN NaN NaN 0.000000 100.000000 6 sex VARCHAR(20) CHARACTER SET LATIN 713 0 0.0 NaN NaN NaN 0.000000 100.000000 7 parch INTEGER 713 0 NaN 540.0 173.0 0.0 0.000000 100.000000 8 name VARCHAR(1000) CHARACTER SET LATIN 713 0 0.0 NaN NaN NaN 0.000000 100.000000 9 passenger INTEGER 713 0 NaN 0.0 713.0 0.0 0.000000 100.000000 10 cabin VARCHAR(20) CHARACTER SET LATIN 154 559 0.0 NaN NaN NaN 78.401122 21.598878 11 fare FLOAT 713 0 NaN 10.0 703.0 0.0 0.000000 100.000000 2025-11-04 01:44:52,148 | INFO | Statistics of Data: ATTRIBUTE StatName StatValue 0 survived MAXIMUM 1.000000 1 survived STANDARD DEVIATION 0.484688 2 survived PERCENTILES(25) 0.000000 3 survived PERCENTILES(50) 0.000000 4 fare COUNT 713.000000 5 fare MINIMUM 0.000000 6 fare MAXIMUM 512.329200 7 fare MEAN 32.204125 8 fare STANDARD DEVIATION 51.384597 9 fare PERCENTILES(25) 7.925000 2025-11-04 01:44:52,477 | INFO | Categorical Columns with their Distinct values: ColumnName DistinctValueCount name 713 sex 2 ticket 565 cabin 124 embarked 3 2025-11-04 01:44:55,757 | INFO | Futile columns in dataset: ColumnName 0 ticket 1 name 2025-11-04 01:44:59,620 | INFO | Columns with outlier percentage :- ColumnName OutlierPercentage 0 sibsp 5.329593 1 fare 12.762973 2 age 20.476858 3 parch 24.263675 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 01:45:00,004 | INFO | Feature Engineering started ... 2025-11-04 01:45:00,005 | INFO | Handling duplicate records present in dataset ... 2025-11-04 01:45:00,183 | INFO | Analysis completed. No action taken. 2025-11-04 01:45:00,183 | INFO | Total time to handle duplicate records: 0.18 sec 2025-11-04 01:45:00,184 | INFO | Handling less significant features from data ... 2025-11-04 01:45:07,148 | INFO | Removing Futile columns: ['ticket', 'name'] 2025-11-04 01:45:07,148 | INFO | Sample of Data after removing Futile columns: passenger survived pclass sex age sibsp parch fare cabin embarked automl_id 0 795 0 3 male 25.0 0 0 7.8958 None S 12 1 591 0 3 male 35.0 0 0 7.1250 None S 11 2 387 0 3 male 1.0 5 2 46.9000 None S 15 3 530 0 2 male 23.0 2 1 11.5000 None S 5 4 570 1 3 male 32.0 0 0 7.8542 None S 13 5 40 1 3 female 14.0 1 0 11.2417 None C 6 6 162 1 2 female 40.0 0 0 15.7500 None S 10 7 631 1 1 male 80.0 0 0 30.0000 A23 S 14 8 305 0 3 male NaN 0 0 8.0500 None S 9 9 122 0 3 male NaN 0 0 8.0500 None S 7 713 rows X 11 columns 2025-11-04 01:45:07,948 | INFO | Total time to handle less significant features: 7.76 sec 2025-11-04 01:45:07,948 | INFO | Handling Date Features ... 2025-11-04 01:45:07,948 | INFO | Analysis Completed. Dataset does not contain any feature related to dates. No action needed. 2025-11-04 01:45:07,948 | INFO | Total time to handle date features: 0.00 sec 2025-11-04 01:45:07,948 | INFO | Checking Missing values in dataset ... 2025-11-04 01:45:09,342 | INFO | Columns with their missing values: cabin: 559 age: 140 embarked: 2 2025-11-04 01:45:10,421 | INFO | Deleting rows of these columns for handling missing values: ['embarked'] 2025-11-04 01:45:11,071 | INFO | Sample of dataset after removing 2 rows: passenger survived pclass sex age sibsp parch fare cabin embarked automl_id 0 795 0 3 male 25.0 0 0 7.8958 None S 12 1 162 1 2 female 40.0 0 0 15.7500 None S 10 2 631 1 1 male 80.0 0 0 30.0000 A23 S 14 3 530 0 2 male 23.0 2 1 11.5000 None S 5 4 570 1 3 male 32.0 0 0 7.8542 None S 13 5 122 0 3 male NaN 0 0 8.0500 None S 7 6 591 0 3 male 35.0 0 0 7.1250 None S 11 7 387 0 3 male 1.0 5 2 46.9000 None S 15 8 305 0 3 male NaN 0 0 8.0500 None S 9 9 40 1 3 female 14.0 1 0 11.2417 None C 6 711 rows X 11 columns 2025-11-04 01:45:12,023 | INFO | Dropping these columns for handling missing values: ['cabin'] 2025-11-04 01:45:12,023 | INFO | Sample of dataset after removing 1 columns: passenger survived pclass sex age sibsp parch fare embarked automl_id 0 387 0 3 male 1.0 5 2 46.9000 S 15 1 326 1 1 female 36.0 0 0 135.6333 C 8 2 795 0 3 male 25.0 0 0 7.8958 S 12 3 530 0 2 male 23.0 2 1 11.5000 S 5 4 570 1 3 male 32.0 0 0 7.8542 S 13 5 40 1 3 female 14.0 1 0 11.2417 C 6 6 162 1 2 female 40.0 0 0 15.7500 S 10 7 631 1 1 male 80.0 0 0 30.0000 S 14 8 305 0 3 male NaN 0 0 8.0500 S 9 9 469 0 3 male NaN 0 0 7.7250 Q 4 711 rows X 10 columns 2025-11-04 01:45:12,885 | INFO | Total time to find missing values in data: 4.94 sec 2025-11-04 01:45:12,885 | INFO | Imputing Missing Values ... 2025-11-04 01:45:13,164 | INFO | Columns with their imputation method: age: mean 2025-11-04 01:45:15,562 | INFO | Sample of dataset after Imputation: passenger survived pclass sex age sibsp parch fare embarked automl_id 0 570 1 3 male 32 0 0 7.8542 S 13 1 591 0 3 male 35 0 0 7.1250 S 11 2 387 0 3 male 1 5 2 46.9000 S 15 3 469 0 3 male 29 0 0 7.7250 Q 4 4 795 0 3 male 25 0 0 7.8958 S 12 5 40 1 3 female 14 1 0 11.2417 C 6 6 162 1 2 female 40 0 0 15.7500 S 10 7 631 1 1 male 80 0 0 30.0000 S 14 8 326 1 1 female 36 0 0 135.6333 C 8 9 122 0 3 male 29 0 0 8.0500 S 7 711 rows X 10 columns 2025-11-04 01:45:16,322 | INFO | Time taken to perform imputation: 3.44 sec 2025-11-04 01:45:16,323 | INFO | Performing encoding for categorical columns ... 2025-11-04 01:45:24,872 | INFO | ONE HOT Encoding these Columns: ['sex', 'embarked'] 2025-11-04 01:45:24,873 | INFO | Sample of dataset after performing one hot encoding: survived pclass sex_0 sex_1 age sibsp parch fare embarked_0 embarked_1 embarked_2 automl_id passenger 387 0 3 0 1 1 5 2 46.900 0 0 1 15 448 1 1 0 1 34 0 0 26.550 0 0 1 23 713 1 1 0 1 48 1 0 52.000 0 0 1 27 753 0 3 0 1 33 0 0 9.500 0 0 1 31 59 1 2 1 0 5 1 2 27.750 0 0 1 39 324 1 2 1 0 22 1 1 29.000 0 0 1 43 263 0 1 0 1 52 1 1 79.650 0 0 1 35 856 1 3 1 0 18 0 1 9.350 0 0 1 19 591 0 3 0 1 35 0 0 7.125 0 0 1 11 122 0 3 0 1 29 0 0 8.050 0 0 1 7 711 rows X 13 columns 2025-11-04 01:45:24,966 | INFO | Time taken to encode the columns: 8.64 sec 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 01:45:24,967 | INFO | Data preparation started ... 2025-11-04 01:45:24,967 | INFO | Outlier preprocessing ... 2025-11-04 01:45:28,116 | INFO | Columns with outlier percentage :- ColumnName OutlierPercentage 0 sibsp 5.344585 1 fare 12.517581 2 age 7.172996 3 parch 24.331927 2025-11-04 01:45:28,547 | INFO | Deleting rows of these columns: ['age', 'sibsp'] 2025-11-04 01:45:30,611 | INFO | Sample of dataset after removing outlier rows: survived pclass sex_0 sex_1 age sibsp parch fare embarked_0 embarked_1 embarked_2 automl_id passenger 795 0 3 0 1 25 0 0 7.8958 0 0 1 12 509 0 3 0 1 28 0 0 22.5250 0 0 1 24 774 0 3 0 1 29 0 0 7.2250 1 0 0 28 366 0 3 0 1 30 0 0 7.2500 0 0 1 32 242 1 3 1 0 29 1 0 15.5000 0 1 0 48 38 0 3 0 1 21 0 0 8.0500 0 0 1 52 467 0 2 0 1 29 0 0 0.0000 0 0 1 40 652 1 2 1 0 18 0 1 23.0000 0 0 1 20 326 1 1 1 0 36 0 0 135.6333 1 0 0 8 469 0 3 0 1 29 0 0 7.7250 0 1 0 4 629 rows X 13 columns 2025-11-04 01:45:30,722 | INFO | median inplace of outliers: ['parch', 'fare'] 2025-11-04 01:45:32,775 | INFO | Sample of dataset after performing MEDIAN inplace: survived pclass sex_0 sex_1 age sibsp parch fare embarked_0 embarked_1 embarked_2 automl_id passenger 856 1 3 1 0 18 0 0 9.3500 0 0 1 19 713 1 1 0 1 48 1 0 52.0000 0 0 1 27 753 0 3 0 1 33 0 0 9.5000 0 0 1 31 263 0 1 0 1 52 1 0 13.0000 0 0 1 35 324 1 2 1 0 22 1 0 29.0000 0 0 1 43 385 0 3 0 1 29 0 0 7.8958 0 0 1 47 59 1 2 1 0 5 1 0 27.7500 0 0 1 39 448 1 1 0 1 34 0 0 26.5500 0 0 1 23 591 0 3 0 1 35 0 0 7.1250 0 0 1 11 122 0 3 0 1 29 0 0 8.0500 0 0 1 7 629 rows X 13 columns 2025-11-04 01:45:32,932 | INFO | Time Taken by Outlier processing: 7.96 sec 2025-11-04 01:45:32,932 | INFO | Checking imbalance data ... 2025-11-04 01:45:33,014 | INFO | Imbalance Not Found. 2025-11-04 01:45:33,973 | INFO | Feature selection using rfe ... 2025-11-04 01:45:53,930 | INFO | feature selected by RFE: ['passenger', 'age', 'sex_1', 'pclass', 'sex_0', 'embarked_0', 'embarked_1', 'sibsp', 'embarked_2', 'fare'] 2025-11-04 01:45:53,931 | INFO | Total time taken by feature selection: 19.96 sec 2025-11-04 01:45:54,256 | INFO | Scaling Features of rfe data ... 2025-11-04 01:45:55,870 | INFO | columns that will be scaled: ['r_passenger', 'r_age', 'r_pclass', 'r_sibsp', 'r_fare'] 2025-11-04 01:45:58,415 | INFO | Dataset sample after scaling: r_embarked_0 survived r_sex_1 r_sex_0 automl_id r_embarked_1 r_embarked_2 r_passenger r_age r_pclass r_sibsp r_fare 0 1 1 0 1 6 0 0 0.043870 0.215686 1.0 0.5 0.197223 1 1 1 0 1 8 0 0 0.365579 0.647059 0.0 0.0 0.228070 2 0 0 1 0 9 0 1 0.341957 0.509804 1.0 0.0 0.141228 3 0 1 0 1 10 0 1 0.181102 0.725490 0.5 0.0 0.276316 4 0 0 1 0 12 0 1 0.893138 0.431373 1.0 0.0 0.138523 5 0 1 1 0 13 0 1 0.640045 0.568627 1.0 0.0 0.137793 6 0 0 1 0 11 0 1 0.663667 0.627451 1.0 0.0 0.125000 7 0 0 1 0 7 0 1 0.136108 0.509804 1.0 0.0 0.141228 8 0 0 1 0 5 0 1 0.595051 0.392157 0.5 1.0 0.201754 9 0 0 1 0 4 1 0 0.526434 0.509804 1.0 0.0 0.135526 629 rows X 12 columns 2025-11-04 01:45:59,105 | INFO | Total time taken by feature scaling: 4.85 sec 2025-11-04 01:45:59,106 | INFO | Scaling Features of pca data ... 2025-11-04 01:46:00,204 | INFO | columns that will be scaled: ['passenger', 'pclass', 'age', 'sibsp', 'fare'] 2025-11-04 01:46:02,578 | INFO | Dataset sample after scaling: survived parch embarked_0 sex_1 sex_0 embarked_1 automl_id embarked_2 passenger pclass age sibsp fare 0 0 0 0 1 0 0 18 1 0.249719 1.0 0.941176 0.0 0.141228 1 0 0 0 1 0 0 9 1 0.341957 1.0 0.509804 0.0 0.141228 2 1 0 0 1 0 0 13 1 0.640045 1.0 0.568627 0.0 0.137793 3 0 0 0 1 0 1 4 0 0.526434 1.0 0.509804 0.0 0.135526 4 0 0 0 1 0 0 12 1 0.893138 1.0 0.431373 0.0 0.138523 5 0 0 0 1 0 0 7 1 0.136108 1.0 0.509804 0.0 0.141228 6 0 0 0 1 0 0 11 1 0.663667 1.0 0.627451 0.0 0.125000 7 1 0 0 0 1 0 19 1 0.961755 1.0 0.294118 0.0 0.164035 8 1 0 1 0 1 0 8 0 0.365579 0.0 0.647059 0.0 0.228070 9 0 0 0 1 0 0 5 1 0.595051 0.5 0.392157 1.0 0.201754 629 rows X 13 columns 2025-11-04 01:46:03,190 | INFO | Total time taken by feature scaling: 4.08 sec 2025-11-04 01:46:03,191 | INFO | Dimension Reduction using pca ... 2025-11-04 01:46:03,851 | INFO | PCA columns: ['col_0', 'col_1', 'col_2', 'col_3', 'col_4', 'col_5'] 2025-11-04 01:46:03,852 | INFO | Total time taken by PCA: 0.66 sec 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 01:46:04,269 | INFO | Model Training started ... 2025-11-04 01:46:04,313 | INFO | Hyperparameters used for model training: 2025-11-04 01:46:04,314 | INFO | Model: decision_forest 2025-11-04 01:46:04,314 | INFO | Hyperparameters: {'response_column': 'survived', 'name': 'decision_forest', 'tree_type': 'Classification', 'min_impurity': (0.0, 0.1, 0.2), 'max_depth': (5, 6, 8, 10), 'min_node_size': (1, 2, 3), 'num_trees': (-1,), 'seed': 42} 2025-11-04 01:46:04,314 | INFO | Total number of models for decision_forest: 36 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2025-11-04 01:46:04,315 | INFO | Model: xgboost 2025-11-04 01:46:04,315 | INFO | Hyperparameters: {'response_column': 'survived', 'name': 'xgboost', 'model_type': 'Classification', 'column_sampling': (1, 0.6), 'min_impurity': (0.0, 0.1, 0.2), 'lambda1': (1.0, 0.01, 0.1), 'shrinkage_factor': (0.5, 0.1, 0.3), 'max_depth': (5, 6, 8, 10), 'min_node_size': (1, 2, 3), 'iter_num': (10, 20, 30), 'num_boosted_trees': (-1, 5, 10), 'seed': 42} 2025-11-04 01:46:04,318 | INFO | Total number of models for xgboost: 5832 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2025-11-04 01:46:04,318 | INFO | Model: glm 2025-11-04 01:46:04,319 | INFO | Hyperparameters: {'response_column': 'survived', 'name': 'glm', 'family': 'BINOMIAL', 'lambda1': (0.001, 0.02, 0.1), 'alpha': (0.15, 0.85), 'learning_rate': 'OPTIMAL', 'initial_eta': (0.05, 0.1), 'momentum': (0.65, 0.8, 0.95), 'iter_num_no_change': (5, 10, 50), 'iter_max': (300, 200, 400), 'batch_size': (10, 50, 60, 80)} 2025-11-04 01:46:04,319 | INFO | Total number of models for glm: 1296 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2025-11-04 01:46:04,319 | INFO | Performing hyperparameter tuning ... 2025-11-04 01:46:05,563 | INFO | Model training for decision_forest 2025-11-04 01:46:25,904 | INFO | ---------------------------------------------------------------------------------------------------- 2025-11-04 01:46:25,904 | INFO | Model training for xgboost 2025-11-04 01:46:45,404 | INFO | ---------------------------------------------------------------------------------------------------- 2025-11-04 01:46:45,404 | INFO | Model training for glm 2025-11-04 01:47:04,963 | INFO | ---------------------------------------------------------------------------------------------------- 2025-11-04 01:47:04,966 | INFO | Leaderboard RANK MODEL_ID FEATURE_SELECTION ACCURACY MICRO-PRECISION ... MACRO-RECALL MACRO-F1 WEIGHTED-PRECISION WEIGHTED-RECALL WEIGHTED-F1 0 1 DECISIONFOREST_2 rfe 0.825397 0.825397 ... 0.808905 0.813358 0.824143 0.825397 0.823892 1 2 DECISIONFOREST_4 rfe 0.825397 0.825397 ... 0.808905 0.813358 0.824143 0.825397 0.823892 2 3 DECISIONFOREST_0 rfe 0.817460 0.817460 ... 0.802412 0.805699 0.816153 0.817460 0.816322 3 4 XGBOOST_2 rfe 0.809524 0.809524 ... 0.814471 0.804601 0.819985 0.809524 0.811493 4 5 XGBOOST_0 rfe 0.793651 0.793651 ... 0.790353 0.785882 0.797852 0.793651 0.794946 5 6 XGBOOST_6 rfe 0.793651 0.793651 ... 0.790353 0.785882 0.797852 0.793651 0.794946 6 7 GLM_2 rfe 0.785714 0.785714 ... 0.776438 0.775401 0.786579 0.785714 0.786096 7 8 DECISIONFOREST_1 pca 0.785714 0.785714 ... 0.739332 0.751079 0.801736 0.785714 0.771713 8 9 DECISIONFOREST_5 pca 0.785714 0.785714 ... 0.739332 0.751079 0.801736 0.785714 0.771713 9 10 GLM_4 rfe 0.777778 0.777778 ... 0.758813 0.762456 0.775583 0.777778 0.775863 10 11 DECISIONFOREST_3 pca 0.777778 0.777778 ... 0.732839 0.743605 0.789323 0.777778 0.764406 11 12 GLM_7 pca 0.769841 0.769841 ... 0.752319 0.755012 0.767874 0.769841 0.768406 12 13 XGBOOST_4 rfe 0.769841 0.769841 ... 0.748609 0.752891 0.767246 0.769841 0.767273 13 14 GLM_6 rfe 0.761905 0.761905 ... 0.768089 0.756944 0.776963 0.761905 0.764660 14 15 GLM_0 rfe 0.761905 0.761905 ... 0.764378 0.755751 0.772947 0.761905 0.764366 15 16 XGBOOST_5 pca 0.761905 0.761905 ... 0.742115 0.745489 0.759396 0.761905 0.759853 16 17 GLM_3 pca 0.761905 0.761905 ... 0.738404 0.743207 0.758943 0.761905 0.758605 17 18 GLM_5 pca 0.746032 0.746032 ... 0.710575 0.717568 0.743834 0.746032 0.737493 18 19 XGBOOST_3 pca 0.738095 0.738095 ... 0.730056 0.727362 0.741511 0.738095 0.739383 19 20 XGBOOST_1 pca 0.730159 0.730159 ... 0.712430 0.713942 0.728372 0.730159 0.729078 20 21 XGBOOST_7 pca 0.730159 0.730159 ... 0.712430 0.713942 0.728372 0.730159 0.729078 21 22 GLM_1 pca 0.682540 0.682540 ... 0.725417 0.682219 0.772840 0.682540 0.679978 [22 rows x 13 columns] 22 rows X 13 columns 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation >>> Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 14/14 - Display model leaderboard.
>>> aml.leaderboard()
RANK MODEL_ID FEATURE_SELECTION ACCURACY MICRO-PRECISION ... MACRO-RECALL MACRO-F1 WEIGHTED-PRECISION WEIGHTED-RECALL WEIGHTED-F1 0 1 DECISIONFOREST_2 rfe 0.825397 0.825397 ... 0.808905 0.813358 0.824143 0.825397 0.823892 1 2 DECISIONFOREST_4 rfe 0.825397 0.825397 ... 0.808905 0.813358 0.824143 0.825397 0.823892 2 3 DECISIONFOREST_0 rfe 0.817460 0.817460 ... 0.802412 0.805699 0.816153 0.817460 0.816322 3 4 XGBOOST_2 rfe 0.809524 0.809524 ... 0.814471 0.804601 0.819985 0.809524 0.811493 4 5 XGBOOST_0 rfe 0.793651 0.793651 ... 0.790353 0.785882 0.797852 0.793651 0.794946 5 6 XGBOOST_6 rfe 0.793651 0.793651 ... 0.790353 0.785882 0.797852 0.793651 0.794946 6 7 GLM_2 rfe 0.785714 0.785714 ... 0.776438 0.775401 0.786579 0.785714 0.786096 7 8 DECISIONFOREST_1 pca 0.785714 0.785714 ... 0.739332 0.751079 0.801736 0.785714 0.771713 8 9 DECISIONFOREST_5 pca 0.785714 0.785714 ... 0.739332 0.751079 0.801736 0.785714 0.771713 9 10 GLM_4 rfe 0.777778 0.777778 ... 0.758813 0.762456 0.775583 0.777778 0.775863 10 11 DECISIONFOREST_3 pca 0.777778 0.777778 ... 0.732839 0.743605 0.789323 0.777778 0.764406 11 12 GLM_7 pca 0.769841 0.769841 ... 0.752319 0.755012 0.767874 0.769841 0.768406 12 13 XGBOOST_4 rfe 0.769841 0.769841 ... 0.748609 0.752891 0.767246 0.769841 0.767273 13 14 GLM_6 rfe 0.761905 0.761905 ... 0.768089 0.756944 0.776963 0.761905 0.764660 14 15 GLM_0 rfe 0.761905 0.761905 ... 0.764378 0.755751 0.772947 0.761905 0.764366 15 16 XGBOOST_5 pca 0.761905 0.761905 ... 0.742115 0.745489 0.759396 0.761905 0.759853 16 17 GLM_3 pca 0.761905 0.761905 ... 0.738404 0.743207 0.758943 0.761905 0.758605 17 18 GLM_5 pca 0.746032 0.746032 ... 0.710575 0.717568 0.743834 0.746032 0.737493 18 19 XGBOOST_3 pca 0.738095 0.738095 ... 0.730056 0.727362 0.741511 0.738095 0.739383 19 20 XGBOOST_1 pca 0.730159 0.730159 ... 0.712430 0.713942 0.728372 0.730159 0.729078 20 21 XGBOOST_7 pca 0.730159 0.730159 ... 0.712430 0.713942 0.728372 0.730159 0.729078 21 22 GLM_1 pca 0.682540 0.682540 ... 0.725417 0.682219 0.772840 0.682540 0.679978 [22 rows x 13 columns]
- Display the best performing model.
>>> aml.leader()
RANK MODEL_ID FEATURE_SELECTION ACCURACY MICRO-PRECISION ... MACRO-RECALL MACRO-F1 WEIGHTED-PRECISION WEIGHTED-RECALL WEIGHTED-F1 0 1 DECISIONFOREST_2 rfe 0.825397 0.825397 ... 0.808905 0.813358 0.824143 0.825397 0.823892 [1 rows x 13 columns]
- Display hyperparameters for trained model.
- Display mode hyperparameters for rank 1.
>>> aml.model_hyperparameters(rank=1)
{'response_column': 'survived', 'name': 'decision_forest', 'tree_type': 'Classification', 'min_impurity': 0.0, 'max_depth': 5, 'min_node_size': 2, 'num_trees': -1, 'seed': 42, 'persist': False, 'output_prob': True, 'output_responses': ['1', '0']} - Display model hyperparameters for rank 4.
>>> aml.model_hyperparameters(rank=4)
{'response_column': 'survived', 'name': 'xgboost', 'model_type': 'Classification', 'column_sampling': 1, 'min_impurity': 0.0, 'lambda1': 1.0, 'shrinkage_factor': 0.5, 'max_depth': 5, 'min_node_size': 1, 'iter_num': 10, 'num_boosted_trees': 5, 'seed': 42, 'persist': False, 'output_prob': True, 'output_responses': ['1', '0']}
- Display mode hyperparameters for rank 1.
- Generate prediction on test dataset using best performing model.
>>> prediction = aml.predict(titanic_test)
2025-11-04 01:49:52,113 | INFO | Data Transformation started ... 2025-11-04 01:49:52,113 | INFO | Performing transformation carried out in feature engineering phase ... 2025-11-04 01:49:52,736 | INFO | Updated dataset after dropping futile columns : passenger survived pclass sex age sibsp parch fare cabin embarked automl_id 0 793 0 3 female NaN 8 2 69.5500 None S 14 1 814 0 3 female 6.0 4 2 31.2750 None S 8 2 812 0 3 male 39.0 0 0 24.1500 None S 12 3 265 0 3 female NaN 0 0 7.7500 None Q 5 4 101 0 3 female 28.0 0 0 7.8958 None S 13 5 19 0 3 female 31.0 1 0 18.0000 None S 7 6 730 0 3 female 25.0 1 0 7.9250 None S 11 7 137 1 1 female 19.0 0 2 26.2833 D47 S 15 8 244 0 3 male 22.0 0 0 7.1250 None S 9 9 61 0 3 male 22.0 0 0 7.2292 None C 4 178 rows X 11 columns 2025-11-04 01:49:53,072 | INFO | Updated dataset after performing target column transformation : passenger survived pclass sex age sibsp parch fare cabin embarked automl_id 0 101 0 3 female 28.0 0 0 7.8958 None S 13 1 730 0 3 female 25.0 1 0 7.9250 None S 11 2 137 1 1 female 19.0 0 2 26.2833 D47 S 15 3 61 0 3 male 22.0 0 0 7.2292 None C 4 4 812 0 3 male 39.0 0 0 24.1500 None S 12 5 734 0 2 male 23.0 0 0 13.0000 None S 6 6 345 0 2 male 36.0 0 0 13.0000 None S 10 7 793 0 3 female NaN 8 2 69.5500 None S 14 8 814 0 3 female 6.0 4 2 31.2750 None S 8 9 19 0 3 female 31.0 1 0 18.0000 None S 7 178 rows X 11 columns 2025-11-04 01:49:53,317 | INFO | Updated dataset after dropping missing value containing columns : passenger survived pclass sex age sibsp parch fare embarked automl_id 0 101 0 3 female 28.0 0 0 7.8958 S 13 1 730 0 3 female 25.0 1 0 7.9250 S 11 2 137 1 1 female 19.0 0 2 26.2833 S 15 3 61 0 3 male 22.0 0 0 7.2292 C 4 4 812 0 3 male 39.0 0 0 24.1500 S 12 5 734 0 2 male 23.0 0 0 13.0000 S 6 6 345 0 2 male 36.0 0 0 13.0000 S 10 7 793 0 3 female NaN 8 2 69.5500 S 14 8 814 0 3 female 6.0 4 2 31.2750 S 8 9 19 0 3 female 31.0 1 0 18.0000 S 7 178 rows X 10 columns 2025-11-04 01:49:54,303 | INFO | Updated dataset after imputing missing value containing columns : passenger survived pclass sex age sibsp parch fare embarked automl_id 0 793 0 3 female 29 8 2 69.5500 S 14 1 730 0 3 female 25 1 0 7.9250 S 11 2 137 1 1 female 19 0 2 26.2833 S 15 3 61 0 3 male 22 0 0 7.2292 C 4 4 812 0 3 male 39 0 0 24.1500 S 12 5 265 0 3 female 29 0 0 7.7500 Q 5 6 244 0 3 male 22 0 0 7.1250 S 9 7 101 0 3 female 28 0 0 7.8958 S 13 8 814 0 3 female 6 4 2 31.2750 S 8 9 19 0 3 female 31 1 0 18.0000 S 7 178 rows X 10 columns 2025-11-04 01:49:59,120 | INFO | Updated dataset after performing categorical encoding : survived pclass sex_0 sex_1 age sibsp parch fare embarked_0 embarked_1 embarked_2 automl_id passenger 101 0 3 1 0 28 0 0 7.8958 0 0 1 13 610 1 1 1 0 40 0 0 153.4625 0 0 1 21 404 0 3 0 1 28 1 0 15.8500 0 0 1 25 873 0 1 0 1 33 0 0 5.0000 0 0 1 29 747 0 3 0 1 16 1 1 20.2500 0 0 1 37 604 0 3 0 1 44 0 0 8.0500 0 0 1 41 34 0 2 0 1 66 0 0 10.5000 0 0 1 33 835 0 3 0 1 18 0 0 8.3000 0 0 1 17 244 0 3 0 1 22 0 0 7.1250 0 0 1 9 265 0 3 1 0 29 0 0 7.7500 0 1 0 5 178 rows X 13 columns 2025-11-04 01:49:59,273 | INFO | Performing transformation carried out in data preparation phase ... 2025-11-04 01:49:59,998 | INFO | Updated dataset after performing RFE feature selection: automl_id passenger age sex_1 pclass sex_0 embarked_0 embarked_1 sibsp embarked_2 fare survived 1 64 188 45 1 1 0 0 0 0 1 26.5500 1 84 756 0 1 2 0 0 0 1 1 14.5000 1 108 75 32 1 3 0 0 0 0 1 56.4958 1 116 258 30 0 1 1 0 0 0 1 86.5000 1 132 521 30 0 1 1 0 0 0 1 93.5000 1 148 866 42 0 2 1 0 0 0 1 13.0000 0 13 101 28 0 3 1 0 0 0 1 7.8958 0 25 404 28 1 3 0 0 0 1 1 15.8500 0 29 873 33 1 1 0 0 0 0 1 5.0000 0 33 34 66 1 2 0 0 0 0 1 10.5000 178 rows X 12 columns 2025-11-04 01:50:00,864 | INFO | Updated dataset after performing scaling on RFE selected features : survived r_embarked_0 r_sex_1 r_sex_0 automl_id r_embarked_1 r_embarked_2 r_passenger r_age r_pclass r_sibsp r_fare 0 0 0 1 0 29 0 1 0.980877 0.588235 0.0 0.0 0.087719 1 0 0 1 0 41 0 1 0.678290 0.803922 1.0 0.0 0.141228 2 0 0 1 0 65 0 1 0.602925 0.823529 0.0 0.0 0.465789 3 0 0 0 1 81 1 0 0.735658 0.294118 1.0 0.0 0.118421 4 0 0 1 0 89 0 1 0.179978 0.803922 1.0 0.0 0.282456 5 0 0 1 0 97 1 0 0.314961 1.215686 1.0 0.0 0.135965 6 1 1 1 0 40 0 0 0.673791 0.901961 0.0 0.5 0.998758 7 1 1 0 1 52 0 0 0.532058 0.392157 0.5 0.0 0.241960 8 1 0 1 0 64 0 1 0.210349 0.823529 0.0 0.0 0.465789 9 1 1 0 1 72 0 0 0.988751 1.039216 0.0 0.0 1.458918 178 rows X 12 columns 2025-11-04 01:50:02,234 | INFO | Updated dataset after performing scaling for PCA feature selection : survived parch embarked_0 sex_1 sex_0 embarked_1 automl_id embarked_2 passenger pclass age sibsp fare 0 1 0 0 1 0 0 64 1 0.210349 0.0 0.823529 0.0 0.465789 1 1 1 0 1 0 0 84 1 0.849269 0.5 -0.058824 0.5 0.254386 2 1 0 0 1 0 0 108 1 0.083240 1.0 0.568627 0.0 0.991154 3 1 0 0 0 1 0 116 1 0.289089 0.0 0.529412 0.0 1.517544 4 1 0 0 0 1 0 132 1 0.584927 0.0 0.529412 0.0 1.640351 5 1 0 0 0 1 0 148 1 0.973003 0.5 0.764706 0.0 0.228070 6 0 0 0 0 1 0 13 1 0.112486 1.0 0.490196 0.0 0.138523 7 0 0 0 1 0 0 25 1 0.453318 1.0 0.490196 0.5 0.278070 8 0 0 0 1 0 0 29 1 0.980877 0.0 0.588235 0.0 0.087719 9 0 0 0 1 0 0 33 1 0.037120 0.5 1.235294 0.0 0.184211 178 rows X 13 columns 2025-11-04 01:50:02,690 | INFO | Updated dataset after performing PCA feature selection : automl_id col_0 col_1 col_2 col_3 col_4 col_5 survived 0 40 0.115488 -1.108618 -0.961301 0.189160 0.037975 0.449300 1 1 13 0.644199 0.662373 0.419392 -0.274127 -0.270581 -0.327601 0 2 52 1.242898 -0.612770 -0.061959 -0.352359 0.154754 -0.207412 1 3 25 -0.587362 0.155686 0.136922 -0.108206 -0.201738 0.375208 0 4 64 -0.467396 0.116995 -0.719125 0.350888 -0.227753 -0.312627 1 5 29 -0.494907 0.122419 -0.637915 0.252426 0.496492 -0.111965 0 6 72 1.362560 -0.551915 -0.871712 0.129982 0.552677 0.112351 1 7 33 -0.567285 0.105638 -0.284682 0.180347 -0.359583 -0.357239 0 8 84 -0.503757 0.158937 -0.218662 -0.006225 0.146552 0.436244 1 9 41 -0.652225 0.137441 0.163958 -0.100921 0.209487 -0.025870 0 10 rows X 8 columns 2025-11-04 01:50:02,981 | INFO | Data Transformation completed.█████| 100% - 9/9 2025-11-04 01:50:03,526 | INFO | Following model is being picked for evaluation: 2025-11-04 01:50:03,527 | INFO | Model ID : DECISIONFOREST_2 2025-11-04 01:50:03,527 | INFO | Feature Selection Method : rfe 2025-11-04 01:50:04,320 | INFO | Applying SHAP for Model Interpretation... 2025-11-04 01:50:06,531 | INFO | SHAP Analysis Completed. Feature Importance Available. /root/automl_testing/pyTeradata/teradataml/automl/model_evaluation.py:380: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown plt.show() 2025-11-04 01:50:06,623 | INFO | Prediction : automl_id prediction prob_1 prob_0 survived 0 29 0 0.0 1.0 0 1 41 0 0.0 1.0 0 2 65 1 1.0 0.0 0 3 81 1 1.0 0.0 0 4 89 0 0.0 1.0 0 5 97 0 0.0 1.0 0 6 40 1 1.0 0.0 1 7 52 1 1.0 0.0 1 8 64 1 1.0 0.0 1 9 72 1 1.0 0.0 1 2025-11-04 01:50:08,738 | INFO | ROC-AUC : GINI AUC 0.672362 0.344725 threshold_value tpr fpr 0 0.040816 0.77027 0.240385 1 0.081633 0.77027 0.240385 2 0.102041 0.77027 0.240385 3 0.122449 0.77027 0.240385 4 0.163265 0.77027 0.240385 5 0.183673 0.77027 0.240385 6 0.142857 0.77027 0.240385 7 0.061224 0.77027 0.240385 8 0.020408 0.77027 0.240385 9 0.000000 1.00000 1.000000 2025-11-04 01:50:09,226 | INFO | Confusion Matrix : [[79 25] [17 57]]>>> prediction.head()
automl_id prediction prob_1 prob_0 survived 0 64 1 1.0 0.0 1 1 84 1 1.0 0.0 1 2 108 0 0.0 1.0 1 3 116 1 1.0 0.0 1 4 132 1 1.0 0.0 1 5 148 1 1.0 0.0 1 6 13 0 0.0 1.0 0 7 25 0 0.0 1.0 0 8 29 0 0.0 1.0 0 9 33 0 0.0 1.0 0
- Generate evaluation metrics on test dataset using best performing model.
>>> performance_metrics = aml.evaluate(titanic_test)
2025-11-04 01:50:49,987 | INFO | Skipping data transformation as data is already transformed. 2025-11-04 01:50:50,536 | INFO | Following model is being picked for evaluation: 2025-11-04 01:50:50,536 | INFO | Model ID : DECISIONFOREST_2 2025-11-04 01:50:50,536 | INFO | Feature Selection Method : rfe 2025-11-04 01:50:54,574 | INFO | Performance Metrics : Prediction Mapping CLASS_1 CLASS_2 Precision Recall F1 Support SeqNum 0 0 CLASS_1 79 17 0.822917 0.759615 0.790000 104 1 1 CLASS_2 25 57 0.695122 0.770270 0.730769 74 -------------------------------------------------------------------------------- SeqNum Metric MetricValue 0 3 Micro-Recall 0.764045 1 5 Macro-Precision 0.759019 2 6 Macro-Recall 0.764943 3 7 Macro-F1 0.760385 4 9 Weighted-Recall 0.764045 5 10 Weighted-F1 0.765376 6 8 Weighted-Precision 0.769789 7 4 Micro-F1 0.764045 8 2 Micro-Precision 0.764045 9 1 Accuracy 0.764045>>> performance_metrics
Prediction Mapping CLASS_1 CLASS_2 Precision Recall F1 Support SeqNum 0 0 CLASS_1 79 17 0.822917 0.759615 0.790000 104 1 1 CLASS_2 25 57 0.695122 0.770270 0.730769 74