This example predicts whether passenger aboard the RMS Titanic survived or not based on different factors.
Run AutoML to get the best performing model using following specifications:
- Add customization for some specific process in AutoML run.
- Use only two models 'xgboost' and ‘decision forest’ for AutoML training.
- Set early stopping timer to 100 sec and max_models to 5.
- Opt for verbose level 2 to get detailed log.
- Load data and split it to train and test datasets.
- Load the example data and create teradataml DataFrame.
>>> load_example_data("teradataml", "titanic")>>> titanic = DataFrame.from_table("titanic") - Perform sampling to get 80% for training and 20% for testing.
>>> titanic_sample = titanic.sample(frac = [0.8, 0.2])
- Fetch train and test data.
>>> titanic_train= titanic_sample[titanic_sample['sampleid'] == 1].drop('sampleid', axis=1)>>> titanic_test = titanic_sample[titanic_sample['sampleid'] == 2].drop('sampleid', axis=1)
- Load the example data and create teradataml DataFrame.
- Add customization and generate custom config JSON file.
>>> AutoML.generate_custom_config("custom_titanic")Generating custom config JSON for AutoML ... Available main options for customization with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Feature Engineering Phase Index 2: Customize Data Preparation Phase Index 3: Customize Model Training Phase Index 4: Generate custom json and exit -------------------------------------------------------------------------------- Enter the index you want to customize: 1 Customizing Feature Engineering Phase ... Available options for customization of feature engineering phase with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Missing Value Handling Index 2: Customize Bincode Encoding Index 3: Customize String Manipulation Index 4: Customize Categorical Encoding Index 5: Customize Mathematical Transformation Index 6: Customize Nonlinear Transformation Index 7: Customize Antiselect Features Index 8: Back to main menu Index 9: Generate custom json and exit -------------------------------------------------------------------------------- Enter the list of indices you want to customize in feature engineering phase: 1,2,4,6,7,8 Customizing Missing Value Handling ... Provide the following details to customize missing value handling: Available missing value handling methods with corresponding indices: Index 1: Drop Columns Index 2: Drop Rows Index 3: Impute Missing values Enter the list of indices for missing value handling methods : 1,3 Enter the feature or list of features for dropping columns with missing values: cabin Available missing value imputation methods with corresponding indices: Index 1: Statistical Imputation Index 2: Literal Imputation Enter the list of corresponding index missing value imputation methods you want to use: 1 Enter the feature or list of features for imputing missing values using statistic values: age Available statistical methods with corresponding indices: Index 1: min Index 2: max Index 3: mean Index 4: median Index 5: mode Enter the index of corresponding statistic imputation method for feature age: 4 Available options for generic arguments: Index 0: Default Index 1: volatile Index 2: persist Enter the indices for generic arguments : 0 Customization of missing value handling has been completed successfully. Customizing Bincode Encoding ... Provide the following details to customize binning and coding encoding: Available binning methods with corresponding indices: Index 1: Equal-Width Index 2: Variable-Width Enter the feature or list of features for binning: pclass Enter the index of corresponding binning method for feature pclass: 1 Enter the number of bins for feature pclass: 3 Available options for generic arguments: Index 0: Default Index 1: volatile Index 2: persist Enter the indices for generic arguments : 0 Customization of bincode encoding has been completed successfully. Customizing Categorical Encoding ... Provide the following details to customize categorical encoding: Available categorical encoding methods with corresponding indices: Index 1: OneHotEncoding Index 2: OrdinalEncoding Index 3: TargetEncoding Enter the list of corresponding index categorical encoding methods you want to use: 2,3 Enter the feature or list of features for OrdinalEncoding: pclass Enter the feature or list of features for TargetEncoding: embarked Available target encoding methods with corresponding indices: Index 1: CBM_BETA Index 2: CBM_DIRICHLET Index 3: CBM_GAUSSIAN_INVERSE_GAMMA Enter the index of target encoding method for feature embarked: 3 Enter the response column for target encoding method for feature embarked: survived Available options for generic arguments: Index 0: Default Index 1: volatile Index 2: persist Enter the indices for generic arguments : 0 Customization of categorical encoding has been completed successfully. Customizing Nonlinear Transformation ... Provide the following details to customize nonlinear transformation: Enter number of non-linear combination you want to make: 1 Provide the details for non-linear combination 1: Enter the list of target feature/s for non-linear combination 1: parch,sibsp Enter the formula for non-linear combination 1: Y=(X0+X1+1) Enter the resultant feature for non-linear combination 1: family_count Available options for generic arguments: Index 0: Default Index 1: volatile Index 2: persist Enter the indices for generic arguments : 0 Customization of nonlinear transformation has been completed successfully. Customizing Antiselect Features ... Enter the feature or list of features for antiselect: passenger Available options for generic arguments: Index 0: Default Index 1: volatile Index 2: persist Enter the indices for generic arguments : 0 Customization of antiselect features has been completed successfully. Customization of feature engineering phase has been completed successfully. Available main options for customization with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Feature Engineering Phase Index 2: Customize Data Preparation Phase Index 3: Customize Model Training Phase Index 4: Generate custom json and exit -------------------------------------------------------------------------------- Enter the index you want to customize: 4 Generating custom json and exiting ... Process of generating custom config file for AutoML has been completed successfully. 'custom_titanic.json' file is generated successfully under the current working directory.
- Create an AutoML instance.
>>> aml = AutoML(task_type="Classification", >>> include=['decision_forest','xgboost'], >>> verbose=2, >>> max_runtime_secs=100, >>> max_models=5, >>> custom_config_file='custom_titanic.json')
- Fit the data.
>>> aml.fit(titanic_train, titanic_train.survived)
2025-11-04 01:58:05,914 | INFO | Received below input for customization : { "MissingValueHandlingIndicator": true, "MissingValueHandlingParam": { "DroppingColumnIndicator": true, "DroppingColumnList": [ "cabin" ], "ImputeMissingIndicator": true, "StatImputeList": [ "age" ], "StatImputeMethod": [ "median" ] }, "BincodeIndicator": true, "BincodeParam": { "pclass": { "Type": "Equal-Width", "NumOfBins": 3 } }, "CategoricalEncodingIndicator": true, "CategoricalEncodingParam": { "OrdinalEncodingIndicator": true, "OrdinalEncodingList": [ "pclass" ], "TargetEncodingIndicator": true, "TargetEncodingList": { "embarked": { "encoder_method": "CBM_GAUSSIAN_INVERSE_GAMMA", "response_column": "survived" } } }, "NonLinearTransformationIndicator": true, "NonLinearTransformationParam": { "Combination_1": { "target_columns": [ "parch", "sibsp" ], "formula": "Y=(X0+X1+1)", "result_column": "family_count" } }, "AntiselectIndicator": true, "AntiselectParam": { "excluded_columns": [ "passenger" ] } } 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 01:58:05,915 | INFO | Feature Exploration started 2025-11-04 01:58:05,915 | INFO | Data Overview: 2025-11-04 01:58:05,956 | INFO | Total Rows in the data: 713 2025-11-04 01:58:05,998 | INFO | Total Columns in the data: 12 2025-11-04 01:58:06,608 | INFO | Column Summary: ColumnName Datatype NonNullCount NullCount BlankCount ZeroCount PositiveCount NegativeCount NullPercentage NonNullPercentage 0 name VARCHAR(1000) CHARACTER SET LATIN 713 0 0.0 NaN NaN NaN 0.000000 100.000000 1 ticket VARCHAR(20) CHARACTER SET LATIN 713 0 0.0 NaN NaN NaN 0.000000 100.000000 2 passenger INTEGER 713 0 NaN 0.0 713.0 0.0 0.000000 100.000000 3 sibsp INTEGER 713 0 NaN 486.0 227.0 0.0 0.000000 100.000000 4 parch INTEGER 713 0 NaN 540.0 173.0 0.0 0.000000 100.000000 5 fare FLOAT 713 0 NaN 10.0 703.0 0.0 0.000000 100.000000 6 cabin VARCHAR(20) CHARACTER SET LATIN 154 559 0.0 NaN NaN NaN 78.401122 21.598878 7 embarked VARCHAR(20) CHARACTER SET LATIN 711 2 0.0 NaN NaN NaN 0.280505 99.719495 8 sex VARCHAR(20) CHARACTER SET LATIN 713 0 0.0 NaN NaN NaN 0.000000 100.000000 9 age INTEGER 573 140 NaN 5.0 568.0 0.0 19.635344 80.364656 10 pclass INTEGER 713 0 NaN 0.0 713.0 0.0 0.000000 100.000000 11 survived INTEGER 713 0 NaN 445.0 268.0 0.0 0.000000 100.000000 2025-11-04 01:58:07,380 | INFO | Statistics of Data: ATTRIBUTE StatName StatValue 0 survived MAXIMUM 1.000000 1 survived STANDARD DEVIATION 0.484688 2 survived PERCENTILES(25) 0.000000 3 survived PERCENTILES(50) 0.000000 4 fare COUNT 713.000000 5 fare MINIMUM 0.000000 6 fare MAXIMUM 512.329200 7 fare MEAN 32.204125 8 fare STANDARD DEVIATION 51.384597 9 fare PERCENTILES(25) 7.925000 2025-11-04 01:58:07,548 | INFO | Categorical Columns with their Distinct values: ColumnName DistinctValueCount name 713 sex 2 ticket 565 cabin 124 embarked 3 2025-11-04 01:58:10,125 | INFO | Futile columns in dataset: ColumnName 0 name 1 ticket 2025-11-04 01:58:13,554 | INFO | Columns with outlier percentage :- ColumnName OutlierPercentage 0 fare 12.762973 1 parch 24.263675 2 sibsp 5.329593 3 age 20.476858 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 01:58:13,902 | INFO | Feature Engineering started ... 2025-11-04 01:58:13,902 | INFO | Handling duplicate records present in dataset ... 2025-11-04 01:58:14,041 | INFO | Analysis completed. No action taken. 2025-11-04 01:58:14,041 | INFO | Total time to handle duplicate records: 0.14 sec 2025-11-04 01:58:14,041 | INFO | Starting customized anti-select columns ... 2025-11-04 01:58:14,629 | INFO | Updated dataset sample after performing anti-select columns: survived pclass name sex age sibsp parch ticket fare cabin embarked 0 0 3 Dantcheff, Mr. Ristiu male 25.0 0 0 349203 7.8958 None S 1 1 2 Watt, Mrs. James (Elizabeth "Bessie" Inglis Milne) female 40.0 0 0 C.A. 33595 15.7500 None S 2 1 1 Barkworth, Mr. Algernon Henry Wilson male 80.0 0 0 27042 30.0000 A23 S 3 0 2 Hocking, Mr. Richard George male 23.0 2 1 29104 11.5000 None S 4 1 3 Jonsson, Mr. Carl male 32.0 0 0 350417 7.8542 None S 5 0 3 Moore, Mr. Leonard Charles male NaN 0 0 A4. 54510 8.0500 None S 6 0 3 Rintamaki, Mr. Matti male 35.0 0 0 STON/O 2. 3101273 7.1250 None S 7 0 3 Goodwin, Master. Sidney Leonard male 1.0 5 2 CA 2144 46.9000 None S 8 0 3 Williams, Mr. Howard Hugh "Harry" male NaN 0 0 A/5 2466 8.0500 None S 9 1 3 Nicola-Yarred, Miss. Jamila female 14.0 1 0 2651 11.2417 None C 713 rows X 11 columns 2025-11-04 01:58:14,964 | INFO | Handling less significant features from data ... 2025-11-04 01:58:19,867 | INFO | Removing Futile columns: ['ticket', 'name'] 2025-11-04 01:58:19,867 | INFO | Sample of Data after removing Futile columns: survived pclass sex age sibsp parch fare cabin embarked automl_id 0 1 1 male 80.0 0 0 30.0000 A23 S 14 1 1 1 female 36.0 0 0 135.6333 C32 C 8 2 0 3 male 25.0 0 0 7.8958 None S 12 3 0 3 male NaN 0 0 8.0500 None S 7 4 0 3 male 1.0 5 2 46.9000 None S 15 5 0 2 male 23.0 2 1 11.5000 None S 5 6 0 3 male NaN 0 0 8.0500 None S 9 7 1 3 male 32.0 0 0 7.8542 None S 13 8 0 3 male 35.0 0 0 7.1250 None S 11 9 0 3 male NaN 0 0 7.7250 None Q 4 713 rows X 10 columns 2025-11-04 01:58:20,323 | INFO | Total time to handle less significant features: 5.36 sec 2025-11-04 01:58:20,323 | INFO | Handling Date Features ... 2025-11-04 01:58:20,323 | INFO | Analysis Completed. Dataset does not contain any feature related to dates. No action needed. 2025-11-04 01:58:20,323 | INFO | Total time to handle date features: 0.00 sec 2025-11-04 01:58:20,324 | INFO | Dropping these columns for handling customized missing value: ['cabin'] 2025-11-04 01:58:23,129 | INFO | Updated dataset sample after performing customized missing value imputation: pclass sex age sibsp parch fare embarked automl_id survived 1 2 female 19 1 0 26.0000 S 88 1 1 female 33 1 0 53.1000 S 116 1 1 female 35 0 0 512.3292 C 120 1 1 female 44 0 1 57.9792 C 124 1 1 female 28 1 0 89.1042 C 136 1 1 male 48 1 0 76.7292 C 140 0 3 male 1 5 2 46.9000 S 15 0 1 male 52 1 1 79.6500 S 35 0 3 male 28 0 0 7.8958 S 47 0 3 male 19 0 0 0.0000 S 59 713 rows X 9 columns 2025-11-04 01:58:23,242 | INFO | Proceeding with default option for handling remaining missing values. 2025-11-04 01:58:23,242 | INFO | Checking Missing values in dataset ... 2025-11-04 01:58:24,028 | INFO | Columns with their missing values: embarked: 2 2025-11-04 01:58:24,432 | INFO | Deleting rows of these columns for handling missing values: ['embarked'] 2025-11-04 01:58:24,516 | INFO | Sample of dataset after removing 2 rows: pclass sex age sibsp parch fare embarked automl_id survived 1 2 female 19 1 0 26.0000 S 88 1 1 female 33 1 0 53.1000 S 116 1 1 female 35 0 0 512.3292 C 120 1 1 female 44 0 1 57.9792 C 124 1 1 female 28 1 0 89.1042 C 136 1 1 male 48 1 0 76.7292 C 140 0 3 male 1 5 2 46.9000 S 15 0 1 male 52 1 1 79.6500 S 35 0 3 male 28 0 0 7.8958 S 47 0 3 male 19 0 0 0.0000 S 59 711 rows X 9 columns 2025-11-04 01:58:24,606 | INFO | Total time to find missing values in data: 1.36 sec 2025-11-04 01:58:24,606 | INFO | Imputing Missing Values ... 2025-11-04 01:58:24,606 | INFO | Analysis completed. No imputation required. 2025-11-04 01:58:24,606 | INFO | Time taken to perform imputation: 0.00 sec 2025-11-04 01:58:25,987 | INFO | Updated dataset sample after performing Equal-Width binning :- age parch fare embarked automl_id sex sibsp pclass survived 0 18 1 20.2125 S 75 male 1 pclass_3 0 71 0 34.6542 C 91 male 0 pclass_1 0 24 0 24.1500 S 95 male 2 pclass_3 0 29 0 27.7208 C 99 male 1 pclass_2 0 26 0 8.0500 S 115 male 0 pclass_3 0 26 0 7.8958 S 119 male 0 pclass_3 1 19 0 26.0000 S 88 female 1 pclass_2 1 33 0 53.1000 S 116 female 1 pclass_1 1 35 0 512.3292 C 120 female 0 pclass_1 1 44 1 57.9792 C 124 female 0 pclass_1 711 rows X 9 columns 2025-11-04 01:58:26,099 | INFO | No information provided for Variable-Width Transformation. 2025-11-04 01:58:26,099 | INFO | Skipping customized string manipulation. 2025-11-04 01:58:26,099 | INFO | Starting Customized Categorical Feature Encoding ... 2025-11-04 01:58:28,132 | INFO | Updated dataset sample after performing ordinal encoding: age parch fare embarked automl_id sex sibsp pclass survived 1 48 0 76.7292 C 140 male 1 0 1 19 0 30.0000 S 184 female 0 0 1 36 2 71.0000 S 192 female 0 0 1 27 0 13.8583 C 200 female 1 1 1 14 2 120.0000 S 248 female 1 0 1 18 0 227.5250 C 252 female 1 0 0 18 1 20.2125 S 75 male 1 2 0 71 0 34.6542 C 91 male 0 0 0 24 0 24.1500 S 95 male 2 2 0 29 0 27.7208 C 99 male 1 1 711 rows X 9 columns 2025-11-04 01:58:31,099 | INFO | Updated dataset sample after performing target encoding: survived age parch fare pclass automl_id sex sibsp embarked 0.533835 1 17 0 108.9000 0 340 female 1 0.533835 1 23 1 63.3583 0 452 male 0 0.533835 1 29 0 7.8958 2 604 male 0 0.533835 1 49 0 89.1042 0 624 male 1 0.533835 1 40 1 134.5000 0 105 female 1 0.533835 1 26 0 30.0000 0 125 male 0 0.533835 1 28 0 24.0000 1 37 female 1 0.533835 1 60 0 75.2500 0 368 female 1 0.533835 1 18 0 227.5250 0 252 female 1 0.533835 1 48 0 76.7292 0 140 male 1 711 rows X 9 columns 2025-11-04 01:58:31,215 | INFO | Performing encoding for categorical columns ... 2025-11-04 01:58:33,242 | INFO | ONE HOT Encoding these Columns: ['sex'] 2025-11-04 01:58:33,242 | INFO | Sample of dataset after performing one hot encoding: survived age parch fare pclass automl_id sex_0 sex_1 sibsp embarked 0.4 1 28 0 7.7500 2 653 1 0 0 0.4 1 28 0 15.5000 2 346 1 0 1 0.4 1 28 0 7.7500 2 546 1 0 0 0.4 1 28 0 7.7333 2 650 1 0 0 0.4 1 19 0 7.8792 2 295 1 0 0 0.4 1 28 0 23.2500 2 419 0 1 2 0.4 1 16 0 7.7500 2 279 1 0 0 0.4 1 28 0 7.8792 2 106 1 0 0 0.4 1 28 0 7.7500 2 565 1 0 0 0.4 1 28 0 7.7875 2 329 1 0 0 711 rows X 10 columns 2025-11-04 01:58:33,372 | INFO | Time taken to encode the columns: 2.16 sec 2025-11-04 01:58:33,372 | INFO | Starting customized mathematical transformation ... 2025-11-04 01:58:33,372 | INFO | Skipping customized mathematical transformation. 2025-11-04 01:58:33,372 | INFO | Starting customized non-linear transformation ... 2025-11-04 01:58:33,372 | INFO | Possible combination : ['Combination_1'] 2025-11-04 01:58:35,367 | INFO | Updated dataset sample after performing non-liner transformation: survived age parch fare pclass automl_id sex_0 sex_1 sibsp family_count embarked 0.327519 1 28 0.0 16.1000 2 352 1 0 1.0 2.0 0.327519 1 35 0.0 52.0000 0 440 1 0 1.0 2.0 0.327519 1 28 0.0 35.5000 0 476 0 1 0.0 1.0 0.327519 1 39 0.0 7.9250 2 500 0 1 0.0 1.0 0.327519 1 35 0.0 26.2875 0 568 0 1 0.0 1.0 0.327519 1 41 1.0 19.5000 1 588 1 0 0.0 2.0 0.400000 1 28 0.0 7.7500 2 653 1 0 0.0 1.0 0.400000 1 28 0.0 15.5000 2 346 1 0 1.0 2.0 0.400000 1 28 0.0 7.7500 2 546 1 0 0.0 1.0 0.400000 1 28 0.0 7.7333 2 650 1 0 0.0 1.0 711 rows X 11 columns 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 01:58:35,481 | INFO | Data preparation started ... 2025-11-04 01:58:35,481 | INFO | No information provided for performing customized feature scaling. Proceeding with default option. 2025-11-04 01:58:35,481 | INFO | No information provided for performing customized imbalanced dataset sampling. AutoML will Proceed with default option. 2025-11-04 01:58:35,481 | INFO | Starting customized outlier processing ... 2025-11-04 01:58:35,481 | INFO | No information provided for customized outlier processing. AutoML will proceed with default settings. 2025-11-04 01:58:35,481 | INFO | Outlier preprocessing ... 2025-11-04 01:58:38,454 | INFO | Columns with outlier percentage :- ColumnName OutlierPercentage 0 fare 12.517581 1 embarked 18.565401 2 family_count 10.829817 3 sibsp 5.344585 4 parch 24.331927 5 age 7.172996 2025-11-04 01:58:38,844 | INFO | Deleting rows of these columns: ['age', 'sibsp'] 2025-11-04 01:58:41,003 | INFO | Sample of dataset after removing outlier rows: survived age parch fare pclass automl_id sex_0 sex_1 sibsp family_count embarked 0.327519 1 28 0.0 35.5000 0 476 0 1 0.0 1.0 0.327519 1 35 0.0 26.2875 0 568 0 1 0.0 1.0 0.327519 1 41 1.0 19.5000 1 588 1 0 0.0 2.0 0.327519 1 24 2.0 16.7000 2 600 1 0 0.0 3.0 0.327519 1 24 0.0 7.1417 2 620 0 1 0.0 1.0 0.327519 1 24 2.0 65.0000 1 652 1 0 1.0 4.0 0.327519 1 39 0.0 55.9000 0 612 1 0 1.0 2.0 0.327519 1 39 0.0 7.9250 2 500 0 1 0.0 1.0 0.327519 1 35 0.0 52.0000 0 440 1 0 1.0 2.0 0.327519 1 28 0.0 16.1000 2 352 1 0 1.0 2.0 629 rows X 11 columns 2025-11-04 01:58:41,138 | INFO | median inplace of outliers: ['embarked', 'parch', 'family_count', 'fare'] 2025-11-04 01:58:43,191 | INFO | Sample of dataset after performing MEDIAN inplace: survived age parch fare pclass automl_id sex_0 sex_1 sibsp family_count embarked 0.327519 1 26 0.0 18.7875 2 625 0 1 0.0 1.0 0.327519 1 13 0.0 7.2292 2 258 1 0 0.0 1.0 0.327519 1 30 0.0 56.9292 0 294 1 0 0.0 1.0 0.327519 1 39 0.0 13.0000 0 382 1 0 1.0 3.0 0.327519 1 41 0.0 13.0000 0 522 1 0 0.0 1.0 0.327519 1 22 0.0 49.5000 0 550 1 0 0.0 3.0 0.400000 1 28 0.0 23.2500 2 419 0 1 2.0 3.0 0.400000 1 29 0.0 7.7500 2 724 0 1 0.0 1.0 0.400000 1 28 0.0 7.7500 2 148 0 1 0.0 1.0 0.400000 1 28 0.0 15.5000 2 48 1 0 1.0 2.0 629 rows X 11 columns 2025-11-04 01:58:43,306 | INFO | Time Taken by Outlier processing: 7.82 sec 2025-11-04 01:58:43,306 | INFO | Checking imbalance data ... 2025-11-04 01:58:43,369 | INFO | Imbalance Not Found. 2025-11-04 01:58:44,144 | INFO | Feature selection using rfe ... 2025-11-04 01:58:59,868 | INFO | feature selected by RFE: ['age', 'sex_1', 'pclass', 'sex_0', 'fare', 'family_count'] 2025-11-04 01:58:59,870 | INFO | Total time taken by feature selection: 15.73 sec 2025-11-04 01:59:00,180 | INFO | Scaling Features of rfe data ... 2025-11-04 01:59:01,334 | INFO | columns that will be scaled: ['r_age', 'r_pclass', 'r_fare', 'r_family_count'] 2025-11-04 01:59:03,143 | INFO | Dataset sample after scaling: survived r_sex_0 r_sex_1 automl_id r_age r_pclass r_fare r_family_count 0 1 1 0 6 0.215686 1.0 0.197223 0.5 1 1 1 0 8 0.647059 0.0 0.228070 0.0 2 0 0 1 9 0.490196 1.0 0.141228 0.0 3 1 1 0 10 0.725490 0.5 0.276316 0.0 4 0 0 1 12 0.431373 1.0 0.138523 0.0 5 1 0 1 13 0.568627 1.0 0.137793 0.0 6 0 0 1 11 0.627451 1.0 0.125000 0.0 7 0 0 1 7 0.490196 1.0 0.141228 0.0 8 0 0 1 5 0.392157 0.5 0.201754 0.0 9 0 0 1 4 0.490196 1.0 0.135526 0.0 629 rows X 8 columns 2025-11-04 01:59:03,693 | INFO | Total time taken by feature scaling: 3.51 sec 2025-11-04 01:59:03,693 | INFO | Scaling Features of pca data ... 2025-11-04 01:59:04,612 | INFO | columns that will be scaled: ['embarked', 'age', 'fare', 'pclass', 'sibsp', 'family_count'] 2025-11-04 01:59:06,517 | INFO | Dataset sample after scaling: survived parch sex_1 sex_0 automl_id embarked age fare pclass sibsp family_count 0 1 0.0 1 0 148 1.0 0.490196 0.135965 1.0 0.0 0.0 1 0 0.0 1 0 255 1.0 0.490196 0.271930 1.0 0.5 0.5 2 0 0.0 1 0 343 1.0 0.490196 0.135965 1.0 0.0 0.0 3 0 0.0 1 0 355 1.0 0.313725 0.118421 1.0 0.0 0.0 4 0 0.0 0 1 98 1.0 0.490196 0.133846 1.0 0.0 0.0 5 0 0.0 0 1 190 1.0 0.705882 0.510965 1.0 0.0 0.0 6 1 0.0 0 1 361 0.0 0.960784 0.228070 0.0 0.5 0.5 7 1 0.0 0 1 545 0.0 0.490196 0.228070 0.0 0.5 0.5 8 1 0.0 1 0 625 0.0 0.450980 0.329605 1.0 0.0 0.0 9 1 0.0 0 1 6 0.0 0.215686 0.197223 1.0 0.5 0.5 629 rows X 11 columns 2025-11-04 01:59:07,065 | INFO | Total time taken by feature scaling: 3.37 sec 2025-11-04 01:59:07,065 | INFO | Dimension Reduction using pca ... 2025-11-04 01:59:07,692 | INFO | PCA columns: ['col_0', 'col_1', 'col_2', 'col_3', 'col_4', 'col_5'] 2025-11-04 01:59:07,692 | INFO | Total time taken by PCA: 0.63 sec 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 01:59:08,078 | INFO | Model Training started ... 2025-11-04 01:59:08,141 | INFO | Starting customized hyperparameter update ... 2025-11-04 01:59:08,141 | INFO | Skipping customized hyperparameter tuning 2025-11-04 01:59:08,144 | INFO | Hyperparameters used for model training: 2025-11-04 01:59:08,144 | INFO | Model: decision_forest 2025-11-04 01:59:08,144 | INFO | Hyperparameters: {'response_column': 'survived', 'name': 'decision_forest', 'tree_type': 'Classification', 'min_impurity': (0.0, 0.1, 0.2), 'max_depth': (5, 6, 8, 10), 'min_node_size': (1, 2, 3), 'num_trees': (-1,), 'seed': 42} 2025-11-04 01:59:08,145 | INFO | Total number of models for decision_forest: 36 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2025-11-04 01:59:08,145 | INFO | Model: xgboost 2025-11-04 01:59:08,145 | INFO | Hyperparameters: {'response_column': 'survived', 'name': 'xgboost', 'model_type': 'Classification', 'column_sampling': (1, 0.6), 'min_impurity': (0.0, 0.1, 0.2), 'lambda1': (1.0, 0.01, 0.1), 'shrinkage_factor': (0.5, 0.1, 0.3), 'max_depth': (5, 6, 8, 10), 'min_node_size': (1, 2, 3), 'iter_num': (10, 20, 30), 'num_boosted_trees': (-1, 5, 10), 'seed': 42} 2025-11-04 01:59:08,146 | INFO | Total number of models for xgboost: 5832 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2025-11-04 01:59:08,146 | INFO | Performing hyperparameter tuning ... 2025-11-04 01:59:09,386 | INFO | Model training for decision_forest 2025-11-04 01:59:20,913 | INFO | ---------------------------------------------------------------------------------------------------- 2025-11-04 01:59:20,913 | INFO | Model training for xgboost 2025-11-04 01:59:27,012 | INFO | ---------------------------------------------------------------------------------------------------- 2025-11-04 01:59:27,015 | INFO | Leaderboard RANK MODEL_ID FEATURE_SELECTION ACCURACY MICRO-PRECISION ... MACRO-RECALL MACRO-F1 WEIGHTED-PRECISION WEIGHTED-RECALL WEIGHTED-F1 0 1 DECISIONFOREST_0 rfe 0.833333 0.833333 ... 0.822820 0.824011 0.832799 0.833333 0.833012 1 2 DECISIONFOREST_1 rfe 0.809524 0.809524 ... 0.799629 0.799629 0.809524 0.809524 0.809524 2 3 DECISIONFOREST_3 pca 0.809524 0.809524 ... 0.769944 0.783070 0.821440 0.809524 0.799904 3 4 DECISIONFOREST_2 pca 0.785714 0.785714 ... 0.754174 0.763009 0.786184 0.785714 0.779310 4 5 XGBOOST_1 pca 0.761905 0.761905 ... 0.764378 0.755751 0.772947 0.761905 0.764366 [5 rows x 13 columns] 5 rows X 13 columns 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation >>> Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 18/18 - Display model leaderboard.
>>> aml.leaderboard()
RANK MODEL_ID FEATURE_SELECTION ACCURACY MICRO-PRECISION ... MACRO-RECALL MACRO-F1 WEIGHTED-PRECISION WEIGHTED-RECALL WEIGHTED-F1 0 1 DECISIONFOREST_0 rfe 0.833333 0.833333 ... 0.822820 0.824011 0.832799 0.833333 0.833012 1 2 DECISIONFOREST_1 rfe 0.809524 0.809524 ... 0.799629 0.799629 0.809524 0.809524 0.809524 2 3 DECISIONFOREST_3 pca 0.809524 0.809524 ... 0.769944 0.783070 0.821440 0.809524 0.799904 3 4 DECISIONFOREST_2 pca 0.785714 0.785714 ... 0.754174 0.763009 0.786184 0.785714 0.779310 4 5 XGBOOST_1 pca 0.761905 0.761905 ... 0.764378 0.755751 0.772947 0.761905 0.764366 [5 rows x 13 columns]
- Display the best performing model.
>>> aml.leader()
RANK MODEL_ID FEATURE_SELECTION ACCURACY MICRO-PRECISION ... MACRO-RECALL MACRO-F1 WEIGHTED-PRECISION WEIGHTED-RECALL WEIGHTED-F1 0 1 DECISIONFOREST_0 rfe 0.833333 0.833333 ... 0.82282 0.824011 0.832799 0.833333 0.833012 [1 rows x 13 columns]
- Display hyperparameters for trained model.
- Display model hyperparameters for rank 1.
>>> aml.model_hyperparameters(rank=1)
{'response_column': 'survived', 'name': 'decision_forest', 'tree_type': 'Classification', 'min_impurity': 0.1, 'max_depth': 6, 'min_node_size': 3, 'num_trees': -1, 'seed': 42, 'persist': False, 'output_prob': True, 'output_responses': ['1', '0'], 'max_models': 2} - Display hyperparameters for rank 4.
>>> aml.model_hyperparameters(rank=4)
{'response_column': 'survived', 'name': 'decision_forest', 'tree_type': 'Classification', 'min_impurity': 0.1, 'max_depth': 8, 'min_node_size': 2, 'num_trees': -1, 'seed': 42, 'persist': False, 'output_prob': True, 'output_responses': ['1', '0'], 'max_models': 2}
- Display model hyperparameters for rank 1.
- Generate prediction on test dataset using best performing model.
>>> prediction = aml.predict(titanic_test)
2025-11-04 02:03:17,753 | INFO | Data Transformation started ... 2025-11-04 02:03:17,754 | INFO | Performing transformation carried out in feature engineering phase ... 2025-11-04 02:03:18,354 | INFO | Updated dataset after dropping futile columns : passenger survived pclass sex age sibsp parch fare cabin embarked automl_id 0 137 1 1 female 19.0 0 2 26.2833 D47 S 15 1 814 0 3 female 6.0 4 2 31.2750 None S 8 2 812 0 3 male 39.0 0 0 24.1500 None S 12 3 734 0 2 male 23.0 0 0 13.0000 None S 6 4 793 0 3 female NaN 8 2 69.5500 None S 14 5 265 0 3 female NaN 0 0 7.7500 None Q 5 6 244 0 3 male 22.0 0 0 7.1250 None S 9 7 101 0 3 female 28.0 0 0 7.8958 None S 13 8 345 0 2 male 36.0 0 0 13.0000 None S 10 9 61 0 3 male 22.0 0 0 7.2292 None C 4 178 rows X 11 columns 2025-11-04 02:03:18,648 | INFO | Updated dataset after performing target column transformation : passenger survived pclass sex age sibsp parch fare cabin embarked automl_id 0 793 0 3 female NaN 8 2 69.5500 None S 14 1 730 0 3 female 25.0 1 0 7.9250 None S 11 2 137 1 1 female 19.0 0 2 26.2833 D47 S 15 3 265 0 3 female NaN 0 0 7.7500 None Q 5 4 101 0 3 female 28.0 0 0 7.8958 None S 13 5 61 0 3 male 22.0 0 0 7.2292 None C 4 6 814 0 3 female 6.0 4 2 31.2750 None S 8 7 812 0 3 male 39.0 0 0 24.1500 None S 12 8 244 0 3 male 22.0 0 0 7.1250 None S 9 9 19 0 3 female 31.0 1 0 18.0000 None S 7 178 rows X 11 columns 2025-11-04 02:03:18,930 | INFO | Updated dataset after dropping customized missing value containing columns : passenger survived pclass sex age sibsp parch fare embarked automl_id 0 137 1 1 female 19.0 0 2 26.2833 S 15 1 345 0 2 male 36.0 0 0 13.0000 S 10 2 793 0 3 female NaN 8 2 69.5500 S 14 3 265 0 3 female NaN 0 0 7.7500 Q 5 4 101 0 3 female 28.0 0 0 7.8958 S 13 5 61 0 3 male 22.0 0 0 7.2292 C 4 6 814 0 3 female 6.0 4 2 31.2750 S 8 7 812 0 3 male 39.0 0 0 24.1500 S 12 8 244 0 3 male 22.0 0 0 7.1250 S 9 9 734 0 2 male 23.0 0 0 13.0000 S 6 178 rows X 10 columns 2025-11-04 02:03:19,892 | INFO | Updated dataset after imputing customized missing value containing columns : passenger survived pclass sex age sibsp parch fare embarked automl_id 0 101 0 3 female 28 0 0 7.8958 S 13 1 345 0 2 male 36 0 0 13.0000 S 10 2 793 0 3 female 28 8 2 69.5500 S 14 3 19 0 3 female 31 1 0 18.0000 S 7 4 137 1 1 female 19 0 2 26.2833 S 15 5 61 0 3 male 22 0 0 7.2292 C 4 6 814 0 3 female 6 4 2 31.2750 S 8 7 812 0 3 male 39 0 0 24.1500 S 12 8 730 0 3 female 25 1 0 7.9250 S 11 9 734 0 2 male 23 0 0 13.0000 S 6 178 rows X 10 columns 2025-11-04 02:03:21,745 | INFO | Updated dataset after performing customized equal width bin-code transformation : passenger age parch fare embarked automl_id sex sibsp pclass survived 1 89 23 2 263.0000 S 61 female 3 pclass_1 1 371 25 0 55.4417 C 77 male 1 pclass_1 1 752 6 1 12.4750 S 93 male 0 pclass_3 1 872 47 1 52.5542 S 101 female 1 pclass_1 1 805 27 0 6.9750 S 157 male 0 pclass_3 1 517 34 0 10.5000 S 165 female 0 pclass_2 0 101 28 0 7.8958 S 13 female 0 pclass_3 0 404 28 0 15.8500 S 25 male 1 pclass_3 0 873 33 0 5.0000 S 29 male 0 pclass_1 0 34 66 0 10.5000 S 33 male 0 pclass_2 178 rows X 10 columns 2025-11-04 02:03:23,335 | INFO | Updated dataset after performing customized categorical encoding : survived passenger age parch fare pclass automl_id sex sibsp embarked 0.533835 0 378 27 2 211.5000 0 136 male 0 0.533835 0 558 28 0 227.5250 0 66 male 0 0.533835 0 525 28 0 7.2292 2 98 male 0 0.533835 0 790 46 0 79.2000 0 102 male 0 0.533835 0 621 27 0 14.4542 2 35 male 1 0.533835 0 178 50 0 28.7125 0 109 female 0 0.400000 0 422 21 0 7.7333 2 113 male 0 0.400000 0 791 28 0 7.7500 2 26 male 0 0.400000 0 891 32 0 7.7500 2 106 male 0 0.400000 0 768 30 0 7.7500 2 23 female 0 178 rows X 10 columns 2025-11-04 02:03:24,331 | INFO | Updated dataset after performing categorical encoding : survived passenger age parch fare pclass automl_id sex_0 sex_1 sibsp embarked 0.400000 1 29 28 0 7.8792 2 127 1 0 0 0.400000 1 413 33 0 90.0000 0 48 1 0 1 0.400000 0 526 40 0 7.7500 2 16 0 1 0 0.400000 0 779 28 0 7.7375 2 68 0 1 0 0.400000 0 655 18 0 6.7500 2 81 1 0 0 0.400000 1 157 16 0 7.7333 2 115 1 0 0 0.533835 0 378 27 2 211.5000 0 136 0 1 0 0.533835 0 558 28 0 227.5250 0 66 0 1 0 0.533835 0 525 28 0 7.2292 2 98 0 1 0 0.533835 0 790 46 0 79.2000 0 102 0 1 0 178 rows X 11 columns 2025-11-04 02:03:25,179 | INFO | Updated dataset after performing customized non-linear transformation : survived passenger age parch fare pclass automl_id sex_0 sex_1 sibsp family_count embarked 0.327519 0 656 24 0.0 73.5000 1 177 0 1 2.0 3.0 0.327519 0 564 28 0.0 8.0500 2 28 0 1 0.0 1.0 0.327519 0 522 22 0.0 7.8958 2 32 0 1 0.0 1.0 0.327519 0 739 28 0.0 7.8958 2 56 0 1 0.0 1.0 0.327519 0 651 28 0.0 7.8958 2 92 0 1 0.0 1.0 0.327519 0 180 36 0.0 0.0000 2 96 0 1 0.0 1.0 0.327519 0 611 39 5.0 31.2750 2 88 1 0 1.0 7.0 0.327519 0 814 6 2.0 31.2750 2 8 1 0 4.0 7.0 0.327519 0 641 20 0.0 7.8542 2 161 0 1 0.0 1.0 0.327519 0 214 30 0.0 13.0000 1 145 0 1 0.0 1.0 178 rows X 12 columns 2025-11-04 02:03:25,825 | INFO | Updated dataset after performing customized anti-selection : embarked survived age parch fare pclass automl_id sex_0 sex_1 sibsp family_count 0 0.533835 0 50 0.0 28.7125 0 109 1 0 0.0 1.0 1 0.533835 1 56 1.0 83.1583 0 72 1 0 0.0 2.0 2 0.533835 1 32 0.0 30.5000 0 176 0 1 0.0 1.0 3 0.533835 1 15 0.0 14.4542 2 38 1 0 1.0 2.0 4 0.533835 1 25 0.0 91.0792 0 86 0 1 1.0 2.0 5 0.533835 1 1 2.0 37.0042 1 114 0 1 0.0 3.0 6 0.400000 1 28 0.0 7.8792 2 127 1 0 0.0 1.0 7 0.400000 1 33 0.0 90.0000 0 48 1 0 1.0 2.0 8 0.400000 0 40 0.0 7.7500 2 16 0 1 0.0 1.0 9 0.400000 0 28 0.0 7.7375 2 68 0 1 0.0 1.0 178 rows X 11 columns 2025-11-04 02:03:26,057 | INFO | Performing transformation carried out in data preparation phase ... 2025-11-04 02:03:26,803 | INFO | Updated dataset after performing RFE feature selection: automl_id age sex_1 pclass sex_0 fare family_count survived 1 169 17 0 0 1 57.0000 2.0 1 36 63 0 0 1 77.9583 2.0 1 64 45 1 0 0 26.5500 1.0 1 76 26 0 2 1 7.9250 1.0 1 108 32 1 2 0 56.4958 1.0 1 116 30 0 0 1 86.5000 1.0 0 177 24 1 1 0 73.5000 3.0 0 28 28 1 2 0 8.0500 1.0 0 32 22 1 2 0 7.8958 1.0 0 56 28 1 2 0 7.8958 1.0 178 rows X 8 columns 2025-11-04 02:03:27,562 | INFO | Updated dataset after performing scaling on RFE selected features : survived r_sex_0 r_sex_1 automl_id r_age r_pclass r_fare r_family_count 0 0 0 1 32 0.372549 1.0 0.138523 0.0 1 0 0 1 92 0.490196 1.0 0.138523 0.0 2 0 0 1 96 0.647059 1.0 0.000000 0.0 3 0 0 1 104 0.490196 1.0 0.141228 0.0 4 0 0 1 144 0.411765 1.0 0.282456 0.5 5 0 1 0 152 0.490196 1.0 1.220175 5.0 6 1 1 0 101 0.862745 0.0 0.922004 1.0 7 1 1 0 165 0.607843 0.5 0.184211 0.0 8 1 1 0 169 0.274510 0.0 1.000000 0.5 9 1 0 1 181 0.647059 0.0 0.461184 0.0 178 rows X 8 columns 2025-11-04 02:03:28,740 | INFO | Updated dataset after performing scaling for PCA feature selection : survived parch sex_1 sex_0 automl_id embarked age fare pclass sibsp family_count 0 1 0.0 0 1 169 -0.000267 0.274510 1.000000 0.0 0.5 0.5 1 1 0.0 0 1 36 -0.000267 1.176471 1.367689 0.0 0.5 0.5 2 1 0.0 1 0 64 -0.000267 0.823529 0.465789 0.0 0.0 0.0 3 1 0.0 0 1 76 -0.000267 0.450980 0.139035 1.0 0.0 0.0 4 1 0.0 1 0 108 -0.000267 0.568627 0.991154 1.0 0.0 0.0 5 1 0.0 0 1 116 -0.000267 0.529412 1.517544 0.0 0.0 0.0 6 0 0.0 1 0 177 -0.000267 0.411765 1.289474 0.5 1.0 1.0 7 0 0.0 1 0 28 -0.000267 0.490196 0.141228 1.0 0.0 0.0 8 0 0.0 1 0 32 -0.000267 0.372549 0.138523 1.0 0.0 0.0 9 0 0.0 1 0 56 -0.000267 0.490196 0.138523 1.0 0.0 0.0 178 rows X 11 columns 2025-11-04 02:03:29,109 | INFO | Updated dataset after performing PCA feature selection : automl_id col_0 col_1 col_2 col_3 col_4 col_5 survived 0 101 1.219747 -0.881510 0.328327 0.102559 0.086625 0.192121 1 1 177 -0.124017 -0.926226 1.024970 0.066431 0.471474 -0.425253 0 2 165 0.855218 0.148332 -0.454413 -0.120779 -0.011061 0.136053 1 3 28 -0.584001 0.264865 -0.047374 -0.146007 -0.008639 0.028028 0 4 169 1.148406 -0.670955 0.014315 0.021998 0.178162 -0.459917 1 5 32 -0.581800 0.281829 -0.036110 -0.162284 -0.047981 -0.077502 0 6 181 -0.402240 -0.649165 -0.455126 0.147058 -0.005639 -0.032572 1 7 56 -0.584185 0.265557 -0.047415 -0.146158 -0.009595 0.028910 0 8 36 1.155128 -0.889723 -0.066752 0.166061 0.602480 0.236110 1 9 92 -0.584185 0.265557 -0.047415 -0.146158 -0.009595 0.028910 0 10 rows X 8 columns 2025-11-04 02:03:29,439 | INFO | Data Transformation completed.█████| 100% - 14/14 2025-11-04 02:03:29,985 | INFO | Following model is being picked for evaluation: 2025-11-04 02:03:29,985 | INFO | Model ID : DECISIONFOREST_0 2025-11-04 02:03:29,985 | INFO | Feature Selection Method : rfe 2025-11-04 02:03:30,722 | INFO | Applying SHAP for Model Interpretation... 2025-11-04 02:03:32,813 | INFO | SHAP Analysis Completed. Feature Importance Available. /root/automl_testing/pyTeradata/teradataml/automl/model_evaluation.py:380: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown plt.show() 2025-11-04 02:03:32,886 | INFO | Prediction : automl_id prediction prob_1 prob_0 survived 0 169 1 1.0 0.0 1 1 36 1 1.0 0.0 1 2 64 1 1.0 0.0 1 3 76 0 0.0 1.0 1 4 108 1 1.0 0.0 1 5 116 1 1.0 0.0 1 6 177 1 1.0 0.0 0 7 28 0 0.0 1.0 0 8 32 0 0.0 1.0 0 9 56 0 0.0 1.0 0 2025-11-04 02:03:34,773 | INFO | ROC-AUC : GINI AUC 0.665346 0.330691 threshold_value tpr fpr 0 0.040816 0.797297 0.259615 1 0.081633 0.797297 0.259615 2 0.102041 0.797297 0.259615 3 0.122449 0.797297 0.259615 4 0.163265 0.797297 0.259615 5 0.183673 0.797297 0.259615 6 0.142857 0.797297 0.259615 7 0.061224 0.797297 0.259615 8 0.020408 0.797297 0.259615 9 0.000000 1.000000 1.000000 2025-11-04 02:03:35,155 | INFO | Confusion Matrix : [[77 27] [15 59]]>>> prediction.head()
automl_id prediction prob_1 prob_0 survived 0 169 1 1.0 0.0 1 1 36 1 1.0 0.0 1 2 64 1 1.0 0.0 1 3 76 0 0.0 1.0 1 4 108 1 1.0 0.0 1 5 116 1 1.0 0.0 1 6 177 1 1.0 0.0 0 7 28 0 0.0 1.0 0 8 32 0 0.0 1.0 0 9 56 0 0.0 1.0 0
- Generate evaluation metrics on test dataset using best performing model.
>>> performance_metrics = aml.evaluate(titanic_test)
2025-11-04 02:04:10,024 | INFO | Skipping data transformation as data is already transformed. 2025-11-04 02:04:10,567 | INFO | Following model is being picked for evaluation: 2025-11-04 02:04:10,567 | INFO | Model ID : DECISIONFOREST_0 2025-11-04 02:04:10,567 | INFO | Feature Selection Method : rfe 2025-11-04 02:04:13,988 | INFO | Performance Metrics : Prediction Mapping CLASS_1 CLASS_2 Precision Recall F1 Support SeqNum 1 1 CLASS_2 27 59 0.686047 0.797297 0.737500 74 0 0 CLASS_1 77 15 0.836957 0.740385 0.785714 104 -------------------------------------------------------------------------------- SeqNum Metric MetricValue 0 3 Micro-Recall 0.764045 1 5 Macro-Precision 0.761502 2 6 Macro-Recall 0.768841 3 7 Macro-F1 0.761607 4 9 Weighted-Recall 0.764045 5 10 Weighted-F1 0.765670 6 8 Weighted-Precision 0.774219 7 4 Micro-F1 0.764045 8 2 Micro-Precision 0.764045 9 1 Accuracy 0.764045>>> performance_metrics
Prediction Mapping CLASS_1 CLASS_2 Precision Recall F1 Support SeqNum 0 0 CLASS_1 77 15 0.836957 0.740385 0.785714 104 1 1 CLASS_2 27 59 0.686047 0.797297 0.737500 74