This example predict whether passenger aboard the RMS Titanic survived or not based on different factors.
Run AutoML to get the best performing model using following specifications:
- Add customization for some specific process in AutoML run.
- Use only two models ‘xgboost’ and ‘decision forest’ for AutoML training.
- Set early stopping timer to 300 sec.
- Opt for verbose level 2 to get detailed log.
- Load data and split it to train and test datasets.
- Load the example data and create teradataml DataFrame.
>>> load_example_data("teradataml", "titanic")
>>> titanic = DataFrame.from_table("titanic")
- Perform sampling to get 80% for training and 20% for testing.
>>> titanic_sample = titanic.sample(frac = [0.8, 0.2])
- Fetch train and test data.
>>> titanic_train= titanic_sample[titanic_sample['sampleid'] == 1].drop('sampleid', axis=1)
>>> titanic_test = titanic_sample[titanic_sample['sampleid'] == 2].drop('sampleid', axis=1)
- Load the example data and create teradataml DataFrame.
- Add customization and generate custom config JSON file.
>>> AutoML.generate_custom_config("custom_titanic")
Generating custom config JSON for AutoML ... Available main options for customization with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Feature Engineering Phase Index 2: Customize Data Preparation Phase Index 3: Customize Model Training Phase Index 4: Generate custom json and exit -------------------------------------------------------------------------------- Enter the index you want to customize: 1 Customizing Feature Engineering Phase ... Available options for customization of feature engineering phase with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Missing Value Handling Index 2: Customize Bincode Encoding Index 3: Customize String Manipulation Index 4: Customize Categorical Encoding Index 5: Customize Mathematical Transformation Index 6: Customize Nonlinear Transformation Index 7: Customize Antiselect Features Index 8: Back to main menu Index 9: Generate custom json and exit -------------------------------------------------------------------------------- Enter the list of indices you want to customize in feature engineering phase: 1,2,4,6,7,8 Customizing Missing Value Handling ... Provide the following details to customize missing value handling: Available missing value handling methods with corresponding indices: Index 1: Drop Columns Index 2: Drop Rows Index 3: Impute Missing values Enter the list of indices for missing value handling methods : 1,3 Enter the feature or list of features for dropping columns with missing values: cabin Available missing value imputation methods with corresponding indices: Index 1: Statistical Imputation Index 2: Literal Imputation Enter the list of corresponding index missing value imputation methods you want to use: 1 Enter the feature or list of features for imputing missing values using statistic values: age, embarked Available statistical methods with corresponding indices: Index 1: min Index 2: max Index 3: mean Index 4: median Index 5: mode Enter the index of corresponding statistic imputation method for feature age: 4 Enter the index of corresponding statistic imputation method for feature embarked: 5 Customization of missing value handling has been completed successfully. Customizing Bincode Encoding ... Provide the following details to customize binning and coding encoding: Available binning methods with corresponding indices: Index 1: Equal-Width Index 2: Variable-Width Enter the feature or list of features for binning: pclass Enter the index of corresponding binning method for feature pclass: 2 Enter the number of bins for feature pclass: 2 Available value type of feature for variable binning with corresponding indices: Index 1: int Index 2: float Provide the range for bin 1 of feature pclass: Enter the index of corresponding value type of feature pclass: 1 Enter the minimum value for bin 1 of feature pclass: 0 Enter the maximum value for bin 1 of feature pclass: 1 Enter the label for bin 1 of feature pclass: low Provide the range for bin 2 of feature pclass: Enter the index of corresponding value type of feature pclass: 1 Enter the minimum value for bin 2 of feature pclass: 2 Enter the maximum value for bin 2 of feature pclass: 3 Enter the label for bin 2 of feature pclass: high Customization of bincode encoding has been completed successfully. Customizing Categorical Encoding ... Provide the following details to customize categorical encoding: Available categorical encoding methods with corresponding indices: Index 1: OneHotEncoding Index 2: OrdinalEncoding Index 3: TargetEncoding Enter the list of corresponding index categorical encoding methods you want to use: 2,3 Enter the feature or list of features for OrdinalEncoding: pclass Enter the feature or list of features for TargetEncoding: embarked Available target encoding methods with corresponding indices: Index 1: CBM_BETA Index 2: CBM_DIRICHLET Index 3: CBM_GAUSSIAN_INVERSE_GAMMA Enter the index of target encoding method for feature embarked: 3 Enter the response column for target encoding method for feature embarked: survived Customization of categorical encoding has been completed successfully. Customizing Nonlinear Transformation ... Provide the following details to customize nonlinear transformation: Enter number of non-linear combination you want to make: 1 Provide the details for non-linear combination 1: Enter the list of target feature/s for non-linear combination 1: parch, sibsp Enter the formula for non-linear combination 1: Y=(X0+X1+1) Enter the resultant feature for non-linear combination 1: Family_count Customization of nonlinear transformation has been completed successfully. Customizing Antiselect Features ... Enter the feature or list of features for antiselect: passenger Customization of antiselect features has been completed successfully. Customization of feature engineering phase has been completed successfully. Available main options for customization with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Feature Engineering Phase Index 2: Customize Data Preparation Phase Index 3: Customize Model Training Phase Index 4: Generate custom json and exit -------------------------------------------------------------------------------- Enter the index you want to customize: 2 Customizing Data Preparation Phase ... Available options for customization of data preparation phase with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Train Test Split Index 2: Customize Data Imbalance Handling Index 3: Customize Outlier Handling Index 4: Customize Feature Scaling Index 5: Back to main menu Index 6: Generate custom json and exit -------------------------------------------------------------------------------- Enter the list of indices you want to customize in data preparation phase: 1,2,3,4,5 Customizing Train Test Split ... Enter the train size for train test split: 0.75 Customization of train test split has been completed successfully. Customizing Data Imbalance Handling ... Available data sampling methods with corresponding indices: Index 1: SMOTE Index 2: NearMiss Enter the corresponding index data imbalance handling method: 1 Customization of data imbalance handling has been completed successfully. Customizing Outlier Handling ... Available outlier detection methods with corresponding indices: Index 1: percentile Index 2: tukey Index 3: carling Enter the corresponding index oulier handling method: 1 Enter the lower percentile value for outlier handling: 0.1 Enter the upper percentile value for outlier handling: 0.9 Enter the feature or list of features for outlier handling: fare Available outlier replacement methods with corresponding indices: Index 1: delete Index 2: median Index 3: Any Numeric Value Enter the index of corresponding replacement method for feature fare: 2 Customization of outlier handling has been completed successfully. Available feature scaling methods with corresponding indices: Index 1: maxabs Index 2: mean Index 3: midrange Index 4: range Index 5: rescale Index 6: std Index 7: sum Index 8: ustd Enter the corresponding index feature scaling method: 6 Customization of feature scaling has been completed successfully. Customization of data preparation phase has been completed successfully. Available main options for customization with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Feature Engineering Phase Index 2: Customize Data Preparation Phase Index 3: Customize Model Training Phase Index 4: Generate custom json and exit -------------------------------------------------------------------------------- Enter the index you want to customize: 3 Customizing Model Training Phase ... Available options for customization of model training phase with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Model Hyperparameter Index 2: Back to main menu Index 3: Generate custom json and exit -------------------------------------------------------------------------------- Enter the list of indices you want to customize in model training phase: 1,2 Customizing Model Hyperparameter ... Available models for hyperparameter tuning with corresponding indices: Index 1: decision_forest Index 2: xgboost Index 3: knn Index 4: glm Index 5: svm Available hyperparamters update methods with corresponding indices: Index 1: ADD Index 2: REPLACE Enter the list of model indices for performing hyperparameter tuning: 2 Available hyperparameters for model 'xgboost' with corresponding indices: Index 1: min_impurity Index 2: max_depth Index 3: min_node_size Index 4: shrinkage_factor Index 5: iter_num Enter the list of hyperparameter indices for model 'xgboost': 3 Enter the index of corresponding update method for hyperparameters 'min_node_size' for model 'xgboost': 1 Enter the list of value for hyperparameter 'min_node_size' for model 'xgboost': 1,5 Customization of model hyperparameter has been completed successfully. Customization of model training phase has been completed successfully. Available main options for customization with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Feature Engineering Phase Index 2: Customize Data Preparation Phase Index 3: Customize Model Training Phase Index 4: Generate custom json and exit -------------------------------------------------------------------------------- Enter the index you want to customize: 4 Generating custom json and exiting ... Process of generating custom config file for AutoML has been completed successfully. 'custom_titanic.json' file is generated successfully under the current working directory.
- Create an AutoML instance.
>>> aml = AutoML(task_type="Classification", >>> include=['decision_forest','xgboost'], >>> verbose=2, >>> max_runtime_secs=300, >>> custom_config_file='custom_titanic.json')
- Fit the data.
>>> aml.fit(titanic_train, titanic_train.survived)
Received below input for customization : { "MissingValueHandlingIndicator": true, "MissingValueHandlingParam": { "DroppingColumnIndicator": true, "DroppingColumnList": [ "cabin" ], "ImputeMissingIndicator": true, "StatImputeList": [ "age", "embarked" ], "StatImputeMethod": [ "median", "mode" ] }, "BincodeIndicator": true, "BincodeParam": { "pclass": { "Type": "Variable-Width", "NumOfBins": 2, "Bin_1": { "min_value": 0, "max_value": 1, "label": "low" }, "Bin_2": { "min_value": 2, "max_value": 3, "label": "high" } } }, "CategoricalEncodingIndicator": true, "CategoricalEncodingParam": { "OrdinalEncodingIndicator": true, "OrdinalEncodingList": [ "pclass" ], "TargetEncodingIndicator": true, "TargetEncodingList": { "embarked": { "encoder_method": "CBM_GAUSSIAN_INVERSE_GAMMA", "response_column": "survived" } } }, "NonLinearTransformationIndicator": true, "NonLinearTransformationParam": { "Combination_1": { "target_columns": [ "parch", "sibsp" ], "formula": "Y=(X0+X1+1)", "result_column": "Family_count" } }, "AntiselectIndicator": true, "AntiselectParam": [ "passenger" ], "TrainTestSplitIndicator": true, "TrainingSize": 0.75, "DataImbalanceIndicator": true, "DataImbalanceMethod": "SMOTE", "OutlierFilterIndicator": true, "OutlierFilterMethod": "percentile", "OutlierLowerPercentile": 0.1, "OutlierUpperPercentile": 0.9, "OutlierFilterParam": { "fare": { "replacement_value": "median" } }, "FeatureScalingIndicator": true, "FeatureScalingMethod": "std", "HyperparameterTuningIndicator": true, "HyperparameterTuningParam": { "xgboost": { "min_node_size": { "Method": "ADD", "Value": [ 1, 5 ] } } } } 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation Feature Exploration started ... Data Overview: Total Rows in the data: 713 Total Columns in the data: 12 Column Summary: ColumnName Datatype NonNullCount NullCount BlankCount ZeroCount PositiveCount NegativeCount NullPercentage NonNullPercentage cabin VARCHAR(20) CHARACTER SET LATIN 167 546 0 None None None 76.57784011220197 23.422159887798035 pclass INTEGER 713 0 None 0 713 0 0.0 100.0 sex VARCHAR(20) CHARACTER SET LATIN 713 0 0 None None None 0.0 100.0 sibsp INTEGER 713 0 None 483 230 0 0.0 100.0 parch INTEGER 713 0 None 535 178 0 0.0 100.0 passenger INTEGER 713 0 None 0 713 0 0.0 100.0 embarked VARCHAR(20) CHARACTER SET LATIN 712 1 0 None None None 0.1402524544179523 99.85974754558205 fare FLOAT 713 0 None 12 701 0 0.0 100.0 name VARCHAR(1000) CHARACTER SET LATIN 713 0 0 None None None 0.0 100.0 survived INTEGER 713 0 None 439 274 0 0.0 100.0 age INTEGER 570 143 None 7 563 0 20.05610098176718 79.94389901823281 ticket VARCHAR(20) CHARACTER SET LATIN 713 0 0 None None None 0.0 100.0 Statistics of Data: func passenger survived pclass age sibsp parch fare 50% 456 0 3 28 0 0 14.5 count 713 713 713 570 713 713 713 mean 452.764 0.384 2.293 29.335 0.53 0.394 33.635 min 1 0 1 0 0 0 0 max 891 1 3 71 8 5 512.329 75% 673 1 3 38 1 0 31.275 25% 235 0 2 20 0 0 7.925 std 257.37 0.487 0.839 14.481 1.111 0.797 52.824 Categorical Columns with their Distinct values: ColumnName DistinctValueCount name 713 sex 2 ticket 561 cabin 130 embarked 3 Futile columns in dataset: ColumnName ticket name Target Column Distribution: Columns with outlier percentage :- ColumnName OutlierPercentage 0 age 20.757363 1 sibsp 5.189341 2 fare 14.165498 3 parch 24.964937 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation Feature Engineering started ... Handling duplicate records present in dataset ... Analysis completed. No action taken. Total time to handle duplicate records: 1.64 sec Handling less significant features from data ... Removing Futile columns: ['ticket', 'name'] Sample of Data after removing Futile columns: passenger survived pclass sex age sibsp parch fare cabin embarked id 80 1 3 female 30 0 0 12.475 None S 12 183 0 3 male 9 4 2 31.3875 None S 8 509 0 3 male 28 0 0 22.525 None S 16 305 0 3 male None 0 0 8.05 None S 13 835 0 3 male 18 0 0 8.3 None S 15 162 1 2 female 40 0 0 15.75 None S 23 265 0 3 female None 0 0 7.75 None Q 9 530 0 2 male 23 2 1 11.5 None S 17 61 0 3 male 22 0 0 7.2292 None C 14 652 1 2 female 18 0 1 23.0 None S 22 713 rows X 11 columns Total time to handle less significant features: 20.57 sec Handling Date Features ... Analysis Completed. Dataset does not contain any feature related to dates. No action needed. Total time to handle date features: 0.00 sec Dropping these columns for handling customized missing value: ['cabin'] result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713845516787654"'20 Updated dataset sample after performing customized missing value imputation: passenger survived pclass sex age sibsp parch fare embarked id 427 1 2 female 28 1 0 26.0 S 31 671 1 2 female 40 1 1 39.0 S 47 528 0 1 male 28 0 0 221.7792 S 55 793 0 3 female 28 8 2 69.55 S 63 833 0 3 male 28 0 0 7.2292 C 79 282 0 3 male 28 0 0 7.8542 S 87 589 0 3 male 22 0 0 8.05 S 71 692 1 3 female 4 0 1 13.4167 C 39 162 1 2 female 40 0 0 15.75 S 23 835 0 3 male 18 0 0 8.3 S 15 713 rows X 10 columns Proceeding with default option for handling remaining missing values. Checking Missing values in dataset ... Analysis Completed. No Missing Values Detected. Total time to find missing values in data: 6.47 sec Imputing Missing Values ... Analysis completed. No imputation required. Time taken to perform imputation: 0.01 sec No information provided for Equal-Width Transformation. Variable-Width binning information:- ColumnName MinValue MaxValue Label 0 pclass 0 1 low 1 pclass 2 3 high 2 rows X 4 columns result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713845626085490"'20 Updated dataset sample after performing Variable-Width binning: sibsp parch fare age id survived passenger sex embarked pclass 3 0 15.85 33 691 1 86 female S high 3 1 25.4667 28 92 0 177 male S high 3 1 21.075 3 220 0 375 female S high 3 1 25.4667 28 700 0 410 female S high 3 2 27.9 10 646 0 820 male S high 3 2 27.9 9 718 0 635 female S high 0 0 7.25 30 32 0 366 male S high 0 0 8.05 16 56 1 221 male S high 0 0 24.15 39 80 0 812 male S high 0 0 8.05 43 88 0 669 male S high 713 rows X 10 columns Skipping customized string manipulation.⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾| 25% - 5/20 Starting Customized Categorical Feature Encoding ... result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713845980398494"'20 Updated dataset sample after performing ordinal encoding: sibsp parch fare age id survived passenger sex embarked pclass 4 1 29.125 2 49 0 17 male Q 0 4 1 29.125 4 558 0 172 male Q 0 4 2 31.275 9 539 0 542 female S 0 4 2 31.275 6 40 0 814 female S 0 4 1 39.6875 14 429 0 687 male S 0 4 1 39.6875 1 302 0 165 male S 0 4 1 29.125 8 445 0 788 male Q 0 4 2 31.3875 5 180 1 234 female S 0 4 2 7.925 17 627 1 69 female S 0 4 1 39.6875 16 735 0 267 male S 0 713 rows X 10 columns result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713844678774235"'20 Updated dataset sample after performing target encoding: embarked sibsp parch fare age id survived passenger sex pclass 0.36363636363636365 1 0 15.5 28 398 0 365 male 0 0.36363636363636365 1 0 24.15 28 178 1 110 female 0 0.36363636363636365 1 0 24.15 28 457 0 769 male 0 0.36363636363636365 1 1 15.5 40 585 0 189 male 0 0.36363636363636365 2 0 23.25 28 420 1 302 male 0 0.36363636363636365 2 0 90.0 44 742 0 246 male 1 0.3300395256916996 1 1 29.0 22 67 1 324 female 0 0.3300395256916996 1 5 31.3875 38 280 1 26 female 0 0.3300395256916996 1 2 65.0 48 774 1 755 female 0 0.3300395256916996 0 0 7.8958 22 160 0 522 male 0 713 rows X 10 columns Performing encoding for categorical columns ... result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713845375023505"'20 ONE HOT Encoding these Columns: ['sex'] Sample of dataset after performing one hot encoding: embarked sibsp parch fare age id survived passenger sex_0 sex_1 pclass 0.36363636363636365 1 0 24.15 28 457 0 769 0 1 0 0.36363636363636365 2 0 23.25 28 420 1 302 0 1 0 0.36363636363636365 2 0 90.0 44 742 0 246 0 1 1 0.36363636363636365 0 0 7.7375 28 304 0 779 0 1 0 0.36363636363636365 0 0 7.75 28 42 0 791 0 1 0 0.36363636363636365 0 0 7.75 32 466 0 891 0 1 0 0.3300395256916996 1 1 11.1333 1 197 1 173 1 0 0 0.3300395256916996 0 0 6.2375 61 298 0 327 0 1 0 0.3300395256916996 1 2 65.0 24 662 1 616 1 0 0 0.3300395256916996 0 0 14.0 54 139 0 318 0 1 0 713 rows X 11 columns Time taken to encode the columns: 13.91 sec Starting customized mathematical transformation ... Skipping customized mathematical transformation. Starting customized non-linear transformation ... Possible combination : ['Combination_1'] result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713844655147652"'20 Updated dataset sample after performing non-liner transformation: embarked sibsp parch fare age id survived passenger sex_0 sex_1 pclass Family_count 0.3300395256916996 0.0 0.0 0.0 28 106 0 278 0 1 0 1.0 0.3300395256916996 0.0 0.0 50.4958 31 518 0 868 0 1 1 1.0 0.3300395256916996 1.0 1.0 20.525 33 41 0 549 0 1 0 3.0 0.3300395256916996 2.0 0.0 133.65 50 203 1 661 0 1 1 3.0 0.3300395256916996 1.0 1.0 39.0 60 502 0 685 0 1 0 3.0 0.3300395256916996 1.0 0.0 26.0 25 444 0 729 0 1 0 2.0 0.3300395256916996 0.0 0.0 14.5 28 592 0 761 0 1 0 1.0 0.3300395256916996 1.0 0.0 17.8 18 509 0 50 1 0 0 2.0 0.3300395256916996 0.0 0.0 8.05 28 544 0 416 1 0 0 1.0 0.3300395256916996 0.0 0.0 8.05 22 71 0 589 0 1 0 1.0 713 rows X 12 columns Starting customized anti-select columns ... Updated dataset sample after performing anti-select columns: embarked sibsp parch fare age id survived sex_0 sex_1 pclass Family_count 0.36363636363636365 2.0 0.0 90.0 44 742 0 0 1 1 3.0 0.36363636363636365 0.0 0.0 7.75 28 42 0 0 1 0 1.0 0.36363636363636365 0.0 0.0 7.75 32 466 0 0 1 0 1.0 0.36363636363636365 0.0 0.0 7.75 28 530 1 1 0 0 1.0 0.36363636363636365 0.0 0.0 7.75 30 131 0 1 0 0 1.0 0.36363636363636365 0.0 0.0 7.8792 19 283 1 1 0 0 1.0 0.3300395256916996 0.0 0.0 8.05 22 71 0 0 1 0 1.0 0.3300395256916996 0.0 0.0 8.05 28 544 0 1 0 0 1.0 0.3300395256916996 0.0 0.0 0.0 28 106 0 0 1 0 1.0 0.3300395256916996 1.0 0.0 17.8 18 509 0 1 0 0 2.0 713 rows X 11 columns 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation Data preparation started ... Spliting of dataset into training and testing ... Training size : 0.75 Testing size : 0.25 Training data sample embarked sibsp parch fare age id survived sex_0 sex_1 pclass Family_count 0.3300395256916996 0.0 0.0 8.05 28 13 0 0 1 0 1.0 0.3300395256916996 0.0 0.0 22.525 28 16 0 0 1 0 1.0 0.3300395256916996 2.0 1.0 11.5 23 17 0 0 1 0 4.0 0.3300395256916996 4.0 2.0 31.275 2 18 0 1 0 0 7.0 0.3300395256916996 0.0 0.0 13.0 36 20 0 0 1 0 1.0 0.5763888888888888 0.0 0.0 18.7875 11 21 0 0 1 0 1.0 0.36363636363636365 0.0 0.0 7.75 28 9 0 1 0 0 1.0 0.36363636363636365 4.0 1.0 29.125 2 49 0 0 1 0 6.0 0.36363636363636365 0.0 0.0 7.75 28 73 0 0 1 0 1.0 0.36363636363636365 0.0 0.0 7.75 40 96 0 0 1 0 1.0 534 rows X 11 columns Testing data sample embarked sibsp parch fare age id survived sex_0 sex_1 pclass Family_count 0.5763888888888888 0.0 0.0 7.2292 22 14 0 0 1 0 1.0 0.3300395256916996 0.0 0.0 7.25 30 32 0 0 1 0 1.0 0.3300395256916996 0.0 0.0 13.0 24 34 0 1 0 0 1.0 0.3300395256916996 0.0 0.0 26.55 34 35 1 0 1 1 1.0 0.3300395256916996 0.0 0.0 30.5 27 57 1 0 1 1 1.0 0.5763888888888888 1.0 0.0 24.0 28 62 1 1 0 0 2.0 0.36363636363636365 1.0 0.0 15.5 28 28 1 1 0 0 2.0 0.36363636363636365 0.0 0.0 7.75 28 42 0 0 1 0 1.0 0.36363636363636365 0.0 0.0 7.75 28 81 1 1 0 0 1.0 0.36363636363636365 0.0 5.0 29.125 39 196 0 1 0 0 6.0 179 rows X 11 columns Time taken for spliting of data: 11.91 sec Starting customized outlier processing ... Columns with outlier percentage :- ColumnName OutlierPercentage 0 id 9.817672 1 age 9.256662 2 fare 9.116410 3 Family_count 2.805049 4 parch 1.683029 5 sibsp 3.225806 result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713845708648720"' result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713853330820560"'/20 result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713846364308763"' Checking imbalance data ... Imbalance Not Found. Feature selection using lasso ... feature selected by lasso: ['sibsp', 'sex_1', 'fare', 'sex_0', 'Family_count', 'age', 'pclass'] Total time taken by feature selection: 2.80 sec scaling Features of lasso data ... columns that will be scaled: ['sibsp', 'fare', 'Family_count', 'age', 'pclass'] Training dataset sample after scaling: id survived sex_1 sex_0 sibsp fare Family_count age pclass 163 0 1 0 -0.4733327294262123 -0.7081553190636275 -0.5661773649877848 -0.09018879949184125 -0.5759086932341287 27 0 1 0 3.6567273606652484 0.91382440495327 3.5194809174916357 -2.2466433783863144 -0.5759086932341287 53 1 0 1 -0.4733327294262123 -0.08005100856639845 0.01748810393784668 0.38902332915137505 -0.5759086932341287 210 0 1 0 -0.4733327294262123 1.1201887036809417 -0.5661773649877848 1.9863970912954294 1.7363863608036514 461 0 1 0 -0.4733327294262123 0.33579644478911136 0.01748810393784668 -0.09018879949184125 -0.5759086932341287 43 0 0 1 0.3526792885920798 -0.28797473524415335 0.01748810393784668 0.1494172648297669 -0.5759086932341287 63 0 0 1 6.134763414720124 1.8557188868034997 5.270477324268531 -0.09018879949184125 -0.5759086932341287 25 0 0 1 -0.4733327294262123 -0.7081553190636275 -0.5661773649877848 -0.09018879949184125 -0.5759086932341287 60 1 0 1 -0.4733327294262123 2.560580320241089 -0.5661773649877848 -1.0486130567782739 1.7363863608036514 17 0 1 0 1.1786913066103721 -0.5582755799252347 1.1848190417891098 -0.48953224002785484 -0.5759086932341287 534 rows X 9 columns Testing dataset sample after scaling: id survived sex_1 sex_0 sibsp fare Family_count age pclass 57 1 1 0 -0.4733327294262123 0.23183458145023392 -0.5661773649877848 -0.17005748759904396 1.7363863608036514 71 0 1 0 -0.4733327294262123 -0.7017429513328856 -0.5661773649877848 -0.5694009281350575 -0.5759086932341287 67 1 0 1 0.3526792885920798 0.16945746344690746 0.6011535728634781 -0.5694009281350575 -0.5759086932341287 79 0 1 0 -0.4733327294262123 -0.7358757103043059 -0.5661773649877848 -0.09018879949184125 -0.5759086932341287 82 0 1 0 0.3526792885920798 1.171649826033686 0.01748810393784668 -0.8090069924566656 1.7363863608036514 126 0 0 1 -0.4733327294262123 -0.713178756300162 -0.5661773649877848 -0.8888756805638683 -0.5759086932341287 99 1 0 1 -0.4733327294262123 7.751915966067934 0.01748810393784668 -1.1284817448854765 1.7363863608036514 62 1 0 1 0.3526792885920798 -0.03846626323084746 0.01748810393784668 -0.09018879949184125 -0.5759086932341287 34 0 0 1 -0.4733327294262123 -0.4958984619219083 -0.5661773649877848 -0.4096635519206521 -0.5759086932341287 32 0 1 0 -0.4733327294262123 -0.7350107476013265 -0.5661773649877848 0.06954857672256418 -0.5759086932341287 179 rows X 9 columns Total time taken by feature scaling: 44.70 sec Feature selection using rfe ... feature selected by RFE: ['sex_1', 'sex_0', 'age', 'pclass', 'fare', 'Family_count'] Total time taken by feature selection: 22.47 sec scaling Features of rfe data ... columns that will be scaled: ['r_age', 'r_pclass', 'r_fare', 'r_Family_count'] Training dataset sample after scaling: r_sex_1 id survived r_sex_0 r_age r_pclass r_fare r_Family_count 1 163 0 0 -0.09018879949184125 -0.5759086932341287 -0.7081553190636275 -0.5661773649877848 1 27 0 0 -2.2466433783863144 -0.5759086932341287 0.91382440495327 3.5194809174916357 0 53 1 1 0.38902332915137505 -0.5759086932341287 -0.08005100856639845 0.01748810393784668 1 210 0 0 1.9863970912954294 1.7363863608036514 1.1201887036809417 -0.5661773649877848 1 461 0 0 -0.09018879949184125 -0.5759086932341287 0.33579644478911136 0.01748810393784668 0 43 0 1 0.1494172648297669 -0.5759086932341287 -0.28797473524415335 0.01748810393784668 0 63 0 1 -0.09018879949184125 -0.5759086932341287 1.8557188868034997 5.270477324268531 0 25 0 1 -0.09018879949184125 -0.5759086932341287 -0.7081553190636275 -0.5661773649877848 0 60 1 1 -1.0486130567782739 1.7363863608036514 2.560580320241089 -0.5661773649877848 1 17 0 0 -0.48953224002785484 -0.5759086932341287 -0.5582755799252347 1.1848190417891098 534 rows X 8 columns Testing dataset sample after scaling: r_sex_1 id survived r_sex_0 r_age r_pclass r_fare r_Family_count 1 57 1 0 -0.17005748759904396 1.7363863608036514 0.23183458145023392 -0.5661773649877848 1 71 0 0 -0.5694009281350575 -0.5759086932341287 -0.7017429513328856 -0.5661773649877848 0 67 1 1 -0.5694009281350575 -0.5759086932341287 0.16945746344690746 0.6011535728634781 1 79 0 0 -0.09018879949184125 -0.5759086932341287 -0.7358757103043059 -0.5661773649877848 1 82 0 0 -0.8090069924566656 1.7363863608036514 1.171649826033686 0.01748810393784668 0 126 0 1 -0.8888756805638683 -0.5759086932341287 -0.713178756300162 -0.5661773649877848 0 99 1 1 -1.1284817448854765 1.7363863608036514 7.751915966067934 0.01748810393784668 0 62 1 1 -0.09018879949184125 -0.5759086932341287 -0.03846626323084746 0.01748810393784668 0 34 0 1 -0.4096635519206521 -0.5759086932341287 -0.4958984619219083 -0.5661773649877848 1 32 0 0 0.06954857672256418 -0.5759086932341287 -0.7350107476013265 -0.5661773649877848 179 rows X 8 columns Total time taken by feature scaling: 42.68 sec scaling Features of pca data ... columns that will be scaled: ['embarked', 'sibsp', 'parch', 'fare', 'age', 'pclass', 'Family_count'] Training dataset sample after scaling: id survived sex_1 sex_0 embarked sibsp parch fare age pclass Family_count 17 0 1 0 -0.5232459472817322 1.1786913066103721 0.7672497196248651 -0.5582755799252346 -0.4895322400278547 -0.5759086932341301 1.1848190417891102 31 1 0 1 -0.5232459472817322 0.3526792885920798 -0.50514577813811 0.04470322744025449 -0.09018879949184123 -0.5759086932341301 0.017488103937846687 60 1 0 1 -0.5232459472817322 -0.4733327294262123 -0.50514577813811 2.560580320241088 -1.0486130567782737 1.7363863608036554 -0.566177364987785 65 0 1 0 -0.5232459472817322 0.3526792885920798 -0.50514577813811 -0.37738193771558787 -0.09018879949184123 -0.5759086932341301 0.017488103937846687 25 0 0 1 -0.5232459472817322 -0.4733327294262123 -0.50514577813811 -0.7081553190636273 -0.09018879949184123 -0.5759086932341301 -0.566177364987785 163 0 1 0 -0.5232459472817322 -0.4733327294262123 -0.50514577813811 -0.7081553190636273 -0.09018879949184123 -0.5759086932341301 -0.566177364987785 73 0 1 0 -0.17262793722408595 -0.4733327294262123 -0.50514577813811 -0.7142183749335507 -0.09018879949184123 -0.5759086932341301 -0.566177364987785 97 0 1 0 -0.17262793722408595 -0.4733327294262123 -0.50514577813811 -0.5229285463900163 2.226003155617037 -0.5759086932341301 -0.566177364987785 131 0 0 1 -0.17262793722408595 -0.4733327294262123 -0.50514577813811 -0.7142183749335507 0.06954857672256418 -0.5759086932341301 -0.566177364987785 133 1 0 1 -0.17262793722408595 -0.4733327294262123 -0.50514577813811 -0.7088456258361975 -0.09018879949184123 -0.5759086932341301 -0.566177364987785 534 rows X 11 columns Testing dataset sample after scaling: id survived sex_1 sex_0 embarked sibsp parch fare age pclass Family_count 14 0 1 0 2.0476663405184086 -0.4733327294262123 -0.50514577813811 -0.7358757103043057 -0.5694009281350575 -0.5759086932341301 -0.566177364987785 32 0 1 0 -0.5232459472817322 -0.4733327294262123 -0.50514577813811 -0.7350107476013262 0.06954857672256418 -0.5759086932341301 -0.566177364987785 34 0 0 1 -0.5232459472817322 -0.4733327294262123 -0.50514577813811 -0.4958984619219081 -0.40966355192065207 -0.5759086932341301 -0.566177364987785 35 1 1 0 -0.5232459472817322 -0.4733327294262123 -0.50514577813811 0.06757483737480756 0.389023329151375 1.7363863608036554 -0.566177364987785 57 1 1 0 -0.5232459472817322 -0.4733327294262123 -0.50514577813811 0.23183458145023386 -0.17005748759904393 1.7363863608036554 -0.566177364987785 62 1 0 1 2.0476663405184086 0.3526792885920798 -0.50514577813811 -0.03846626323084745 -0.09018879949184123 -0.5759086932341301 0.017488103937846687 28 1 0 1 -0.17262793722408595 0.3526792885920798 -0.50514577813811 -0.3919365985830307 -0.09018879949184123 -0.5759086932341301 0.017488103937846687 42 0 1 0 -0.17262793722408595 -0.4733327294262123 -0.50514577813811 -0.7142183749335507 -0.09018879949184123 -0.5759086932341301 -0.566177364987785 81 1 0 1 -0.17262793722408595 -0.4733327294262123 -0.50514577813811 -0.7142183749335507 -0.09018879949184123 -0.5759086932341301 -0.566177364987785 196 0 0 1 -0.17262793722408595 -0.4733327294262123 5.856831710676766 0.17465555661385126 0.7883667696873885 -0.5759086932341301 2.3521499796403735 179 rows X 11 columns Total time taken by feature scaling: 42.52 sec Dimension Reduction using pca ... PCA columns: ['col_0', 'col_1', 'col_2', 'col_3', 'col_4', 'col_5'] Total time taken by PCA: 11.87 sec 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation Model Training started ... Starting customized hyperparameter update ... Completed customized hyperparameter update. Hyperparameters used for model training: response_column : survived name : decision_forest tree_type : Classification min_impurity : (0.0, 0.1, 0.2) max_depth : (5, 6, 8, 10) min_node_size : (1, 2, 3) num_trees : (-1, 20, 30) Total number of models for decision_forest : 108 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- response_column : survived name : xgboost model_type : Classification column_sampling : (1, 0.6) min_impurity : (0.0, 0.1, 0.2) lambda1 : (0.01, 0.1, 1, 10) shrinkage_factor : (0.5, 0.1, 0.3) max_depth : (5, 6, 8, 10) min_node_size : (1, 2, 3, 5) iter_num : (10, 20, 30) Total number of models for xgboost : 3456 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Performing hyperParameter tuning ... decision_forest ---------------------------------------------------------------------------------------------------- xgboost ---------------------------------------------------------------------------------------------------- Evaluating models performance ... Evaluation completed. Leaderboard Rank Model-ID Feature-Selection Accuracy Micro-Precision Micro-Recall Micro-F1 Macro-Precision Macro-Recall Macro-F1 Weighted-Precision Weighted-Recall Weighted-F1 0 1 XGBOOST_2 pca 0.793296 0.793296 0.793296 0.793296 0.781918 0.788603 0.784583 0.796967 0.793296 0.794506 1 2 DECISIONFOREST_3 lasso 0.793296 0.793296 0.793296 0.793296 0.785015 0.772398 0.777281 0.791188 0.793296 0.790961 2 3 XGBOOST_0 lasso 0.782123 0.782123 0.782123 0.782123 0.774160 0.757905 0.763684 0.779693 0.782123 0.778804 3 4 XGBOOST_3 lasso 0.782123 0.782123 0.782123 0.782123 0.774160 0.757905 0.763684 0.779693 0.782123 0.778804 4 5 DECISIONFOREST_0 lasso 0.770950 0.770950 0.770950 0.770950 0.763252 0.743412 0.749838 0.768262 0.770950 0.766484 5 6 DECISIONFOREST_2 pca 0.765363 0.765363 0.765363 0.765363 0.752372 0.752372 0.752372 0.765363 0.765363 0.765363 6 7 XGBOOST_1 rfe 0.664804 0.664804 0.664804 0.664804 0.646978 0.605731 0.602222 0.654215 0.664804 0.638361 7 8 DECISIONFOREST_1 rfe 0.664804 0.664804 0.664804 0.664804 0.653798 0.597628 0.588695 0.657792 0.664804 0.629221 8 rows X 13 columns 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 20/20
- Display model leaderboard.
>>> aml.leaderboard()
Rank Model-ID Feature-Selection Accuracy Micro-Precision Micro-Recall Micro-F1 Macro-Precision Macro-Recall Macro-F1 Weighted-Precision Weighted-Recall Weighted-F1 0 1 XGBOOST_2 pca 0.793296 0.793296 0.793296 0.793296 0.781918 0.788603 0.784583 0.796967 0.793296 0.794506 1 2 DECISIONFOREST_3 lasso 0.793296 0.793296 0.793296 0.793296 0.785015 0.772398 0.777281 0.791188 0.793296 0.790961 2 3 XGBOOST_0 lasso 0.782123 0.782123 0.782123 0.782123 0.774160 0.757905 0.763684 0.779693 0.782123 0.778804 3 4 XGBOOST_3 lasso 0.782123 0.782123 0.782123 0.782123 0.774160 0.757905 0.763684 0.779693 0.782123 0.778804 4 5 DECISIONFOREST_0 lasso 0.770950 0.770950 0.770950 0.770950 0.763252 0.743412 0.749838 0.768262 0.770950 0.766484 5 6 DECISIONFOREST_2 pca 0.765363 0.765363 0.765363 0.765363 0.752372 0.752372 0.752372 0.765363 0.765363 0.765363 6 7 XGBOOST_1 rfe 0.664804 0.664804 0.664804 0.664804 0.646978 0.605731 0.602222 0.654215 0.664804 0.638361 7 8 DECISIONFOREST_1 rfe 0.664804 0.664804 0.664804 0.664804 0.653798 0.597628 0.588695 0.657792 0.664804 0.629221
- Display the best performing model.
>>> aml.leader()
Rank Model-ID Feature-Selection Accuracy Micro-Precision Micro-Recall Micro-F1 Macro-Precision Macro-Recall Macro-F1 Weighted-Precision Weighted-Recall Weighted-F1 0 1 XGBOOST_2 pca 0.793296 0.793296 0.793296 0.793296 0.781918 0.788603 0.784583 0.796967 0.793296 0.794506
- Generate prediction on validation dataset using best performing model.In the data preparation phase, AutoML generates the validation dataset by splitting the data provided during fitting into training and testing sets. AutoML's model training utilizes the training data, with the testing data acting as the validation dataset for model evaluation.
>>> prediction = aml.predict()
Following model is being used for generating prediction : Model ID : XGBOOST_2 Feature Selection Method : pca Prediction : survived id Prediction Confidence_Lower Confidence_upper 0 0 42 0 1.000 1.000 1 1 81 1 1.000 1.000 2 0 14 0 0.875 0.875 3 0 196 0 0.500 0.500 4 0 337 1 0.875 0.875 5 0 32 0 0.875 0.875 6 1 23 1 0.750 0.750 7 0 11 0 0.750 0.750 8 1 10 1 0.750 0.750 9 1 28 1 0.625 0.625 Performance Metrics : Prediction Mapping CLASS_1 CLASS_2 Precision Recall F1 Support SeqNum 0 0 CLASS_1 89 16 0.847619 0.809091 0.827907 110 1 1 CLASS_2 21 53 0.716216 0.768116 0.741259 69 ROC-AUC : AUC GINI 0.7152832674571804 0.4305665349143608 threshold_value tpr fpr 0.04081632653061224 0.7681159420289855 0.19090909090909092 0.08163265306122448 0.7681159420289855 0.19090909090909092 0.1020408163265306 0.7681159420289855 0.19090909090909092 0.12244897959183673 0.7681159420289855 0.19090909090909092 0.16326530612244897 0.7681159420289855 0.19090909090909092 0.18367346938775508 0.7681159420289855 0.19090909090909092 0.14285714285714285 0.7681159420289855 0.19090909090909092 0.061224489795918366 0.7681159420289855 0.19090909090909092 0.02040816326530612 0.7681159420289855 0.19090909090909092 0.0 1.0 1.0 Confusion Matrix : array([[89, 21], [16, 53]], dtype=int64)
>>> prediction.head()
survived id Prediction Confidence_Lower Confidence_upper 0 32 0 0.875 0.875 0 380 0 1.0 1.0 0 413 0 0.75 0.75 0 556 0 1.0 1.0 0 731 0 0.875 0.875 0 79 0 1.0 1.0 0 71 0 0.875 0.875 0 355 0 0.75 0.75 0 337 1 0.875 0.875 0 14 0 0.875 0.875
- Generate prediction on test dataset using best performing model.
>>> prediction = aml.predict(titanic_test)
Data Transformation started ... Performing transformation carried out in feature engineering phase ... Updated dataset after dropping futile columns : passenger survived pclass sex age sibsp parch fare cabin embarked id 469 0 3 male None 0 0 7.725 None Q 8 570 1 3 male 32 0 0 7.8542 None S 15 223 0 3 male 51 0 0 8.05 None S 23 856 1 3 female 18 0 1 9.35 None S 11 326 1 1 female 36 0 0 135.6333 C32 C 13 650 1 3 female 23 0 0 7.55 None S 21 734 0 2 male 23 0 0 13.0 None S 14 795 0 3 male 25 0 0 7.8958 None S 22 631 1 1 male 80 0 0 30.0 A23 S 10 57 1 2 female 21 0 0 10.5 None S 18 Updated dataset after performing target column transformation : sibsp cabin parch fare age id passenger sex pclass embarked survived 0 None 1 9.35 18 11 856 female 3 S 1 0 B51 B53 B55 0 5.0 33 9 873 male 1 S 0 0 C106 0 30.5 None 17 299 male 1 S 1 0 None 0 8.05 21 12 38 male 3 S 0 0 A23 0 30.0 80 10 631 male 1 S 1 0 None 0 10.5 21 18 57 female 2 S 1 0 C32 0 135.6333 36 13 326 female 1 C 1 0 None 0 7.55 23 21 650 female 3 S 1 0 None 0 13.0 23 14 734 male 2 S 0 0 None 0 7.8958 25 22 795 male 3 S 0 Updated dataset after dropping customized missing value containing columns : sibsp parch fare age id passenger sex pclass embarked survived 0 1 9.35 18 11 856 female 3 S 1 0 0 30.0 80 10 631 male 1 S 1 0 0 10.5 21 18 57 female 2 S 1 0 0 13.0 23 14 734 male 2 S 0 0 0 135.6333 36 13 326 female 1 C 1 0 0 7.55 23 21 650 female 3 S 1 0 0 8.05 21 12 38 male 3 S 0 0 0 7.8542 48 20 772 male 3 S 0 0 0 7.8542 32 15 570 male 3 S 1 0 0 8.05 51 23 223 male 3 S 0 Updated dataset after imputing customized missing value containing columns : sibsp parch fare age id passenger sex pclass embarked survived 4 2 31.275 4 116 851 male 3 S 0 4 1 39.6875 7 63 51 male 3 S 0 4 1 39.6875 2 62 825 male 3 S 0 4 2 31.275 11 42 543 female 3 S 0 0 0 8.05 44 49 697 male 3 S 0 0 0 7.75 22 57 142 female 3 S 1 0 0 7.75 65 65 281 male 3 Q 0 0 0 7.8542 21 105 624 male 3 S 0 0 0 8.05 55 113 153 male 3 S 0 0 5 39.6875 41 121 639 female 3 S 0 result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713846972810707"' Updated dataset after performing customized variable width bin-code transformation : sibsp parch fare age id survived passenger sex embarked pclass 0 0 8.05 55 113 0 153 male S high 0 0 26.2875 36 153 1 513 male S low 0 0 7.2292 28 161 1 368 female C high 0 0 13.0 23 14 0 734 male S high 0 0 7.8958 24 54 0 295 male S high 0 0 0.0 49 70 0 598 male S high 4 1 39.6875 7 63 0 51 male S high 4 2 31.275 11 42 0 543 female S high 4 1 39.6875 2 62 0 825 male S high 4 1 29.125 7 143 0 279 male Q high result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713846171306458"' result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713850847022753"' Updated dataset after performing customized categorical encoding : embarked sibsp parch fare age id survived passenger sex pclass 0.36363636363636365 0 0 7.6292 28 47 0 503 female 0 0.36363636363636365 1 0 90.0 33 48 1 413 female 1 0.36363636363636365 1 0 15.5 28 154 1 187 female 0 0.36363636363636365 2 0 23.25 28 84 1 331 female 0 0.36363636363636365 0 0 7.75 65 65 0 281 male 0 0.36363636363636365 0 0 7.725 28 8 0 469 male 0 0.3300395256916996 0 0 7.8542 20 158 0 641 male 0 0.3300395256916996 0 0 7.925 39 126 0 529 male 0 0.3300395256916996 0 0 8.05 28 117 0 88 male 0 0.3300395256916996 0 0 0.0 28 190 0 675 male 0 result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713846624114222"' Updated dataset after performing categorical encoding : embarked sibsp parch fare age id survived passenger sex_0 sex_1 pclass 0.3300395256916996 0 0 56.4958 26 61 1 510 0 1 0 0.3300395256916996 0 0 7.8958 24 54 0 295 0 1 0 0.3300395256916996 0 0 10.5 21 18 1 57 1 0 0 0.3300395256916996 0 0 10.1708 19 34 0 688 0 1 0 0.3300395256916996 0 0 7.8 21 122 0 52 0 1 0 0.3300395256916996 0 0 7.4958 36 120 0 664 0 1 0 0.36363636363636365 0 0 7.6292 28 47 0 503 1 0 0 0.36363636363636365 1 0 90.0 33 48 1 413 1 0 1 0.36363636363636365 1 0 15.5 28 154 1 187 1 0 0 0.36363636363636365 2 0 23.25 28 84 1 331 1 0 0 result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713846909702579"' Updated dataset after performing customized non-linear transformation : embarked sibsp parch fare age id survived passenger sex_0 sex_1 pclass Family_count 0.3300395256916996 0.0 0.0 10.5 21 18 1 57 1 0 0 1.0 0.3300395256916996 0.0 0.0 8.05 45 60 1 339 0 1 0 1.0 0.3300395256916996 0.0 0.0 7.4958 36 120 0 664 0 1 0 1.0 0.3300395256916996 0.0 1.0 9.35 18 11 1 856 1 0 0 2.0 0.3300395256916996 0.0 0.0 26.2875 35 144 1 702 0 1 1 1.0 0.3300395256916996 0.0 0.0 10.5 26 138 0 620 0 1 0 1.0 0.36363636363636365 1.0 0.0 15.5 28 154 1 187 1 0 0 2.0 0.36363636363636365 0.0 0.0 7.75 65 65 0 281 0 1 0 1.0 0.36363636363636365 0.0 0.0 7.725 28 8 0 469 0 1 0 1.0 0.36363636363636365 0.0 0.0 7.75 28 90 0 127 0 1 0 1.0 Updated dataset after performing customized anti-selection : embarked sibsp parch fare age id survived sex_0 sex_1 pclass Family_count 0.3300395256916996 0.0 0.0 10.5 21 18 1 1 0 0 1.0 0.3300395256916996 0.0 0.0 8.05 45 60 1 0 1 0 1.0 0.3300395256916996 0.0 0.0 7.4958 36 120 0 0 1 0 1.0 0.3300395256916996 0.0 1.0 9.35 18 11 1 1 0 0 2.0 0.3300395256916996 0.0 0.0 26.2875 35 144 1 0 1 1 1.0 0.3300395256916996 0.0 0.0 10.5 26 138 0 0 1 0 1.0 0.36363636363636365 1.0 0.0 15.5 28 154 1 1 0 0 2.0 0.36363636363636365 0.0 0.0 7.75 65 65 0 0 1 0 1.0 0.36363636363636365 0.0 0.0 7.725 28 8 0 0 1 0 1.0 0.36363636363636365 0.0 0.0 7.75 28 90 0 0 1 0 1.0 Performing transformation carried out in data preparation phase ... result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713849399082665"' Updated dataset after performing Lasso feature selection: id sibsp sex_1 fare sex_0 Family_count age pclass survived 190 0.0 1 0.0 0 1.0 28 0 0 28 1.0 1 20.2125 0 3.0 18 0 0 26 0.0 1 8.05 0 1.0 30 0 0 79 0.0 1 7.775 0 1.0 28 0 1 98 0.0 1 13.0 0 1.0 34 0 1 114 0.0 1 26.2875 0 1.0 42 1 1 142 0.0 0 7.75 1 1.0 45 0 0 16 0.0 0 7.55 1 1.0 28 0 0 134 1.0 0 26.0 1 2.0 29 0 1 110 0.0 0 55.0 1 2.0 28 1 1 Updated dataset after performing scaling on Lasso selected features : id survived sex_1 sex_0 sibsp fare Family_count age pclass 190 0 1 0 -0.4733327294262123 -1.036500151284071 -0.5661773649877848 -0.09018879949184125 -0.5759086932341287 28 0 1 0 0.3526792885920798 -0.19596848618924687 0.6011535728634781 -0.8888756805638683 -0.5759086932341287 26 0 1 0 -0.4733327294262123 -0.7017429513328856 -0.5661773649877848 0.06954857672256418 -0.5759086932341287 79 1 1 0 -0.4733327294262123 -0.713178756300162 -0.5661773649877848 -0.09018879949184125 -0.5759086932341287 98 1 1 0 -0.4733327294262123 -0.4958984619219083 -0.5661773649877848 0.38902332915137505 -0.5759086932341287 114 1 1 0 -0.4733327294262123 0.056658841724225466 -0.5661773649877848 1.0279728340089966 1.7363863608036514 142 0 0 1 -0.4733327294262123 -0.7142183749335509 -0.5661773649877848 1.2675788983306049 -0.5759086932341287 16 0 0 1 -0.4733327294262123 -0.7225353240006611 -0.5661773649877848 -0.09018879949184125 -0.5759086932341287 134 1 0 1 0.3526792885920798 0.044703227440254505 0.01748810393784668 -0.010320111384638534 -0.5759086932341287 110 1 0 1 -0.4733327294262123 1.250660842171233 0.01748810393784668 -0.09018879949184125 1.7363863608036514 Updated dataset after performing RFE feature selection: id sex_1 sex_0 age pclass fare Family_count survived 190 1 0 28 0 0.0 1.0 0 28 1 0 18 0 20.2125 3.0 0 26 1 0 30 0 8.05 1.0 0 79 1 0 28 0 7.775 1.0 1 98 1 0 34 0 13.0 1.0 1 114 1 0 42 1 26.2875 1.0 1 142 0 1 45 0 7.75 1.0 0 16 0 1 28 0 7.55 1.0 0 134 0 1 29 0 26.0 2.0 1 110 0 1 28 1 55.0 2.0 1 Updated dataset after performing scaling on RFE selected features : r_sex_1 id survived r_sex_0 r_age r_pclass r_fare r_Family_count 1 190 0 0 -0.09018879949184125 -0.5759086932341287 -1.036500151284071 -0.5661773649877848 1 28 0 0 -0.8888756805638683 -0.5759086932341287 -0.19596848618924687 0.6011535728634781 1 26 0 0 0.06954857672256418 -0.5759086932341287 -0.7017429513328856 -0.5661773649877848 1 79 1 0 -0.09018879949184125 -0.5759086932341287 -0.713178756300162 -0.5661773649877848 1 98 1 0 0.38902332915137505 -0.5759086932341287 -0.4958984619219083 -0.5661773649877848 1 114 1 0 1.0279728340089966 1.7363863608036514 0.056658841724225466 -0.5661773649877848 0 142 0 1 1.2675788983306049 -0.5759086932341287 -0.7142183749335509 -0.5661773649877848 0 16 0 1 -0.09018879949184125 -0.5759086932341287 -0.7225353240006611 -0.5661773649877848 0 134 1 1 -0.010320111384638534 -0.5759086932341287 0.044703227440254505 0.01748810393784668 0 110 1 1 -0.09018879949184125 1.7363863608036514 1.250660842171233 0.01748810393784668 Updated dataset after performing scaling for PCA feature selection : id survived sex_1 sex_0 embarked sibsp parch fare age pclass Family_count 190 0 1 0 -0.5236584390582704 -0.4733327294262123 -0.50514577813811 -1.0365001512840708 -0.09018879949184123 -0.5759086932341301 -0.566177364987785 28 0 1 0 -0.5236584390582704 0.3526792885920798 0.7672497196248651 -0.1959684861892468 -0.8888756805638682 -0.5759086932341301 0.6011535728634785 26 0 1 0 -0.5236584390582704 -0.4733327294262123 -0.50514577813811 -0.7017429513328853 0.06954857672256418 -0.5759086932341301 -0.566177364987785 79 1 1 0 -0.5236584390582704 -0.4733327294262123 -0.50514577813811 -0.7131787563001618 -0.09018879949184123 -0.5759086932341301 -0.566177364987785 98 1 1 0 -0.5236584390582704 -0.4733327294262123 -0.50514577813811 -0.4958984619219081 0.389023329151375 -0.5759086932341301 -0.566177364987785 114 1 1 0 -0.5236584390582704 -0.4733327294262123 -0.50514577813811 0.05665884172422545 1.0279728340089966 1.7363863608036554 -0.566177364987785 142 0 0 1 -0.5236584390582704 -0.4733327294262123 -0.50514577813811 -0.7142183749335507 1.2675788983306047 -0.5759086932341301 -0.566177364987785 16 0 0 1 -0.5236584390582704 -0.4733327294262123 -0.50514577813811 -0.7225353240006609 -0.09018879949184123 -0.5759086932341301 -0.566177364987785 134 1 0 1 -0.5236584390582704 0.3526792885920798 -0.50514577813811 0.04470322744025449 -0.010320111384638533 -0.5759086932341301 0.017488103937846687 110 1 0 1 -0.5236584390582704 -0.4733327294262123 0.7672497196248651 1.2506608421712326 -0.09018879949184123 1.7363863608036554 0.017488103937846687 Updated dataset after performing PCA feature selection : id col_0 col_1 col_2 col_3 col_4 col_5 survived 0 34 -0.885025 -1.180737 0.100885 -0.635803 -0.007118 0.330957 0 1 142 -1.149064 -0.329648 -0.857521 0.961232 0.258437 -1.077695 0 2 120 -1.174492 -0.715732 -0.605833 0.365936 -0.214848 0.235200 0 3 16 -0.891443 -0.859990 -0.138386 0.002481 0.458539 -0.989043 0 4 190 -1.135244 -1.133515 -0.235449 0.025433 -0.140323 0.258624 0 ... ... ... ... ... ... ... ... ... 173 183 1.321608 -1.392658 0.622355 -0.251383 1.497978 1.361819 1 174 138 -0.988370 -0.956751 -0.196295 -0.244715 -0.088857 0.295064 0 175 60 -1.305914 -0.424775 -0.988488 0.866806 -0.319594 0.189375 1 176 72 5.525947 -1.038043 -0.239991 -0.357263 -1.256634 0.570059 0 177 61 -0.478894 0.088547 -0.394947 -0.932471 0.033191 0.408747 1 178 rows × 8 columns Data Transformation completed. Following model is being used for generating prediction : Model ID : XGBOOST_2 Feature Selection Method : pca Prediction : survived id Prediction Confidence_Lower Confidence_upper 0 0 120 0 1.000 1.000 1 0 190 0 1.000 1.000 2 1 134 1 0.625 0.625 3 1 144 0 0.750 0.750 4 0 28 0 0.750 0.750 5 1 168 1 0.750 0.750 6 1 110 1 1.000 1.000 7 0 16 1 0.750 0.750 8 0 142 1 0.750 0.750 9 0 34 0 0.750 0.750 Performance Metrics : Prediction Mapping CLASS_1 CLASS_2 Precision Recall F1 Support SeqNum 0 0 CLASS_1 99 25 0.798387 0.900000 0.846154 110 1 1 CLASS_2 11 43 0.796296 0.632353 0.704918 68 ROC-AUC : AUC GINI 0.7345588235294118 0.46911764705882364 threshold_value tpr fpr 0.04081632653061224 0.6323529411764706 0.1 0.08163265306122448 0.6323529411764706 0.1 0.1020408163265306 0.6323529411764706 0.1 0.12244897959183673 0.6323529411764706 0.1 0.16326530612244897 0.6323529411764706 0.1 0.18367346938775508 0.6323529411764706 0.1 0.14285714285714285 0.6323529411764706 0.1 0.061224489795918366 0.6323529411764706 0.1 0.02040816326530612 0.6323529411764706 0.1 0.0 1.0 1.0 Confusion Matrix : array([[99, 11], [25, 43]], dtype=int64)
>>> prediction.head()
survived id Prediction Confidence_Lower Confidence_upper 0 28 0 0.75 0.75 0 152 0 0.625 0.625 0 103 0 1.0 1.0 0 31 0 0.875 0.875 0 43 0 0.875 0.875 0 37 0 0.875 0.875 0 127 1 0.875 0.875 0 26 0 1.0 1.0 0 190 0 1.0 1.0 0 120 0 1.0 1.0