This example predicts the species of iris flower based on different factors.
Run AutoML to acquire the most effective model with the following specifications:
- Use early stopping timer to 300 sec.
- Include only ‘xgboost’ model for training.
- Opt for verbose level 2 to get detailed log.
- Add customization for some specific processes of AutoClassifier.
- Load data and split it to train and test datasets.
- Load the example data and create teradataml DataFrame.
>>> load_example_data("teradataml", "iris_input")
- Perform sampling to get 80% for training and 20% for testing.
>>> iris_sample = iris.sample(frac = [0.8, 0.2])
- Fetch train and test data.
>>> iris_train= iris_sample[iris_sample['sampleid'] == 1].drop('sampleid', axis=1)
>>> iris_test = iris_sample[iris_sample['sampleid'] == 2].drop('sampleid', axis=1)
- Load the example data and create teradataml DataFrame.
- Add customization.
>>> AutoClassifier.generate_custom_config("custom_iris")
Generating custom config JSON for AutoML ... Available main options for customization with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Feature Engineering Phase Index 2: Customize Data Preparation Phase Index 3: Customize Model Training Phase Index 4: Generate custom json and exit -------------------------------------------------------------------------------- Enter the index you want to customize: 1 Customizing Feature Engineering Phase ... Available options for customization of feature engineering phase with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Missing Value Handling Index 2: Customize Bincode Encoding Index 3: Customize String Manipulation Index 4: Customize Categorical Encoding Index 5: Customize Mathematical Transformation Index 6: Customize Nonlinear Transformation Index 7: Customize Antiselect Features Index 8: Back to main menu Index 9: Generate custom json and exit -------------------------------------------------------------------------------- Enter the list of indices you want to customize in feature engineering phase: 8 Customization of feature engineering phase has been completed successfully. Available main options for customization with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Feature Engineering Phase Index 2: Customize Data Preparation Phase Index 3: Customize Model Training Phase Index 4: Generate custom json and exit -------------------------------------------------------------------------------- Enter the index you want to customize: 2 Customizing Data Preparation Phase ... Available options for customization of data preparation phase with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Train Test Split Index 2: Customize Data Imbalance Handling Index 3: Customize Outlier Handling Index 4: Customize Feature Scaling Index 5: Back to main menu Index 6: Generate custom json and exit -------------------------------------------------------------------------------- Enter the list of indices you want to customize in data preparation phase: 1, 4, 5 Customizing Train Test Split ... Enter the train size for train test split: 0.85 Customization of train test split has been completed successfully. Available feature scaling methods with corresponding indices: Index 1: maxabs Index 2: mean Index 3: midrange Index 4: range Index 5: rescale Index 6: std Index 7: sum Index 8: ustd Enter the corresponding index feature scaling method: 4 Customization of feature scaling has been completed successfully. Customization of data preparation phase has been completed successfully. Available main options for customization with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Feature Engineering Phase Index 2: Customize Data Preparation Phase Index 3: Customize Model Training Phase Index 4: Generate custom json and exit -------------------------------------------------------------------------------- Enter the index you want to customize: 3 Customizing Model Training Phase ... Available options for customization of model training phase with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Model Hyperparameter Index 2: Back to main menu Index 3: Generate custom json and exit -------------------------------------------------------------------------------- Enter the list of indices you want to customize in model training phase: 1 Customizing Model Hyperparameter ... Available models for hyperparameter tuning with corresponding indices: Index 1: decision_forest Index 2: xgboost Index 3: knn Index 4: glm Index 5: svm Available hyperparamters update methods with corresponding indices: Index 1: ADD Index 2: REPLACE Enter the list of model indices for performing hyperparameter tuning: 2 Available hyperparameters for model 'xgboost' with corresponding indices: Index 1: min_impurity Index 2: max_depth Index 3: min_node_size Index 4: shrinkage_factor Index 5: iter_num Enter the list of hyperparameter indices for model 'xgboost': 2 Enter the index of corresponding update method for hyperparameters 'max_depth' for model 'xgboost': 2 Enter the list of value for hyperparameter 'max_depth' for model 'xgboost': 3,4 Customization of model hyperparameter has been completed successfully. Available options for customization of model training phase with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Model Hyperparameter Index 2: Back to main menu Index 3: Generate custom json and exit -------------------------------------------------------------------------------- Enter the list of indices you want to customize in model training phase: 3 Customization of model training phase has been completed successfully. Process of generating custom config file for AutoML has been completed successfully. 'custom_iris.json' file is generated successfully under the current working directory.
- Create an AutoML instance.
>>> aml = AutoClassifier(include=['xgboost'], >>> verbose=2, >>> max_runtime_secs=300, >>> custom_config_file='custom_iris.json')
- Fit training data.
>>> aml.fit(iris_train, iris_train.species)
Received below input for customization : { "TrainTestSplitIndicator": true, "TrainingSize": 0.85, "DataImbalanceIndicator": true, "DataImbalanceMethod": "SMOTE", "FeatureScalingIndicator": true, "FeatureScalingMethod": "range", "HyperparameterTuningIndicator": true, "HyperparameterTuningParam": { "xgboost": { "max_depth": { "Method": "ADD", "Value": [ 3, 4 ] } } } } 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation Feature Exploration started ... Data Overview: Total Rows in the data: 120 Total Columns in the data: 6 Column Summary: ColumnName Datatype NonNullCount NullCount BlankCount ZeroCount PositiveCount NegativeCount NullPercentage NonNullPercentage species INTEGER 120 0 None 0 120 0 0.0 100.0 sepal_length FLOAT 120 0 None 0 120 0 0.0 100.0 petal_length FLOAT 120 0 None 0 120 0 0.0 100.0 sepal_width FLOAT 120 0 None 0 120 0 0.0 100.0 petal_width FLOAT 120 0 None 0 120 0 0.0 100.0 id INTEGER 120 0 None 0 120 0 0.0 100.0 Statistics of Data: func id sepal_length sepal_width petal_length petal_width species std 42.746 0.828 0.445 1.784 0.764 0.825 25% 37.75 5.1 2.775 1.5 0.275 1 50% 72.5 5.8 3 4.2 1.3 2 75% 110.25 6.4 3.3 5.1 1.8 3 max 149 7.7 4.4 6.9 2.5 3 min 2 4.3 2 1 0.1 1 mean 73.642 5.818 3.033 3.683 1.158 1.975 count 120 120 120 120 120 120 Target Column Distribution: Columns with outlier percentage :- ColumnName OutlierPercentage 0 sepal_width 0.833333 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation Feature Engineering started ... Handling duplicate records present in dataset ... Analysis completed. No action taken. Total time to handle duplicate records: 1.57 sec Handling less significant features from data ... Total time to handle less significant features: 5.83 sec Handling Date Features ... Analysis Completed. Dataset does not contain any feature related to dates. No action needed. Total time to handle date features: 0.00 sec Proceeding with default option for missing value imputation. Proceeding with default option for handling remaining missing values. Checking Missing values in dataset ... Analysis Completed. No Missing Values Detected. Total time to find missing values in data: 7.15 sec Imputing Missing Values ... Analysis completed. No imputation required. Time taken to perform imputation: 0.01 sec No information provided for Variable-Width Transformation. Skipping customized string manipulation.⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾| 26% - 5/19 Starting Customized Categorical Feature Encoding ... AutoML will proceed with default encoding technique. Performing encoding for categorical columns ... Analysis completed. No categorical columns were found. Time taken to encode the columns: 1.42 sec Starting customized mathematical transformation ... Skipping customized mathematical transformation. Starting customized non-linear transformation ... Skipping customized non-linear transformation. Starting customized anti-select columns ... Skipping customized anti-select columns. 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation Data preparation started ... Spliting of dataset into training and testing ... Training size : 0.85 Testing size : 0.15 Training data sample sepal_length sepal_width petal_length petal_width species id 5.1 3.4 1.5 0.2 1 10 5.0 2.0 3.5 1.0 2 14 5.0 3.2 1.2 0.2 1 22 5.7 2.6 3.5 1.0 2 12 6.3 3.3 6.0 2.5 3 9 5.4 3.9 1.3 0.4 1 17 5.1 2.5 3.0 1.1 2 13 5.6 2.7 4.2 1.3 2 21 6.7 3.0 5.0 1.7 2 15 6.0 3.0 4.8 1.8 3 23 102 rows X 6 columns Testing data sample sepal_length sepal_width petal_length petal_width species id 6.4 3.2 5.3 2.3 3 30 5.7 2.9 4.2 1.3 2 31 6.3 2.9 5.6 1.8 3 103 6.3 3.4 5.6 2.4 3 27 6.5 2.8 4.6 1.5 2 28 6.7 2.5 5.8 1.8 3 108 6.4 2.7 5.3 1.9 3 29 5.4 3.9 1.7 0.4 1 85 6.2 2.2 4.5 1.5 2 107 5.6 2.9 3.6 1.3 2 110 18 rows X 6 columns Time taken for spliting of data: 11.26 sec Starting customized outlier processing ... No information provided for customized outlier processing. AutoML will proceed with default settings. Outlier preprocessing ... Columns with outlier percentage :- ColumnName OutlierPercentage 0 sepal_width 0.833333 Deleting rows of these columns: ['sepal_width'] result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713845938369288"'19 Sample of training dataset after removing outlier rows: sepal_length sepal_width petal_length petal_width species id 7.2 3.6 6.1 2.5 3 34 7.4 2.8 6.1 1.9 3 24 6.2 3.4 5.4 2.3 3 106 6.1 2.8 4.7 1.2 2 35 7.3 2.9 6.3 1.8 3 71 7.0 3.2 4.7 1.4 2 63 6.7 3.3 5.7 2.1 3 95 6.7 3.0 5.2 2.3 3 87 5.4 3.9 1.3 0.4 1 17 5.4 3.4 1.5 0.4 1 47 101 rows X 6 columns Time Taken by Outlier processing: 35.55 sec result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713844347449263"'19 result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713849039953629"' Checking imbalance data ... Imbalance Not Found. Feature selection using lasso ... feature selected by lasso: ['petal_width', 'sepal_width', 'sepal_length', 'petal_length'] Total time taken by feature selection: 3.37 sec scaling Features of lasso data ... columns that will be scaled: ['petal_width', 'sepal_width', 'sepal_length', 'petal_length'] Training dataset sample after scaling: id species petal_width sepal_width sepal_length petal_length 40 3 0.7916666666666666 0.4761904761904763 0.6470588235294118 0.711864406779661 80 1 0.08333333333333333 0.7142857142857144 0.23529411764705874 0.06779661016949151 99 1 0.0 0.4761904761904763 0.0 0.016949152542372895 61 3 0.7083333333333334 0.4761904761904763 0.6470588235294118 0.7627118644067796 93 2 0.625 0.3333333333333335 0.5 0.6949152542372881 34 3 1.0 0.7619047619047621 0.8529411764705882 0.8644067796610169 78 2 0.5 0.1428571428571428 0.35294117647058826 0.5084745762711864 19 2 0.5 0.4285714285714286 0.676470588235294 0.6101694915254237 17 1 0.12500000000000003 0.9047619047619049 0.323529411764706 0.05084745762711865 76 1 0.04166666666666667 0.6666666666666667 0.14705882352941174 0.1016949152542373 101 rows X 6 columns Testing dataset sample after scaling: id species petal_width sepal_width sepal_length petal_length 110 2 0.5 0.4285714285714286 0.3823529411764705 0.4406779661016949 29 3 0.75 0.3333333333333335 0.6176470588235295 0.7288135593220338 116 1 0.08333333333333333 0.4761904761904763 0.14705882352941174 0.06779661016949151 108 3 0.7083333333333334 0.23809523809523814 0.7058823529411765 0.8135593220338982 30 3 0.9166666666666666 0.5714285714285716 0.6176470588235295 0.7288135593220338 28 2 0.5833333333333334 0.38095238095238093 0.6470588235294118 0.6101694915254237 127 1 0.0 0.523809523809524 0.17647058823529427 0.0847457627118644 122 3 0.8750000000000001 0.4761904761904763 0.6470588235294118 0.8135593220338982 101 2 0.375 0.09523809523809534 0.5 0.5084745762711864 26 2 0.625 0.6190476190476191 0.588235294117647 0.6271186440677966 18 rows X 6 columns Total time taken by feature scaling: 42.16 sec Feature selection using rfe ... feature selected by RFE: ['petal_length', 'petal_width'] Total time taken by feature selection: 10.35 sec scaling Features of rfe data ... columns that will be scaled: ['r_petal_length', 'r_petal_width'] Training dataset sample after scaling: id species r_petal_length r_petal_width 40 3 0.711864406779661 0.7916666666666666 80 1 0.06779661016949151 0.08333333333333333 99 1 0.016949152542372895 0.0 61 3 0.7627118644067796 0.7083333333333334 93 2 0.6949152542372881 0.625 34 3 0.8644067796610169 1.0 78 2 0.5084745762711864 0.5 19 2 0.6101694915254237 0.5 17 1 0.05084745762711865 0.12500000000000003 76 1 0.1016949152542373 0.04166666666666667 101 rows X 4 columns Testing dataset sample after scaling: id species r_petal_length r_petal_width 110 2 0.4406779661016949 0.5 29 3 0.7288135593220338 0.75 116 1 0.06779661016949151 0.08333333333333333 108 3 0.8135593220338982 0.7083333333333334 30 3 0.7288135593220338 0.9166666666666666 28 2 0.6101694915254237 0.5833333333333334 127 1 0.0847457627118644 0.0 122 3 0.8135593220338982 0.8750000000000001 101 2 0.5084745762711864 0.375 26 2 0.6271186440677966 0.625 18 rows X 4 columns Total time taken by feature scaling: 40.20 sec scaling Features of pca data ... columns that will be scaled: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width'] Training dataset sample after scaling: id species sepal_length sepal_width petal_length petal_width 95 3 0.7058823529411765 0.6190476190476191 0.7966101694915254 0.8333333333333334 24 3 0.911764705882353 0.38095238095238093 0.8644067796610169 0.75 106 3 0.5588235294117647 0.6666666666666667 0.7457627118644068 0.9166666666666666 35 2 0.5294117647058822 0.38095238095238093 0.6271186440677966 0.4583333333333333 34 3 0.8529411764705882 0.7619047619047621 0.8644067796610169 1.0 112 3 0.8529411764705882 0.4761904761904763 0.8135593220338982 0.625 17 1 0.323529411764706 0.9047619047619049 0.05084745762711865 0.12500000000000003 47 1 0.323529411764706 0.6666666666666667 0.0847457627118644 0.12500000000000003 71 3 0.8823529411764705 0.4285714285714286 0.8983050847457626 0.7083333333333334 63 2 0.7941176470588235 0.5714285714285716 0.6271186440677966 0.5416666666666666 101 rows X 6 columns Testing dataset sample after scaling: id species sepal_length sepal_width petal_length petal_width 27 3 0.588235294117647 0.6666666666666667 0.7796610169491525 0.9583333333333333 30 3 0.6176470588235295 0.5714285714285716 0.7288135593220338 0.9166666666666666 110 2 0.3823529411764705 0.4285714285714286 0.4406779661016949 0.5 31 2 0.411764705882353 0.4285714285714286 0.5423728813559322 0.5 29 3 0.6176470588235295 0.3333333333333335 0.7288135593220338 0.75 85 1 0.323529411764706 0.9047619047619049 0.11864406779661016 0.12500000000000003 28 2 0.6470588235294118 0.38095238095238093 0.6101694915254237 0.5833333333333334 108 3 0.7058823529411765 0.23809523809523814 0.8135593220338982 0.7083333333333334 103 3 0.588235294117647 0.4285714285714286 0.7796610169491525 0.7083333333333334 107 2 0.5588235294117647 0.09523809523809534 0.5932203389830508 0.5833333333333334 18 rows X 6 columns Total time taken by feature scaling: 36.83 sec Dimension Reduction using pca ... PCA columns: ['col_0', 'col_1'] Total time taken by PCA: 11.03 sec 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation Model Training started ... Starting customized hyperparameter update ... Completed customized hyperparameter update. Hyperparameters used for model training: response_column : species name : xgboost model_type : Classification column_sampling : (1, 0.6) min_impurity : (0.0, 0.1) lambda1 : (0.01, 0.1, 1, 10) shrinkage_factor : (0.5, 0.1, 0.2) max_depth : (3, 4, 5, 6, 7, 8) min_node_size : (1, 2) iter_num : (10, 20) Total number of models for xgboost : 1152 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Performing hyperParameter tuning ... xgboost ---------------------------------------------------------------------------------------------------- Evaluating models performance ... Evaluation completed. Leaderboard Rank Model-ID Feature-Selection Accuracy Micro-Precision Micro-Recall Micro-F1 Macro-Precision Macro-Recall Macro-F1 Weighted-Precision Weighted-Recall Weighted-F1 0 1 XGBOOST_0 lasso 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1 2 XGBOOST_1 rfe 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 2 3 XGBOOST_3 lasso 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 3 4 XGBOOST_4 rfe 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 4 5 XGBOOST_7 rfe 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 5 6 XGBOOST_6 lasso 0.944444 0.944444 0.944444 0.944444 0.952381 0.944444 0.944056 0.952381 0.944444 0.944056 6 7 XGBOOST_2 pca 0.888889 0.888889 0.888889 0.888889 0.916667 0.888889 0.885714 0.916667 0.888889 0.885714 7 8 XGBOOST_5 pca 0.888889 0.888889 0.888889 0.888889 0.916667 0.888889 0.885714 0.916667 0.888889 0.885714 8 rows X 13 columns 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 19/19
- Display model leaderboard.
>>> aml.leaderboard()
Rank Model-ID Feature-Selection Accuracy Micro-Precision Micro-Recall Micro-F1 Macro-Precision Macro-Recall Macro-F1 Weighted-Precision Weighted-Recall Weighted-F1 0 1 XGBOOST_0 lasso 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1 2 XGBOOST_1 rfe 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 2 3 XGBOOST_3 lasso 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 3 4 XGBOOST_4 rfe 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 4 5 XGBOOST_7 rfe 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 5 6 XGBOOST_6 lasso 0.944444 0.944444 0.944444 0.944444 0.952381 0.944444 0.944056 0.952381 0.944444 0.944056 6 7 XGBOOST_2 pca 0.888889 0.888889 0.888889 0.888889 0.916667 0.888889 0.885714 0.916667 0.888889 0.885714 7 8 XGBOOST_5 pca 0.888889 0.888889 0.888889 0.888889 0.916667 0.888889 0.885714 0.916667 0.888889 0.885714
- Display the best performing model.
>>> aml.leader()
Rank Model-ID Feature-Selection Accuracy Micro-Precision Micro-Recall Micro-F1 Macro-Precision Macro-Recall Macro-F1 Weighted-Precision Weighted-Recall Weighted-F1 0 1 XGBOOST_0 lasso 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
- Generate prediction on validation dataset using best performing model.In the data preparation phase, AutoML generates the validation dataset by splitting the data provided during fitting into training and testing sets. AutoML's model training utilizes the training data, with the testing data acting as the validation dataset for model evaluation.
>>> prediction = aml.predict()
Following model is being used for generating prediction : Model ID : XGBOOST_0 Feature Selection Method : lasso Prediction : id Prediction Confidence_Lower Confidence_upper species 0 110 2 0.750 0.750 2 1 29 3 0.750 0.750 3 2 116 1 0.750 0.750 1 3 108 3 0.750 0.750 3 4 30 3 1.000 1.000 3 5 28 2 0.625 0.625 2 6 127 1 0.750 0.750 1 7 122 3 1.000 1.000 3 8 101 2 1.000 1.000 2 9 26 2 0.500 0.500 2 Performance Metrics : Prediction Mapping CLASS_1 CLASS_2 CLASS_3 Precision Recall F1 Support SeqNum 0 1 CLASS_1 6 0 0 1.0 1.0 1.0 6 2 3 CLASS_3 0 0 6 1.0 1.0 1.0 6 1 2 CLASS_2 0 6 0 1.0 1.0 1.0 6 Confusion Matrix : array([[6, 0, 0], [0, 6, 0], [0, 0, 6]], dtype=int64)
>>> prediction.head()
id Prediction Confidence_Lower Confidence_upper species 28 2 0.625 0.625 2 30 3 1.0 1.0 3 31 2 0.75 0.75 2 82 1 0.75 0.75 1 101 2 1.0 1.0 2 103 3 1.0 1.0 3 85 1 0.875 0.875 1 29 3 0.75 0.75 3 27 3 1.0 1.0 3 26 2 0.5 0.5 2
- Generate prediction on test dataset using best performing model.
>>> prediction = aml.predict(iris_test)
Data Transformation started ... Performing transformation carried out in feature engineering phase ... Updated dataset after dropping irrelevent columns : sepal_length sepal_width petal_length petal_width species 7.7 3.8 6.7 2.2 3 5.9 3.2 4.8 1.8 2 4.6 3.2 1.4 0.2 1 5.1 3.5 1.4 0.2 1 5.7 3.8 1.7 0.3 1 6.8 3.2 5.9 2.3 3 6.7 3.1 5.6 2.4 3 5.8 2.8 5.1 2.4 3 5.9 3.0 5.1 1.8 3 4.6 3.4 1.4 0.3 1 Updated dataset after performing target column transformation : sepal_length id sepal_width petal_width petal_length species 5.5 9 4.2 0.2 1.4 1 5.7 11 3.8 0.3 1.7 1 6.8 19 3.2 2.3 5.9 3 5.9 10 3.0 1.8 5.1 3 5.9 13 3.2 1.8 4.8 2 4.6 21 3.2 0.2 1.4 1 7.7 12 3.8 2.2 6.7 3 5.7 20 2.5 2.0 5.0 3 6.7 14 3.1 2.4 5.6 3 5.8 22 2.8 2.4 5.1 3 Performing transformation carried out in data preparation phase ... result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713849678727662"' Updated dataset after performing Lasso feature selection: id petal_width sepal_width sepal_length petal_length species 26 0.4 3.4 5.0 1.6 1 22 2.4 2.8 5.8 5.1 3 35 1.6 3.4 6.0 4.5 2 36 1.8 3.0 6.1 4.9 3 17 1.5 3.1 6.7 4.7 2 34 0.2 3.1 4.6 1.5 1 15 0.5 3.3 5.1 1.7 1 32 1.2 3.0 5.7 4.2 2 38 2.1 3.0 7.1 5.9 3 12 2.2 3.8 7.7 6.7 3 Updated dataset after performing scaling on Lasso selected features : id species petal_width sepal_width sepal_length petal_length 26 1 0.12500000000000003 0.6666666666666667 0.2058823529411765 0.1016949152542373 17 2 0.5833333333333334 0.523809523809524 0.7058823529411765 0.6271186440677966 34 1 0.04166666666666667 0.523809523809524 0.088235294117647 0.0847457627118644 38 3 0.8333333333333334 0.4761904761904763 0.8235294117647057 0.8305084745762712 36 3 0.7083333333333334 0.4761904761904763 0.5294117647058822 0.6610169491525424 28 2 0.5416666666666666 0.523809523809524 0.7058823529411765 0.576271186440678 19 3 0.9166666666666666 0.5714285714285716 0.7352941176470588 0.8305084745762712 30 2 0.5 0.23809523809523814 0.35294117647058826 0.5084745762711864 15 1 0.16666666666666669 0.6190476190476191 0.23529411764705874 0.11864406779661016 32 2 0.4583333333333333 0.4761904761904763 0.411764705882353 0.5423728813559322 Updated dataset after performing RFE feature selection: id petal_length petal_width species 26 1.6 0.4 1 17 4.7 1.5 2 34 1.5 0.2 1 36 4.9 1.8 3 22 5.1 2.4 3 35 4.5 1.6 2 38 5.9 2.1 3 12 6.7 2.2 3 15 1.7 0.5 1 32 4.2 1.2 2 Updated dataset after performing scaling on RFE selected features : id species r_petal_length r_petal_width 22 3 0.6949152542372881 0.9583333333333333 17 2 0.6271186440677966 0.5833333333333334 34 1 0.0847457627118644 0.04166666666666667 38 3 0.8305084745762712 0.8333333333333334 15 1 0.11864406779661016 0.16666666666666669 32 2 0.5423728813559322 0.4583333333333333 36 3 0.6610169491525424 0.7083333333333334 28 2 0.576271186440678 0.5416666666666666 19 3 0.8305084745762712 0.9166666666666666 30 2 0.5084745762711864 0.5 Updated dataset after performing scaling for PCA feature selection : id species sepal_length sepal_width petal_length petal_width 22 3 0.4411764705882352 0.38095238095238093 0.6949152542372881 0.9583333333333333 36 3 0.5294117647058822 0.4761904761904763 0.6610169491525424 0.7083333333333334 28 2 0.7058823529411765 0.523809523809524 0.576271186440678 0.5416666666666666 19 3 0.7352941176470588 0.5714285714285716 0.8305084745762712 0.9166666666666666 17 2 0.7058823529411765 0.523809523809524 0.6271186440677966 0.5833333333333334 34 1 0.088235294117647 0.523809523809524 0.0847457627118644 0.04166666666666667 38 3 0.8235294117647057 0.4761904761904763 0.8305084745762712 0.8333333333333334 12 3 1.0 0.8571428571428572 0.9661016949152542 0.8750000000000001 15 1 0.23529411764705874 0.6190476190476191 0.11864406779661016 0.16666666666666669 32 2 0.411764705882353 0.4761904761904763 0.5423728813559322 0.4583333333333333 Updated dataset after performing PCA feature selection : id col_0 col_1 species 0 26 -0.552814 -0.064726 1 1 17 0.306241 -0.131285 2 2 22 0.488587 0.103787 3 3 19 0.641246 -0.184527 3 4 38 0.648012 -0.133717 3 5 36 0.333749 -0.014838 3 6 15 -0.493909 -0.034387 1 7 20 0.388955 0.248874 3 8 34 -0.640616 0.115578 1 9 35 0.190191 -0.175907 2 Data Transformation completed. Following model is being used for generating prediction : Model ID : XGBOOST_0 Feature Selection Method : lasso Prediction : id Prediction Confidence_Lower Confidence_upper species 0 26 1 0.875 0.875 1 1 36 3 1.000 1.000 3 2 28 2 0.750 0.750 2 3 15 1 0.875 0.875 1 4 17 2 0.625 0.625 2 5 34 1 0.750 0.750 1 6 38 3 1.000 1.000 3 7 12 3 1.000 1.000 3 8 19 3 1.000 1.000 3 9 30 2 1.000 1.000 2 Performance Metrics : Prediction Mapping CLASS_1 CLASS_2 CLASS_3 Precision Recall F1 Support SeqNum 0 1 CLASS_1 8 0 0 1.000000 1.000000 1.000000 8 2 3 CLASS_3 0 1 11 0.916667 1.000000 0.956522 11 1 2 CLASS_2 0 10 0 1.000000 0.909091 0.952381 11 Confusion Matrix : array([[ 8, 0, 0], [ 0, 10, 1], [ 0, 0, 11]], dtype=int64)
>>> prediction.head()
id Prediction Confidence_Lower Confidence_upper species 10 3 0.875 0.875 3 12 3 1.0 1.0 3 13 3 0.875 0.875 2 14 3 1.0 1.0 3 16 2 0.75 0.75 2 17 2 0.625 0.625 2 15 1 0.875 0.875 1 11 1 0.875 0.875 1 9 1 0.875 0.875 1 8 1 0.875 0.875 1
- Generate prediction on test dataset using second best performing model.
>>> prediction = aml.predict(iris_test,2)
Data Transformation started ... Performing transformation carried out in feature engineering phase ... Updated dataset after dropping irrelevent columns : sepal_length sepal_width petal_length petal_width species 5.5 4.2 1.4 0.2 1 7.7 3.8 6.7 2.2 3 5.7 2.5 5.0 2.0 3 6.7 3.1 5.6 2.4 3 5.9 3.2 4.8 1.8 2 4.6 3.2 1.4 0.2 1 5.7 3.8 1.7 0.3 1 6.8 3.2 5.9 2.3 3 5.9 3.0 5.1 1.8 3 4.6 3.4 1.4 0.3 1 Updated dataset after performing target column transformation : sepal_length id sepal_width petal_width petal_length species 5.9 13 3.2 1.8 4.8 2 6.7 14 3.1 2.4 5.6 3 5.8 22 2.8 2.4 5.1 3 5.1 8 3.5 0.2 1.4 1 5.9 10 3.0 1.8 5.1 3 4.6 18 3.4 0.3 1.4 1 7.7 12 3.8 2.2 6.7 3 5.7 20 2.5 2.0 5.0 3 5.7 11 3.8 0.3 1.7 1 6.8 19 3.2 2.3 5.9 3 Performing transformation carried out in data preparation phase ... result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713844745578509"' Updated dataset after performing Lasso feature selection: id petal_width sepal_width sepal_length petal_length species 19 2.3 3.2 6.8 5.9 3 22 2.4 2.8 5.8 5.1 3 35 1.6 3.4 6.0 4.5 2 26 0.4 3.4 5.0 1.6 1 36 1.8 3.0 6.1 4.9 3 28 1.4 3.1 6.7 4.4 2 15 0.5 3.3 5.1 1.7 1 32 1.2 3.0 5.7 4.2 2 38 2.1 3.0 7.1 5.9 3 12 2.2 3.8 7.7 6.7 3 Updated dataset after performing scaling on Lasso selected features : id species petal_width sepal_width sepal_length petal_length 17 2 0.5833333333333334 0.523809523809524 0.7058823529411765 0.6271186440677966 19 3 0.9166666666666666 0.5714285714285716 0.7352941176470588 0.8305084745762712 30 2 0.5 0.23809523809523814 0.35294117647058826 0.5084745762711864 15 1 0.16666666666666669 0.6190476190476191 0.23529411764705874 0.11864406779661016 26 1 0.12500000000000003 0.6666666666666667 0.2058823529411765 0.1016949152542373 20 3 0.7916666666666666 0.23809523809523814 0.411764705882353 0.6779661016949152 36 3 0.7083333333333334 0.4761904761904763 0.5294117647058822 0.6610169491525424 28 2 0.5416666666666666 0.523809523809524 0.7058823529411765 0.576271186440678 38 3 0.8333333333333334 0.4761904761904763 0.8235294117647057 0.8305084745762712 12 3 0.8750000000000001 0.8571428571428572 1.0 0.9661016949152542 Updated dataset after performing RFE feature selection: id petal_length petal_width species 26 1.6 0.4 1 22 5.1 2.4 3 35 4.5 1.6 2 36 4.9 1.8 3 19 5.9 2.3 3 30 4.0 1.3 2 15 1.7 0.5 1 32 4.2 1.2 2 38 5.9 2.1 3 12 6.7 2.2 3 Updated dataset after performing scaling on RFE selected features : id species r_petal_length r_petal_width 22 3 0.6949152542372881 0.9583333333333333 38 3 0.8305084745762712 0.8333333333333334 12 3 0.9661016949152542 0.8750000000000001 17 2 0.6271186440677966 0.5833333333333334 19 3 0.8305084745762712 0.9166666666666666 30 2 0.5084745762711864 0.5 15 1 0.11864406779661016 0.16666666666666669 32 2 0.5423728813559322 0.4583333333333333 36 3 0.6610169491525424 0.7083333333333334 28 2 0.576271186440678 0.5416666666666666 Updated dataset after performing scaling for PCA feature selection : id species sepal_length sepal_width petal_length petal_width 17 2 0.7058823529411765 0.523809523809524 0.6271186440677966 0.5833333333333334 15 1 0.23529411764705874 0.6190476190476191 0.11864406779661016 0.16666666666666669 32 2 0.411764705882353 0.4761904761904763 0.5423728813559322 0.4583333333333333 36 3 0.5294117647058822 0.4761904761904763 0.6610169491525424 0.7083333333333334 22 3 0.4411764705882352 0.38095238095238093 0.6949152542372881 0.9583333333333333 35 2 0.5 0.6666666666666667 0.5932203389830508 0.625 19 3 0.7352941176470588 0.5714285714285716 0.8305084745762712 0.9166666666666666 30 2 0.35294117647058826 0.23809523809523814 0.5084745762711864 0.5 38 3 0.8235294117647057 0.4761904761904763 0.8305084745762712 0.8333333333333334 12 3 1.0 0.8571428571428572 0.9661016949152542 0.8750000000000001 Updated dataset after performing PCA feature selection : id col_0 col_1 species 0 26 -0.552814 -0.064726 1 1 17 0.306241 -0.131285 2 2 22 0.488587 0.103787 3 3 19 0.641246 -0.184527 3 4 38 0.648012 -0.133717 3 5 36 0.333749 -0.014838 3 6 15 -0.493909 -0.034387 1 7 20 0.388955 0.248874 3 8 34 -0.640616 0.115578 1 9 35 0.190191 -0.175907 2 Data Transformation completed. Following model is being used for generating prediction : Model ID : XGBOOST_1 Feature Selection Method : rfe Prediction : id Prediction Confidence_Lower Confidence_upper species 0 17 2 0.625 0.625 2 1 38 3 1.000 1.000 3 2 12 3 1.000 1.000 3 3 36 2 0.500 0.500 3 4 22 3 0.875 0.875 3 5 35 2 0.625 0.625 2 6 19 3 1.000 1.000 3 7 30 2 1.000 1.000 2 8 15 1 0.875 0.875 1 9 32 2 1.000 1.000 2 Performance Metrics : Prediction Mapping CLASS_1 CLASS_2 CLASS_3 Precision Recall F1 Support SeqNum 0 1 CLASS_1 8 0 0 1.000000 1.000000 1.000000 8 2 3 CLASS_3 0 0 10 1.000000 0.909091 0.952381 11 1 2 CLASS_2 0 11 1 0.916667 1.000000 0.956522 11 Confusion Matrix : array([[ 8, 0, 0], [ 0, 11, 0], [ 0, 1, 10]], dtype=int64)
>>> prediction.head()
id Prediction Confidence_Lower Confidence_upper species 10 3 0.875 0.875 3 12 3 1.0 1.0 3 13 2 0.5 0.5 2 14 3 1.0 1.0 3 16 2 1.0 1.0 2 17 2 0.625 0.625 2 15 1 0.875 0.875 1 11 1 0.875 0.875 1 9 1 0.875 0.875 1 8 1 0.875 0.875 1