After teradataml DataFrames are created, create the required Dataset objects. The following example creates two Dataset objects, one for training and another for validation. Note that, in these examples, teradataml introduces a new argument partition_columns for distributed/multi model training.
- The teradataml DataFrames that are passed should be created from the same parent teradataml DataFrame using the select() API.
- The partition columns should be present in the parent DataFrame from which the required DataFrames are derived from.
# Training Dataset. >>> obj_m = td_lightgbm.Dataset(df_x_classif, df_y_classif, silent=True, partition_columns=["partition_column_1", "partition_column_2"], free_raw_data=False) >>> obj_m partition_column_1 partition_column_2 model 0 1 11 <lightgbm.basic.Dataset object at 0x7f555198e640> 1 0 11 <lightgbm.basic.Dataset object at 0x7f55519523a0> 2 1 10 <lightgbm.basic.Dataset object at 0x7f5551968eb0> 3 0 10 <lightgbm.basic.Dataset object at 0x7f55518f6430> # Validation dataset. >>> obj_m_v = td_lightgbm.Dataset(df_x_classif, df_y_classif, free_raw_data=False, partition_columns=["partition_column_1", "partition_column_2"]) >>> obj_m_v partition_column_1 partition_column_2 model 0 1 11 <lightgbm.basic.Dataset object at 0x7f555198e9d0> 1 0 11 <lightgbm.basic.Dataset object at 0x7f55684c6d90> 2 1 10 <lightgbm.basic.Dataset object at 0x7f55518de640> 3 0 10 <lightgbm.basic.Dataset object at 0x7f55518dec40>
After creating the Dataset objects, run the training function with record_evaluation callback:
# Training with valid_sets and callbacks argument. >>> rec = {} >>> opt_tr_m = td_lightgbm.train(params={}, train_set=obj_m_v, num_boost_round=30, callbacks=[td_lightgbm.record_evaluation(rec)], valid_sets=[obj_m_v, obj_m_v]) >>> opt_tr_m partition_column_1 partition_column_2 \ 0 1 11 1 0 11 2 1 10 3 0 10 model \ 0 <lightgbm.basic.Booster object at 0x7f55518bd040> 1 <lightgbm.basic.Booster object at 0x7f55500a6be0> 2 <lightgbm.basic.Booster object at 0x7f55500a61c0> 3 <lightgbm.basic.Booster object at 0x7f55500a6670> console_output \ 0 [LightGBM] [Warning] Auto-choosing col-wise mu... 1 [LightGBM] [Warning] Auto-choosing col-wise mu... 2 [LightGBM] [Warning] Auto-choosing col-wise mu... 3 [LightGBM] [Warning] Auto-choosing row-wise mu... record_evaluation_result 0 {'valid_0': {'l2': [0.21963737683498566, 0.196... 1 {'valid_0': {'l2': [0.2229904865477988, 0.2008... 2 {'valid_0': {'l2': [0.2151413809523807, 0.1919... 3 {'valid_0': {'l2': [0.2195184911242605, 0.1948...
Since there are multiple models involved, each trained model has a different console output and record_evaluation result. These are provided in pandas Dataframe and can be accessed using model_info attribute.
Access pandas Dataframe and the individual record evaluation results or console outputs:
>>> opt_tr_m.model_info partition_column_1 partition_column_2 model console_output record_evaluation_result 0 1 11 <lightgbm.basic.Booster object at ...> [LightGBM] [Warning] Auto-choosing col-wise mu... {'valid_0': {'l2': [0.21963737683498566, 0.196... 1 0 11 <lightgbm.basic.Booster object at ...> [LightGBM] [Warning] Auto-choosing col-wise mu... {'valid_0': {'l2': [0.2229904865477988, 0.2008... 2 1 10 <lightgbm.basic.Booster object at ...> [LightGBM] [Warning] Auto-choosing col-wise mu... {'valid_0': {'l2': [0.2151413809523807, 0.1919... 3 0 10 <lightgbm.basic.Booster object at ...> [LightGBM] [Warning] Auto-choosing row-wise mu... {'valid_0': {'l2': [0.2195184911242605, 0.1948... # console output. >>> print(opt_tr_m.model_info.iloc[0]["console_output"]) [LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000037 seconds. You can set `force_col_wise=true` to remove the overhead. [LightGBM] [Info] Total Bins 136 [LightGBM] [Info] Number of data points in the train set: 97, number of used features: 4 [LightGBM] [Info] Start training from score 0.556701 [LightGBM] [Warning] No further splits with positive gain, best gain: -inf [LightGBM] [Warning] No further splits with positive gain, best gain: -inf ... ... [LightGBM] [Warning] No further splits with positive gain, best gain: -inf [LightGBM] [Warning] No further splits with positive gain, best gain: -inf # record_evaluation result. >>> print(opt_tr_m.model_info.iloc[0]["record_evaluation_result"]) {'valid_0': OrderedDict([('l2', [0.21963737683498566, 0.1965251124298607, ..., 0.06526645509697025])]), 'valid_1': OrderedDict([('l2', [0.21963737683498566, 0.1965251124298607, ..., 0.06526645509697025])])}