Once the required teradataml DataFrames are created, you need to create Dataset objects.
>>> obj_s1 = td_lightgbm.Dataset(df_x_classif, df_y_classif, silent=True, free_raw_data=False) >>> obj_s1 <lightgbm.basic.Dataset object at 0x7f5553c28c40>
After creating the Dataset object, run training function with record_evaluation and early_stopping callbacks:
>>> rec = {} # To pass this empty dictionary to record_evaluation callback. # Training With valid_sets and callbacks argument. >>> opt_tr_s = td_lightgbm.train(params={}, train_set=obj_s1, num_boost_round=30, callbacks=[td_lightgbm.record_evaluation(rec), td_lightgbm.early_stopping(3)], valid_sets=[obj_s1]) >>> opt_tr_s [LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000065 seconds. You can set `force_col_wise=true` to remove the overhead. [LightGBM] [Info] Total Bins 532 [LightGBM] [Info] Number of data points in the train set: 400, number of used features: 4 Training until validation scores don't improve for 3 rounds Did not meet early stopping. Best iteration is: [30] valid_0's l2: 0.0416953 <lightgbm.basic.Booster object at 0x7f5553d235b0>
Similar to the train() function of lightgbm, OpensourceML’s lightGBM also displays console output and returns Booster object. However, this Booster object is not lightGBM’s Booster object but it is OpensourceML’s internal Booster wrapper object.
Differences between training functionality of lightgbm and td_lightgbm follow:
- lightgbm populates the record evaluation results in the variable that was passed to the record_evaluation function.
- td_lightgbm provides an attribute record_evaluation_result for the object returned by train() method, which can be accessed as shown in the following example:
>>> opt_tr_s.record_evaluation_result {'valid_0': OrderedDict([('l2', [0.21581071275252517, 0.18813848372931546, 0.16614597654803748, ... ... 0.04169529314351532])])}