Multi model training | teradataml open-source machine learning functions - Multi model training - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
ft:locale
en-US
ft:lastEdition
2025-01-23
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

After teradataml DataFrames are created, create the required Dataset objects. The following example creates two Dataset objects, one for training and another for validation. Note that, in these examples, teradataml introduces a new argument partition_columns for distributed/multi model training.

  • The teradataml DataFrames that are passed should be created from the same parent teradataml DataFrame using the select() API.
  • The partition columns should be present in the parent DataFrame from which the required DataFrames are derived from.
# Training Dataset.
>>> obj_m = td_lightgbm.Dataset(df_x_classif, df_y_classif, silent=True,
                                partition_columns=["partition_column_1", "partition_column_2"],
                                free_raw_data=False)
>>> obj_m
   partition_column_1  partition_column_2                                              model
0                   1                  11  <lightgbm.basic.Dataset object at 0x7f555198e640>
1                   0                  11  <lightgbm.basic.Dataset object at 0x7f55519523a0>
2                   1                  10  <lightgbm.basic.Dataset object at 0x7f5551968eb0>
3                   0                  10  <lightgbm.basic.Dataset object at 0x7f55518f6430>

# Validation dataset.
>>> obj_m_v = td_lightgbm.Dataset(df_x_classif, df_y_classif, free_raw_data=False,
                                  partition_columns=["partition_column_1", "partition_column_2"])
>>> obj_m_v
   partition_column_1  partition_column_2                                              model
0                   1                  11  <lightgbm.basic.Dataset object at 0x7f555198e9d0>
1                   0                  11  <lightgbm.basic.Dataset object at 0x7f55684c6d90>
2                   1                  10  <lightgbm.basic.Dataset object at 0x7f55518de640>
3                   0                  10  <lightgbm.basic.Dataset object at 0x7f55518dec40>

After creating the Dataset objects, run the training function with record_evaluation callback:

# Training with valid_sets and callbacks argument.
>>> rec = {}
>>> opt_tr_m = td_lightgbm.train(params={}, train_set=obj_m_v, num_boost_round=30,
                                 callbacks=[td_lightgbm.record_evaluation(rec)],
                                 valid_sets=[obj_m_v, obj_m_v])
>>> opt_tr_m
   partition_column_1  partition_column_2  \
0                   1                  11   
1                   0                  11   
2                   1                  10   
3                   0                  10   

                                               model  \
0  <lightgbm.basic.Booster object at 0x7f55518bd040>   
1  <lightgbm.basic.Booster object at 0x7f55500a6be0>   
2  <lightgbm.basic.Booster object at 0x7f55500a61c0>   
3  <lightgbm.basic.Booster object at 0x7f55500a6670>   

                                      console_output  \
0  [LightGBM] [Warning] Auto-choosing col-wise mu...   
1  [LightGBM] [Warning] Auto-choosing col-wise mu...   
2  [LightGBM] [Warning] Auto-choosing col-wise mu...   
3  [LightGBM] [Warning] Auto-choosing row-wise mu...   

                            record_evaluation_result  
0  {'valid_0': {'l2': [0.21963737683498566, 0.196...  
1  {'valid_0': {'l2': [0.2229904865477988, 0.2008...  
2  {'valid_0': {'l2': [0.2151413809523807, 0.1919...  
3  {'valid_0': {'l2': [0.2195184911242605, 0.1948...

Since there are multiple models involved, each trained model has a different console output and record_evaluation result. These are provided in pandas Dataframe and can be accessed using model_info attribute.

Access pandas Dataframe and the individual record evaluation results or console outputs:

>>> opt_tr_m.model_info
	partition_column_1	partition_column_2												model						   console_output							 record_evaluation_result
0	                 1	                11	<lightgbm.basic.Booster object at ...>	[LightGBM] [Warning] Auto-choosing col-wise mu...	{'valid_0': {'l2': [0.21963737683498566, 0.196...
1	                 0	                11	<lightgbm.basic.Booster object at ...>	[LightGBM] [Warning] Auto-choosing col-wise mu...	{'valid_0': {'l2': [0.2229904865477988, 0.2008...
2	                 1	                10	<lightgbm.basic.Booster object at ...>	[LightGBM] [Warning] Auto-choosing col-wise mu...	{'valid_0': {'l2': [0.2151413809523807, 0.1919...
3	                 0	                10	<lightgbm.basic.Booster object at ...>	[LightGBM] [Warning] Auto-choosing row-wise mu...	{'valid_0': {'l2': [0.2195184911242605, 0.1948...

# console output.
>>> print(opt_tr_m.model_info.iloc[0]["console_output"])
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000037 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 136
[LightGBM] [Info] Number of data points in the train set: 97, number of used features: 4
[LightGBM] [Info] Start training from score 0.556701
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
...
...
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf

# record_evaluation result.
>>> print(opt_tr_m.model_info.iloc[0]["record_evaluation_result"])
{'valid_0': OrderedDict([('l2', [0.21963737683498566, 0.1965251124298607, ..., 0.06526645509697025])]), 'valid_1': OrderedDict([('l2', [0.21963737683498566, 0.1965251124298607, ..., 0.06526645509697025])])}