Execute lightGBM functions | teradataml open-source machine learning functions - Execute lightGBM functions using standard supported arguments - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
ft:locale
en-US
ft:lastEdition
2025-01-23
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
lifecycle
latest
Product Category
Teradata Vantage

Using training APIs for training

To run training APIs train and cv, create a Dataset object then pass the object as input to these functions. The following subsections cover both single model and multi model training and highlight the differences between training with actual lightGBM and td_lightgbm.

Whether it is single-model training or multi-model training, you must first create the required teradataml DataFrames. These DataFrames should be created from the same parent teradataml DataFrame using the select() API.

# Load the example dataset.
>>> load_example_data("openml", ["multi_model_classification", "multi_model_regression"])

## Classification data.
# Create DataFrames.
>>> df_train_classif = DataFrame('multi_model_classification')
>>> df_train_classif
        col1	    col2	      col3	        col4  label	group_column partition_column_1	partition_column_2
-2.86976401	 -1.45650077  -0.096175758	-1.831834743 	  0	          11	              1	                11
-2.69651383	-1.244829178  -0.110505514	-1.670524168	  0	           9	              0	                10
0.939206169	-0.618203487   0.209642108	 0.150728526	  0	          11	              0	                10
1.413963995	 0.329342266   0.110572078	 0.743405742	  0	          10	              0	                11
0.599510805	 1.298906549  -0.141761534	 0.790378168	  1	           8	              1	                10
-1.17273888	  0.05006965  -0.144305532	-0.484090357	  1	          12	              1	                11
1.087219107	-1.006162386   0.289957882	 0.055393647	  0	           9	              1	                10
0.669401117	 0.965276402  -0.079356706	 0.683697322	  1	           8	              1	                10
1.441799942	-0.007863088    0.16867607	 0.617164007	  0	           9	              1	                10
1.619580063	 0.479070866   0.110079835	 0.893252734	  1	           8	              0	                10

# Create required classification DataFrames.
>>> df_x_classif = df_train_classif.select(["col1", "col2", "col3", "col4"])
>>> df_y_classif = df_train_classif.select("label")

## Regression data.
# Create DataFrames.
>>> df_train_reg = DataFrame('multi_model_regression')
>>> df_train_reg.head(4)
        col1	    col2	      col3	        col4  label	group_column partition_column_1	partition_column_2
-2.86976401	 -1.45650077  -0.096175758	-1.831834743   -141	          11	              1	                11
-2.69651383	-1.244829178  -0.110505514	-1.670524168    -89            9	              0	                10
0.939206169	-0.618203487   0.209642108	 0.150728526   -137	          11	              0	                10
1.413963995	 0.329342266   0.110572078	 0.743405742	  8	          10	              0	                11

# Create required regression DataFrames.
df_x_reg = df_train_reg.select(["col1", "col2", "col3", "col4"])
df_y_reg = df_train_reg.select("label")