td_lightgbm | teradataml open-source machine learning functions - td_lightgbm - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
ft:locale
en-US
ft:lastEdition
2025-01-23
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

teradataml OpenSourceML exposes lightGBM package through an interface object td_lightgbm. Use this interface object to execute all supported lightGBM’s functions with the same syntax and arguments without pulling the data to client using MPP capabilities.

teradataml OpenSourceML’s td_lightgbm trains and scores models in both single model approach and distributed/multi-model approach. However, there are few things to note when working with lightGBM in distributed model training:
  • teradataml OpenSourceML has introduced an argument partition_columns that can be used with any lightGBM function.
  • partition_columns argument accepts the names of the columns used for partitioning.
  • Generates model for each unique partition.
  • Column names specified should be present in the parent teradataml DataFrame from which input teradataml DataFrames are derived.
  • If parent DataFrame does not contain the columns, then teradataml raises an exception.
  • When distributed models are generated per unique partition by fit() or train() methods, you may or may not provide partition_columns in predict or other functions as teradataml OpenSourceML internally picks partition_columns from trained model if this argument is not provided.

The following sections detail how to use teradataml’s td_lightgbm to run supported lightGBM functions - Dataset, Booster, train, cv, and all scikit-learn functions - to generate single model and distributed-model (multi-model) through partition_columns argument and supportability information.