TD_XGBoost Function | XGBoost | Teradata Vantage - TD_XGBoost - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

TD_XGBoost function, also known as eXtreme Gradient Boosting, is an implementation of the gradient boosted decision tree designed for speed and performance. It has recently been dominating applied machine learning.

In gradient boosting, each iteration fits a model to the residuals (errors) of the previous iteration to correct the errors made by existing models. The predicted residual is multiplied by this learning rate and then added to the previous prediction. Models are added sequentially until no further improvements can be made. It is called gradient boosting because it uses a gradient descent algorithm to minimize the loss when adding new models.

Gradient boosting involves three elements:
  • A loss function to be optimized.
  • A weak learner to make predictions.
  • An additive model to add weak learners to minimize the loss function.
The loss function used depends on the type of problem being solved. For example, regression may use a squared error and binary classification may use binomial. A benefit of the gradient boosting is that a new boosting algorithm does not have to be derived for each loss function. Instead, it provides a generic enough framework that any differentiable loss function can be used. TD_XGBoost function supports both regression and classification predictive modeling problems. The model that it creates is used in the TD_XGBoostPredict function for making predictions.
  • Regression: The prediction is based on continuous values. XGBoost regression calculates the difference between the current prediction and the known correct target value. This difference is called residual. After that, XGBoost regression trains a weak model that maps features to that residual. This residual predicted by a weak model is added to the existing model input, and thus this process nudges the model towards the correct target. Repeating this step improves the overall model prediction.
  • Classification: Similar to regression, XGBoost uses regression trees for classification. In this case, the residual is computed by converting odds (a ratio between the number of events and non-events) to probability and the probability is expressed through log odds to obtain residuals. For example, if your data contains three spam emails and two non-spam emails, the odds are 3:2, that is, 1.5 in decimal notation.
TD_XGBoost function supports the following features.
  • Regression
  • Multiple class and binary classification
  • When a dataset is small, best practice is to distribute the data to one AMP. To do this, create an identifier column as a primary index, and use the same value for each row.
  • For Classification (softmax), a maximum of 500 classes are supported.
  • For Classification, while using a SELECT statement for the function input, the SELECT statement must have a deterministic output. Otherwise, the function may not run successfully or return the correct output. For example, the function must not have an ON clause such as "SELECT top 500 * from table_t".
  • The processing time is controlled by (proportional to):
    • The number of boosted trees (controlled by NumBoostedTrees, TreeSize, and CoverageFactor).
    • The number of iterations (sub-trees) in each boosted tree (controlled by IterNum).
    • The complexity of an iteration (controlled by MaxDepth, MinNodeSize, ColumnSampling, MinImpurity).

    A careful choice of these parameters can be used to control the processing time. For example, changing CoverageFactor from 1.0 to 2.0 doubles the number of boosted trees, which as a result, doubles the execution time roughly.

It is recommended that you redistribute the input table rows to one or fewer AMPs in the following cases for improved training results:
  1. Classification with imbalanced datasets: When dealing with classification tasks and encountering an imbalanced dataset, such as having 1% rows for minority class 1 and 99% rows labeled as majority class 0, it is advisable to reduce the number of AMPs that hold the input table rows. This redistribution increases the likelihood of each AMP containing minority class data rows, allowing for better training of sub-tree models on the minority class.
  2. Large cluster with smaller datasets: In situations where the cluster consists of a large number of AMPs (for example, 200 AMPs) and the dataset size (total number of input table rows) is relatively small (for example, 1000 or fewer rows), redistributing the data rows to fewer AMPs can improve the quality of the model. This redistribution to fewer AMPs enables better training of sub-tree models because each model trains on a more substantial training sample.