XGBoost Functions (ML Engine) - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
9.02
9.01
2.0
1.3
Published
February 2022
Language
English (United States)
Last Update
2022-02-10
dita:mapPath
rnn1580259159235.ditamap
dita:ditavalPath
ybt1582220416951.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

The XGBoost (ML Engine) function trains a classification/regression model using gradient boosting with decision trees as the base-line classifier and has a corresponding predict function, XGBoostPredict (ML Engine).

In gradient boosting, each iteration fits a model to the residuals (errors) of the previous iteration. It also provides a general framework for adding a loss function and a regularization term.

ML Engine implementation of the XGBoost algorithm includes:
  • Loss functions:
    • Binomial (for binary classification)
    • Softmax (for multiple-class classification)
    • Mean squared error (MSE) (for regression)
  • L2 regularization
  • Shrinkage
  • Column subsampling

Row subsampling is implemented by randomly partitioning the input data set among the available vworkers. By distributing the input data set across vworkers, we train multiple gradient boosting trees in parallel, each on a subset of data. The results are combined to create a final prediction by majority vote.

You can use the XGBoost functions to create predictions input for the Receiver Operating Characteristic (ROC) (ML Engine) function.

For a general description of gradient boosting, see https://statweb.stanford.edu/~jhf/ftp/trebst.pdf. For more details about the XGBoost algorithm, see http://www.kdd.org/kdd2016/papers/files/rfp0697-chenAemb.pdf.

Function Description
XGBoost (ML Engine) Takes training data set in dense or sparse format and uses gradient boosting to create strong classifying model.
XGBoostPredict (ML Engine) Applies model output by XGBoost to a new data set, outputting predicted labels for each data point.