XGBoost Example: Regression, Sparse Format | Teradata Vantage - XGBoost Example: Regression, Sparse Format - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
9.02
9.01
2.0
1.3
Published
February 2022
Language
English (United States)
Last Update
2022-02-10
dita:mapPath
rnn1580259159235.ditamap
dita:ditavalPath
ybt1582220416951.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

Input

  • InputTable: boston_sparse, created by inputting boston (InputTable for XGBoost Example: Regression, Dense Format) to the Unpivoting function:
    CREATE MULTISET TABLE boston_sparse AS (
      SELECT * FROM Unpivoting (
        ON boston
        USING
        TargetColumns ('crim','zn','indus','chas','nox','rm',
          'age','dis','rad','tax','ptratio','black','lstat')
        AttributeColumn ('attribute')
        ValueColumn ('value1')
        InputTypes ('false')
        Accumulate ('id', 'medv')
      ) AS dt
    ) WITH data;
    id  medv attribute value1
    --- ---- --------- ------
    265   36 crim           1
    265   36 ptratio       13
    265   36 lstat          8
    265   36 age           92
    265   36 black        388
    265   36 dis            2
    265   36 nox            1
    265   36 chas           0
    265   36 zn            20
    244   24 lstat          5
    244   24 rad            6
    244   24 black        375
    244   24 rm             6
    244   24 age            8
    244   24 dis            7
    244   24 zn            30
    265   36 indus          4
    265   36 rm             7
    265   36 rad            5
    265   36 tax          264
  • AttributeTable: sparse_boston_attributes
    attributename attributetype
    ------------- -------------
    dis                       1
    rad                       1
    ptratio                   1
    age                       1
    chas                      1
    nox                       1
    zn                        1
    crim                      1
    indus                     1
    tax                       1
    rm                        1
    black                     1

SQL Call

SELECT *  FROM XGBoost (
  ON boston AS InputTable
  ON sparse_boston_attributes AS AttributeTable
  OUT TABLE OutputTable (xgboost_regression_model)
  USING
  ResponseColumn ('medv')
  PredictionType ('regression')

  LossFunction ('mse')
  AttributeNameColumn ('attribute')
  AttributeValueColumn ('value1')
  IterNum (5)
  MaxDepth (10)
  MinNodeSize (1)
  RegularizationLambda ('1')
  ShrinkageFactor ('0.1')
  IDColumn ('id')
  NumBoostedTrees ('1')
) AS dt;

Output

message
----------------------------------------------------------------
Parameters:
Number of boosting iterations : 5
Number of boosted trees : 2
Number of total trees (all subtrees): 10
Prediction Type : REGRESSION
LossFunction : MSE
Regularization : 1.0
Shrinkage : 0.1
MaxDepth : 10
MinNodeSize : 1
Variance : 0.0
Seed : null
ColumnSubSampling Features: 12
XGBoost model created in table specified in OutputTable argument

This query returns the following table:

SELECT tree_id, iter, class_num, CAST (tree AS VARCHAR(30)) AS tree,
  CAST(region_prediction AS VARCHAR(30)) AS region_pred
    FROM xgboost_regression_model ORDER BY 1,2;

The column tree is shown truncated to 80 characters.

tree_id iter class_num tree
------- ---- --------- --------------------------------------------------------
      0    4         0 {"sum_":-7.259742420000081,"sumSq_":20959.84902615756,"s
      0    2         0 {"sum_":-4.656698140000127,"sumSq_":20907.042531013678,"
      0    3         0 {"sum_":-5.843054700000081,"sumSq_":20932.994992147258,"
      0    1         0 {"sum_":-1.297999986085152E-4,"sumSq_":20895.56742119848
      1    3         0 {"sum_":-11.721289920000102,"sumSq_":21945.56968946764,"
      1    1         0 {"sum_":-2.065200004339185E-4,"sumSq_":21818.30039547843
      1    2         0 {"sum_":-9.681986649999978,"sumSq_":21913.84304678934,"s
      0    5         0 {"sum_":-8.692206329999848,"sumSq_":20986.778941833785,"
      1    5         0 {"sum_":-14.901439650000153,"sumSq_":22007.848794685153,
     -1   -1        -1 {"classifier":"REGRESSION","lossType":"MSE","numIteratio
      1    4         0 {"sum_":-13.322867280000047,"sumSq_":21976.772296942378,

Download a zip file of all examples and a SQL script file that creates their input tables.