XGBoost Example: Regression, Dense Format | Teradata Vantage - XGBoost Example: Regression, Dense Format - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
9.02
9.01
2.0
1.3
Published
February 2022
Language
English (United States)
Last Update
2022-02-10
dita:mapPath
rnn1580259159235.ditamap
dita:ditavalPath
ybt1582220416951.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

This example uses U. S. Census Service data about housing in Boston, Massachusetts in 1978 to create a regression model that predicts median home value, which can be input to XGBoostPredict Example: Regression, Dense Format.

Input Data Descriptions
Column Description
crim Per capita crime rate by town
zn Proportion of residential land zoned for lots over 25,000 square feet
indus Proportion of nonretail business acres for each town
chas Charles River dummy variable (1 if tract bounds river, 0 otherwise
nox Nitrous oxides concentration (parts/10 million)
rm Average number of rooms in each dwelling
age Proportion of owner-occupied units built before 1940
dis Weighted distances to five Boston employment centers
rad Index of accessibility to radial highways
tax Full-value property tax rate for each $10,000
ptratio Pupil-teacher ratio by town
black 1000(Bk - 0.63)2 where Bk is number of black residents in each town
lstat Percent lower status of population
medv Median value of owner-occupied homes (in $1000s)

Input

InputTable: boston

id  crim zn   indus chas nox rm  age  dis rad tax   ptratio black lstat medv
--- ---- ---- ----- ---- --- --- ---- --- --- ----- ------- ----- ----- ----
469   16.  0.   18.   0.   1. 6.  71.  3.  24. 666.      20. 369.    18.  19.
265    1. 20.    4.   0.   1. 7.  92.  2.   5. 264.      13. 388.     8.  36.
 40    0. 75.    3.   0.   0. 7.  22.  5.   3. 252.      18. 396.     4.  31.
122    0.  0.   26.   0.   1. 6.  84.  2.   2. 188.      19. 378.    14.  20.
 61    0. 25.    5.   0.   0. 6.  66.  7.   8. 284.      20. 395.    13.  19.
244    0. 30.    5.   0.   0. 6.   8.  7.   6. 300.      17. 375.     5.  24.
162    1.  0.   20.   0.   1. 7.  91.  2.   5. 403.      15. 374.     2.  50.
387   24.  0.   18.   0.   1. 5. 100.  1.  24. 666.      20. 397.    28.  10.
326    0.  0.    7.   0.   0. 6.  15.  5.   5. 287.      20. 394.     5.  25.
305    0. 33.    2.   0.   0. 7.  41.  4.   7. 222.      18. 394.     7.  36.
223    1.  0.    6.   1.   1. 7.  78.  3.   8. 307.      17. 390.    10.  28.
448   10.  0.   18.   0.   1. 6.  97.  2.  24. 666.      20. 389.    16.  13.
183    0.  0.    2.   0.   0. 7.  92.  3.   3. 193.      18. 394.     5.  38.
101    0.  0.    9.   0.   1. 7.  80.  3.   5. 384.      21. 395.     9.  28.
488    5.  0.   18.   0.   1. 6.  53.  3.  24. 666.      20. 388.    11.  21.

SQL Call

SELECT *  FROM XGBoost (
  ON boston AS InputTable
  OUT TABLE OutputTable (xgboost_regression_model)
  USING
  ResponseColumn ('medv')
  PredictionType ('regression')

  LossFunction ('mse')
  NumericInputs ('crim','zn','indus','chas','nox','rm',
    'age','dis','rad','tax','ptratio','black','lstat')
  IterNum (5)
  MaxDepth (10)
  MinNodeSize (1)
  RegularizationLambda ('1')
  ShrinkageFactor ('0.1')
  IDColumn ('id')
  NumBoostedTrees ('1')
) AS dt;

Output

message
----------------------------------------------------------------
Parameters:
Number of boosting iterations : 5
Number of boosted trees : 1
Number of total trees (all subtrees): 5
Prediction Type : REGRESSION
LossFunction : MSE
Regularization : 1.0
Shrinkage : 0.1
MaxDepth : 10
MinNodeSize : 1
Variance : 0.0
Seed : null
ColumnSubSampling Features: 13
XGBoost model created in table specified in OutputTable argument

This query returns the following table:

SELECT tree_id, iter, class_num, CAST (tree AS VARCHAR(30)),
  CAST (region_prediction AS VARCHAR(30))
  FROM xgboost_regression_model ORDER BY 1,2,3;

The column tree is shown truncated to 80 characters.

tree_id  iter class_num tree
------- ----- --------- -------------------------------------------------------
      0     4         0 {"sum_":6.50351156,"sumSq_":43051.688035353036,"size_":
      0     2         0 {"sum_":1.8182168800000742,"sumSq_":42820.70821381819,"
      0     1         0 {"sum_":1.421199996389788E-4,"sumSq_":42719.33465255344
      0     3         0 {"sum_":-0.568932420000074,"sumSq_":42870.20826191229,"
     -1    -1        -1 {"classifier":"REGRESSION","lossType":"MSE","numIterati
      0     5         0 {"sum_":4.517953350000042,"sumSq_":43106.41661851385,"s

Download a zip file of all examples and a SQL script file that creates their input tables.