1.0 - 8.00 - XGBoost Example 1: Binary Classification - Teradata Vantage

Teradata® Vantage Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.0
8.00
Release Date
May 2019
Content Type
Programming Reference
Publication ID
B700-4003-098K
Language
English (United States)

This example uses home sales data to create a classification tree that predicts home style, which can be input to XGBoostPredict Example 1: Binary Classification.

Input

For descriptions of InputTable columns, see DecisionForest Example 1: Classification Tree without Out-of-Bag Error.

InputTable: housing_train_binary
sn price lotsize bedrooms bathrms stories driveway recroom fullbase gashw airco garagepl prefarea homestyle
2 38500 4000 2 1 1 yes no no no no 0 no Classic
4 60500 6650 3 1 2 yes yes no no no 0 no Classic
6 66000 4160 3 1 1 yes yes yes no yes 0 no Eclectic
8 69000 4160 3 1 3 yes no no no no 0 no Eclectic
10 88500 5500 3 2 4 yes yes no no yes 1 no Eclectic
12 30500 3000 2 1 1 no no no no no 0 no Eclectic
14 36000 2880 3 1 1 no no no no no 0 no Classic
18 40750 5200 4 1 3 yes no no no no 0 no Classic
20 45000 3986 3 2 2 no yes yes no no 1 no Classic
22 65900 4510 4 2 1 yes no yes no no 0 no Classic

SQL Call

SELECT * FROM XGBoost (
  ON housing_train_binary AS InputTable
  OUT TABLE OutputTable (xgboost_model)
  USING
  ResponseColumn ('homestyle')
  PredictionType ('classification') 
  NumericInputs ('price','lotsize','bedrooms','bathrms','stories','garagepl')
  CategoricalInputs ('driveway','recroom','fullbase','gashw','airco',
    'prefarea')
  LossFunction ('binomial')
  IterNum (10)
  MaxDepth (10)
  MinNodeSize (1)
  RegularizationLambda (1)
  ShrinkageFactor (0.1)
  IDColumn ('sn')
  NumBoostedTrees (2)
) AS dt;

Output

message
Parameters: 
Number of boosting iterations : 10
Number of boosted trees : 2
Number of total trees (all subtrees): 20
Prediction Type : CLASSIFICATION
LossFunction : BINOMIAL
Regularization : 1.0
Shrinkage : 0.1
MaxDepth : 10
MinNodeSize : 1
Variance : 0.0
Seed : 1
ColumnSubSampling Features: 12
XGBoost model created in table specified in OutputTable argument

This query returns the following table:

SELECT tree_id, iter, class_num, cast (tree AS VARCHAR(30)),
  cast(region_prediction AS varchar(30))
  FROM xgboost_model ORDER BY 1,2,3;

For simplicity, the last two output columns show only the first 30 characters of each value.

xgboost_model
tree_id iter class_num cast(tree as character varying(30)) cast(region_prediction as character varying(30))
-1 -1 -1 {"classifier":"CLASSIFICATION"  
0 1   {"sum_":1.200000101064802E-6," {"1792":0.06969074,"1280":-0.1
0 2   {"sum_":-0.1465209100000503,"s {"384":0.042531442,"385":0.030
0 3   {"sum_":0.281402000000032,"sum {"1664":0.052631423,"1665":0.0
0 4   {"sum_":1.2547231599999822,"su {"1280":-0.06937855,"1281":-0.
0 5   {"sum_":1.915482170000011,"sum {"768":0.027756682,"1538":0.03
0 6   {"sum_":1.9837604200000016,"su {"768":0.026932025,"1538":0.03
0 7   {"sum_":2.091817570000022,"sum {"768":0.026147524,"769":0.035
0 8   {"sum_":2.360585519999998,"sum {"768":0.023378344,"769":0.025
0 9   {"sum_":2.6234908400000014,"su {"768":0.02468738,"769":0.0227
0 10   {"sum_":3.1594640700000056,"su {"770":0.024006711,"771":0.023
1 1   {"sum_":-2.400000108204736E-6, {"1664":0.07176345,"1536":0.07
1 2   {"sum_":1.2196362000000556,"su {"1536":0.067304045,"1152":-0.
1 3   {"sum_":1.693821100000001,"sum {"1536":0.06236268,"512":-0.07
1 4   {"sum_":1.534895909999999,"sum {"1536":0.053727582,"769":0.02
1 5   {"sum_":1.6560535500000007,"su {"256":-0.04895391,"257":-0.06
1 6   {"sum_":1.8388899100000082,"su {"512":-0.046615,"513":-0.0458
1 7   {"sum_":2.205078229999988,"sum {"1540":0.026542466,"1541":0.0
1 8   {"sum_":2.621658340000005,"sum {"258":-0.040129118,"259":-0.0
1 9   {"sum_":2.92402716000001,"sumS {"16":-0.042044364,"20":-0.040
1 10   {"sum_":3.2108298899999914,"su {"260":-0.03580759,"522":-0.03