XGBoost Example: Binary Classification - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.10
1.1
Published
October 2019
Language
English (United States)
Last Update
2019-12-31
dita:mapPath
ima1540829771750.ditamap
dita:ditavalPath
jsj1481748799576.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

This example uses home sales data to create a classification tree that predicts home style, which can be input to XGBoostPredict Example: Binary Classification.

Input

For descriptions of InputTable columns, see DecisionForest Example: TreeType ('classification'), OutOfBag ('false').

InputTable: housing_train_binary
sn price lotsize bedrooms bathrms stories driveway recroom fullbase gashw airco garagepl prefarea homestyle
2 38500 4000 2 1 1 yes no no no no 0 no Classic
4 60500 6650 3 1 2 yes yes no no no 0 no Classic
6 66000 4160 3 1 1 yes yes yes no yes 0 no Eclectic
8 69000 4160 3 1 3 yes no no no no 0 no Eclectic
10 88500 5500 3 2 4 yes yes no no yes 1 no Eclectic
12 30500 3000 2 1 1 no no no no no 0 no Eclectic
14 36000 2880 3 1 1 no no no no no 0 no Classic
18 40750 5200 4 1 3 yes no no no no 0 no Classic
20 45000 3986 3 2 2 no yes yes no no 1 no Classic
22 65900 4510 4 2 1 yes no yes no no 0 no Classic

SQL Call

SELECT * FROM XGBoost (
  ON housing_train_binary AS InputTable
  OUT TABLE OutputTable (xgboost_model)
  USING
  ResponseColumn ('homestyle')
  PredictionType ('classification') 
  NumericInputs ('price','lotsize','bedrooms','bathrms','stories','garagepl')
  CategoricalInputs ('driveway','recroom','fullbase','gashw','airco',
    'prefarea')
  LossFunction ('binomial')
  IterNum (10)
  MaxDepth (10)
  MinNodeSize (1)
  RegularizationLambda (1)
  ShrinkageFactor (0.1)
  IDColumn ('sn')
  NumBoostedTrees (2)
) AS dt;

Output

 message                                                          
 ---------------------------------------------------------------- 
 Parameters:                                                     
 	Number of boosting iterations : 10                             
 	Number of boosted trees : 2                                    
 	Number of total trees (all subtrees): 20                       
 	Prediction Type : CLASSIFICATION                               
 	LossFunction : BINOMIAL                                        
 	Regularization : 1.0                                           
 	Shrinkage : 0.1                                                
 	MaxDepth : 10                                                  
 	MinNodeSize : 1                                                
 	Variance : 0.0                                                 
 	Seed : 1                                                       
 	ColumnSubSampling Features: 12                                 
 XGBoost model created in table specified in OutputTable argument

This query returns the following table:

SELECT tree_id, iter, class_num, CAST (tree AS VARCHAR(30)),
  CAST (region_prediction AS VARCHAR(30))
  FROM xgboost_model ORDER BY 1,2,3;

For simplicity, the last two output columns show only the first 30 characters of each value.

 tree_id iter class_num tree                           region_prediction              
 ------- ---- --------- ------------------------------ ------------------------------ 
      -1   -1        -1 {"classifier":"CLASSIFICATION"                               
       0    1         0 {"sum_":1.1999999649514592E-6, {"1792":0.06969074,"1280":-0.1
       0    2         0 {"sum_":-0.14652091000000556," {"384":0.042531442,"385":0.030
       0    3         0 {"sum_":0.28140193999999746,"s {"1664":0.052631423,"1665":0.0
       0    4         0 {"sum_":1.2547231600000008,"su {"1280":-0.06937855,"1281":-0.
       0    5         0 {"sum_":1.9154820900000036,"su {"768":0.027756682,"1538":0.03
       0    6         0 {"sum_":1.9837604199999974,"su {"768":0.026932025,"1538":0.03
       0    7         0 {"sum_":2.0918175700000003,"su {"768":0.026147524,"769":0.035
       0    8         0 {"sum_":2.360585519999998,"sum {"768":0.023378344,"769":0.025
       0    9         0 {"sum_":2.623491119999999,"sum {"768":0.02468738,"769":0.0227
       0   10         0 {"sum_":3.1594645900000033,"su {"770":0.024006711,"771":0.023
       1    1         0 {"sum_":-2.3999999940738093E-6 {"1664":0.07176345,"1536":0.07
       1    2         0 {"sum_":1.2196361999999998,"su {"1536":0.067304045,"1024":-0.
       1    3         0 {"sum_":1.693821099999995,"sum {"1536":0.06236268,"512":-0.07
       1    4         0 {"sum_":1.5348959100000008,"su {"1536":0.053727582,"769":0.02
       1    5         0 {"sum_":1.6560536700000057,"su {"256":-0.04895391,"257":-0.06
       1    6         0 {"sum_":1.8388901200000038,"su {"512":-0.046615,"513":-0.0458
       1    7         0 {"sum_":2.2050783699999994,"su {"1540":0.026542466,"1541":0.0
       1    8         0 {"sum_":2.6216585700000024,"su {"258":-0.040129118,"259":-0.0
       1    9         0 {"sum_":2.92402738,"sumSq_":65 {"16":-0.042044364,"20":-0.040
       1   10         0 {"sum_":3.210829989999998,"sum {"260":-0.03580759,"522":-0.03

Download a zip file of all examples and a SQL script file that creates their input tables from the attachment in the left sidebar.