XGBoost Example 3: Sparse Input Format - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

Input

  • InputTable: sparse_housing_train_binary, created by inputting housing_train_binary (InputTable for XGBoost Example 1: Binary Classification) to the Unpivoting function:
    DROP TABLE sparse_housing_train_binary;
    
    CREATE MULTISET TABLE sparse_housing_train_binary AS ( 
      SELECT sn, homestyle, attribute, value_col AS value1 FROM 
        Unpivoting (
          ON housing_train_binary
          USING
          TargetColumns (
            'price','lotsize','bedrooms','bathrms','stories',
            'garagepl','driveway','recroom','fullbase','gashw','airco','prefarea'
          )
          Accumulate ('sn','homestyle')
      ) AS dt
    ) WITH DATA;
    
  • AttributeTable: sparse_housing_attributes, created and populated with this code:
    DROP TABLE IF EXISTS sparse_housing_attributes;
    
    CREATE MULTISET TABLE sparse_housing_attributes (
      attributename VARCHAR, attributetype NUMERIC
    );
    
    INSERT INTO sparse_housing_attributes VALUES
    ('price','1'),
    ('lotsize','1'),
    ('bedrooms','1'),
    ('bathrms','1'),
    ('stories','1'),
    ('garagepl','1'),
    ('driveway','0'),
    ('recroom','0'),
    ('fullbase','0'),
    ('gashw','0'),
    ('airco','0'),
    ('prefarea','0');
    
InputTable: sparse_housing_train_binary
sn homestyle attribute value1
1 Classic airco no
1 Classic bathrms 3
1 Classic bedrooms 1
1 Classic driveway yes
1 Classic fullbase yes
1 Classic garagepl 1
1 Classic gashw no
1 Classic lotsize 5850.0
1 Classic prefarea no
1 Classic price 42000.0
1 Classic recroom no
1 Classic stories 2
2 Classic airco no
2 Classic bathrms 1
2 Classic bedrooms 2
2 Classic driveway yes
... ... ... ...
AttributeTable: sparse_housing_attributes
price lotsize bedrooms bathrms stories garagepl driveway recroom fullbase gashw airco prefarea
1 1 1 1 1 1 0 0 0 0 0 0

SQL Call

DROP TABLE IF EXISTS sparse_housing_xgboost_model ;

SELECT * FROM XGBoost (
  ON sparse_housing_train_binary AS InputTable
  ON sparse_housing_attributes AS AttributeTable
  OUT TABLE OutputTable (sparse_housing_xgboost_model)
  USING
  ResponseColumn ('homestyle')
  PredictionType ('classification') 
  AttributeNameColumn ('attribute')
  AttributeValueColumn ('value1')
  LossFunction ('binomial')
  IterNum (10)
  MaxDepth (10)
  MinNodeSize (1)
  RegularizationLambda (1)
  ShrinkageFactor (0.1)
  IDColumn ('sn')
  NumBoostedTrees (2)
) AS dt;

Output

message
Parameters: 
 	Number of boosting iterations : 10
 	Number of boosted trees : 2
 	Number of total trees (all subtrees): 20
 	Prediction Type : CLASSIFICATION
 	LossFunction : BINOMIAL
 	Regularization : 1.0
 	Shrinkage : 0.1
 	MaxDepth : 10
 	MinNodeSize : 1
 	Variance : 0.0
 	Seed : 1
 	ColumnSubSampling Features: 12
 XGBoost model created in table specified in OutputTable argument

This query returns the following table:

SELECT tree_id, iter, class_num, CAST (tree AS VARCHAR(30)) AS tree,
  cast(region_prediction AS VARCHAR(30)) AS region_pred
    FROM sparse_housing_xgboost_model order by 1,2;

For simplicity, the last two output columns show only the first 30 characters of each value.

sparse_housing_xgboost_model order
tree_id iter class_num tree region_pred
-1 -1 -1 {"classifier":"CLASSIFICATION"  
0 1 0 {"sum_":1.2000001063938726E-6, {"1792":0.06969074,"1280":-0.1
0 2 0 {"sum_":-0.14652091000004508," {"384":0.042531442,"385":0.030
0 3 0 {"sum_":0.2814020000000276,"su {"1664":0.052631423,"1665":0.0
0 4 0 {"sum_":1.2547231599999824,"su {"1280":-0.06937855,"1281":-0.
0 5 0 {"sum_":1.9154821700000046,"su {"768":0.027756682,"1538":0.03
0 6 0 {"sum_":1.98376042000001,"sumS {"768":0.026932025,"1538":0.03
0 7 0 {"sum_":2.091817570000011,"sum {"768":0.026147524,"769":0.035
0 8 0 {"sum_":2.360585519999989,"sum {"768":0.023378344,"769":0.025
0 9 0 {"sum_":2.623490840000003,"sum {"768":0.02468738,"769":0.0227
0 10 0 {"sum_":3.1594640700000087,"su {"770":0.024006711,"771":0.023
1 1 0 {"sum_":-2.4000000620194584E-6 {"1664":0.07176345,"1792":0.07
1 2 0 {"sum_":1.2196362000000476,"su {"1152":-0.108543135,"1024":-0
1 3 0 {"sum_":1.693821100000016,"sum {"512":-0.07265504,"1536":0.06
1 4 0 {"sum_":1.5348959100000301,"su {"1536":0.053727582,"1537":0.0
1 5 0 {"sum_":1.6560536700000048,"su {"256":-0.04895391,"257":-0.06
1 6 0 {"sum_":1.8388901200000012,"su {"512":-0.046615,"513":-0.0458
1 7 0 {"sum_":2.205078370000002,"sum {"1540":0.026542466,"1541":0.0
1 8 0 {"sum_":2.621658569999993,"sum {"258":-0.040129118,"259":-0.0
1 9 0 {"sum_":2.9240273799999925,"su {"16":-0.042044364,"20":-0.040
1 10 0 {"sum_":3.2108299899999864,"su {"260":-0.03580759,"522":-0.03