This section shows the input table, SQL query, and output tables of an example using TD_XGBoost for classification.
InputTable
The input is a diabetes dataset sample, with 3 feature columns and a target (response) column 'outcome'. It is a binary classification problem with two classes 0 and 1.
ID | outcome | col_1 | col_2 | col_3 |
---|---|---|---|---|
1 | 1 | 23 | 8.14 | 0.84054 |
1 | 1 | 20 | 3.97 | 0.66351 |
1 | 1 | 0 | 4.05 | 0.07022 |
1 | 1 | 15 | 10.59 | 0.21719 |
1 | 1 | 0 | 6.91 | 0.18836 |
1 | 2 | 20 | 3.97 | 0.66351 |
1 | 2 | 20 | 3.97 | 0.5405 |
1 | 2 | 16 | 1.95 | 2.77974 |
1 | 2 | 0 | 4.49 | 0.05735 |
SQL Call
SELECT * FROM TD_XGBoost ( ON diabetes_sample PARTITION BY ANY OUT TABLE MetaInformationTable(xgb_out) USING ResponseColumn('response') InputColumns('[2:4]') MaxDepth(3) MinNodeSize(1) NumParallelTrees(2) ModelType('CLASSIFICATION') Seed(1) RegularizationLambda(1) LearningRate(0.5) NumBoostRounds(2) MinImpurity(0) ColumnSampling(1.0) ) AS dt;
Output
task_index tree_num iter class_num tree_order tree ---------- -------- ---- --------- ---------- ---- 0 1 1 0 0 {"id_":1,"sum_":-2.000000,"sumSq_":1.000000,"size_":4,"maxDepth_":0,"value_":-0.500000,"nodeType_":"REGRESSION_LEAF","prediction_":-0.500000} 0 1 2 0 0 {"id_":1,"sum_":-1.510163,"sumSq_":0.570148,"size_":4,"maxDepth_":0,"value_":-0.377541,"nodeType_":"REGRESSION_LEAF","prediction_":-0.389214} 0 2 1 0 0 {"id_":1,"sum_":1.000000,"sumSq_":1.000000,"size_":4,"maxDepth_":3,"nodeType_":"REGRESSION_NODE","split_": {"splitValue_":8.000000,"attr_":"col_1","type_":"REGRESSION_NUMERIC_SPLIT","score_":0.750000,"scoreImprove_":0.750000,"leftNodeSize_":1,"rightNod {"id_":2,"sum_":-0.500000,"sumSq_":0.250000,"size_":1,"maxDepth_":0,"value_":-0.500000,"nodeType_":"REGRESSION_LEAF","prediction_":-0.200000},"r {"id_":3,"sum_":1.500000,"sumSq_":0.750000,"size_":3,"maxDepth_":0,"value_":0.500000,"nodeType_":"REGRESSION_LEAF","prediction_":0.428571}} 0 2 2 0 0 {"id_":1,"sum_":0.733237,"sumSq_":0.669463,"size_":4,"maxDepth_":3,"nodeType_":"REGRESSION_NODE","split_": {"splitValue_":8.000000,"attr_":"col_1","type_":"REGRESSION_NUMERIC_SPLIT","score_":0.535054,"scoreImprove_":0.535054,"leftNodeSize_":1,"rightNod {"id_":2,"sum_":-0.450166,"sumSq_":0.202649,"size_":1,"maxDepth_":0,"value_":-0.450166,"nodeType_":"REGRESSION_LEAF","prediction_":-0.180425},"r {"id_":3,"sum_":1.183403,"sumSq_":0.466814,"size_":3,"maxDepth_":0,"value_":0.394468,"nodeType_":"REGRESSION_LEAF","prediction_":0.344696}} 0 -1 -1 -1 -1 {"lossType":"LOGISTIC","numBoostedTrees":2,"iterNum":2,"avgResponses":0.000000,"classMapping":{"1":0,"2":1}}
Out MetaInformation Table
task_index tree_num iter accuracy deviance ---------- -------- ---- -------- -------- 0 1 1 1 0.378 0 1 2 1 0.291 0 2 1 1 0.408 0 2 2 1 0.338