AdaBoostPredict Example 1 - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

This example uses test data and the model output by the AdaBoost Example 1 to use real estate sales data to predict home style.

Input

  • attributetable: housing_test_att, created by inputting housing_train (the raw data for DecisionForest Example 1: Classification Tree without Out-of-Bag Error) to the Unpivoting function:
    CREATE MULTISET TABLE housing_test_att AS (
      SELECT * FROM Unpivoting (
        ON housing_test
        USING
        TargetColumns ('price', 'lotsize', 'bedrooms', 'bathrms',
          'stories', 'driveway', 'recroom', 'fullbase', 'gashw', 'airco',
          'garagepl', 'prefarea')
        Accumulate ('sn')
      ) AS dt
    ) WITH DATA;
  • model: abd_model, output by AdaBoost Example 1

This query returns the following table:

SELECT * FROM housing_test_att ORDER BY 1, 2;
housing_test_att
sn attribute value_col
13 airco no
13 bathrms 1
13 bedrooms 3
13 driveway yes
13 fullbase no
13 garagepl 0
13 gashw no
13 lotsize 1700.0
13 prefarea no
13 price 27000.0
13 recroom no
13 stories 2
16 airco yes
16 bathrms 1
16 bedrooms 2
16 driveway yes
16 fullbase no
16 garagepl 0
16 gashw no
16 lotsize 3185.0
16 prefarea no
16 price 37900.0
16 recroom no
16 stories 1
... ... ...

SQL Call

CREATE MULTISET TABLE housing_prediction AS (
  SELECT * FROM AdaBoostPredict (
    ON housing_test_att AS attributetable PARTITION BY sn
    ON abd_model AS model DIMENSION
    USING
    AttrTableGroupbyColumns ('attribute')
    AttrTablePidColumns ('sn')
    AttrTableValColumn ('value')
  ) AS dt
) WITH DATA;

Output

This query returns the following table:

SELECT * FROM housing_prediction ORDER BY 1;

The pred_label column contains the predicted response.

housing_prediction
sn pred_label
13 Classic
16 Classic
25 Classic
38 Eclectic
53 Eclectic
104 Bungalow
111 Classic
117 Eclectic
132 Classic
140 Classic
142 Classic
157 Eclectic
161 Eclectic
162 Bungalow
176 Eclectic
177 Eclectic
195 Classic
198 Classic
224 Eclectic
234 Classic
237 Classic
239 Classic
249 Classic
251 Classic
254 Eclectic
255 Eclectic
260 Classic
274 Eclectic
294 Classic
301 Eclectic
306 Eclectic
317 Eclectic
329 Bungalow
339 Bungalow
340 Eclectic
353 Eclectic
355 Eclectic
364 Eclectic
367 Bungalow
377 Bungalow
401 Eclectic
403 Eclectic
408 Eclectic
411 Eclectic
440 Eclectic
441 Eclectic
443 Eclectic
459 Classic
463 Classic
469 Eclectic
472 Eclectic
527 Bungalow
530 Eclectic
540 Eclectic

Prediction Accuracy

This query returns the prediction accuracy:

SELECT (SELECT COUNT(*) FROM housing_prediction, housing_test
  WHERE housing_prediction.sn = housing_test.sn
  AND housing_prediction.pred_label =
    housing_test.homestyle) / (SELECT COUNT(SN) FROM housing_prediction)
  ) AS PA;
PA
0.98148148148148148148

The prediction accuracy is 98.1%, a large improvement over the Forest_Predict function, whose prediction accuracy is 77.7% on the same input.