This example uses U. S. Census Service data about housing in Boston, Massachusetts in 1978 to create a regression model that predicts median home value, which can be input to XGBoostPredict Example: Regression, Dense Format.
Column | Description |
---|---|
crim | Per capita crime rate by town |
zn | Proportion of residential land zoned for lots over 25,000 square feet |
indus | Proportion of nonretail business acres for each town |
chas | Charles River dummy variable (1 if tract bounds river, 0 otherwise |
nox | Nitrous oxides concentration (parts/10 million) |
rm | Average number of rooms in each dwelling |
age | Proportion of owner-occupied units built before 1940 |
dis | Weighted distances to five Boston employment centers |
rad | Index of accessibility to radial highways |
tax | Full-value property tax rate for each $10,000 |
ptratio | Pupil-teacher ratio by town |
black | 1000(Bk - 0.63)2 where Bk is number of black residents in each town |
lstat | Percent lower status of population |
medv | Median value of owner-occupied homes (in $1000s) |
Input
InputTable: boston
id crim zn indus chas nox rm age dis rad tax ptratio black lstat medv --- ---- ---- ----- ---- --- --- ---- --- --- ----- ------- ----- ----- ---- 469 16. 0. 18. 0. 1. 6. 71. 3. 24. 666. 20. 369. 18. 19. 265 1. 20. 4. 0. 1. 7. 92. 2. 5. 264. 13. 388. 8. 36. 40 0. 75. 3. 0. 0. 7. 22. 5. 3. 252. 18. 396. 4. 31. 122 0. 0. 26. 0. 1. 6. 84. 2. 2. 188. 19. 378. 14. 20. 61 0. 25. 5. 0. 0. 6. 66. 7. 8. 284. 20. 395. 13. 19. 244 0. 30. 5. 0. 0. 6. 8. 7. 6. 300. 17. 375. 5. 24. 162 1. 0. 20. 0. 1. 7. 91. 2. 5. 403. 15. 374. 2. 50. 387 24. 0. 18. 0. 1. 5. 100. 1. 24. 666. 20. 397. 28. 10. 326 0. 0. 7. 0. 0. 6. 15. 5. 5. 287. 20. 394. 5. 25. 305 0. 33. 2. 0. 0. 7. 41. 4. 7. 222. 18. 394. 7. 36. 223 1. 0. 6. 1. 1. 7. 78. 3. 8. 307. 17. 390. 10. 28. 448 10. 0. 18. 0. 1. 6. 97. 2. 24. 666. 20. 389. 16. 13. 183 0. 0. 2. 0. 0. 7. 92. 3. 3. 193. 18. 394. 5. 38. 101 0. 0. 9. 0. 1. 7. 80. 3. 5. 384. 21. 395. 9. 28. 488 5. 0. 18. 0. 1. 6. 53. 3. 24. 666. 20. 388. 11. 21.
SQL Call
SELECT * FROM XGBoost ( ON boston AS InputTable OUT TABLE OutputTable (xgboost_regression_model) USING ResponseColumn ('medv') PredictionType ('regression') LossFunction ('mse') NumericInputs ('crim','zn','indus','chas','nox','rm', 'age','dis','rad','tax','ptratio','black','lstat') IterNum (5) MaxDepth (10) MinNodeSize (1) RegularizationLambda ('1') ShrinkageFactor ('0.1') IDColumn ('id') NumBoostedTrees ('1') ) AS dt;
Output
message ---------------------------------------------------------------- Parameters: Number of boosting iterations : 5 Number of boosted trees : 1 Number of total trees (all subtrees): 5 Prediction Type : REGRESSION LossFunction : MSE Regularization : 1.0 Shrinkage : 0.1 MaxDepth : 10 MinNodeSize : 1 Variance : 0.0 Seed : null ColumnSubSampling Features: 13 XGBoost model created in table specified in OutputTable argument
This query returns the following table:
SELECT tree_id, iter, class_num, CAST (tree AS VARCHAR(30)), CAST (region_prediction AS VARCHAR(30)) FROM xgboost_regression_model ORDER BY 1,2,3;
The column tree is shown truncated to 80 characters.
tree_id iter class_num tree ------- ----- --------- ------------------------------------------------------- 0 4 0 {"sum_":6.50351156,"sumSq_":43051.688035353036,"size_": 0 2 0 {"sum_":1.8182168800000742,"sumSq_":42820.70821381819," 0 1 0 {"sum_":1.421199996389788E-4,"sumSq_":42719.33465255344 0 3 0 {"sum_":-0.568932420000074,"sumSq_":42870.20826191229," -1 -1 -1 {"classifier":"REGRESSION","lossType":"MSE","numIterati 0 5 0 {"sum_":4.517953350000042,"sumSq_":43106.41661851385,"s
Download a zip file of all examples and a SQL script file that creates their input tables.