This example calculates the cross-validation error for four GLM models based on the Gaussian family.
Input
The input table, housing_train, is from GLM Example: Gaussian Distribution Analysis.
SQL Call
The LinkFunction and Intercept syntax elements specify the four GLM models to validate.
SELECT * FROM CrossValidation ( ON housing_train AS InputTable OUT TABLE CrossValidationErrorTable (glmcvtable) USING Family ('gaussian') FunctionName ('glm') InputColumns ('price ','lotsize ','bedrooms ','bathrms ', 'stories ','garagepl','driveway ','recroom ','fullbase ','gashw ', 'airco ','prefarea','homestyle') CategoricalColumns ('driveway ','recroom ','fullbase ','gashw ', 'airco ','prefarea','homestyle') LinkFunction ('identity','log','identity','log') Intercept ('t','f','f','t') FoldNum (3) CVParams ('LinkFunction','Intercept') Metric ('MSE') ) AS dt;
Output
message -------------------------------------------------------------------------- Finished. Results can be found in table specified in the argument CVTable
SELECT * FROM glmcvtable;
linkfunction intercept model cverror ------------ ---------- ------------------ ---------------------- identity t "cv_outputtable_0" 1.15206987686268E 008 log f "cv_outputtable_1" 5.33687796406253E 009 log t "cv_outputtable_3" 5.33687796401766E 009 identity f "cv_outputtable_2" 1.15206981069037E 008
The cross-validation error shows that the default link function, identity, performs better than the log link function.
Download a zip file of all examples and a SQL script file that creates their input tables.