This example calculates the cross-validation error for four GLM models based on the Gaussian family.
Input
The input table, housing_train, is from GLM Example 3: Gaussian Distribution Analysis with Default Options.
SQL Call
The LinkFunction and Intercept arguments specify the four GLM models to validate.
SELECT * FROM CrossValidation ( ON housing_train AS InputTable OUT TABLE CVTable (glmcvtable) USING Family ('gaussian') FunctionName ('glm') InputColumns ('price ','lotsize ','bedrooms ','bathrms ', 'stories ','garagepl','driveway ','recroom ','fullbase ','gashw ', 'airco ','prefarea','homestyle') CategoricalColumns ('driveway ','recroom ','fullbase ','gashw ', 'airco ','prefarea','homestyle') LinkFunction ('identity','log','identity','log') Intercept ('t','f','f','t') FoldNum (3) CVParams ('LinkFunction','Intercept') Metric ('MSE') ) AS dt;
Output
message |
---|
Finished. Results can be found in table specified in the argument CVTable |
This query returns the following table:
SELECT * FROM glmcvtable;
linkfunction intercept model cverror ------------ ---------- ------------------ ---------------------- identity t "cv_outputtable_0" 1.15206987686268E 008 log f "cv_outputtable_1" 5.33687796406253E 009 log t "cv_outputtable_3" 5.33687796401766E 009 identity f "cv_outputtable_2" 1.15206981069037E 008
The cross-validation error shows that the default link function, identity, performs better than the log link function.