This example uses the GLM function to create two models and then compares them with the LikelihoodRatioTest function.
Input
This is the input table for the two GLM calls:
id | temp | damage |
---|---|---|
1 | 53 | 1 |
2 | 57 | 1 |
3 | 58 | 1 |
4 | 63 | 1 |
5 | 66 | 0 |
6 | 67 | 0 |
7 | 67 | 0 |
8 | 67 | 0 |
9 | 68 | 0 |
10 | 69 | 0 |
11 | 70 | 1 |
12 | 70 | 0 |
13 | 70 | 1 |
14 | 70 | 0 |
15 | 72 | 0 |
16 | 73 | 0 |
17 | 75 | 0 |
18 | 75 | 1 |
19 | 76 | 0 |
20 | 76 | 0 |
21 | 78 | 0 |
22 | 79 | 0 |
This GLM call creates a model, damage_glm1, using the predictor variable temp (in the second row of damage_glm1):
SELECT * FROM GLM (
ON glm_tempdamage AS InputTable
OUT TABLE OutputTable (damage_glm1)
USING
InputColumns ('damage','temp')
Family ('LOGISTIC')
Link ('CANONICAL')
StopThreshold ('0.01')
MaxIterNum ('10')
) AS dt;
predictor | estimate | std_error | z_score | p_value | significance |
---|---|---|---|---|---|
(Intercept) | 14.8054 | 7.44367 | 1.98899 | 0.0467025 | * |
temp | -0.228562 | 0.109324 | -2.09069 | 0.036556 | * |
ITERATIONS # | 4 | 0 | 0 | 0 | Number of Fisher Scoring iterations |
ROWS # | 22 | 0 | 0 | 0 | Number of rows |
Residual deviance | 20.268 | 0 | 0 | 0 | on 20 degrees of freedom |
Pearson goodness of fit | 22.7719 | 0 | 0 | 0 | on 20 degrees of freedom |
AIC | 24.268 | 0 | 0 | 0 | Akaike information criterion |
BIC | 26.4501 | 0 | 0 | 0 | Bayesian information criterion |
Wald Test | 6.03947 | 0 | 0 | 0.0488142 | * |
Dispersion parameter | 1 | 0 | 0 | 0 | Taken to be 1 for BINOMIAL and POISSON. |
This GLM call creates the null model, produced with only the intercept:
SELECT * FROM GLM (
ON glm_tempdamage AS InputTable
OUT TABLE OutputTable (damage_glm2)
USING
InputColumns ('damage')
Family ('LOGISTIC')
Link ('CANONICAL')
StopThreshold ('0.01')
MaxIterNum ('10')
) AS dt;
predictor | estimate | std_error | z_score | p_value | significance |
---|---|---|---|---|---|
(Intercept) | -0.76214 | 0.457738 | -1.66502 | 0.0959097 | . |
ITERATIONS # | 3 | 0 | 0 | 0 | Number of Fisher Scoring iterations |
ROWS # | 22 | 0 | 0 | 0 | Number of rows |
Residual deviance | 27.5216 | 0 | 0 | 0 | on 21 degrees of freedom |
Pearson goodness of fit | 22 | 0 | 0 | 0 | on 21 degrees of freedom |
AIC | 29.5216 | 0 | 0 | 0 | Akaike information criterion |
BIC | 30.6126 | 0 | 0 | 0 | Bayesian information criterion |
Wald Test | 2.77228 | 0 | 0 | 0.0959095 | . |
Dispersion parameter | 1 | 0 | 0 | 0 | Taken to be 1 for BINOMIAL and POISSON. |
SQL Call
SELECT * FROM LikelihoodRatioTest ( ON (SELECT * FROM damage_glm1 WHERE attribute = -1) AS model1 PARTITION BY 1 ON (SELECT * FROM damage_glm2 WHERE attribute = -1) AS model2 PARTITION BY 1 USING Statistic ('predictor') Loglik ('estimate') ObsNum ('std_err') ParamNum ('z_score') ) AS dt;
Output
distribution | statistic | p_value |
---|---|---|
Chi-squared 1 d.f. | 7.2536 | 0.0070759 |
The final output compares the two GLM distributions and displays a chi squared statistic and a p-value. The chi square value suggests, based on all predictors, that the data was more likely created by model 1 than by the null model. The result is statistically significant at the 95% confidence level (p-value < 0.05).