This example uses the GLM function to create two models and then compares them with the LikelihoodRatioTest function.
The first model, damage_glm1, is built using the independent variable "temp" and is the alternative hypothesis.
The second model, damage_glm2, is the null model. It includes no independent variables and is built using only the intercept.
The LikelihoodRatioTest query compares the two models to test whether "temp" is significant in the model.
Input
This is the input table for the two GLM calls:
id | temp | damage |
---|---|---|
1 | 53 | 1 |
2 | 57 | 1 |
3 | 58 | 1 |
4 | 63 | 1 |
5 | 66 | 0 |
6 | 67 | 0 |
7 | 67 | 0 |
8 | 67 | 0 |
9 | 68 | 0 |
10 | 69 | 0 |
11 | 70 | 1 |
12 | 70 | 0 |
13 | 70 | 1 |
14 | 70 | 0 |
15 | 72 | 0 |
16 | 73 | 0 |
17 | 75 | 0 |
18 | 75 | 1 |
19 | 76 | 0 |
20 | 76 | 0 |
21 | 78 | 0 |
22 | 79 | 0 |
This GLM call creates a model, damage_glm1, using the predictor variable temp (in the second row of damage_glm1):
SELECT * FROM GLM (
ON glm_tempdamage AS InputTable
OUT TABLE OutputTable (damage_glm1)
USING
TargetColumns ('damage','temp')
Family ('LOGISTIC')
Link ('CANONICAL')
StopThreshold ('0.01')
MaxIterNum ('10')
) AS dt;
predictor estimate std_error z_score p_value significance ----------------------- -------------------- ------------------- ------------------- -------------------- --------------------------------------- (Intercept) 14.805399894714355 7.44366979598999 1.9889899492263794 0.04670250043272972 * temp -0.22856199741363525 0.10932400077581406 -2.0906898975372314 0.036556001752614975 * ITERATIONS # 4.0 0.0 0.0 0.0 Number of Fisher Scoring iterations ROWS # 22.0 0.0 0.0 0.0 Number of rows Residual deviance 20.26799964904785 0.0 0.0 0.0 on 20 degrees of freedom Pearson goodness of fit 22.771900177001953 0.0 0.0 0.0 on 20 degrees of freedom AIC 24.26799964904785 0.0 0.0 0.0 Akaike information criterion BIC 26.450084686279297 0.0 0.0 0.0 Bayesian information criterion Wald Test 6.039470195770264 0.0 0.0 0.04881415143609047 * Dispersion parameter 1.0 0.0 0.0 0.0 Taken to be 1 for BINOMIAL and POISSON.
SELECT * FROM damage_glm1 ORDER BY 1;
attribute predictor category estimate std_err z_score p_value significance family --------- ----------- -------- -------------------- ------------------- ------------------- -------------------- ------------ -------- -1 Loglik NULL -10.133999824523926 22.0 1.0 0.0 NULL LOGISTIC 0 (Intercept) NULL 14.805399894714355 7.44366979598999 1.9889899492263794 0.04670250043272972 * LOGISTIC 1 temp NULL -0.22856199741363525 0.10932400077581406 -2.0906898975372314 0.036556001752614975 * LOGISTIC
This GLM call creates the null model, produced with only the intercept:
SELECT * FROM GLM (
ON glm_tempdamage AS InputTable
OUT TABLE OutputTable(damage_glm2)
USING
InputColumns ('damage')
Family ('LOGISTIC')
LinkFunction ('CANONICAL')
StopThreshold ('0.01')
MaxIterNum ('10')
) AS dt;
predictor estimate std_error z_score p_value significance ----------------------- ------------------- ------------------ ------------------- ------------------- --------------------------------------- (Intercept) -0.7621399760246277 0.4577380120754242 -1.6650199890136719 0.09590969979763031 . ITERATIONS # 3.0 0.0 0.0 0.0 Number of Fisher Scoring iterations ROWS # 22.0 0.0 0.0 0.0 Number of rows Residual deviance 27.5216007232666 0.0 0.0 0.0 on 21 degrees of freedom Pearson goodness of fit 22.0 0.0 0.0 0.0 on 21 degrees of freedom AIC 29.5216007232666 0.0 0.0 0.0 Akaike information criterion BIC 30.612642288208008 0.0 0.0 0.0 Bayesian information criterion Wald Test 2.772279977798462 0.0 0.0 0.09590946137905121 . Dispersion parameter 1.0 0.0 0.0 0.0 Taken to be 1 for BINOMIAL and POISSON.
SELECT * FROM damage_glm2 ORDER BY 1;
attribute predictor category estimate std_err z_score p_value significance family --------- ----------- -------- ------------------- ------------------ ------------------- ------------------- ------------ -------- -1 Loglik NULL -13.7608003616333 22.0 0.0 0.0 NULL LOGISTIC 0 (Intercept) NULL -0.7621399760246277 0.4577380120754242 -1.6650199890136719 0.09590969979763031 . LOGISTIC
SQL Call
SELECT * FROM LikelihoodRatioTest ( ON (SELECT * FROM damage_glm1 WHERE attribute = -1) AS FirstModel PARTITION BY 1 ON (SELECT * FROM damage_glm2 WHERE attribute = -1) AS SecondModel PARTITION BY 1 USING statistic('predictor') loglik('estimate') ObsNum('std_err') ParamNum('z_score') ) AS dt;
Output
distribution statistic p_value ------------------ ---------------- -------------------- Chi-squared 1 d.f. 7.25360107421875 0.007075897417962551
The reported chi square value (7.2536) corresponds to a p-value of 0.007, which is statistically significant at the 95% confidence level (p-value < 0.05). This shows that "temp" is significant in the model; that is, the alternative model, damage_glm1, cannot be rejected in favor of the null hypothesis model.
Download a zip file of all examples and a SQL script file that creates their input tables.