LikelihoodRatioTest Example - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.10
1.1
Published
October 2019
Language
English (United States)
Last Update
2019-12-31
dita:mapPath
ima1540829771750.ditamap
dita:ditavalPath
jsj1481748799576.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

This example uses the GLM function to create two models and then compares them with the LikelihoodRatioTest function.

The first model, damage_glm1, is built using the independent variable "temp" and is the alternative hypothesis.

The second model, damage_glm2, is the null model. It includes no independent variables and is built using only the intercept.

The LikelihoodRatioTest query compares the two models to test whether "temp" is significant in the model.

Input

This is the input table for the two GLM calls:

glm_tempdamage
id temp damage
1 53 1
2 57 1
3 58 1
4 63 1
5 66 0
6 67 0
7 67 0
8 67 0
9 68 0
10 69 0
11 70 1
12 70 0
13 70 1
14 70 0
15 72 0
16 73 0
17 75 0
18 75 1
19 76 0
20 76 0
21 78 0
22 79 0

This GLM call creates a model, damage_glm1, using the predictor variable temp (in the second row of damage_glm1):

SELECT * FROM GLM (
  ON glm_tempdamage AS InputTable
  OUT TABLE OutputTable (damage_glm1)
  USING
  TargetColumns ('damage','temp')
  Family ('LOGISTIC')
  Link ('CANONICAL')
  StopThreshold ('0.01')
  MaxIterNum ('10')
) AS dt;
 predictor               estimate             std_error           z_score             p_value              significance                            
 ----------------------- -------------------- ------------------- ------------------- -------------------- --------------------------------------- 
 (Intercept)               14.805399894714355    7.44366979598999  1.9889899492263794  0.04670250043272972 *                                      
 temp                    -0.22856199741363525 0.10932400077581406 -2.0906898975372314 0.036556001752614975 *                                      
 ITERATIONS #                             4.0                 0.0                 0.0                  0.0 Number of Fisher Scoring iterations    
 ROWS #                                  22.0                 0.0                 0.0                  0.0 Number of rows                         
 Residual deviance          20.26799964904785                 0.0                 0.0                  0.0 on 20 degrees of freedom               
 Pearson goodness of fit   22.771900177001953                 0.0                 0.0                  0.0 on 20 degrees of freedom               
 AIC                        24.26799964904785                 0.0                 0.0                  0.0 Akaike information criterion           
 BIC                       26.450084686279297                 0.0                 0.0                  0.0 Bayesian information criterion         
 Wald Test                  6.039470195770264                 0.0                 0.0  0.04881415143609047 *                                      
 Dispersion parameter                     1.0                 0.0                 0.0                  0.0 Taken to be 1 for BINOMIAL and POISSON.
SELECT * FROM damage_glm1 ORDER BY 1;
 attribute predictor   category estimate             std_err             z_score             p_value              significance family   
 --------- ----------- -------- -------------------- ------------------- ------------------- -------------------- ------------ -------- 
        -1 Loglik      NULL      -10.133999824523926                22.0                 1.0                  0.0 NULL         LOGISTIC
         0 (Intercept) NULL       14.805399894714355    7.44366979598999  1.9889899492263794  0.04670250043272972 *            LOGISTIC
         1 temp        NULL     -0.22856199741363525 0.10932400077581406 -2.0906898975372314 0.036556001752614975 *            LOGISTIC

This GLM call creates the null model, produced with only the intercept:

SELECT * FROM GLM (
  ON glm_tempdamage AS InputTable
  OUT TABLE OutputTable(damage_glm2)

  USING
  InputColumns ('damage')
  Family ('LOGISTIC')
  LinkFunction ('CANONICAL')
  StopThreshold ('0.01')
  MaxIterNum ('10')
) AS dt;
 predictor               estimate            std_error          z_score             p_value             significance                            
 ----------------------- ------------------- ------------------ ------------------- ------------------- --------------------------------------- 
 (Intercept)             -0.7621399760246277 0.4577380120754242 -1.6650199890136719 0.09590969979763031 .                                      
 ITERATIONS #                            3.0                0.0                 0.0                 0.0 Number of Fisher Scoring iterations    
 ROWS #                                 22.0                0.0                 0.0                 0.0 Number of rows                         
 Residual deviance          27.5216007232666                0.0                 0.0                 0.0 on 21 degrees of freedom               
 Pearson goodness of fit                22.0                0.0                 0.0                 0.0 on 21 degrees of freedom               
 AIC                        29.5216007232666                0.0                 0.0                 0.0 Akaike information criterion           
 BIC                      30.612642288208008                0.0                 0.0                 0.0 Bayesian information criterion         
 Wald Test                 2.772279977798462                0.0                 0.0 0.09590946137905121 .                                      
 Dispersion parameter                    1.0                0.0                 0.0                 0.0 Taken to be 1 for BINOMIAL and POISSON.
SELECT * FROM damage_glm2 ORDER BY 1;
 attribute predictor   category estimate            std_err            z_score             p_value             significance family   
 --------- ----------- -------- ------------------- ------------------ ------------------- ------------------- ------------ -------- 
        -1 Loglik      NULL       -13.7608003616333               22.0                 0.0                 0.0 NULL         LOGISTIC
         0 (Intercept) NULL     -0.7621399760246277 0.4577380120754242 -1.6650199890136719 0.09590969979763031 .            LOGISTIC

SQL Call

SELECT * FROM LikelihoodRatioTest (
  ON (SELECT * FROM damage_glm1 WHERE attribute = -1) AS FirstModel 
    PARTITION BY 1
  ON (SELECT * FROM damage_glm2 WHERE attribute = -1) AS SecondModel 
    PARTITION BY 1
  USING
  statistic('predictor')
  loglik('estimate')
  ObsNum('std_err')
  ParamNum('z_score')
) AS dt;

Output

 distribution       statistic        p_value              
 ------------------ ---------------- -------------------- 
 Chi-squared 1 d.f. 7.25360107421875 0.007075897417962551

The reported chi square value (7.2536) corresponds to a p-value of 0.007, which is statistically significant at the 95% confidence level (p-value < 0.05). This shows that "temp" is significant in the model; that is, the alternative model, damage_glm1, cannot be rejected in favor of the null hypothesis model.

Download a zip file of all examples and a SQL script file that creates their input tables from the attachment in the left sidebar.