1.0 - 8.00 - LikelihoodRatioTest Example - Teradata Vantage

Teradata® Vantage Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.0
8.00
Release Date
May 2019
Content Type
Programming Reference
Publication ID
B700-4003-098K
Language
English (United States)

This example uses the GLM function to create two models and then compares them with the LikelihoodRatioTest function.

Input

This is the input table for the two GLM calls:

glm_tempdamage
id temp damage
1 53 1
2 57 1
3 58 1
4 63 1
5 66 0
6 67 0
7 67 0
8 67 0
9 68 0
10 69 0
11 70 1
12 70 0
13 70 1
14 70 0
15 72 0
16 73 0
17 75 0
18 75 1
19 76 0
20 76 0
21 78 0
22 79 0

This GLM call creates a model, damage_glm1, using the predictor variable temp (in the second row of damage_glm1):

SELECT * FROM GLM (
  ON glm_tempdamage AS InputTable
  OUT TABLE OutputTable (damage_glm1)
  USING
  InputColumns ('damage','temp')
  Family ('LOGISTIC')
  Link ('CANONICAL')
  StopThreshold ('0.01')
  MaxIterNum ('10')
) AS dt;
damage_glm1
predictor estimate std_error z_score p_value significance
(Intercept) 14.8054 7.44367 1.98899 0.0467025 *
temp -0.228562 0.109324 -2.09069 0.036556 *
ITERATIONS # 4 0 0 0 Number of Fisher Scoring iterations
ROWS # 22 0 0 0 Number of rows
Residual deviance 20.268 0 0 0 on 20 degrees of freedom
Pearson goodness of fit 22.7719 0 0 0 on 20 degrees of freedom
AIC 24.268 0 0 0 Akaike information criterion
BIC 26.4501 0 0 0 Bayesian information criterion
Wald Test 6.03947 0 0 0.0488142 *
Dispersion parameter 1 0 0 0 Taken to be 1 for BINOMIAL and POISSON.

This GLM call creates the null model, produced with only the intercept:

SELECT * FROM GLM (
  ON glm_tempdamage AS InputTable
  OUT TABLE OutputTable (damage_glm2)
  USING
  InputColumns ('damage')
  Family ('LOGISTIC')
  Link ('CANONICAL')
  StopThreshold ('0.01')
  MaxIterNum ('10')
) AS dt;
damage_glm2
predictor estimate std_error z_score p_value significance
(Intercept) -0.76214 0.457738 -1.66502 0.0959097 .
ITERATIONS # 3 0 0 0 Number of Fisher Scoring iterations
ROWS # 22 0 0 0 Number of rows
Residual deviance 27.5216 0 0 0 on 21 degrees of freedom
Pearson goodness of fit 22 0 0 0 on 21 degrees of freedom
AIC 29.5216 0 0 0 Akaike information criterion
BIC 30.6126 0 0 0 Bayesian information criterion
Wald Test 2.77228 0 0 0.0959095 .
Dispersion parameter 1 0 0 0 Taken to be 1 for BINOMIAL and POISSON.

SQL Call

SELECT * FROM LikelihoodRatioTest (
  ON (SELECT * FROM damage_glm1 WHERE attribute = -1) AS model1
    PARTITION BY 1
  ON (SELECT * FROM damage_glm2 WHERE attribute = -1) AS model2
    PARTITION BY 1
  USING
  Statistic ('predictor')
  Loglik ('estimate')
  ObsNum ('std_err')
  ParamNum ('z_score')
) AS dt;

Output

distribution statistic p_value
Chi-squared 1 d.f. 7.2536 0.0070759

The final output compares the two GLM distributions and displays a chi squared statistic and a p-value. The chi square value suggests, based on all predictors, that the data was more likely created by model 1 than by the null model. The result is statistically significant at the 95% confidence level (p-value < 0.05).