LikelihoodRatioTest Example - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

This example uses the GLM function to create two models and then compares them with the LikelihoodRatioTest function.

Input

This is the input table for the two GLM calls:

glm_tempdamage
id temp damage
1 53 1
2 57 1
3 58 1
4 63 1
5 66 0
6 67 0
7 67 0
8 67 0
9 68 0
10 69 0
11 70 1
12 70 0
13 70 1
14 70 0
15 72 0
16 73 0
17 75 0
18 75 1
19 76 0
20 76 0
21 78 0
22 79 0

This GLM call creates a model, damage_glm1, using the predictor variable temp (in the second row of damage_glm1):

SELECT * FROM GLM (
  ON glm_tempdamage AS InputTable
  OUT TABLE OutputTable (damage_glm1)
  USING
  InputColumns ('damage','temp')
  Family ('LOGISTIC')
  Link ('CANONICAL')
  StopThreshold ('0.01')
  MaxIterNum ('10')
) AS dt;
damage_glm1
predictor estimate std_error z_score p_value significance
(Intercept) 14.8054 7.44367 1.98899 0.0467025 *
temp -0.228562 0.109324 -2.09069 0.036556 *
ITERATIONS # 4 0 0 0 Number of Fisher Scoring iterations
ROWS # 22 0 0 0 Number of rows
Residual deviance 20.268 0 0 0 on 20 degrees of freedom
Pearson goodness of fit 22.7719 0 0 0 on 20 degrees of freedom
AIC 24.268 0 0 0 Akaike information criterion
BIC 26.4501 0 0 0 Bayesian information criterion
Wald Test 6.03947 0 0 0.0488142 *
Dispersion parameter 1 0 0 0 Taken to be 1 for BINOMIAL and POISSON.

This GLM call creates the null model, produced with only the intercept:

SELECT * FROM GLM (
  ON glm_tempdamage AS InputTable
  OUT TABLE OutputTable (damage_glm2)
  USING
  InputColumns ('damage')
  Family ('LOGISTIC')
  Link ('CANONICAL')
  StopThreshold ('0.01')
  MaxIterNum ('10')
) AS dt;
damage_glm2
predictor estimate std_error z_score p_value significance
(Intercept) -0.76214 0.457738 -1.66502 0.0959097 .
ITERATIONS # 3 0 0 0 Number of Fisher Scoring iterations
ROWS # 22 0 0 0 Number of rows
Residual deviance 27.5216 0 0 0 on 21 degrees of freedom
Pearson goodness of fit 22 0 0 0 on 21 degrees of freedom
AIC 29.5216 0 0 0 Akaike information criterion
BIC 30.6126 0 0 0 Bayesian information criterion
Wald Test 2.77228 0 0 0.0959095 .
Dispersion parameter 1 0 0 0 Taken to be 1 for BINOMIAL and POISSON.

SQL Call

SELECT * FROM LikelihoodRatioTest (
  ON (SELECT * FROM damage_glm1 WHERE attribute = -1) AS model1
    PARTITION BY 1
  ON (SELECT * FROM damage_glm2 WHERE attribute = -1) AS model2
    PARTITION BY 1
  USING
  Statistic ('predictor')
  Loglik ('estimate')
  ObsNum ('std_err')
  ParamNum ('z_score')
) AS dt;

Output

distribution statistic p_value
Chi-squared 1 d.f. 7.2536 0.0070759

The final output compares the two GLM distributions and displays a chi squared statistic and a p-value. The chi square value suggests, based on all predictors, that the data was more likely created by model 1 than by the null model. The result is statistically significant at the 95% confidence level (p-value < 0.05).