GLMPredict Function Example | Teradata Vantage - 17.05 - GLMPredict Example: Logistic Distribution Prediction - Teradata Database

Teradata Vantage™ - Advanced SQL Engine Analytic Functions

prodname
Advanced SQL Engine
Teradata Database
vrm_release
17.00
17.05
created_date
June 2020
category
Programming Reference
featnum
B035-1206-170K

This example predicts the admission status of students.

Input

  • Input table: admissions_test, which has admissions information for 20 students
  • Model: glm_admissions_model, output by "GLM Example: Logistic Regression Analysis with Intercept" in Teradata Vantage™ Machine Learning Engine Analytic Function Reference, B700-4003
Input Table Column Descriptions
Column Description
id Student identifier (unique)
masters Whether student has a masters degree—yes or no (categorical)
gpa Grade point average on a 4.0 scale (numerical)
stats Statistical skills—Novice, Beginner, or Advanced (categorical)
programming Programming skills—Novice, Beginner, or Advanced (categorical)
admitted Whether student was admitted—1 (yes) or 0 (no)
admissions_test
id masters gpa stats programming admitted
50 yes 3.95 Beginner Beginner 0
51 yes 3.76 Beginner Beginner 0
52 no 3.7 Novice Beginner 1
53 yes 3.5 Beginner Novice 1
54 yes 3.5 Beginner Advanced 1
55 no 3.6 Beginner Advanced 1
56 no 3.82 Advanced Advanced 1
57 no 3.71 Advanced Advanced 1
58 no 3.13 Advanced Advanced 1
59 no 3.65 Novice Novice 1
60 no 4 Advanced Novice 1
61 yes 4 Advanced Advanced 1
62 no 3.7 Advanced Advanced 1
63 no 3.83 Advanced Advanced 1
64 yes 3.81 Advanced Advanced 1
65 yes 3.9 Advanced Advanced 1
66 no 3.87 Novice Beginner 1
67 yes 3.46 Novice Beginner 0
68 no 1.87 Advanced Novice 1
69 no 3.96 Advanced Advanced 1

SQL Call

CREATE MULTISET TABLE glmpredict_admissions AS (
  SELECT * FROM GLMPredict (
    ON admissions_test PARTITION BY ANY
    ON glm_admissions_model AS Model DIMENSION
    USING
    Accumulate ('id','masters','gpa','stats','programming','admitted')
    Family ('LOGISTIC')
    LinkFunction ('LOGIT')
  ) AS dt
) WITH DATA;

Output

This query returns the following table:

SELECT * FROM glmpredict_admissions ORDER BY 1;

Fitted values can vary in precision, because they depend on the model table output by ML Engine GLM function and fetched to Advanced SQL Engine.

glmpredict_admissions
id masters gpa stats programming admitted fitted value
50 yes 3.95000000000000E 000 Beginner Beginner 0 3.50763408888030E-001
51 yes 3.76000000000000E 000 Beginner Beginner 0 3.55708978581653E-001
52 no 3.70000000000000E 000 Novice Beginner 1 7.58306140231079E-001
53 yes 3.50000000000000E 000 Beginner Novice 1 5.56012779663342E-001
54 yes 3.50000000000000E 000 Beginner Advanced 1 7.69474352959112E-001
55 no 3.60000000000000E 000 Beginner Advanced 1 9.68031141480050E-001
56 no 3.82000000000000E 000 Advanced Advanced 1 9.45772725968165E-001
57 no 3.71000000000000E 000 Advanced Advanced 1 9.46411914806798E-001
58 no 3.13000000000000E 000 Advanced Advanced 1 9.49666186386367E-001
59 no 3.65000000000000E 000 Novice Novice 1 8.74189685344822E-001
60 no 4.00000000000000E 000 Advanced Novice 1 8.65058992199339E-001
61 yes 4.00000000000000E 000 Advanced Advanced 1 6.50618727191735E-001
62 no 3.70000000000000E 000 Advanced Advanced 1 9.46469669158547E-001
63 no 3.83000000000000E 000 Advanced Advanced 1 9.45714262806374E-001
64 yes 3.81000000000000E 000 Advanced Advanced 1 6.55523357798207E-001
65 yes 3.90000000000000E 000 Advanced Advanced 1 6.53204164468356E-001
66 no 3.87000000000000E 000 Novice Beginner 1 7.54738501228429E-001
67 yes 3.46000000000000E 000 Novice Beginner 0 2.60034297764359E-001
68 no 1.87000000000000E 000 Advanced Novice 1 8.90965518431337E-001
69 no 3.96000000000000E 000 Advanced Advanced 1 9.44948816395031E-001

Categorizing fitted_value Column

The fitted_value column gives the probability that a student belongs to one of the output classes. The following figure shows a typical logistic regression graph, mapping the input x-axis against a y probability value between [0,1].



A fitted_value probability greater than or equal to 0.5 implies class 1 (student admitted); a probability less than 0.5 implies class 0 (student rejected).

The following code adds a fitted_category column to glmpredict_admissions and populates it:

ALTER table glmpredict_admissions
  ADD fitted_category int;
  UPDATE glmpredict_admissions SET fitted_category = 1
    WHERE fitted_value > 0.4999;
  UPDATE glmpredict_admissions SET fitted_category = 0
    WHERE fitted_value < 0.4999;

This query returns the following table:

SELECT * FROM glmpredict_admissions ORDER BY 1;
id masters gpa stats programming admitted fitted_value fitted_category
50 yes 3.95000000000000E 000 Beginner Beginner 0 3.50763408888030E-001 0
51 yes 3.76000000000000E 000 Beginner Beginner 0 3.55708978581653E-001 0
52 no 3.70000000000000E 000 Novice Beginner 1 7.58306140231079E-001 1
53 yes 3.50000000000000E 000 Beginner Novice 1 5.56012779663342E-001 1
54 yes 3.50000000000000E 000 Beginner Advanced 1 7.69474352959112E-001 1
55 no 3.60000000000000E 000 Beginner Advanced 1 9.68031141480050E-001 1
56 no 3.82000000000000E 000 Advanced Advanced 1 9.45772725968165E-001 1
57 no 3.71000000000000E 000 Advanced Advanced 1 9.46411914806798E-001 1
58 no 3.13000000000000E 000 Advanced Advanced 1 9.49666186386367E-001 1
59 no 3.65000000000000E 000 Novice Novice 1 8.74189685344822E-001 1
60 no 4.00000000000000E 000 Advanced Novice 1 8.65058992199339E-001 1
61 yes 4.00000000000000E 000 Advanced Advanced 1 6.50618727191735E-001 1
62 no 3.70000000000000E 000 Advanced Advanced 1 9.46469669158547E-001 1
63 no 3.83000000000000E 000 Advanced Advanced 1 9.45714262806374E-001 1
64 yes 3.81000000000000E 000 Advanced Advanced 1 6.55523357798207E-001 1
65 yes 3.90000000000000E 000 Advanced Advanced 1 6.53204164468356E-001 1
66 no 3.87000000000000E 000 Novice Beginner 1 7.54738501228429E-001 1
67 yes 3.46000000000000E 000 Novice Beginner 0 2.60034297764359E-001 0
68 no 1.87000000000000E 000 Advanced Novice 1 8.90965518431337E-001 1
69 no 3.96000000000000E 000 Advanced Advanced 1 9.44948816395031E-001 1

Prediction Accuracy

This query returns the prediction accuracy:

SELECT (SELECT COUNT(id) FROM glmpredict_admissions
  WHERE admitted = fitted_category)/(SELECT count(id)
    FROM glmpredict_admissions) AS prediction_accuracy;
prediction_accuracy
1.00000000000000000000