1.0 - 8.00 - GLMPredict Example 1: Logistic Distribution Prediction - Teradata Vantage

Teradata® Vantage Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.0
8.00
Release Date
May 2019
Content Type
Programming Reference
Publication ID
B700-4003-098K
Language
English (United States)

Input

admissions_test
id masters gpa stats programming admitted
50 yes 3.95 Beginner Beginner 0
51 yes 3.76 Beginner Beginner 0
52 no 3.7 Novice Beginner 1
53 yes 3.5 Beginner Novice 1
54 yes 3.5 Beginner Advanced 1
55 no 3.6 Beginner Advanced 1
56 no 3.82 Advanced Advanced 1
57 no 3.71 Advanced Advanced 1
58 no 3.13 Advanced Advanced 1
59 no 3.65 Novice Novice 1
60 no 4 Advanced Novice 1
61 yes 4 Advanced Advanced 1
62 no 3.7 Advanced Advanced 1
63 no 3.83 Advanced Advanced 1
64 yes 3.81 Advanced Advanced 1
65 yes 3.9 Advanced Advanced 1
66 no 3.87 Novice Beginner 1
67 yes 3.46 Novice Beginner 0
68 no 1.87 Advanced Novice 1
69 no 3.96 Advanced Advanced 1

SQL Call

SELECT * FROM GLMPredict@coprocessor (
  ON admissions_test PARTITION BY ANY
  ON glm_admissions_model AS model DIMENSION
  USING
  Accumulate ('id', 'masters', 'gpa', 'stats', 'programming', 'admitted')
  Family ('LOGISTIC')
  LinkFunction ('LOGIT')
) AS dt;

Output

This query returns the following table:

SELECT * FROM glmpredict_admissions ORDER BY 1;
glmpredict_admissions
id masters gpa stats programming admitted fitted_value
50 yes 3.95 Beginner Beginner 0 0.35076568493861
51 yes 3.76 Beginner Beginner 0 0.355711270634992
52 no 3.7 Novice Beginner 1 0.758307972010008
53 yes 3.5 Beginner Novice 1 0.556015248287644
54 yes 3.5 Beginner Advanced 1 0.769476126790063
55 no 3.6 Beginner Advanced 1 0.968031451283364
56 no 3.82 Advanced Advanced 1 0.945773239222729
57 no 3.71 Advanced Advanced 1 0.946412421748131
58 no 3.13 Advanced Advanced 1 0.949666663764171
59 no 3.65 Novice Novice 1 0.874190783966478
60 no 4 Advanced Novice 1 0.8650601595144
61 yes 4 Advanced Advanced 1 0.650621000328301
62 no 3.7 Advanced Advanced 1 0.946470175529378
63 no 3.83 Advanced Advanced 1 0.945714776638306
64 yes 3.81 Advanced Advanced 1 0.655525617391707
65 yes 3.9 Advanced Advanced 1 0.653206427288345
66 no 3.87 Novice Beginner 1 0.754740354720256
67 yes 3.46 Novice Beginner 0 0.260036221097295
68 no 1.87 Advanced Novice 1 0.890966489834395
69 no 3.96 Advanced Advanced 1 0.94494933637213

Categorizing fitted_value Column

The fitted_value column gives the probability that a student belongs to one of the output classes.

The fitted_value probability of > = 0.5 implies class 1 (student admitted) and a probability of < 0.5 implies class 0 (student rejected).

The following code adds a fitted_category column to glmpredict_admissions and populates it:

ALTER table glmpredict_admissions
  ADD fitted_category int;
  UPDATE glmpredict_admissions SET fitted_category = 1
    WHERE fitted_value > 0.4999;
  UPDATE glmpredict_admissions SET fitted_category = 0
    WHERE fitted_value < 0.4999;

This query returns the following table:

SELECT * FROM glmpredict_admissions ORDER BY 1;
glmpredict_admissions
id masters gpa stats programming admitted fitted_value fitted_category
50 yes 3.95 Beginner Beginner 0 0.35076568493861 0
51 yes 3.76 Beginner Beginner 0 0.355711270634992 0
52 no 3.7 Novice Beginner 1 0.758307972010008 1
53 yes 3.5 Beginner Novice 1 0.556015248287644 1
54 yes 3.5 Beginner Advanced 1 0.769476126790063 1
55 no 3.6 Beginner Advanced 1 0.968031451283364 1
56 no 3.82 Advanced Advanced 1 0.945773239222729 1
57 no 3.71 Advanced Advanced 1 0.946412421748131 1
58 no 3.13 Advanced Advanced 1 0.949666663764171 1
59 no 3.65 Novice Novice 1 0.874190783966478 1
60 no 4 Advanced Novice 1 0.8650601595144 1
61 yes 4 Advanced Advanced 1 0.650621000328301 1
62 no 3.7 Advanced Advanced 1 0.946470175529378 1
63 no 3.83 Advanced Advanced 1 0.945714776638306 1
64 yes 3.81 Advanced Advanced 1 0.655525617391707 1
65 yes 3.9 Advanced Advanced 1 0.653206427288345 1
66 no 3.87 Novice Beginner 1 0.754740354720256 1
67 yes 3.46 Novice Beginner 0 0.260036221097295 0
68 no 1.87 Advanced Novice 1 0.890966489834395 1
69 no 3.96 Advanced Advanced 1 0.94494933637213 1

Prediction Accuracy

This query returns the following table:

SELECT (SELECT COUNT(id) FROM glmpredict_admissions
  WHERE admitted = fitted_category)/(SELECT count(id)
    FROM glmpredict_admissions) AS prediction_accuracy;
prediction_accuracy
1.00000000000000000000