This example predicts the admission status of students.
Input
- Input table: admissions_test, which has admissions information for 20 students
- Model: glm_admissions_model, output by "GLM Example: Logistic Regression Analysis with Intercept" in Teradata Vantage™ Machine Learning Engine Analytic Function Reference, B700-4003
Column | Description |
---|---|
id | Student identifier (unique) |
masters | Whether student has a masters degree—yes or no (categorical) |
gpa | Grade point average on a 4.0 scale (numerical) |
stats | Statistical skills—Novice, Beginner, or Advanced (categorical) |
programming | Programming skills—Novice, Beginner, or Advanced (categorical) |
admitted | Whether student was admitted—1 (yes) or 0 (no) |
id | masters | gpa | stats | programming | admitted |
---|---|---|---|---|---|
50 | yes | 3.95 | Beginner | Beginner | 0 |
51 | yes | 3.76 | Beginner | Beginner | 0 |
52 | no | 3.7 | Novice | Beginner | 1 |
53 | yes | 3.5 | Beginner | Novice | 1 |
54 | yes | 3.5 | Beginner | Advanced | 1 |
55 | no | 3.6 | Beginner | Advanced | 1 |
56 | no | 3.82 | Advanced | Advanced | 1 |
57 | no | 3.71 | Advanced | Advanced | 1 |
58 | no | 3.13 | Advanced | Advanced | 1 |
59 | no | 3.65 | Novice | Novice | 1 |
60 | no | 4 | Advanced | Novice | 1 |
61 | yes | 4 | Advanced | Advanced | 1 |
62 | no | 3.7 | Advanced | Advanced | 1 |
63 | no | 3.83 | Advanced | Advanced | 1 |
64 | yes | 3.81 | Advanced | Advanced | 1 |
65 | yes | 3.9 | Advanced | Advanced | 1 |
66 | no | 3.87 | Novice | Beginner | 1 |
67 | yes | 3.46 | Novice | Beginner | 0 |
68 | no | 1.87 | Advanced | Novice | 1 |
69 | no | 3.96 | Advanced | Advanced | 1 |
SQL Call
CREATE MULTISET TABLE glmpredict_admissions AS ( SELECT * FROM GLMPredict ( ON admissions_test PARTITION BY ANY ON glm_admissions_model AS Model DIMENSION USING Accumulate ('id','masters','gpa','stats','programming','admitted') Family ('LOGISTIC') LinkFunction ('LOGIT') ) AS dt ) WITH DATA;
Output
This query returns the following table:
SELECT * FROM glmpredict_admissions ORDER BY 1;
Fitted values can vary in precision, because they depend on the model table output by ML Engine GLM function and fetched to Advanced SQL Engine.
id | masters | gpa | stats | programming | admitted | fitted value |
---|---|---|---|---|---|---|
50 | yes | 3.95000000000000E 000 | Beginner | Beginner | 0 | 3.50763408888030E-001 |
51 | yes | 3.76000000000000E 000 | Beginner | Beginner | 0 | 3.55708978581653E-001 |
52 | no | 3.70000000000000E 000 | Novice | Beginner | 1 | 7.58306140231079E-001 |
53 | yes | 3.50000000000000E 000 | Beginner | Novice | 1 | 5.56012779663342E-001 |
54 | yes | 3.50000000000000E 000 | Beginner | Advanced | 1 | 7.69474352959112E-001 |
55 | no | 3.60000000000000E 000 | Beginner | Advanced | 1 | 9.68031141480050E-001 |
56 | no | 3.82000000000000E 000 | Advanced | Advanced | 1 | 9.45772725968165E-001 |
57 | no | 3.71000000000000E 000 | Advanced | Advanced | 1 | 9.46411914806798E-001 |
58 | no | 3.13000000000000E 000 | Advanced | Advanced | 1 | 9.49666186386367E-001 |
59 | no | 3.65000000000000E 000 | Novice | Novice | 1 | 8.74189685344822E-001 |
60 | no | 4.00000000000000E 000 | Advanced | Novice | 1 | 8.65058992199339E-001 |
61 | yes | 4.00000000000000E 000 | Advanced | Advanced | 1 | 6.50618727191735E-001 |
62 | no | 3.70000000000000E 000 | Advanced | Advanced | 1 | 9.46469669158547E-001 |
63 | no | 3.83000000000000E 000 | Advanced | Advanced | 1 | 9.45714262806374E-001 |
64 | yes | 3.81000000000000E 000 | Advanced | Advanced | 1 | 6.55523357798207E-001 |
65 | yes | 3.90000000000000E 000 | Advanced | Advanced | 1 | 6.53204164468356E-001 |
66 | no | 3.87000000000000E 000 | Novice | Beginner | 1 | 7.54738501228429E-001 |
67 | yes | 3.46000000000000E 000 | Novice | Beginner | 0 | 2.60034297764359E-001 |
68 | no | 1.87000000000000E 000 | Advanced | Novice | 1 | 8.90965518431337E-001 |
69 | no | 3.96000000000000E 000 | Advanced | Advanced | 1 | 9.44948816395031E-001 |
Categorizing fitted_value Column
The fitted_value column gives the probability that a student belongs to one of the output classes. The following figure shows a typical logistic regression graph, mapping the input x-axis against a y probability value between [0,1].
A fitted_value probability greater than or equal to 0.5 implies class 1 (student admitted); a probability less than 0.5 implies class 0 (student rejected).
The following code adds a fitted_category column to glmpredict_admissions and populates it:
ALTER table glmpredict_admissions ADD fitted_category int; UPDATE glmpredict_admissions SET fitted_category = 1 WHERE fitted_value > 0.4999; UPDATE glmpredict_admissions SET fitted_category = 0 WHERE fitted_value < 0.4999;
This query returns the following table:
SELECT * FROM glmpredict_admissions ORDER BY 1;
id | masters | gpa | stats | programming | admitted | fitted_value | fitted_category |
---|---|---|---|---|---|---|---|
50 | yes | 3.95000000000000E 000 | Beginner | Beginner | 0 | 3.50763408888030E-001 | 0 |
51 | yes | 3.76000000000000E 000 | Beginner | Beginner | 0 | 3.55708978581653E-001 | 0 |
52 | no | 3.70000000000000E 000 | Novice | Beginner | 1 | 7.58306140231079E-001 | 1 |
53 | yes | 3.50000000000000E 000 | Beginner | Novice | 1 | 5.56012779663342E-001 | 1 |
54 | yes | 3.50000000000000E 000 | Beginner | Advanced | 1 | 7.69474352959112E-001 | 1 |
55 | no | 3.60000000000000E 000 | Beginner | Advanced | 1 | 9.68031141480050E-001 | 1 |
56 | no | 3.82000000000000E 000 | Advanced | Advanced | 1 | 9.45772725968165E-001 | 1 |
57 | no | 3.71000000000000E 000 | Advanced | Advanced | 1 | 9.46411914806798E-001 | 1 |
58 | no | 3.13000000000000E 000 | Advanced | Advanced | 1 | 9.49666186386367E-001 | 1 |
59 | no | 3.65000000000000E 000 | Novice | Novice | 1 | 8.74189685344822E-001 | 1 |
60 | no | 4.00000000000000E 000 | Advanced | Novice | 1 | 8.65058992199339E-001 | 1 |
61 | yes | 4.00000000000000E 000 | Advanced | Advanced | 1 | 6.50618727191735E-001 | 1 |
62 | no | 3.70000000000000E 000 | Advanced | Advanced | 1 | 9.46469669158547E-001 | 1 |
63 | no | 3.83000000000000E 000 | Advanced | Advanced | 1 | 9.45714262806374E-001 | 1 |
64 | yes | 3.81000000000000E 000 | Advanced | Advanced | 1 | 6.55523357798207E-001 | 1 |
65 | yes | 3.90000000000000E 000 | Advanced | Advanced | 1 | 6.53204164468356E-001 | 1 |
66 | no | 3.87000000000000E 000 | Novice | Beginner | 1 | 7.54738501228429E-001 | 1 |
67 | yes | 3.46000000000000E 000 | Novice | Beginner | 0 | 2.60034297764359E-001 | 0 |
68 | no | 1.87000000000000E 000 | Advanced | Novice | 1 | 8.90965518431337E-001 | 1 |
69 | no | 3.96000000000000E 000 | Advanced | Advanced | 1 | 9.44948816395031E-001 | 1 |
Prediction Accuracy
This query returns the prediction accuracy:
SELECT (SELECT COUNT(id) FROM glmpredict_admissions WHERE admitted = fitted_category)/(SELECT count(id) FROM glmpredict_admissions) AS prediction_accuracy;
prediction_accuracy |
---|
1.00000000000000000000 |