In logistic regression, the dependent variable (Y) has only two possible values (0 and 1, 'yes' and 'no', or 'true' and 'false'). The algorithm applies the model to the data and predicts the most likely outcome.
Input
The InputTable, admissions_train, contains data about applicants to an academic program. For each applicant, attributes in the table include a Masters Degree indicator, a grade point average (on a 4.0 scale), a statistical skills indicator, a programming skills indicator, and an indicator of whether the applicant was admitted. The Masters Degree, statistical skills, and programming skills indicators are categorical variables. Masters degree has two categories (yes or no), while the other two have three categories (Novice, Beginner and Advanced). For admitted status, 1 indicates that the student was admitted and 0 indicates otherwise.
id | masters | gpa | stats | programming | admitted |
---|---|---|---|---|---|
1 | yes | 3.95 | Beginner | Beginner | 0 |
2 | yes | 3.76 | Beginner | Beginner | 0 |
3 | no | 3.7 | Novice | Beginner | 1 |
4 | yes | 3.5 | Beginner | Novice | 1 |
5 | no | 3.44 | Novice | Novice | 0 |
6 | yes | 3.5 | Beginner | Advanced | 1 |
7 | yes | 2.33 | Novice | Novice | 1 |
8 | no | 3.6 | Beginner | Advanced | 1 |
9 | no | 3.82 | Advanced | Advanced | 1 |
10 | no | 3.71 | Advanced | Advanced | 1 |
11 | no | 3.13 | Advanced | Advanced | 1 |
12 | no | 3.65 | Novice | Novice | 1 |
13 | no | 4 | Advanced | Novice | 1 |
14 | yes | 3.45 | Advanced | Advanced | 0 |
15 | yes | 4 | Advanced | Advanced | 1 |
16 | no | 3.7 | Advanced | Advanced | 1 |
17 | no | 3.83 | Advanced | Advanced | 1 |
18 | yes | 3.81 | Advanced | Advanced | 1 |
19 | yes | 1.98 | Advanced | Advanced | 0 |
20 | yes | 3.9 | Advanced | Advanced | 1 |
21 | no | 3.87 | Novice | Beginner | 1 |
22 | yes | 3.46 | Novice | Beginner | 0 |
23 | yes | 3.59 | Advanced | Novice | 1 |
24 | no | 1.87 | Advanced | Novice | 1 |
25 | no | 3.96 | Advanced | Advanced | 1 |
26 | yes | 3.57 | Advanced | Advanced | 1 |
27 | yes | 3.96 | Advanced | Advanced | 0 |
28 | no | 3.93 | Advanced | Advanced | 1 |
29 | yes | 4 | Novice | Beginner | 0 |
30 | yes | 3.79 | Advanced | Novice | 0 |
31 | yes | 3.5 | Advanced | Beginner | 1 |
32 | yes | 3.46 | Advanced | Beginner | 0 |
33 | no | 3.55 | Novice | Novice | 1 |
34 | yes | 3.85 | Advanced | Beginner | 0 |
35 | no | 3.68 | Novice | Beginner | 1 |
36 | no | 3 | Advanced | Novice | 0 |
37 | no | 3.52 | Novice | Novice | 1 |
38 | yes | 2.65 | Advanced | Beginner | 1 |
39 | yes | 3.75 | Advanced | Beginner | 0 |
40 | yes | 3.95 | Novice | Beginner | 0 |
SQL Call
The response variable (admitted, in this example) must be specified as the first variable listed in the TargetColumns syntax element, followed by the other predictors.
DROP TABLE glm_admissions_model; SELECT * FROM GLM ( ON admissions_train AS InputTable OUT TABLE OutputTable (glm_admissions_model) USING TargetColumns ('admitted','masters', 'gpa', 'stats', 'programming') CategoricalColumns ('masters', 'stats', 'programming') Family ('LOGISTIC') LinkFunction ('LOGIT') WeightColumn ('1') StopThreshold (0.01) MaxIterNum (25) Intercept ('true') ) AS dt;
Output
The output table shows the model statistics.
predictor estimate std_error z_score p_value significance ----------------------- -------------------- ------------------ -------------------- -------------------- --------------------------------------- (Intercept) 1.0775099992752075 2.920759916305542 0.36891400814056396 0.7121919989585876 masters.no 2.21655011177063 1.0199899673461914 2.173110008239746 0.029771899804472923 * gpa -0.11393500119447708 0.802573025226593 -0.14196200668811798 0.8871099948883057 stats.novice 0.04068480059504509 1.1156699657440186 0.036466699093580246 0.9709100127220154 stats.beginner 0.5266180038452148 1.2229000329971313 0.43063101172447205 0.6667360067367554 programming.beginner -1.769760012626648 1.069000005722046 -1.6555299758911133 0.09781769663095474 . programming.novice -0.9803500175476074 1.1400400400161743 -0.8599230051040649 0.389831006526947 ITERATIONS # 4.0 0.0 0.0 0.0 Number of Fisher Scoring iterations ROWS # 40.0 0.0 0.0 0.0 Number of rows Residual deviance 38.90380096435547 0.0 0.0 0.0 on 33 degrees of freedom Pearson goodness of fit 37.79050064086914 0.0 0.0 0.0 on 33 degrees of freedom AIC 52.90380096435547 0.0 0.0 0.0 Akaike information criterion BIC 64.72595977783203 0.0 0.0 0.0 Bayesian information criterion Wald Test 9.896419525146484 0.0 0.0 0.19451963901519775 Dispersion parameter 1.0 0.0 0.0 0.0 Taken to be 1 for BINOMIAL and POISSON.
For categorical variables, the model selects a reference category. This example uses the Advanced category as a reference for the stats variable.
This query returns the following table:
SELECT * FROM glm_admissions_model;
attribute predictor category estimate std_err z_score p_value significance family --------- ----------- -------- -------------------- ------------------ -------------------- -------------------- ------------ -------- -1 Loglik NULL -19.451900482177734 40.0 6.0 0.0 NULL LOGISTIC 0 (Intercept) NULL 1.0775099992752075 2.920759916305542 0.36891400814056396 0.7121919989585876 LOGISTIC 1 masters yes NULL NULL NULL NULL NULL LOGISTIC 2 masters no 2.21655011177063 1.0199899673461914 2.173110008239746 0.029771899804472923 * LOGISTIC 3 gpa NULL -0.11393500119447708 0.802573025226593 -0.14196200668811798 0.8871099948883057 LOGISTIC 4 stats advanced NULL NULL NULL NULL NULL LOGISTIC 5 stats novice 0.04068480059504509 1.1156699657440186 0.036466699093580246 0.9709100127220154 LOGISTIC 6 stats beginner 0.5266180038452148 1.2229000329971313 0.43063101172447205 0.6667360067367554 LOGISTIC 7 programming advanced NULL NULL NULL NULL NULL LOGISTIC 8 programming beginner -1.769760012626648 1.069000005722046 -1.6555299758911133 0.09781769663095474 . LOGISTIC 9 programming novice -0.9803500175476074 1.1400400400161743 -0.8599230051040649 0.389831006526947 LOGISTIC
Download a zip file of all examples and a SQL script file that creates their input tables from the attachment in the left sidebar.