この例では、学生の応募状況を予測します。
入力
- 入力テーブル: admissions_test、学生20名の応募情報を含んでいます。
- モデル: glm_admissions_model、Teradata Vantage™ Machine Learning Engine分析関数リファレンス、B700-4003の「GLM関数例: Interceptによるロジスティック回帰分析」によって出力される
列 | 説明 |
---|---|
id | 学生識別子(固有) |
masters | 学生が修士号を保有するかどうか—yesまたはno(カテゴリ別) |
gpa | 4.0スケールの成績評価点の平均(数値) |
stats | 統計のスキル—Novice、Beginner、またはAdvanced(カテゴリ別) |
programming | プログラミングのスキル—Novice、Beginner、またはAdvanced(カテゴリ別) |
admitted | 学生が合格かどうか-1(yes)または0(no) |
id | masters | gpa | stats | programming | admitted |
---|---|---|---|---|---|
50 | yes | 3.95 | Beginner | Beginner | 0 |
51 | yes | 3.76 | Beginner | Beginner | 0 |
52 | no | 3.7 | Novice | Beginner | 1 |
53 | yes | 3.5 | Beginner | Novice | 1 |
54 | yes | 3.5 | Beginner | Advanced | 1 |
55 | no | 3.6 | Beginner | Advanced | 1 |
56 | no | 3.82 | Advanced | Advanced | 1 |
57 | no | 3.71 | Advanced | Advanced | 1 |
58 | no | 3.13 | Advanced | Advanced | 1 |
59 | no | 3.65 | Novice | Novice | 1 |
60 | no | 4 | Advanced | Novice | 1 |
61 | yes | 4 | Advanced | Advanced | 1 |
62 | no | 3.7 | Advanced | Advanced | 1 |
63 | no | 3.83 | Advanced | Advanced | 1 |
64 | yes | 3.81 | Advanced | Advanced | 1 |
65 | yes | 3.9 | Advanced | Advanced | 1 |
66 | no | 3.87 | Novice | Beginner | 1 |
67 | yes | 3.46 | Novice | Beginner | 0 |
68 | no | 1.87 | Advanced | Novice | 1 |
69 | no | 3.96 | Advanced | Advanced | 1 |
SQL呼び出し
CREATE MULTISET TABLE glmpredict_admissions AS ( SELECT * FROM GLMPredict ( ON admissions_test PARTITION BY ANY ON glm_admissions_model AS Model DIMENSION USING Accumulate ('id','masters','gpa','stats','programming','admitted') Family ('LOGISTIC') LinkFunction ('LOGIT') ) AS dt ) WITH DATA;
出力
このクエリーは、以下のテーブルを返します。
SELECT * FROM glmpredict_admissions ORDER BY 1;
予測値は精度が異なります。その理由は、ML EngineGLM関数で出力されAdvanced SQL Engineにフェッチされるモデル テーブルに依存するからです。
id | masters | gpa | stats | programming | admitted | fitted value |
---|---|---|---|---|---|---|
50 | yes | 3.95000000000000E 000 | Beginner | Beginner | 0 | 3.50763408888030E-001 |
51 | yes | 3.76000000000000E 000 | Beginner | Beginner | 0 | 3.55708978581653E-001 |
52 | no | 3.70000000000000E 000 | Novice | Beginner | 1 | 7.58306140231079E-001 |
53 | yes | 3.50000000000000E 000 | Beginner | Novice | 1 | 5.56012779663342E-001 |
54 | yes | 3.50000000000000E 000 | Beginner | Advanced | 1 | 7.69474352959112E-001 |
55 | no | 3.60000000000000E 000 | Beginner | Advanced | 1 | 9.68031141480050E-001 |
56 | no | 3.82000000000000E 000 | Advanced | Advanced | 1 | 9.45772725968165E-001 |
57 | no | 3.71000000000000E 000 | Advanced | Advanced | 1 | 9.46411914806798E-001 |
58 | no | 3.13000000000000E 000 | Advanced | Advanced | 1 | 9.49666186386367E-001 |
59 | no | 3.65000000000000E 000 | Novice | Novice | 1 | 8.74189685344822E-001 |
60 | no | 4.00000000000000E 000 | Advanced | Novice | 1 | 8.65058992199339E-001 |
61 | yes | 4.00000000000000E 000 | Advanced | Advanced | 1 | 6.50618727191735E-001 |
62 | no | 3.70000000000000E 000 | Advanced | Advanced | 1 | 9.46469669158547E-001 |
63 | no | 3.83000000000000E 000 | Advanced | Advanced | 1 | 9.45714262806374E-001 |
64 | yes | 3.81000000000000E 000 | Advanced | Advanced | 1 | 6.55523357798207E-001 |
65 | yes | 3.90000000000000E 000 | Advanced | Advanced | 1 | 6.53204164468356E-001 |
66 | no | 3.87000000000000E 000 | Novice | Beginner | 1 | 7.54738501228429E-001 |
67 | yes | 3.46000000000000E 000 | Novice | Beginner | 0 | 2.60034297764359E-001 |
68 | no | 1.87000000000000E 000 | Advanced | Novice | 1 | 8.90965518431337E-001 |
69 | no | 3.96000000000000E 000 | Advanced | Advanced | 1 | 9.44948816395031E-001 |
fitted_value列の分類
fitted_value列は、学生が出力クラスの1つに属する確率を示します。次の図はロジスティック回帰グラフを示しており、入力x-軸を[0, 1]間のy確率値に対してマッピングします。
0.5以上のfitted_valueの確率はクラス1(合格した学生)を意味します。0.5未満の確率はクラス0(不合格の学生)を意味します。
次のコードは、glmpredict_admissionsにfitted_category列を追加しそれを挿入します。
ALTER table glmpredict_admissions ADD fitted_category int; UPDATE glmpredict_admissions SET fitted_category = 1 WHERE fitted_value > 0.4999; UPDATE glmpredict_admissions SET fitted_category = 0 WHERE fitted_value < 0.4999;
このクエリーは、以下のテーブルを返します。
SELECT * FROM glmpredict_admissions ORDER BY 1;
id | masters | gpa | stats | programming | admitted | fitted_value | fitted_category |
---|---|---|---|---|---|---|---|
50 | yes | 3.95000000000000E 000 | Beginner | Beginner | 0 | 3.50763408888030E-001 | 0 |
51 | yes | 3.76000000000000E 000 | Beginner | Beginner | 0 | 3.55708978581653E-001 | 0 |
52 | no | 3.70000000000000E 000 | Novice | Beginner | 1 | 7.58306140231079E-001 | 1 |
53 | yes | 3.50000000000000E 000 | Beginner | Novice | 1 | 5.56012779663342E-001 | 1 |
54 | yes | 3.50000000000000E 000 | Beginner | Advanced | 1 | 7.69474352959112E-001 | 1 |
55 | no | 3.60000000000000E 000 | Beginner | Advanced | 1 | 9.68031141480050E-001 | 1 |
56 | no | 3.82000000000000E 000 | Advanced | Advanced | 1 | 9.45772725968165E-001 | 1 |
57 | no | 3.71000000000000E 000 | Advanced | Advanced | 1 | 9.46411914806798E-001 | 1 |
58 | no | 3.13000000000000E 000 | Advanced | Advanced | 1 | 9.49666186386367E-001 | 1 |
59 | no | 3.65000000000000E 000 | Novice | Novice | 1 | 8.74189685344822E-001 | 1 |
60 | no | 4.00000000000000E 000 | Advanced | Novice | 1 | 8.65058992199339E-001 | 1 |
61 | yes | 4.00000000000000E 000 | Advanced | Advanced | 1 | 6.50618727191735E-001 | 1 |
62 | no | 3.70000000000000E 000 | Advanced | Advanced | 1 | 9.46469669158547E-001 | 1 |
63 | no | 3.83000000000000E 000 | Advanced | Advanced | 1 | 9.45714262806374E-001 | 1 |
64 | yes | 3.81000000000000E 000 | Advanced | Advanced | 1 | 6.55523357798207E-001 | 1 |
65 | yes | 3.90000000000000E 000 | Advanced | Advanced | 1 | 6.53204164468356E-001 | 1 |
66 | no | 3.87000000000000E 000 | Novice | Beginner | 1 | 7.54738501228429E-001 | 1 |
67 | yes | 3.46000000000000E 000 | Novice | Beginner | 0 | 2.60034297764359E-001 | 0 |
68 | no | 1.87000000000000E 000 | Advanced | Novice | 1 | 8.90965518431337E-001 | 1 |
69 | no | 3.96000000000000E 000 | Advanced | Advanced | 1 | 9.44948816395031E-001 | 1 |
予測精度
このクエリーは予測精度を返します。
SELECT (SELECT COUNT(id) FROM glmpredict_admissions WHERE admitted = fitted_category)/(SELECT count(id) FROM glmpredict_admissions) AS prediction_accuracy;
prediction_accuracy |
---|
1.00000000000000000000 |