This example does the following:
- Takes raw data and creates training and test sets.
- Inputs the training set to the NaiveBayes function to create a model.
- Inputs the model and test set to the NaiveBayesPredict_MLE function to predict the output for the test set.
- Determines prediction accuracy based on the original and predicted results.
Input
The input table of raw data, svm_iris, contains values for four attributes (sepal_length, sepal_width, petal_length and petal_width), grouped into three categories (setosa, versicolor and virginica).
id | sepal_length | sepal_width | petal_length | petal_width | species |
---|---|---|---|---|---|
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5 | 3.6 | 1.4 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
8 | 5 | 3.4 | 1.5 | 0.2 | setosa |
9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
11 | 5.4 | 3.7 | 1.5 | 0.2 | setosa |
12 | 4.8 | 3.4 | 1.6 | 0.2 | setosa |
13 | 4.8 | 3 | 1.4 | 0.1 | setosa |
14 | 4.3 | 3 | 1.1 | 0.1 | setosa |
... | ... | ... | ... | ... | ... |
This code divides the 150 data rows into a training data set (80%) and a testing data set (20%):
DROP TABLE svm_iris_train; DROP TABLE svm_iris_test; CREATE MULTISET TABLE svm_iris_train AS ( SELECT * FROM svm_iris WHERE id MOD 5 <> 0 ) WITH DATA; CREATE MULTISET TABLE svm_iris_test AS ( SELECT * FROM svm_iris WHERE id MOD 5 = 0 ) WITH DATA;
Alternatively, you can do the preceding task with the Sampling or RandomSample function.
This query returns the following table:
SELECT * FROM svm_iris_train ORDER BY id;
id | sepal_length | sepal_width | petal_length | petal_width | species |
---|---|---|---|---|---|
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
8 | 5 | 3.4 | 1.5 | 0.2 | setosa |
9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
11 | 5.4 | 3.7 | 1.5 | 0.2 | setosa |
12 | 4.8 | 3.4 | 1.6 | 0.2 | setosa |
13 | 4.8 | 3 | 1.4 | 0.1 | setosa |
14 | 4.3 | 3 | 1.1 | 0.1 | setosa |
16 | 5.7 | 4.4 | 1.5 | 0.4 | setosa |
... | ... | ... | ... | ... | ... |
This query returns the following table:
SELECT * FROM svm_iris_test ORDER BY id;
id | sepal_length | sepal_width | petal_length | petal_width | species |
---|---|---|---|---|---|
5 | 5 | 3.6 | 1.4 | 0.2 | setosa |
10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
15 | 5.8 | 4 | 1.2 | 0.2 | setosa |
20 | 5.1 | 3.8 | 1.5 | 0.3 | setosa |
25 | 4.8 | 3.4 | 1.9 | 0.2 | setosa |
30 | 4.7 | 3.2 | 1.6 | 0.2 | setosa |
35 | 4.9 | 3.1 | 1.5 | 0.2 | setosa |
40 | 5.1 | 3.4 | 1.5 | 0.2 | setosa |
45 | 5.1 | 3.8 | 1.9 | 0.4 | setosa |
50 | 5 | 3.3 | 1.4 | 0.2 | setosa |
55 | 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
60 | 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
65 | 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
70 | 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
75 | 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
80 | 5.7 | 2.6 | 3.5 | 1 | versicolor |
85 | 5.4 | 3 | 4.5 | 1.5 | versicolor |
90 | 5.5 | 2.5 | 4 | 1.3 | versicolor |
95 | 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
100 | 5.7 | 2.8 | 4.1 | 1.3 | versicolor |
105 | 6.5 | 3 | 5.8 | 2.2 | virginica |
110 | 7.2 | 3.6 | 6.1 | 2.5 | virginica |
115 | 5.8 | 2.8 | 5.1 | 2.4 | virginica |
120 | 6 | 2.2 | 5 | 1.5 | virginica |
125 | 6.7 | 3.3 | 5.7 | 2.1 | virginica |
130 | 7.2 | 3 | 5.8 | 1.6 | virginica |
135 | 6.1 | 2.6 | 5.6 | 1.4 | virginica |
140 | 6.9 | 3.1 | 5.4 | 2.1 | virginica |
145 | 6.7 | 3.3 | 5.7 | 2.5 | virginica |
150 | 5.9 | 3 | 5.1 | 1.8 | virginica |
SQL Call
SELECT * FROM NaiveBayes ( ON svm_iris_train AS InputTable OUT TABLE OutputTable (nb_iris_model) USING ResponseColumn ('species') NumericInputs ('[1:4]') ) AS dt;
Output
message ----------------------------------------------------------------------- Successfully built naive bayes model in table specified by OutputTable.
This query returns the following table:
SELECT * FROM nb_iris_model ORDER BY class_nb, cnt ASC;
class_nb variable_nb type_nb category cnt sum_nb sum_sq total_cnt ---------- ------------ ------- -------- --- ------------------ ------------------ --------- setosa petal_length NUMERIC NULL 40 57.69999999999999 84.21000000000001 40 setosa sepal_length NUMERIC NULL 40 199.90000000000003 1004.2700000000001 40 setosa petal_width NUMERIC NULL 40 10.100000000000001 3.030000000000001 40 setosa sepal_width NUMERIC NULL 40 136.7 473.29 40 versicolor sepal_width NUMERIC NULL 40 111.1 313.13 40 versicolor petal_width NUMERIC NULL 40 53.3 72.71000000000001 40 versicolor sepal_length NUMERIC NULL 40 239.6 1446.14 40 versicolor petal_length NUMERIC NULL 40 172.40000000000003 752.22 40 virginica petal_width NUMERIC NULL 40 81.19999999999999 166.99999999999997 40 virginica sepal_width NUMERIC NULL 40 118.80000000000001 356.5400000000001 40 virginica petal_length NUMERIC NULL 40 222.3 1249.15 40 virginica sepal_length NUMERIC NULL 40 264.40000000000003 1764.92 40
Download a zip file of all examples and a SQL script file that creates their input tables.