This example does the following:
- Takes raw data and creates training and test sets.
- Inputs the training set to the Naive Bayes Classifier function to create a model.
- Inputs the model and test set to the NaiveBayesPredict function to predict the output for the test set.
- Determines prediction accuracy based on the original and predicted results.
Input
The input table of raw data, nb_input_iris, contains values for four attributes (sepal_length, sepal_width, petal_length and petal_width), grouped into three categories (setosa, versicolor and virginica).
id | sepal_length | sepal_width | petal_length | petal_width | species |
---|---|---|---|---|---|
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5 | 3.6 | 1.4 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
8 | 5 | 3.4 | 1.5 | 0.2 | setosa |
9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
11 | 5.4 | 3.7 | 1.5 | 0.2 | setosa |
12 | 4.8 | 3.4 | 1.6 | 0.2 | setosa |
13 | 4.8 | 3 | 1.4 | 0.1 | setosa |
14 | 4.3 | 3 | 1.1 | 0.1 | setosa |
... | ... | ... | ... | ... | ... |
This code divides the 150 data rows into a training data set (80%) and a testing data set (20%):
DROP TABLE nb_iris_input_train; DROP TABLE nb_iris_input_test; CREATE MULTISET TABLE nb_iris_input_train AS ( SELECT * FROM nb_input_iris WHERE id MOD 5 <> 0 ) WITH DATA; CREATE MULTISET TABLE nb_iris_input_test AS ( SELECT * FROM nb_input_iris WHERE id MOD 5 = 0 ) WITH DATA;
Alternatively, you can do the preceding task with the Sampling or RandomSample function.
This query returns the following table:
SELECT * FROM nb_iris_input_train ORDER BY id;
id | sepal_length | sepal_width | petal_length | petal_width | species |
---|---|---|---|---|---|
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
8 | 5 | 3.4 | 1.5 | 0.2 | setosa |
9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
11 | 5.4 | 3.7 | 1.5 | 0.2 | setosa |
12 | 4.8 | 3.4 | 1.6 | 0.2 | setosa |
13 | 4.8 | 3 | 1.4 | 0.1 | setosa |
14 | 4.3 | 3 | 1.1 | 0.1 | setosa |
16 | 5.7 | 4.4 | 1.5 | 0.4 | setosa |
... | ... | ... | ... | ... | ... |
This query returns the following table:
SELECT * FROM nb_iris_input_test ORDER BY id;
id | sepal_length | sepal_width | petal_length | petal_width | species |
---|---|---|---|---|---|
5 | 5 | 3.6 | 1.4 | 0.2 | setosa |
10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
15 | 5.8 | 4 | 1.2 | 0.2 | setosa |
20 | 5.1 | 3.8 | 1.5 | 0.3 | setosa |
25 | 4.8 | 3.4 | 1.9 | 0.2 | setosa |
30 | 4.7 | 3.2 | 1.6 | 0.2 | setosa |
35 | 4.9 | 3.1 | 1.5 | 0.2 | setosa |
40 | 5.1 | 3.4 | 1.5 | 0.2 | setosa |
45 | 5.1 | 3.8 | 1.9 | 0.4 | setosa |
50 | 5 | 3.3 | 1.4 | 0.2 | setosa |
55 | 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
60 | 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
65 | 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
70 | 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
75 | 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
80 | 5.7 | 2.6 | 3.5 | 1 | versicolor |
85 | 5.4 | 3 | 4.5 | 1.5 | versicolor |
90 | 5.5 | 2.5 | 4 | 1.3 | versicolor |
95 | 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
100 | 5.7 | 2.8 | 4.1 | 1.3 | versicolor |
105 | 6.5 | 3 | 5.8 | 2.2 | virginica |
110 | 7.2 | 3.6 | 6.1 | 2.5 | virginica |
115 | 5.8 | 2.8 | 5.1 | 2.4 | virginica |
120 | 6 | 2.2 | 5 | 1.5 | virginica |
125 | 6.7 | 3.3 | 5.7 | 2.1 | virginica |
130 | 7.2 | 3 | 5.8 | 1.6 | virginica |
135 | 6.1 | 2.6 | 5.6 | 1.4 | virginica |
140 | 6.9 | 3.1 | 5.4 | 2.1 | virginica |
145 | 6.7 | 3.3 | 5.7 | 2.5 | virginica |
150 | 5.9 | 3 | 5.1 | 1.8 | virginica |
SQL Call
DROP TABLE nb_iris_model; CREATE MULTISET TABLE nb_iris_model AS ( SELECT * FROM NaiveBayesReduce ( ON NaiveBayesMap ( ON nb_iris_input_train USING ResponseColumn ('species') NumericInputs ('[1:4]') ) PARTITION BY class_nb ) AS dt ) WITH DATA;
Output
This query returns the following table:
SELECT * FROM nb_iris_model ORDER BY 1;
class_nb | variable_nb | type_nb | category | cnt | sum_nb | sum_sq | total_cnt |
---|---|---|---|---|---|---|---|
setosa | sepal_width | NUMERIC | 40 | 136.700000524521 | 473.290003499985 | 40 | |
setosa | petal_width | NUMERIC | 40 | 10.1000002026558 | 3.03000012755394 | 40 | |
setosa | sepal_length | NUMERIC | 40 | 199.900000095367 | 1004.27000005722 | 40 | |
setosa | petal_length | NUMERIC | 40 | 57.6999998092651 | 84.2099996709824 | 40 | |
versicolor | sepal_width | NUMERIC | 40 | 111.10000038147 | 313.130002088547 | 40 | |
versicolor | petal_width | NUMERIC | 40 | 53.299999833107 | 40 | ||
versicolor | sepal_length | NUMERIC | 40 | 239.599999427795 | 1446.13999296188 | 40 | |
versicolor | petal_length | NUMERIC | 40 | 172.399999141693 | 752.219992570878 | 40 | |
virginica | sepal_width | NUMERIC | 40 | 118.799999952316 | 356.539999780655 | 40 | |
virginica | petal_width | NUMERIC | 40 | 81.1999989748001 | 166.999995970726 | 40 | |
virginica | sepal_length | NUMERIC | 40 | 264.400000572205 | 1764.92000530243 | 40 | |
virginica | petal_length | NUMERIC | 40 | 222.299999713898 | 1249.1499958992 | 40 |