This example uses the "iris" data set (nb_input_iris). The data has values for four attributes (sepal_length, sepal_width, petal_length and petal_width), which are grouped into three categories (setosa, versicolor and virginica). From the raw input data, a training set and a test set are created. The functions NaiveBayesMap and NaiveBayesReduce use the training set to generate the model. The NaiveBayesPredict function uses that model and predicts the output for a test set. Finally, SQL code determines prediction accuracy based on the original and predicted results.
id | sepal_length | sepal_width | petal_length | petal_width | species |
---|---|---|---|---|---|
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5 | 3.6 | 1.4 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
8 | 5 | 3.4 | 1.5 | 0.2 | setosa |
9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
11 | 5.4 | 3.7 | 1.5 | 0.2 | setosa |
12 | 4.8 | 3.4 | 1.6 | 0.2 | setosa |
13 | 4.8 | 3 | 1.4 | 0.1 | setosa |
14 | 4.3 | 3 | 1.1 | 0.1 | setosa |
... | ... | ... | ... | ... | ... |