Naive Bayes Classifier Example - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢
This example does the following:
  1. Takes raw data and creates training and test sets.
  2. Inputs the training set to the Naive Bayes Classifier function to create a model.
  3. Inputs the model and test set to the NaiveBayesPredict function to predict the output for the test set.
  4. Determines prediction accuracy based on the original and predicted results.

Input

The input table of raw data, nb_input_iris, contains values for four attributes (sepal_length, sepal_width, petal_length and petal_width), grouped into three categories (setosa, versicolor and virginica).

Raw Input: nb_input_iris
id sepal_length sepal_width petal_length petal_width species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
11 5.4 3.7 1.5 0.2 setosa
12 4.8 3.4 1.6 0.2 setosa
13 4.8 3 1.4 0.1 setosa
14 4.3 3 1.1 0.1 setosa
... ... ... ... ... ...

This code divides the 150 data rows into a training data set (80%) and a testing data set (20%):

DROP TABLE nb_iris_input_train;
DROP TABLE nb_iris_input_test;

CREATE MULTISET TABLE nb_iris_input_train AS (
  SELECT * FROM nb_input_iris WHERE id MOD 5 <> 0
) WITH DATA;

CREATE MULTISET TABLE nb_iris_input_test AS (
  SELECT * FROM nb_input_iris WHERE id MOD 5 = 0
) WITH DATA;
Alternatively, you can do the preceding task with the Sampling or RandomSample function.

This query returns the following table:

SELECT * FROM nb_iris_input_train ORDER BY id;
nb_iris_input_train
id sepal_length sepal_width petal_length petal_width species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
11 5.4 3.7 1.5 0.2 setosa
12 4.8 3.4 1.6 0.2 setosa
13 4.8 3 1.4 0.1 setosa
14 4.3 3 1.1 0.1 setosa
16 5.7 4.4 1.5 0.4 setosa
... ... ... ... ... ...

This query returns the following table:

SELECT * FROM nb_iris_input_test ORDER BY id;
nb_iris_input_test
id sepal_length sepal_width petal_length petal_width species
5 5 3.6 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
15 5.8 4 1.2 0.2 setosa
20 5.1 3.8 1.5 0.3 setosa
25 4.8 3.4 1.9 0.2 setosa
30 4.7 3.2 1.6 0.2 setosa
35 4.9 3.1 1.5 0.2 setosa
40 5.1 3.4 1.5 0.2 setosa
45 5.1 3.8 1.9 0.4 setosa
50 5 3.3 1.4 0.2 setosa
55 6.5 2.8 4.6 1.5 versicolor
60 5.2 2.7 3.9 1.4 versicolor
65 5.6 2.9 3.6 1.3 versicolor
70 5.6 2.5 3.9 1.1 versicolor
75 6.4 2.9 4.3 1.3 versicolor
80 5.7 2.6 3.5 1 versicolor
85 5.4 3 4.5 1.5 versicolor
90 5.5 2.5 4 1.3 versicolor
95 5.6 2.7 4.2 1.3 versicolor
100 5.7 2.8 4.1 1.3 versicolor
105 6.5 3 5.8 2.2 virginica
110 7.2 3.6 6.1 2.5 virginica
115 5.8 2.8 5.1 2.4 virginica
120 6 2.2 5 1.5 virginica
125 6.7 3.3 5.7 2.1 virginica
130 7.2 3 5.8 1.6 virginica
135 6.1 2.6 5.6 1.4 virginica
140 6.9 3.1 5.4 2.1 virginica
145 6.7 3.3 5.7 2.5 virginica
150 5.9 3 5.1 1.8 virginica

SQL Call

DROP TABLE nb_iris_model;

CREATE MULTISET TABLE nb_iris_model AS (
  SELECT * FROM NaiveBayesReduce (
    ON NaiveBayesMap (
      ON nb_iris_input_train
      USING
      ResponseColumn ('species')
      NumericInputs ('[1:4]')
    ) PARTITION BY class_nb
  ) AS dt 
) WITH DATA;

Output

This query returns the following table:

SELECT * FROM nb_iris_model ORDER BY 1;
nb_iris_model
class_nb variable_nb type_nb category cnt sum_nb sum_sq total_cnt
setosa sepal_width NUMERIC   40 136.700000524521 473.290003499985 40
setosa petal_width NUMERIC   40 10.1000002026558 3.03000012755394 40
setosa sepal_length NUMERIC   40 199.900000095367 1004.27000005722 40
setosa petal_length NUMERIC   40 57.6999998092651 84.2099996709824 40
versicolor sepal_width NUMERIC   40 111.10000038147 313.130002088547 40
versicolor petal_width NUMERIC   40 53.299999833107   40
versicolor sepal_length NUMERIC   40 239.599999427795 1446.13999296188 40
versicolor petal_length NUMERIC   40 172.399999141693 752.219992570878 40
virginica sepal_width NUMERIC   40 118.799999952316 356.539999780655 40
virginica petal_width NUMERIC   40 81.1999989748001 166.999995970726 40
virginica sepal_length NUMERIC   40 264.400000572205 1764.92000530243 40
virginica petal_length NUMERIC   40 222.299999713898 1249.1499958992 40