1.1 - 8.10 - Naive Bayes Classifier Example - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.1
8.10
Release Date
October 2019
Content Type
Programming Reference
Publication ID
B700-4003-079K
Language
English (United States)
This example does the following:
  1. Takes raw data and creates training and test sets.
  2. Inputs the training set to the Naive Bayes Classifier function to create a model.
  3. Inputs the model and test set to the NaiveBayesPredict_MLE function to predict the output for the test set.
  4. Determines prediction accuracy based on the original and predicted results.

Input

The input table of raw data, nb_input_iris, contains values for four attributes (sepal_length, sepal_width, petal_length and petal_width), grouped into three categories (setosa, versicolor and virginica).

Raw Input: nb_input_iris
id sepal_length sepal_width petal_length petal_width species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
11 5.4 3.7 1.5 0.2 setosa
12 4.8 3.4 1.6 0.2 setosa
13 4.8 3 1.4 0.1 setosa
14 4.3 3 1.1 0.1 setosa
... ... ... ... ... ...

This code divides the 150 data rows into a training data set (80%) and a testing data set (20%):

DROP TABLE nb_iris_input_train;
DROP TABLE nb_iris_input_test;

CREATE MULTISET TABLE nb_iris_input_train AS (
  SELECT * FROM nb_input_iris WHERE id MOD 5 <> 0
) WITH DATA;

CREATE MULTISET TABLE nb_iris_input_test AS (
  SELECT * FROM nb_input_iris WHERE id MOD 5 = 0
) WITH DATA;

Alternatively, you can do the preceding task with the Sampling or RandomSample function.

This query returns the following table:

SELECT * FROM nb_iris_input_train ORDER BY id;
nb_iris_input_train
id sepal_length sepal_width petal_length petal_width species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
11 5.4 3.7 1.5 0.2 setosa
12 4.8 3.4 1.6 0.2 setosa
13 4.8 3 1.4 0.1 setosa
14 4.3 3 1.1 0.1 setosa
16 5.7 4.4 1.5 0.4 setosa
... ... ... ... ... ...

This query returns the following table:

SELECT * FROM nb_iris_input_test ORDER BY id;
nb_iris_input_test
id sepal_length sepal_width petal_length petal_width species
5 5 3.6 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
15 5.8 4 1.2 0.2 setosa
20 5.1 3.8 1.5 0.3 setosa
25 4.8 3.4 1.9 0.2 setosa
30 4.7 3.2 1.6 0.2 setosa
35 4.9 3.1 1.5 0.2 setosa
40 5.1 3.4 1.5 0.2 setosa
45 5.1 3.8 1.9 0.4 setosa
50 5 3.3 1.4 0.2 setosa
55 6.5 2.8 4.6 1.5 versicolor
60 5.2 2.7 3.9 1.4 versicolor
65 5.6 2.9 3.6 1.3 versicolor
70 5.6 2.5 3.9 1.1 versicolor
75 6.4 2.9 4.3 1.3 versicolor
80 5.7 2.6 3.5 1 versicolor
85 5.4 3 4.5 1.5 versicolor
90 5.5 2.5 4 1.3 versicolor
95 5.6 2.7 4.2 1.3 versicolor
100 5.7 2.8 4.1 1.3 versicolor
105 6.5 3 5.8 2.2 virginica
110 7.2 3.6 6.1 2.5 virginica
115 5.8 2.8 5.1 2.4 virginica
120 6 2.2 5 1.5 virginica
125 6.7 3.3 5.7 2.1 virginica
130 7.2 3 5.8 1.6 virginica
135 6.1 2.6 5.6 1.4 virginica
140 6.9 3.1 5.4 2.1 virginica
145 6.7 3.3 5.7 2.5 virginica
150 5.9 3 5.1 1.8 virginica

SQL Call

DROP TABLE nb_iris_model;
CREATE MULTISET TABLE nb_iris_model AS (
  SELECT * FROM NaiveBayesReduce (
    ON NaiveBayesMap (
      ON nb_iris_input_train
      USING
      ResponseColumn ('species')
      NumericInputs ('[1:4]')
    ) PARTITION BY class_nb
  ) AS dt 
) WITH DATA;

Output

This query returns the following table:

SELECT * FROM nb_iris_model ORDER BY class_nb, cnt ASC;
 class_nb   variable_nb  type_nb category cnt sum_nb             sum_sq             total_cnt 
 ---------- ------------ ------- -------- --- ------------------ ------------------ --------- 
 setosa     petal_length NUMERIC NULL      40  57.69999999999999  84.21000000000001        40
 setosa     sepal_length NUMERIC NULL      40 199.90000000000003 1004.2700000000001        40
 setosa     petal_width  NUMERIC NULL      40 10.100000000000001  3.030000000000001        40
 setosa     sepal_width  NUMERIC NULL      40              136.7             473.29        40
 versicolor sepal_width  NUMERIC NULL      40              111.1             313.13        40
 versicolor petal_width  NUMERIC NULL      40               53.3  72.71000000000001        40
 versicolor sepal_length NUMERIC NULL      40              239.6            1446.14        40
 versicolor petal_length NUMERIC NULL      40 172.40000000000003             752.22        40
 virginica  petal_width  NUMERIC NULL      40  81.19999999999999 166.99999999999997        40
 virginica  sepal_width  NUMERIC NULL      40 118.80000000000001  356.5400000000001        40
 virginica  petal_length NUMERIC NULL      40              222.3            1249.15        40
 virginica  sepal_length NUMERIC NULL      40 264.40000000000003            1764.92        40

Download a zip file of all examples and a SQL script file that creates their input tables from the attachment in the left sidebar.