NaiveBayes Example | Teradata Vantage - NaiveBayes Example - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
9.02
9.01
2.0
1.3
Published
February 2022
Language
English (United States)
Last Update
2022-02-10
dita:mapPath
rnn1580259159235.ditamap
dita:ditavalPath
ybt1582220416951.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢
This example does the following:
  1. Takes raw data and creates training and test sets.
  2. Inputs the training set to the NaiveBayes function to create a model.
  3. Inputs the model and test set to the NaiveBayesPredict_MLE function to predict the output for the test set.
  4. Determines prediction accuracy based on the original and predicted results.

Input

The input table of raw data, svm_iris, contains values for four attributes (sepal_length, sepal_width, petal_length and petal_width), grouped into three categories (setosa, versicolor and virginica).

Raw Input: svm_iris
id sepal_length sepal_width petal_length petal_width species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
11 5.4 3.7 1.5 0.2 setosa
12 4.8 3.4 1.6 0.2 setosa
13 4.8 3 1.4 0.1 setosa
14 4.3 3 1.1 0.1 setosa
... ... ... ... ... ...

This code divides the 150 data rows into a training data set (80%) and a testing data set (20%):

DROP TABLE svm_iris_train;
DROP TABLE svm_iris_test;

CREATE MULTISET TABLE svm_iris_train AS (
  SELECT * FROM svm_iris WHERE id MOD 5 <> 0
) WITH DATA;

CREATE MULTISET TABLE svm_iris_test AS (
  SELECT * FROM svm_iris WHERE id MOD 5 = 0
) WITH DATA;

Alternatively, you can do the preceding task with the Sampling or RandomSample function.

This query returns the following table:

SELECT * FROM svm_iris_train ORDER BY id;
svm_iris_train
id sepal_length sepal_width petal_length petal_width species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
11 5.4 3.7 1.5 0.2 setosa
12 4.8 3.4 1.6 0.2 setosa
13 4.8 3 1.4 0.1 setosa
14 4.3 3 1.1 0.1 setosa
16 5.7 4.4 1.5 0.4 setosa
... ... ... ... ... ...

This query returns the following table:

SELECT * FROM svm_iris_test ORDER BY id;
svm_iris_test
id sepal_length sepal_width petal_length petal_width species
5 5 3.6 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
15 5.8 4 1.2 0.2 setosa
20 5.1 3.8 1.5 0.3 setosa
25 4.8 3.4 1.9 0.2 setosa
30 4.7 3.2 1.6 0.2 setosa
35 4.9 3.1 1.5 0.2 setosa
40 5.1 3.4 1.5 0.2 setosa
45 5.1 3.8 1.9 0.4 setosa
50 5 3.3 1.4 0.2 setosa
55 6.5 2.8 4.6 1.5 versicolor
60 5.2 2.7 3.9 1.4 versicolor
65 5.6 2.9 3.6 1.3 versicolor
70 5.6 2.5 3.9 1.1 versicolor
75 6.4 2.9 4.3 1.3 versicolor
80 5.7 2.6 3.5 1 versicolor
85 5.4 3 4.5 1.5 versicolor
90 5.5 2.5 4 1.3 versicolor
95 5.6 2.7 4.2 1.3 versicolor
100 5.7 2.8 4.1 1.3 versicolor
105 6.5 3 5.8 2.2 virginica
110 7.2 3.6 6.1 2.5 virginica
115 5.8 2.8 5.1 2.4 virginica
120 6 2.2 5 1.5 virginica
125 6.7 3.3 5.7 2.1 virginica
130 7.2 3 5.8 1.6 virginica
135 6.1 2.6 5.6 1.4 virginica
140 6.9 3.1 5.4 2.1 virginica
145 6.7 3.3 5.7 2.5 virginica
150 5.9 3 5.1 1.8 virginica

SQL Call

SELECT * FROM NaiveBayes (
  ON svm_iris_train AS InputTable
  OUT TABLE OutputTable (nb_iris_model)
  USING
  ResponseColumn ('species')
  NumericInputs ('[1:4]')
) AS dt;

Output

message                                                                 
 ----------------------------------------------------------------------- 
Successfully built naive bayes model in table specified by OutputTable.

This query returns the following table:

SELECT * FROM nb_iris_model ORDER BY class_nb, cnt ASC;
 class_nb   variable_nb  type_nb category cnt sum_nb             sum_sq             total_cnt 
 ---------- ------------ ------- -------- --- ------------------ ------------------ --------- 
 setosa     petal_length NUMERIC NULL      40  57.69999999999999  84.21000000000001        40
 setosa     sepal_length NUMERIC NULL      40 199.90000000000003 1004.2700000000001        40
 setosa     petal_width  NUMERIC NULL      40 10.100000000000001  3.030000000000001        40
 setosa     sepal_width  NUMERIC NULL      40              136.7             473.29        40
 versicolor sepal_width  NUMERIC NULL      40              111.1             313.13        40
 versicolor petal_width  NUMERIC NULL      40               53.3  72.71000000000001        40
 versicolor sepal_length NUMERIC NULL      40              239.6            1446.14        40
 versicolor petal_length NUMERIC NULL      40 172.40000000000003             752.22        40
 virginica  petal_width  NUMERIC NULL      40  81.19999999999999 166.99999999999997        40
 virginica  sepal_width  NUMERIC NULL      40 118.80000000000001  356.5400000000001        40
 virginica  petal_length NUMERIC NULL      40              222.3            1249.15        40
 virginica  sepal_length NUMERIC NULL      40 264.40000000000003            1764.92        40

Download a zip file of all examples and a SQL script file that creates their input tables.