GMM Examples Input - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
9.02
9.01
2.0
1.3
Published
February 2022
Language
English (United States)
Last Update
2022-02-10
dita:mapPath
rnn1580259159235.ditamap
dita:ditavalPath
ybt1582220416951.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantage™

The table gmm_iris_input contains raw data, which has values for four attributes—sepal_length, sepal_width, petal_length, and petal_width—which are the data dimensions. The table does not include the species column, because the goal is data clustering, not classification. Each example outputs three clusters.

From the raw data, a train set and a test set are created.

The function GMM uses the train set to create the model. The GMMPredict function uses the model information to predict clusters for the test data.

Raw Data Table gmm_iris_input

id sepal_length sepal_width petal_length petal_width
1 5.1 3.5 1.4 0.2
2 4.9 3 1.4 0.2
3 4.7 3.2 1.3 0.2
4 4.6 3.1 1.5 0.2
5 5 3.6 1.4 0.2
6 5.4 3.9 1.7 0.4
7 4.6 3.4 1.4 0.3
8 5 3.4 1.5 0.2
9 4.4 2.9 1.4 0.2
10 4.9 3.1 1.5 0.1
... ... ... ... ...

Split Input into Training and Testing Data Sets

The following code divides the 150 data rows into a training data set (80%) and a testing data set (20%). The GMM examples use gmm_iris_train; the GMMPredict example uses gmm_iris_test.

DROP TABLE gmm_iris_train;
DROP TABLE gmm_iris_test;

CREATE MULTISET TABLE gmm_iris_train AS (
  SELECT * FROM gmm_iris_input WHERE id MOD 5 <> 0
) WITH DATA;

CREATE MULTISET TABLE gmm_iris_test AS (
  SELECT * FROM gmm_iris_input WHERE id MOD 5 = 0
) WITH DATA;

Alternatively, you can do the preceding task with the Sampling or RandomSample function.