1.1 - 8.10 - GMM Examples Input - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.1
8.10
Release Date
October 2019
Content Type
Programming Reference
Publication ID
B700-4003-079K
Language
English (United States)

The table gmm_iris_input contains raw data, which has values for four attributes—sepal_length, sepal_width, petal_length, and petal_width—which are the data dimensions. The table does not include the species column, because the goal is data clustering, not classification. Each example outputs three clusters.

From the raw data, a train set and a test set are created.

The function GMM uses the train set to create the model. The GMMPredict function uses the model information to predict clusters for the test data.

Raw Data Table gmm_iris_input

id sepal_length sepal_width petal_length petal_width
1 5.1 3.5 1.4 0.2
2 4.9 3 1.4 0.2
3 4.7 3.2 1.3 0.2
4 4.6 3.1 1.5 0.2
5 5 3.6 1.4 0.2
6 5.4 3.9 1.7 0.4
7 4.6 3.4 1.4 0.3
8 5 3.4 1.5 0.2
9 4.4 2.9 1.4 0.2
10 4.9 3.1 1.5 0.1
... ... ... ... ...

Split Input into Training and Testing Data Sets

The following code divides the 150 data rows into a training data set (80%) and a testing data set (20%). The GMM examples use gmm_iris_train; the GMMPredict example uses gmm_iris_test.

DROP TABLE gmm_iris_train;
DROP TABLE gmm_iris_test;

CREATE MULTISET TABLE gmm_iris_train AS (
  SELECT * FROM gmm_iris_input WHERE id MOD 5 <> 0
) WITH DATA;

CREATE MULTISET TABLE gmm_iris_test AS (
  SELECT * FROM gmm_iris_input WHERE id MOD 5 = 0
) WITH DATA;

Alternatively, you can do the preceding task with the Sampling or RandomSample function.