GMM Examples Input - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

8.10

1.1

Published

October 2019

Language

English (United States)

Last Update

2019-12-31

dita:mapPath

ima1540829771750.ditamap

dita:ditavalPath

jsj1481748799576.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

The table gmm_iris_input contains raw data, which has values for four attributes—sepal_length, sepal_width, petal_length, and petal_width—which are the data dimensions. The table does not include the species column, because the goal is data clustering, not classification. Each example outputs three clusters.

From the raw data, a train set and a test set are created.

The function GMM uses the train set to create the model. The GMMPredict function uses the model information to predict clusters for the test data.

Raw Data Table gmm_iris_input

id	sepal_length	sepal_width	petal_length	petal_width
1	5.1	3.5	1.4	0.2
2	4.9	3	1.4	0.2
3	4.7	3.2	1.3	0.2
4	4.6	3.1	1.5	0.2
5	5	3.6	1.4	0.2
6	5.4	3.9	1.7	0.4
7	4.6	3.4	1.4	0.3
8	5	3.4	1.5	0.2
9	4.4	2.9	1.4	0.2
10	4.9	3.1	1.5	0.1
...	...	...	...	...

Split Input into Training and Testing Data Sets

The following code divides the 150 data rows into a training data set (80%) and a testing data set (20%). The GMM examples use gmm_iris_train; the GMMPredict example uses gmm_iris_test.

DROP TABLE gmm_iris_train;
DROP TABLE gmm_iris_test;

CREATE MULTISET TABLE gmm_iris_train AS (
  SELECT * FROM gmm_iris_input WHERE id MOD 5 <> 0
) WITH DATA;

CREATE MULTISET TABLE gmm_iris_test AS (
  SELECT * FROM gmm_iris_input WHERE id MOD 5 = 0
) WITH DATA;

Alternatively, you can do the preceding task with the Sampling or RandomSample function.

id	sepal_length	sepal_width	petal_length	petal_width
1	5.1	3.5	1.4	0.2
2	4.9	3	1.4	0.2
3	4.7	3.2	1.3	0.2
4	4.6	3.1	1.5	0.2
5	5	3.6	1.4	0.2
6	5.4	3.9	1.7	0.4
7	4.6	3.4	1.4	0.3
8	5	3.4	1.5	0.2
9	4.4	2.9	1.4	0.2
10	4.9	3.1	1.5	0.1
...	...	...	...	...

id	sepal_length	sepal_width	petal_length	petal_width
1	5.1	3.5	1.4	0.2
2	4.9	3	1.4	0.2
3	4.7	3.2	1.3	0.2
4	4.6	3.1	1.5	0.2
5	5	3.6	1.4	0.2
6	5.4	3.9	1.7	0.4
7	4.6	3.4	1.4	0.3
8	5	3.4	1.5	0.2
9	4.4	2.9	1.4	0.2
10	4.9	3.1	1.5	0.1
...	...	...	...	...

id	sepal_length	sepal_width	petal_length	petal_width
1	5.1	3.5	1.4	0.2
2	4.9	3	1.4	0.2
3	4.7	3.2	1.3	0.2
4	4.6	3.1	1.5	0.2
5	5	3.6	1.4	0.2
6	5.4	3.9	1.7	0.4
7	4.6	3.4	1.4	0.3
8	5	3.4	1.5	0.2
9	4.4	2.9	1.4	0.2
10	4.9	3.1	1.5	0.1
...	...	...	...	...