Split Input into Training and Testing Data Sets - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product
Aster Analytics
Release Number
6.21
Published
November 2016
Language
English (United States)
Last Update
2018-04-14
dita:mapPath
kiu1466024880662.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1021
lifecycle
previous
Product Category
Software

This code divides the 150 data rows into a training data set (80%) and a testing data set (20%):

DROP TABLE IF EXISTS iris_train;
DROP TABLE IF EXISTS iris_test;

CREATE TABLE iris_train AS
  SELECT * FROM iris_input WHERE id%5!=0;

CREATE TABLE iris_test AS
  SELECT * FROM iris_input WHERE id%5=0;

SELECT * FROM iris_train ORDER BY id;
Alternatively, you can do the preceding task with the Sample or RandomSample function.
Single_Tree_Drive Example 1 Train Table iris_train
id sepal_length sepal_width petal_length petal_width species
1 5.1 3.5 1.4 0.2 1
2 4.9 3 1.4 0.2 1
3 4.7 3.2 1.3 0.2 1
4 4.6 3.1 1.5 0.2 1
6 5.4 3.9 1.7 0.4 1
7 4.6 3.4 1.4 0.3 1
8 5 3.4 1.5 0.2 1
9 4.4 2.9 1.4 0.2 1
11 5.4 3.7 1.5 0.2 1
12 4.8 3.4 1.6 0.2 1
13 4.8 3 1.4 0.1 1
14 4.3 3 1.1 0.1 1
16 5.7 4.4 1.5 0.4 1
... ... ... ... ... ...
SELECT * FROM iris_test ORDER BY id;
Single_Tree_Drive Example 1 Test Table iris_test
id sepal_length sepal_width petal_length petal_width species
5 5 3.6 1.4 0.2 1
10 4.9 3.1 1.5 0.1 1
15 5.8 4 1.2 0.2 1
20 5.1 3.8 1.5 0.3 1
25 4.8 3.4 1.9 0.2 1
30 4.7 3.2 1.6 0.2 1
35 4.9 3.1 1.5 0.2 1
40 5.1 3.4 1.5 0.2 1
45 5.1 3.8 1.9 0.4 1
50 5 3.3 1.4 0.2 1
55 6.5 2.8 4.6 1.5 2
60 5.2 2.7 3.9 1.4 2
65 5.6 2.9 3.6 1.3 2
70 5.6 2.5 3.9 1.1 2
75 6.4 2.9 4.3 1.3 2
80 5.7 2.6 3.5 1 2
85 5.4 3 4.5 1.5 2
90 5.5 2.5 4 1.3 2
95 5.6 2.7 4.2 1.3 2
100 5.7 2.8 4.1 1.3 2
105 6.5 3 5.8 2.2 3
110 7.2 3.6 6.1 2.5 3
115 5.8 2.8 5.1 2.4 3
120 6 2.2 5 1.5 3
125 6.7 3.3 5.7 2.1 3
130 7.2 3 5.8 1.6 3
135 6.1 2.6 5.6 1.4 3
140 6.9 3.1 5.4 2.1 3
145 6.7 3.3 5.7 2.5 3
150 5.9 3 5.1 1.8 3