TD_TrainTestSplit Output - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-10-04
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantageā„¢
Output Table Schema
Column Data Type Description
TD_IsTrainRow BYTEINT This column has values 0 and 1. The test rows have a value of 0, and the train rows have a value of 1.
input_column Same as input table The columns copied from the input table.
Create the train table and test table from the output of the TD_TrainTestSplit function in the following way:
CREATE Multiset table TrainTable AS (SELECT * FROM TTS_OUTPUT WHERE TD_IsTrainRow = 1) WITH data;

CREATE Multiset table TestTable AS (SELECT * FROM TTS_OUTPUT WHERE TD_IsTrainRow = 0) WITH data;

TrainTable and TestTable have the same columns and data types.

The output rows are ordered on the IDColumn to achieve the deterministic split. If Seed argument is passed, then the rows are ordered based on the random number sequence, and the rows chosen for train and test data sets are deterministic. If Seed argument is not passed, then the rows chosen for train and test data sets are not deterministic across multiple function calls.

If a column is specified for the StratifyColumn argument, then unique values from the specified column are divided in the ratio of TrainSize and TestSize.
When the value specified for TrainSize multiplied by the number of rows is a fraction, then the function rounds the TrainSize value to the nearest lower integer.
The number of rows in TestSize is equal to the total number of input rows minus the TrainSize value.
As the division depends on how the data is distributed, a 1 to 2% variance can be expected.