Column | Data Type | Description |
---|---|---|
TD_IsTrainRow | BYTEINT | This column has values 0 and 1. The test rows have a value of 0, and the train rows have a value of 1. |
input_column | Same as input table | The columns copied from the input table. |
Create the train table and test table from the output of the TD_TrainTestSplit function in the following way:
CREATE Multiset table TrainTable AS (SELECT * FROM TTS_OUTPUT WHERE TD_IsTrainRow = 1) WITH data; CREATE Multiset table TestTable AS (SELECT * FROM TTS_OUTPUT WHERE TD_IsTrainRow = 0) WITH data;
TrainTable and TestTable have the same columns and data types.
The output rows are ordered on the IDColumn to achieve the deterministic split. If Seed argument is passed, then the rows are ordered based on the random number sequence, and the rows chosen for train and test data sets are deterministic. If Seed argument is not passed, then the rows chosen for train and test data sets are not deterministic across multiple function calls.
If a column is specified for the StratifyColumn argument, then unique values from the specified column are divided in the ratio of TrainSize and TestSize.
When the value specified for TrainSize multiplied by the number of rows is a fraction, then the function rounds the TrainSize value to the nearest lower integer.
The number of rows in TestSize is equal to the total number of input rows minus the TrainSize value.
As the division depends on how the data is distributed, a 1 to 2% variance can be expected.