TD_TrainTestSplit Examples | train_test_split | Teradata Vantage - TD_TrainTestSplit Examples - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

The following input table is used with the examples:

titanicDataset Input Table
PassengerId Survived Pclass Name Age
1 0 3 Mr. Owen Harris 22
2 1 1 Mrs. John Bradley 38
3 1 3 Mrs. Laina 26
4 0 3 Mrs. Jacques Heath 35
5 0 3 Mr. William Henry 35
6 0 3 Mr. James 38

Example: StratifyColumn and Seed Do Not Exist

When Seed is not specified, the Seed value is randomly generated and the rows chosen for train and test data sets are not deterministic across multiple function calls.

SELECT * FROM TD_TrainTestSplit(
ON titanicDataset AS InputTable
USING
IDColumn('PassengerId')
trainSize(0.75)
testSize(0.25)
)As dt;
StratifyColumn and Seed Do Not Exist Output
TD_IsTrainRow PassengerId Survived Pclass Name Age
0 3 1 3 Miss Liana 26
0 6 0 3 Mr. James 38
1 1 0 3 Mr. Owen Harris 22
1 2 1 1 Mrs. John Bradley 38
1 4 0 3 Mr. Jacques Heath 35
1 5 0 3 Mr. William Henry 35

Example: StratifyColumn Does Not Exist and Seed Exists

SELECT * FROM TD_TrainTestSplit(
ON titanicDataset AS InputTable
USING
IDColumn('PassengerId')
trainSize(0.75)
testSize(0.25)
Seed(42)
)AS dt;
StratifyColumn Does Not Exist and Seed Exists Output
TD_IsTrainRow PassengerId Survived Pclass Name Age
0 3 1 3 Miss Liana 26
0 6 0 3 Mr. James 38
1 1 0 3 Mr. Owen Harris 22
1 2 1 1 Mrs. John Bradley 38
1 4 0 3 Mr. Jacques Heath 35
1 5 0 3 Mr. William Henry 35

Example: StratifyColumn Exists and Seed Does Not Exist

SELECT * FROM TD_TrainTestSplit(
ON titanicDataset AS InputTable
USING
IDColumn('PassengerId')
trainSize(0.75)
testSize(0.25)
stratifyColumn('Survived')
)AS dt;
StratifyColumn Exists and Seed Does Not Exist Output
TD_IsTrainRow PassengerId Survived Pclass Name Age
0 3 1 3 Miss Liana 26
0 6 0 3 Mr. James 38
1 1 0 3 Mr. Owen Harris 22
1 2 1 1 Mrs. John Bradley 38
1 4 0 3 Mr. Jacques Heath 35
1 5 0 3 Mr. William Henry 35

Example: StratifyColumn and Seed Exist

SELECT * FROM TD_TrainTestSplit(
ON titanicDataset AS InputTable
USING
IDColumn('PassengerId')
trainSize(0.75)
testSize(0.25)
Seed (42)
stratifyColumn('Survived')
)AS dt;
StratifyColumn and Seed Exist Output
TD_IsTrainRow PassengerId Survived Pclass Name Age
0 3 1 3 Miss Liana 26
0 6 0 3 Mr. James 38
1 1 0 3 Mr. Owen Harris 22
1 2 1 1 Mrs. John Bradley 38
1 4 0 3 Mr. Jacques Heath 35
1 5 0 3 Mr. William Henry 35