TD_TrainTestSplit Examples | train_test_split | Teradata Vantage - TD_TrainTestSplit Examples - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-04-06
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantageā„¢

The following input table is used with the examples:

titanicDataset Input Table
PassengerId Survived Pclass Name Age
1 0 3 Mr. Owen Harris 22
2 1 1 Mrs. John Bradley 38
3 1 3 Mrs. Laina 26
4 0 3 Mrs. Jacques Heath 35
5 0 3 Mr. William Henry 35
6 0 3 Mr. James 38

Example: StratifyColumn and Seed Do Not Exist

When Seed is not specified, the Seed value is randomly generated and the rows chosen for train and test data sets are not deterministic across multiple function calls.

SELECT * FROM TD_TrainTestSplit(
ON titanicDataset AS InputTable
USING
IDColumn('PassengerId')
trainSize(0.75)
testSize(0.25)
)As dt;
StratifyColumn and Seed Do Not Exist Output
TD_IsTrainRow PassengerId Survived Pclass Name Age
0 3 1 3 Miss Liana 26
0 6 0 3 Mr. James 38
1 1 0 3 Mr. Owen Harris 22
1 2 1 1 Mrs. John Bradley 38
1 4 0 3 Mr. Jacques Heath 35
1 5 0 3 Mr. William Henry 35

Example: StratifyColumn Does Not Exist and Seed Exists

SELECT * FROM TD_TrainTestSplit(
ON titanicDataset AS InputTable
USING
IDColumn('PassengerId')
trainSize(0.75)
testSize(0.25)
Seed(42)
)AS dt;
StratifyColumn Does Not Exist and Seed Exists Output
TD_IsTrainRow PassengerId Survived Pclass Name Age
0 3 1 3 Miss Liana 26
0 6 0 3 Mr. James 38
1 1 0 3 Mr. Owen Harris 22
1 2 1 1 Mrs. John Bradley 38
1 4 0 3 Mr. Jacques Heath 35
1 5 0 3 Mr. William Henry 35

Example: StratifyColumn Exists and Seed Does Not Exist

SELECT * FROM TD_TrainTestSplit(
ON titanicDataset AS InputTable
USING
IDColumn('PassengerId')
trainSize(0.75)
testSize(0.25)
stratifyColumn('Survived')
)AS dt;
StratifyColumn Exists and Seed Does Not Exist Output
TD_IsTrainRow PassengerId Survived Pclass Name Age
0 3 1 3 Miss Liana 26
0 6 0 3 Mr. James 38
1 1 0 3 Mr. Owen Harris 22
1 2 1 1 Mrs. John Bradley 38
1 4 0 3 Mr. Jacques Heath 35
1 5 0 3 Mr. William Henry 35

Example: StratifyColumn and Seed Exist

SELECT * FROM TD_TrainTestSplit(
ON titanicDataset AS InputTable
USING
IDColumn('PassengerId')
trainSize(0.75)
testSize(0.25)
Seed (42)
stratifyColumn('Survived')
)AS dt;
StratifyColumn and Seed Exist Output
TD_IsTrainRow PassengerId Survived Pclass Name Age
0 3 1 3 Miss Liana 26
0 6 0 3 Mr. James 38
1 1 0 3 Mr. Owen Harris 22
1 2 1 1 Mrs. John Bradley 38
1 4 0 3 Mr. Jacques Heath 35
1 5 0 3 Mr. William Henry 35