The following input table is used with the examples:
titanicDataset Input TablePassengerId |
Survived |
Pclass |
Name |
Age |
1 |
0 |
3 |
Mr. Owen Harris |
22 |
2 |
1 |
1 |
Mrs. John Bradley |
38 |
3 |
1 |
3 |
Mrs. Laina |
26 |
4 |
0 |
3 |
Mrs. Jacques Heath |
35 |
5 |
0 |
3 |
Mr. William Henry |
35 |
6 |
0 |
3 |
Mr. James |
38 |
Example: StratifyColumn and Seed Do Not Exist
When Seed is not specified, the Seed value is randomly generated and the rows chosen for train and test data sets are not deterministic across multiple function calls.
SELECT * FROM TD_TrainTestSplit(
ON titanicDataset AS InputTable
USING
IDColumn('PassengerId')
trainSize(0.75)
testSize(0.25)
)As dt;
StratifyColumn and Seed Do Not Exist OutputTD_IsTrainRow |
PassengerId |
Survived |
Pclass |
Name |
Age |
0 |
3 |
1 |
3 |
Miss Liana |
26 |
0 |
6 |
0 |
3 |
Mr. James |
38 |
1 |
1 |
0 |
3 |
Mr. Owen Harris |
22 |
1 |
2 |
1 |
1 |
Mrs. John Bradley |
38 |
1 |
4 |
0 |
3 |
Mr. Jacques Heath |
35 |
1 |
5 |
0 |
3 |
Mr. William Henry |
35 |
Example: StratifyColumn Does Not Exist and Seed Exists
SELECT * FROM TD_TrainTestSplit(
ON titanicDataset AS InputTable
USING
IDColumn('PassengerId')
trainSize(0.75)
testSize(0.25)
Seed(42)
)AS dt;
StratifyColumn Does Not Exist and Seed Exists OutputTD_IsTrainRow |
PassengerId |
Survived |
Pclass |
Name |
Age |
0 |
3 |
1 |
3 |
Miss Liana |
26 |
0 |
6 |
0 |
3 |
Mr. James |
38 |
1 |
1 |
0 |
3 |
Mr. Owen Harris |
22 |
1 |
2 |
1 |
1 |
Mrs. John Bradley |
38 |
1 |
4 |
0 |
3 |
Mr. Jacques Heath |
35 |
1 |
5 |
0 |
3 |
Mr. William Henry |
35 |
Example: StratifyColumn Exists and Seed Does Not Exist
SELECT * FROM TD_TrainTestSplit(
ON titanicDataset AS InputTable
USING
IDColumn('PassengerId')
trainSize(0.75)
testSize(0.25)
stratifyColumn('Survived')
)AS dt;
StratifyColumn Exists and Seed Does Not Exist OutputTD_IsTrainRow |
PassengerId |
Survived |
Pclass |
Name |
Age |
0 |
3 |
1 |
3 |
Miss Liana |
26 |
0 |
6 |
0 |
3 |
Mr. James |
38 |
1 |
1 |
0 |
3 |
Mr. Owen Harris |
22 |
1 |
2 |
1 |
1 |
Mrs. John Bradley |
38 |
1 |
4 |
0 |
3 |
Mr. Jacques Heath |
35 |
1 |
5 |
0 |
3 |
Mr. William Henry |
35 |
Example: StratifyColumn and Seed Exist
SELECT * FROM TD_TrainTestSplit(
ON titanicDataset AS InputTable
USING
IDColumn('PassengerId')
trainSize(0.75)
testSize(0.25)
Seed (42)
stratifyColumn('Survived')
)AS dt;
StratifyColumn and Seed Exist OutputTD_IsTrainRow |
PassengerId |
Survived |
Pclass |
Name |
Age |
0 |
3 |
1 |
3 |
Miss Liana |
26 |
0 |
6 |
0 |
3 |
Mr. James |
38 |
1 |
1 |
0 |
3 |
Mr. Owen Harris |
22 |
1 |
2 |
1 |
1 |
Mrs. John Bradley |
38 |
1 |
4 |
0 |
3 |
Mr. Jacques Heath |
35 |
1 |
5 |
0 |
3 |
Mr. William Henry |
35 |