Optional Syntax Elements for TD_TrainTestSplit - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
ft:locale
en-US
ft:lastEdition
2025-04-01
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
qkf1628213546010.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantageā„¢
IDColumn
Column that contains a unique identifier for each row in the input table.
Mandatory when Seed argument is present so that the output of TD_TrainTestSplit is deterministic across multiple function calls.
TrainSize
Size of the train dataset. It accepts float values in the (0, 1) range. Default is 0.75.
TestSize
Size of the test dataset. It accepts float values in the (0, 1) range. Default is 0.25.
Seed
Seed value that controls the shuffling applied to the data before applying the split. Pass an INT for reproducible output across multiple function calls. When the argument is not specified, different runs of the query generate different outputs. The random seed value must be in the range of 0 to INT_MAX.
StratifyColumn
Column name that contains the labels indicating which data needs to be stratified.
  • If both trainSize and testSize arguments are specified, then their sum must be equal to 1.
  • TrainSize and the TestSize must be greater than the number of classes when using stratify.
  • If the input table does not have an identifier column, then TD_FillRowID can be used to generate one.