SeriesSplitter Arguments - Teradata Vantage

Teradata® Vantage Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.0
8.00
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
lmf1502735330121
Product Category
Teradata Vantage
PartitionByColumns
Specify the partition columns of input_table. These columns determine the identity of a partition. For data type restrictions of these columns, see the Teradata Database documentation.
DuplicateRowsCount
[Optional] Specify the number of rows to duplicate across split boundaries. If you specify only value1, the function duplicates value1 rows from the previous partition and value1 rows from the next partition. If you specify both value1 and value2, the function duplicates value1 rows from the previous partition and value2 rows from the next partition. Each argument value must be nonnegative integer less than or equal to 1000.
Default: One row from the previous partition and one row from the next partition
OrderByColumns
[Optional] Specify the order columns of input_table. These columns establish the order of the rows and splits. Without this argument, the function can split the rows in any order.
SplitCount
[Optional] If input_table has multiple partitions, you cannot specify SplitCount. Instead, specify RowsPerSplit. Specify the desired number of splits in a partition of the output table.
The value of split_count must be a positive BIGINT, and its upper bound is the number of rows in the partition. Base the value of split_count on the desired amount of parallelism. For example, for a cluster with 10 vworkers, make split_count a multiple of 10.
If the number of rows in input_table (n) is not exactly divisible by split_count, the function estimates the number of splits in the partition, using this formula:

ceiling ( n / ceiling ( n / split_count ))

Default: 4
RowsPerSplit
[Optional] If input_table has multiple partitions, specify RowsPerSplit instead of SplitCount.

Specify the desired maximum number of rows in each split in the output table. If the number of rows in input_table is not exactly divisible by rows_per_split, the last split contains fewer than rows_per_split rows, but no row contains more than rows_per_split rows.

The value of rows_per_split must be a positive BIGINT.

If input_table has multiple partitions and you do not specify RowsPerSplit, the function uses the value 1000.

Accumulate
[Optional] Specify the names of input_table columns (other than those specified by PartitionByColumns and OrderByColumns) to copy to the output table.
Default: Columns specified by PartitionByColumns and OrderByColumns
SplitIDColumn
[Optional] Specify the name for the output table column to contain the split identifiers. If the output table has another column named split_id_column, the function returns an error. Therefore, if the output table has a column named 'split_id' (specified by Accumulate, PartitionByColumns, or Order_By_Columns), you must use SplitIDColumn to specify a different split_id_column.
Default: 'split_id'
ReturnStatsTable
[Optional] Specify whether the function returns the data in stats_table in response to the command SELECT * FROM SeriesSplitter. When this value is 'false', the function returns only the data in output_table.
Default: 'true'
ValuesBeforeFirst
[Optional] If DuplicateRowsCount is nonzero and OrderByColumns is specified, ValuesBeforeFirst specifies the values to store in the order columns that precede the first row of the first split in a partition as a result of duplicating rows across split boundaries.

If ValuesBeforeFirst specifies only one value and OrderByColumns specifies multiple order columns, the specified value is stored in every order column.

If ValuesBeforeFirst specifies multiple values, it must specify a value for each order column. The value and the order column must have the same data type. For the data type VARCHAR, the values are case-insensitive.

Data Type Default
Numeric -1
CHARACTER(n) or VARCHAR '-1'
Date- or time-based 1900-01-01 0:00:00
CHARACTER '0'
ValuesAfterLast
[Optional] If DuplicateRowsCount is nonzero and OrderByColumns is specified, ValuesAfterLast specifies the values to store in the order columns that follow the last row of the last split in a partition as a result of duplicating rows across split boundaries.
If ValuesAfterLast specifies only one value and OrderByColumns specifies multiple order columns, the specified value is stored in every order column.
If ValuesAfterLast specifies multiple values, it must specify a value for each order column. The value and the order column must have the same data type. For the data type VARCHAR, the values are case-insensitive.
Default: NULL
DuplicateColumn
[Optional] Specify the name of the output table column that indicates whether a row is duplicated from the neighboring split. If the row is duplicated, the column contains 1; otherwise it contains 0. If you omit this argument, the output table does not have this column.
PartialSplitID
[Optional] Specify whether split_id_column contains only the numeric split identifier.
If the value is 'true', split_id_column contains a numeric representation of the split identifier that is unique for each partition. To distribute the output table by split, use a combination of all partition columns and split_id_column.
If the value is 'false', split_id_column contains a string representation of the split that is unique across all partitions. The function creates the string representation by concatenating the partition columns with the order of the split inside the partition (the numeric representation). In the string representation, hyphens separate partition column names from each other and from the order. For example, 'pcol1-pcol2-3'.
Default: 'false'