1.1 - 8.10 - SeriesSplitter Syntax Elements - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.1
8.10
Release Date
October 2019
Content Type
Programming Reference
Publication ID
B700-4003-079K
Language
English (United States)
OutputTable
[Optional] Specify the name for the function output table.
Default: partitioned_output in the current schema
StatsTable
[Optional] Specify the name for the statistics table that the function outputs.
Default: stats_output in the current schema
PartitionByColumns
Specify the partition columns of the InputTable. These columns determine the identity of a partition. For data type restrictions of these columns, see the Teradata Database documentation.
DuplicateRowsCount
[Optional] Specify the number of rows to duplicate across split boundaries. If you specify only value1, the function duplicates value1 rows from the previous partition and value1 rows from the next partition. If you specify both value1 and value2, the function duplicates value1 rows from the previous partition and value2 rows from the next partition. Each syntax element value must be nonnegative integer less than or equal to 1000.
Default: One row from the previous partition and one row from the next partition
OrderByColumns
[Optional] Specify the order columns of the InputTable. These columns establish the order of the rows and splits. Without this syntax element, the function can split the rows in any order.
SplitCount
[Optional] If the InputTable has multiple partitions, you cannot specify SplitCount. Instead, specify RowsPerSplit. Specify the desired number of splits in a partition of the OutputTable.
The value of split_count must be a positive BIGINT, and its upper bound is the number of rows in the partition. Base the value of split_count on the desired amount of parallelism. For example, for a cluster with 10 vworkers, make split_count a multiple of 10.
If the number of rows in the InputTable (n) is not exactly divisible by split_count, the function estimates the number of splits in the partition, using this formula:

ceiling ( n / ceiling ( n / split_count ))

Default: 4
RowsPerSplit
[Optional] If the InputTable has multiple partitions, specify RowsPerSplit instead of SplitCount.

Specify the desired maximum number of rows in each split in the OutputTable. If the number of rows in the InputTable is not exactly divisible by rows_per_split, the last split contains fewer than rows_per_split rows, but no row contains more than rows_per_split rows.

The value of rows_per_split must be a positive BIGINT.

If the InputTable has multiple partitions and you do not specify RowsPerSplit, the function uses the value 1000.

Accumulate
[Optional] Specify the names of the InputTable columns (other than those specified by PartitionByColumns and OrderByColumns) to copy to the OutputTable.
Default: Columns specified by PartitionByColumns and OrderByColumns
SplitIDColumn
[Optional] Specify the name for the OutputTable column to contain the split identifiers. If the OutputTable has another column named split_id_column, the function returns an error. Therefore, if the OutputTable has a column named 'split_id' (specified by Accumulate, PartitionByColumns, or Order_By_Columns), you must use SplitIDColumn to specify a different split_id_column.
Default: 'split_id'
ReturnStatsTable
[Optional] Specify whether the function returns the data in the StatsTable in response to the command SELECT * FROM SeriesSplitter. When this value is 'false', the function returns only the data in the OutputTable.
Default: 'true'
ValuesBeforeFirst
[Optional] If DuplicateRowsCount is nonzero and OrderByColumns is specified, ValuesBeforeFirst specifies the values to store in the order columns that precede the first row of the first split in a partition as a result of duplicating rows across split boundaries.

If ValuesBeforeFirst specifies only one value and OrderByColumns specifies multiple order columns, the specified value is stored in every order column.

If ValuesBeforeFirst specifies multiple values, it must specify a value for each order column. The value and the order column must have the same data type. For the data type VARCHAR, the values are case-insensitive.

Data Type Default
Numeric -1
CHARACTER(n) or VARCHAR '-1'
Date- or time-based 1900-01-01 0:00:00
CHARACTER '0'
ValuesAfterLast
[Optional] If DuplicateRowsCount is nonzero and OrderByColumns is specified, ValuesAfterLast specifies the values to store in the order columns that follow the last row of the last split in a partition as a result of duplicating rows across split boundaries.
If ValuesAfterLast specifies only one value and OrderByColumns specifies multiple order columns, the specified value is stored in every order column.
If ValuesAfterLast specifies multiple values, it must specify a value for each order column. The value and the order column must have the same data type. For the data type VARCHAR, the values are case-insensitive.
Default: NULL
DuplicateColumn
[Optional] Specify the name of the OutputTable column that indicates whether a row is duplicated from the neighboring split. If the row is duplicated, the column contains 1; otherwise it contains 0. If you omit this syntax element, the OutputTable does not have this column.
PartialSplitID
[Optional] Specify whether split_id_column contains only the numeric split identifier.
If the value is 'true', split_id_column contains a numeric representation of the split identifier that is unique for each partition. To distribute the OutputTable by split, use a combination of all partition columns and split_id_column.
If the value is 'false', split_id_column contains a string representation of the split that is unique across all partitions. The function creates the string representation by concatenating the partition columns with the order of the split inside the partition (the numeric representation). In the string representation, hyphens separate partition column names from each other and from the order. For example, 'pcol1-pcol2-3'.
Default: 'false'