SeriesSplitter Arguments - Aster Analytics

Teradata AsterĀ® Analytics Foundation User GuideUpdate 2

Product
Aster Analytics
Release Number
7.00.02
Published
September 2017
Language
English (United States)
Last Update
2018-04-17
dita:mapPath
uce1497542673292.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1022
lifecycle
previous
Product Category
Software
InputTable
Specifies the name of the input table to be split.
PartitionByColumns
Specifies the partitioning columns of input_table. These columns determine the identity of a partition. For data type restrictions of these columns, see the Aster Database documentation.
DuplicateRowsCount
[Optional] Specifies the number of rows to duplicate across split boundaries. Default: One row from the previous partition and one row from the next partition. If you specify only value1, the function duplicates value1 rows from the previous partition and value1 rows from the next partition. If you specify both value1 and value2, the function duplicates value1 rows from the previous partition and value2 rows from the next partition. Each argument value must be nonnegative integer less than or equal to 1000.
OrderByColumns
[Optional] Specifies the ordering columns of input_table. These columns establish the order of the rows and splits. Without this argument, the function can split the rows in any order.
SplitCount
[Optional] If input_table has multiple partitions, you cannot specify SplitCount. Instead, specify RowsPerSplit.

Specifies the desired number of splits in a partition of the output table.

The value of split_count must be a positive BIGINT, and its upper bound is the number of rows in the partition. Default: 4.

Base the value of split_count on the desired amount of parallelism. For example, for a cluster with 10 vworkers, make split_count a multiple of 10.

If the number of rows in input_table (n) is not exactly divisible by split_count, the function estimates the number of splits in the partition, using this formula:

ceiling (n / ceiling (n / split_count) )

RowsPerSplit
[Optional] If input_table has multiple partitions, specify RowsPerSplit instead of SplitCount.

Specifies the desired maximum number of rows in each split in the output table. If the number of rows in input_table is not exactly divisible by rows_per_split, the last split contains fewer than rows_per_split rows, but no row contains more than rows_per_split rows.

The value of rows_per_split must be a positive BIGINT.

If input_table has multiple partitions and you do not specify RowsPerSplit, the function uses the value 1000.

Accumulate
[Optional] Specifies the names of input_table columns (other than those specified by PartitionByColumns and OrderByColumns) to copy to the output table. Default: Columns specified by PartitionByColumns and OrderByColumns.
OutputTable
[Optional] Specifies the name of table that the function creates to store the data splits for all partitions. Default: 'partitioned_input_table'. For example, if input_table is 'time_series', output_table is 'partitioned_time_series'.
SplitIDColumn
[Optional] Specifies the name for the output table column that is to contain the split identifiers. Default: 'split_id'. If the output table has another column named split_id_column, the function returns an error. Therefore, if the output table has a column named 'split_id' (specified by Accumulate, PartitionByColumns, or Order_By_Columns), you must use SplitIDColumn to specify a different split_id_column.
StatsTable
[Optional] Specifies the name of table that the function creates to store the statistics for the splitting operation that it performs. Default: 'stats_input_table'. For example, if input_table is 'time_series', stats_table is 'stats_time_series'.
ReturnStatsTable
[Optional] Specifies whether the function returns the data in stats_table in response to the command SELECT * FROM SeriesSplitter. Default: 'true'. When this value is 'false', the function returns only the data in output_table.
OverwriteOutput
[Optional] Specifies whether the function overwrites stats_table and output_table if they exist. Default: 'false'.
ValuesBeforeFirst
[Optional] If DuplicateRowsCount is nonzero and OrderByColumns is specified, ValuesBeforeFirst specifies the values to be stored in the ordering columns that precede the first row of the first split in a partition as a result of duplicating rows across split boundaries.

If ValuesBeforeFirst specifies only one value and OrderByColumns specifies multiple ordering columns, the specified value is stored in every ordering column.

If ValuesBeforeFirst specifies multiple values, it must specify a value for each ordering column. The value and the ordering column must have the same data type. For the data type VARCHAR, the values are case-insensitive.

The default values for different data types are:
  • Numeric: -1
  • CHAR(n) or VARCHAR : '-1'
  • Date- or time-based: 1900-01-01 0:00:00
  • CHARACTER: '0'
  • Bit: 0
  • Boolean: 'false'
  • IP4 : 0.0.0.0
  • UUID: 0000-0000-0000-0000-0000-0000-0000-0000
ValuesAfterLast
[Optional] If DuplicateRowsCount is nonzero and OrderByColumns is specified, ValuesAfterLast specifies the values to be stored in the ordering columns that follow the last row of the last split in a partition as a result of duplicating rows across split boundaries.

If ValuesAfterLast specifies only one value and OrderByColumns specifies multiple ordering columns, the specified value is stored in every ordering column.

If ValuesAfterLast specifies multiple values, it must specify a value for each ordering column. The value and the ordering column must have the same data type. For the data type VARCHAR, the values are case-insensitive.

Default: NULL.

DuplicateColumn
[Optional] If you specify this argument, the output table has a column that indicates whether a row is duplicated from the neighboring split. If the row is duplicated, the column contains 1; otherwise it contains 0. The output column name is duplicate_column.
PartialSplitID
[Optional] Specifies whether split_id_column contains only the numeric split identifier. Default: 'false'.

If the value is 'true', split_id_column contains a numeric representation of the split identifier that is unique for each partition. To distribute the output table by split, use a combination of all partitioning columns and split_id_column.

If the value is 'false', split_id_column contains a string representation of the split that is unique across all partitions. The function generates the string representation by concatenating the partitioning columns with the order of the split inside the partition (the numeric representation). In the string representation, hyphens separate partitioning column names from each other and from the order. For example, 'pcol1-pcol2-3'.