Arguments - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product: Aster Analytics
Release Number: 6.21
Published: November 2016
Language: English (United States)
Last Update: 2018-04-14
dita:mapPath: kiu1466024880662.ditamap
dita:ditavalPath: AA-notempfilter_pdf_output.ditaval
dita:id: B700-1021
lifecycle: previous
Product Category: Software

Argument	Category	Description
InputTable	Required	Specifies the name of the input table to be split.
PartitionByColumns	Required	Specifies the partitioning columns of input_table. These columns determine the identity of a partition. For data type restrictions of these columns, see the Aster Database documentation.
DuplicateRowsCount	Optional	Specifies the number of rows to duplicate across split boundaries. By default, the function duplicates one row from the previous partition and one row from the next partition. If you specify only value1, then the function duplicates value1 rows from the previous partition and value1 rows from the next partition. If you specify both value1 and value2, then the function duplicates value1 rows from the previous partition and value2 rows from the next partition. Each argument value must be nonnegative integer less than or equal to 1000.
OrderByColumns	Optional	Specifies the ordering columns of input_table. These columns establish the order of the rows and splits. Without this argument, the function can split the rows in any order.
SplitCount	Optional	If input_table has multiple partitions, then you cannot specify SplitCount. Instead, specify RowsPerSplit. Specifies the desired number of splits in a partition of the output table. The value of split_count must be a positive BIGINT, and its upper bound is the number of rows in the partition. The default value is 4. Base the value of split_count on the desired amount of parallelism. For example, for a cluster with 10 vworkers, make split_count a multiple of 10. If the number of rows in input_table (n) is not exactly divisible by split_count, then the function estimates the number of splits in the partition, using this formula: ceiling (n / ceiling (n / split_count) )
RowsPerSplit	Optional	If input_table has multiple partitions, then specify RowsPerSplit instead of SplitCount. Specifies the desired maximum number of rows in each split in the output table. If the number of rows in input_table is not exactly divisible by rows_per_split, then the last split contains fewer than rows_per_split rows, but no row contains more than rows_per_split rows. The value of rows_per_split must be a positive BIGINT. If input_table has multiple partitions and you do not specify RowsPerSplit, then the function uses the value 1000.
Accumulate	Optional	Specifies the names of input_table columns (other than those specified by PartitionByColumns and OrderByColumns) to copy to the output table. By default, only the columns specified by PartitionByColumns and OrderByColumns are copied to the output table.
OutputTable	Optional	Specifies the name of table that the function creates to store the data splits for all partitions. The default value is 'partitioned_input_table'. For example, if input_table is 'time_series', then output_table is 'partitioned_time_series'.
SplitIDColumn	Optional	Specifies the name for the output table column that is to contain the split identifiers. The default value is 'split_id'. If the output table has another column named split_id_column, then the function returns an error. Therefore, if the output table has a column named 'split_id' (specified by Accumulate, PartitionByColumns, or Order_By_Columns), then you must use SplitIDColumn to specify a different split_id_column.
StatsTable	Optional	Specifies the name of table that the function creates to store the statistics for the splitting operation that it performs. The default value is 'stats_input_table'. For example, if input_table is 'time_series', then stats_table is 'stats_time_series'.
ReturnStatsTable	Optional	Specifies whether the function returns the data in stats_table in response to the command SELECT * FROM SeriesSplitter. The default value is 'true'. When this value is 'false', the function returns only the data in output_table.
OverwriteOutput	Optional	Specifies whether the function overwrites stats_table and output_table if they exist. The default value is 'false'.
ValuesBeforeFirst	Optional	If DuplicateRowsCount is nonzero and OrderByColumns is specified, then ValuesBeforeFirst specifies the values to be stored in the ordering columns that precede the first row of the first split in a partition as a result of duplicating rows across split boundaries. If ValuesBeforeFirst specifies only one value and OrderByColumns specifies multiple ordering columns, then the specified value is stored in every ordering column. If ValuesBeforeFirst specifies multiple values, then it must specify a value for each ordering column. The value and the ordering column must have the same data type. For the data type VARCHAR, the values are case-insensitive. The default values for different data types are: Numeric: -1 CHAR(n) or VARCHAR : '-1' Date- or time-based: 1900-01-01 0:00:00 CHARACTER: '0' Bit: 0 Boolean: 'false' IP4 : 0.0.0.0 UUID: 0000-0000-0000-0000-0000-0000-0000-0000
ValuesAfterLast	Optional	If DuplicateRowsCount is nonzero and OrderByColumns is specified, then ValuesAfterLast specifies the values to be stored in the ordering columns that follow the last row of the last split in a partition as a result of duplicating rows across split boundaries. If ValuesAfterLast specifies only one value and OrderByColumns specifies multiple ordering columns, then the specified value is stored in every ordering column. If ValuesAfterLast specifies multiple values, then it must specify a value for each ordering column. The value and the ordering column must have the same data type. For the data type VARCHAR, the values are case-insensitive. The default value is NULL.
DuplicateColumn	Optional	If you specify this argument, the output table has a column that indicates whether a row is duplicated from the neighboring split. If the row is duplicated, the column contains 1; otherwise it contains 0. The output column name is duplicate_column.
PartialSplitID	Optional	Specifies whether split_id_column contains only the numeric split identifier. The default value is 'false'. If the value is 'true', then split_id_column contains a numeric representation of the split identifier that is unique for each partition. To distribute the output table by split, use a combination of all partitioning columns and split_id_column. If the value is 'false', then split_id_column contains a string representation of the split that is unique across all partitions. The function generates the string representation by concatenating the partitioning columns with the order of the split inside the partition (the numeric representation). In the string representation, hyphens separate partitioning column names from each other and from the order. For example, 'pcol1-pcol2-3'.