Arguments - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product
Aster Analytics
Release Number
6.21
Published
November 2016
Language
English (United States)
Last Update
2018-04-14
dita:mapPath
kiu1466024880662.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1021
lifecycle
previous
Product Category
Software
Argument Category Description
InputTable Required Specifies the name of the input table to be split.
PartitionByColumns Required Specifies the partitioning columns of input_table. These columns determine the identity of a partition. For data type restrictions of these columns, see the Aster Database documentation.
DuplicateRowsCount Optional Specifies the number of rows to duplicate across split boundaries. By default, the function duplicates one row from the previous partition and one row from the next partition. If you specify only value1, then the function duplicates value1 rows from the previous partition and value1 rows from the next partition. If you specify both value1 and value2, then the function duplicates value1 rows from the previous partition and value2 rows from the next partition. Each argument value must be nonnegative integer less than or equal to 1000.
OrderByColumns Optional Specifies the ordering columns of input_table. These columns establish the order of the rows and splits. Without this argument, the function can split the rows in any order.
SplitCount Optional
If input_table has multiple partitions, then you cannot specify SplitCount. Instead, specify RowsPerSplit.

Specifies the desired number of splits in a partition of the output table.

The value of split_count must be a positive BIGINT, and its upper bound is the number of rows in the partition. The default value is 4.

Base the value of split_count on the desired amount of parallelism. For example, for a cluster with 10 vworkers, make split_count a multiple of 10.

If the number of rows in input_table (n) is not exactly divisible by split_count, then the function estimates the number of splits in the partition, using this formula:

ceiling (n / ceiling (n / split_count) )

RowsPerSplit Optional
If input_table has multiple partitions, then specify RowsPerSplit instead of SplitCount.

Specifies the desired maximum number of rows in each split in the output table. If the number of rows in input_table is not exactly divisible by rows_per_split, then the last split contains fewer than rows_per_split rows, but no row contains more than rows_per_split rows.

The value of rows_per_split must be a positive BIGINT.

If input_table has multiple partitions and you do not specify RowsPerSplit, then the function uses the value 1000.

Accumulate Optional Specifies the names of input_table columns (other than those specified by PartitionByColumns and OrderByColumns) to copy to the output table. By default, only the columns specified by PartitionByColumns and OrderByColumns are copied to the output table.
OutputTable Optional Specifies the name of table that the function creates to store the data splits for all partitions. The default value is 'partitioned_input_table'. For example, if input_table is 'time_series', then output_table is 'partitioned_time_series'.
SplitIDColumn Optional Specifies the name for the output table column that is to contain the split identifiers. The default value is 'split_id'. If the output table has another column named split_id_column, then the function returns an error. Therefore, if the output table has a column named 'split_id' (specified by Accumulate, PartitionByColumns, or Order_By_Columns), then you must use SplitIDColumn to specify a different split_id_column.
StatsTable Optional Specifies the name of table that the function creates to store the statistics for the splitting operation that it performs. The default value is 'stats_input_table'. For example, if input_table is 'time_series', then stats_table is 'stats_time_series'.
ReturnStatsTable Optional Specifies whether the function returns the data in stats_table in response to the command SELECT * FROM SeriesSplitter. The default value is 'true'. When this value is 'false', the function returns only the data in output_table.
OverwriteOutput Optional Specifies whether the function overwrites stats_table and output_table if they exist. The default value is 'false'.
ValuesBeforeFirst Optional If DuplicateRowsCount is nonzero and OrderByColumns is specified, then ValuesBeforeFirst specifies the values to be stored in the ordering columns that precede the first row of the first split in a partition as a result of duplicating rows across split boundaries.

If ValuesBeforeFirst specifies only one value and OrderByColumns specifies multiple ordering columns, then the specified value is stored in every ordering column.

If ValuesBeforeFirst specifies multiple values, then it must specify a value for each ordering column. The value and the ordering column must have the same data type. For the data type VARCHAR, the values are case-insensitive.

The default values for different data types are:
  • Numeric: -1
  • CHAR(n) or VARCHAR : '-1'
  • Date- or time-based: 1900-01-01 0:00:00
  • CHARACTER: '0'
  • Bit: 0
  • Boolean: 'false'
  • IP4 : 0.0.0.0
  • UUID: 0000-0000-0000-0000-0000-0000-0000-0000
ValuesAfterLast Optional If DuplicateRowsCount is nonzero and OrderByColumns is specified, then ValuesAfterLast specifies the values to be stored in the ordering columns that follow the last row of the last split in a partition as a result of duplicating rows across split boundaries.

If ValuesAfterLast specifies only one value and OrderByColumns specifies multiple ordering columns, then the specified value is stored in every ordering column.

If ValuesAfterLast specifies multiple values, then it must specify a value for each ordering column. The value and the ordering column must have the same data type. For the data type VARCHAR, the values are case-insensitive.

The default value is NULL.

DuplicateColumn Optional If you specify this argument, the output table has a column that indicates whether a row is duplicated from the neighboring split. If the row is duplicated, the column contains 1; otherwise it contains 0. The output column name is duplicate_column.
PartialSplitID Optional Specifies whether split_id_column contains only the numeric split identifier. The default value is 'false'.

If the value is 'true', then split_id_column contains a numeric representation of the split identifier that is unique for each partition. To distribute the output table by split, use a combination of all partitioning columns and split_id_column.

If the value is 'false', then split_id_column contains a string representation of the split that is unique across all partitions. The function generates the string representation by concatenating the partitioning columns with the order of the split inside the partition (the numeric representation). In the string representation, hyphens separate partitioning column names from each other and from the order. For example, 'pcol1-pcol2-3'.