Arguments - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product: Aster Analytics
Release Number: 6.21
Published: November 2016
Language: English (United States)
Last Update: 2018-04-14
dita:mapPath: kiu1466024880662.ditamap
dita:ditavalPath: AA-notempfilter_pdf_output.ditaval
dita:id: B700-1021
lifecycle: previous
Product Category: Software

Argument	Category	Description
TextColumn	Required	Specifies the name of the input column whose contents are to be tokenized.
ToLowerCase	Optional	Specifies whether to convert input text to lowercase. The default value is 'true'. The function ignores this argument if the Stemming argument has the value 'true'.
Stemming	Optional	Specifies whether to stem the tokens—that is, whether to apply the Porter2 stemming algorithm to each token to reduce it to its root form. Before stemming, the function converts the input text to lowercase and applies the RemoveStopWords argument. The default value is 'false'.
Delimiter	Optional	Specifies a regular expression that represents the word delimiter. The default value is '[\t\b\f\r]+').
TotalWordsNum	Optional	Specifies whether to output a column that contains the total number of words in the input document. The default value is 'false'.
Punctuation	Optional	Specifies a regular expression that represents the punctuation characters to remove from the input text. With Stemming ('true'), the recommended value is '[\\\[.,?\!:;~()\\\]]+'. The default value is '[.,!?]'.
Accumulate	Optional	Specifies the names of the input columns to copy to the output table. By default, the function copies all input columns to the output table. No accumulate_column can be the same as token_column or total_column.
TokenColumn	Optional	Specifies the name of the output column that contains the tokens. The default value is 'token'.
FrequencyColumn	Optional	Specifies the name of the output column that contains the frequency of each token. The default value is 'frequency'. The function ignores this argument if the OutputByWord argument has the value 'false'.
TotalColumn	Optional	Specifies the name of the output column that contains the total number of words in the input document. The default value is 'total_count'.
RemoveStopWords	Optional	Specifies whether to remove stop words from the input text before parsing. The default value is 'false'.
PositionColumn	Optional	Specifies the name of the output column that contains the position of a word within a document. The default value is 'position'.
ListPositions	Optional	Specifies whether to output the position of a word in list form. The default value is 'false', which causes the function to output a row for each occurrence of the word. The function ignores this argument if the OutputByWord argument has the value 'false'.
OutputByWord	Optional	Specifies whether to output each token of each input document in its own row in the output table. The default value is 'true'. If you specify 'false', then the function outputs each tokenized input document in one row of the output table.
StemmingExceptions	Optional	Specifies the location of the file that contains the stemming exceptions. A stemming exception is a word followed by its stemmed form. The word and its stemmed form are separated by white space. Each stemming exception is on its own line in the file. For example: bias bias news news goods goods lying lie ugly ugli sky sky early earli The words 'lying', 'ugly', and 'early' are to become 'lie', 'ugli', and 'earli', respectively. The other words are not to change.
StopWords	Optional	Specifies the location of the file that contains the stop words (words to ignore when parsing text). Each stop word is on its own line in the file. For example: a an the and this with but will