Arguments - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product
Aster Analytics
Release Number
6.21
Published
November 2016
Language
English (United States)
Last Update
2018-04-14
dita:mapPath
kiu1466024880662.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1021
lifecycle
previous
Product Category
Software
Argument Category Description
TextColumn Required The name of the column that contains the input text. Input columns must contain string SQL types.
Delimiter Optional A regular expression that specifies the character or string that separates words in the input text. The default value is the space character (' ').
Grams Required A list of integers or ranges of integers that specify the length, in words, of each n-gram (that is, the value of n). A range_of_values has the syntax integer1-integer2, where integer1 <= integer2. The values of n, integer1, and integer2 must be positive.
OverLapping Optional A Boolean value that specifies whether the function allows overlapping n-grams. When this value is 'true' (the default), each word in each sentence starts an n-gram, if enough words follow it (in the same sentence) to form a whole n-gram of the specified size. For information on sentences, see the description of the Reset argument.
ToLowerCase Optional A Boolean value that specifies whether the function converts all letters in the input text to lowercase. The default value is 'true'.
Punctuation Optional A regular expression that specifies the punctuation characters for the function to remove before evaluating the input text. The default characters to remove are: `~#^&*()-
Reset Optional A regular expression that specifies the character or string that ends a sentence. The default sentence-ending characters are: .,?!

At the end of a sentence, the function discards any partial n-grams and searches for the next n-gram at the beginning of the next sentence. An n-gram cannot span two sentences.

TotalGramCount Optional A Boolean value that specifies whether the function returns the total number of n-grams in the document (that is, in the row). The default value is 'false'. If you specify 'true', then the name of the returned column is specified by the Total_Count_Column_Name argument.
The total number of n-grams is not necessarily the number of unique n-grams.
TotalCountColumn Optional The name of the column to return if the value of the Total argument is 'true'. The default value is 'totalcnt'.
Accumulate Optional The names of the columns to return for each n-gram. These columns cannot have the same names as those specified by the arguments NGramColumn, NumGramsColum, and TotalCountColumn. By default, the function returns all input columns for each n-gram.
NGramColumn Optional The name of the column that is to contain the generated n-grams. The default value is 'ngram'.
NumGramsColum Optional The name of the column that is to contain the length of n-gram (in words). The default value is 'n'.
FrequencyColumn Optional The name of the column that is to contain the count of each unique n-gram (that is, the number of times that each unique n-gram appears in the document). The default value is 'frequency'.