7.00.02 - nGram Arguments - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product
Aster Analytics
Release Number
7.00.02
Release Date
September 2017
Content Type
Programming Reference
User Guide
Publication ID
B700-1022-700K
Language
English (United States)
TextColumn
Specifies the name of the column that contains the input text. This column must have a SQL string data type.
Delimiter
[Optional] Specifies, with a regular expression, the character or string that separates words in the input text. Default: ' ' (space).
Grams
Specifies the length, in words, of each n-gram (that is, the value of n). A value_range has the syntax integer1-integer2, where integer1 <= integer2. The values of n, integer1, and integer2 must be positive.
OverLapping
[Optional] Specifies whether the function allows overlapping n-grams. Default: 'true'; that is: Each word in each sentence starts an n-gram, if enough words follow it in the same sentence to form a whole n-gram of the specified size. For information on sentences, see the Reset argument description.
ToLowerCase
[Optional] Specifies whether the function converts all letters in the input text to lowercase. Default: 'true'.
Punctuation
[Optional] Specifies, with a regular expression, the punctuation characters for the function to remove before evaluating the input text. Default: '`~#^&*()-'
Reset
[Optional] Specifies, with a regular expression, the character or string that ends a sentence. Default: '.,?!'

At the end of a sentence, the function discards any partial n-grams and searches for the next n-gram at the beginning of the next sentence. An n-gram cannot span sentences.

TotalGramCount
[Optional] Specifies whether the function returns the total number of n-grams in the document (that is, in the row). Default: 'false'. If you specify 'true', the Total_Count_Column_Name argument determines the name of the output table column that contains these totals.
The total number of n-grams is not necessarily the number of unique n-grams.
TotalCountColumn
[Optional] Specifies the name of the output table column that appears if the value of the Total argument is 'true'. Default: 'totalcnt'.
Accumulate
[Optional] Specifies the names of the input table columns to copy to the output table for each n-gram. These columns cannot have the same names as those specified by the arguments NGramColumn, NumGramsColum, and TotalCountColumn. Default: All input columns for each n-gram.
NGramColumn
[Optional] Specifies the name of the output table column that is to contain the generated n-grams. Default: 'ngram'.
NumGramsColum
[Optional] Specifies the name of the output table column that is to contain the length of n-gram (in words). Default: 'n'.
FrequencyColumn
[Optional] Specifies the name of the output table column that is to contain the count of each unique n-gram (that is, the number of times that each unique n-gram appears in the document). Default: 'frequency'.