Argument | Category | Description |
---|---|---|
TextColumn | Required | The name of the column that contains the input text. Input columns must contain string SQL types. |
Delimiter | Optional | A regular expression that specifies the character or string that separates words in the input text. The default value is the space character (' '). |
Grams | Required | A list of integers or ranges of integers that specify the length, in words, of each n-gram (that is, the value of n). A range_of_values has the syntax integer1-integer2, where integer1 <= integer2. The values of n, integer1, and integer2 must be positive. |
OverLapping | Optional | A Boolean value that specifies whether the function allows overlapping n-grams. When this value is 'true' (the default), each word in each sentence starts an n-gram, if enough words follow it (in the same sentence) to form a whole n-gram of the specified size. For information on sentences, see the description of the Reset argument. |
ToLowerCase | Optional | A Boolean value that specifies whether the function converts all letters in the input text to lowercase. The default value is 'true'. |
Punctuation | Optional | A regular expression that specifies the punctuation characters for the function to remove before evaluating the input text. The default characters to remove are: `~#^&*()- |
Reset | Optional | A regular expression that specifies the character or string that ends a sentence. The default sentence-ending characters are: .,?!
At the end of a sentence, the function discards any partial n-grams and searches for the next n-gram at the beginning of the next sentence. An n-gram cannot span two sentences. |
TotalGramCount | Optional | A Boolean value that specifies whether the function returns the total number of n-grams in the document (that is, in the row). The default value is 'false'. If you specify 'true', then the name of the returned column is specified by the Total_Count_Column_Name argument. The total number of n-grams is not necessarily the number of unique n-grams.
|
TotalCountColumn | Optional | The name of the column to return if the value of the Total argument is 'true'. The default value is 'totalcnt'. |
Accumulate | Optional | The names of the columns to return for each n-gram. These columns cannot have the same names as those specified by the arguments NGramColumn, NumGramsColum, and TotalCountColumn. By default, the function returns all input columns for each n-gram. |
NGramColumn | Optional | The name of the column that is to contain the generated n-grams. The default value is 'ngram'. |
NumGramsColum | Optional | The name of the column that is to contain the length of n-gram (in words). The default value is 'n'. |
FrequencyColumn | Optional | The name of the column that is to contain the count of each unique n-gram (that is, the number of times that each unique n-gram appears in the document). The default value is 'frequency'. |