- ON clause
- Accept the StopWordsTable clause.
- ConvertToLowerCase
- Convert the text in the input table column name to lowercase.
When StemTokens is set to 'true', TD_TextParser behaves as if ConverToLoweCase had the value 'true' regardless of the actual value.
- StemTokens
- Convert the text in the input table column name to their root forms.
- Delimiter
- Specify single-character delimiter values to apply to the text in the specified column in the TextColumn element.
- DelimiterRegex
- Specifies a PCRE regular expression that represents the token delimiter.
- RemoveStopWords
- Specify the value, true to remove the stop words before parsing the text in the specified column in the TextColumn element.
- Punctuation
- Specify the punctuation characters that you want to replace in the text of the specified column in the TextColumn element with space.
- TokenColName
- Specify a name for the output column that contains the individual words from the text of the specified column in the TextColumn element.
- Accumulate
- Specify the input table column names to copy to the output table.
- DocIdColumn
- Specify the column name containing the unique identifier of input rows.
If ListPositions is ‘true' and/or TokenFrequency is 'true' then DocIdColumn is required only if OutputByWord is 'true’.
- ListPositions
- Specify whether to output a list of comma separated positions for each occurrence of a token. The list is arranged in ascending order. The function ignores this argument if OutputByWord has the value 'false'.
Default value: false. The function outputs a row for each occurrence of the word.
- TokenFrequency
- Specify whether to output a count of the total of occurrences for each token.
TD_TextParser ignores this argument if OutputByWord has the value 'false'.
Default value: false.
- OutputByWord
- Specifies whether to output all tokens in a single cell ('false') or each token in a separate row ('true').
Default value: true.