Optional Syntax Elements for TD_TextParser - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
ft:locale
en-US
ft:lastEdition
2025-01-20
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantage™
ON clause
Accept the StopWordsTable clause.
ConvertToLowerCase
Convert the text in the input table column name to lowercase.

When StemTokens is set to 'true', TD_TextParser behaves as if ConverToLoweCase had the value 'true' regardless of the actual value.

Default value: true
StemTokens
Convert the text in the input table column name to their root forms.
Default value: false
Delimiter
Specify single-character delimiter values to apply to the text in the specified column in the TextColumn element.
Default values: ' \t\n\f\r'
DelimiterRegex
Specifies a PCRE regular expression that represents the token delimiter.
No default value, when used, the user must provide a valid PCRE regex.
RemoveStopWords
Specify the value, true to remove the stop words before parsing the text in the specified column in the TextColumn element.
Default value: false
Punctuation
Specify the punctuation characters that you want to replace in the text of the specified column in the TextColumn element with space.
Default values: ‘!#$%&()*+,-./:;?@\^_`{|}~’
TokenColName
Specify a name for the output column that contains the individual words from the text of the specified column in the TextColumn element.
Default value: token
Accumulate
Specify the input table column names to copy to the output table.
DocIdColumn
Specify the column name containing the unique identifier of input rows.

If ListPositions is ‘true' and/or TokenFrequency is 'true' then DocIdColumn is required only if OutputByWord is 'true’.

ListPositions
Specify whether to output a list of comma separated positions for each occurrence of a token. The list is arranged in ascending order. The function ignores this argument if OutputByWord has the value 'false'.

Default value: false. The function outputs a row for each occurrence of the word.

TokenFrequency
Specify whether to output a count of the total of occurrences for each token.

TD_TextParser ignores this argument if OutputByWord has the value 'false'.

Default value: false.

OutputByWord
Specifies whether to output all tokens in a single cell ('false') or each token in a separate row ('true').

Default value: true.