TextParser Function | Teradata Vantage - TextParser (ML Engine)

TextParser Function | Teradata Vantage - TextParser (ML Engine) - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

9.02

9.01

2.0

1.3

Published

February 2022

Language

English (United States)

Last Update

2022-02-10

dita:mapPath

rnn1580259159235.ditamap

dita:ditavalPath

ybt1582220416951.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

The TextParser function tokenizes an input stream of words, optionally stems them (reduces them to their root forms), and then outputs them. The function can either output all words in one row or output each word in its own row with (optionally) the number of times that the word appears.

The TextParser function uses Porter2 as the stemming algorithm.

The TextParser function reads a document into a memory buffer and creates a hash table. The dictionary for the document must not exceed available memory; however, a million-word dictionary with an average word length of ten bytes requires only 10 MB of memory.

TextParser uses files that are preinstalled on ML Engine. For details, see Preinstalled Files That Functions Use.