NGramSplitter_MLE Function | Teradata Vantage - NGramSplitter_MLE (ML Engine) - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
9.02
9.01
2.0
1.3
Published
February 2022
Language
English (United States)
Last Update
2022-02-10
dita:mapPath
rnn1580259159235.ditamap
dita:ditavalPath
ybt1582220416951.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

The NGramSplitter_MLE function tokenizes (splits) an input stream of text and outputs n multigrams (called n-grams) based on the specified delimiter and reset parameters. NGramSplitter_MLE provides more flexibility than standard tokenization when performing text analysis. Many two-word phrases carry important meaning (for example, "machine learning") that unigrams (single-word tokens) do not capture. This, combined with additional analytical techniques, can be useful for performing sentiment analysis, topic identification, and document classification.

NGramSplitter_MLE considers each input row to be one document, and returns a row for each unique n-gram in each document. NGramSplitter_MLE also returns, for each document, the counts of each n-gram and the total number of n-grams.