NGramSplitter_MLE Function | Teradata Vantage - NGramSplitter_MLE (ML Engine)

NGramSplitter_MLE Function | Teradata Vantage - NGramSplitter_MLE (ML Engine) - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

9.02

9.01

2.0

1.3

Published

February 2022

Language

English (United States)

Last Update

2022-02-10

dita:mapPath

rnn1580259159235.ditamap

dita:ditavalPath

ybt1582220416951.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

The NGramSplitter_MLE function tokenizes (splits) an input stream of text and outputs n multigrams (called n-grams) based on the specified delimiter and reset parameters. NGramSplitter_MLE provides more flexibility than standard tokenization when performing text analysis. Many two-word phrases carry important meaning (for example, "machine learning") that unigrams (single-word tokens) do not capture. This, combined with additional analytical techniques, can be useful for performing sentiment analysis, topic identification, and document classification.

NGramSplitter_MLE considers each input row to be one document, and returns a row for each unique n-gram in each document. NGramSplitter_MLE also returns, for each document, the counts of each n-gram and the total number of n-grams.