NGramSplitter_MLE (ML Engine) - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.10
1.1
Published
October 2019
Language
English (United States)
Last Update
2019-12-31
dita:mapPath
ima1540829771750.ditamap
dita:ditavalPath
jsj1481748799576.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

The NGramSplitter_MLE function tokenizes (splits) an input stream of text and outputs n multigrams (called n -grams) based on the specified delimiter and reset parameters. NGramSplitter_MLE provides more flexibility than standard tokenization when performing text analysis. Many two-word phrases carry important meaning (for example, "machine learning") that unigrams (single-word tokens) do not capture. This, combined with additional analytical techniques, can be useful for performing sentiment analysis, topic identification, and document classification.

NGramSplitter_MLE considers each input row to be one document, and returns a row for each unique n-gram in each document. NGramSplitter_MLE also returns, for each document, the counts of each n-gram and the total number of n-grams.