NGramSplitter_MLE (ML Engine)

NGramSplitter_MLE (ML Engine) - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

8.10

1.1

Published

October 2019

Language

English (United States)

Last Update

2019-12-31

dita:mapPath

ima1540829771750.ditamap

dita:ditavalPath

jsj1481748799576.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

The NGramSplitter_MLE function tokenizes (splits) an input stream of text and outputs n multigrams (called n -grams) based on the specified delimiter and reset parameters. NGramSplitter_MLE provides more flexibility than standard tokenization when performing text analysis. Many two-word phrases carry important meaning (for example, "machine learning") that unigrams (single-word tokens) do not capture. This, combined with additional analytical techniques, can be useful for performing sentiment analysis, topic identification, and document classification.

NGramSplitter_MLE considers each input row to be one document, and returns a row for each unique n-gram in each document. NGramSplitter_MLE also returns, for each document, the counts of each n-gram and the total number of n-grams.