NGramSplitter Function | Teradata Vantage - NGramSplitter - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

NGramSplitter considers each input row to be one document, and returns a row for each unique n-gram in each document. NGramSplitter also returns, for each document, the counts of each n-gram and the total number of n-grams.

NGramSplitter is an algorithm used in natural language processing to divide text into smaller units known as n-grams. An n-gram is a sequence of n items, such as words, letters or characters, taken from a given sample of text or speech. The NGramSplitter algorithm takes a string of text as input and returns a list of n-grams based on a specified value of n.

One potential limitation of the NGramSplitter algorithm is that it can produce a large number of n-grams, especially when n is large. This can result in a high-dimensional feature space that can negatively impact the performance of NLP models. To address this issue, various techniques have been developed to reduce the number of n-grams used in NLP tasks.