NGramSplitter Function | Teradata Vantage - NGramSplitter

NGramSplitter Function | Teradata Vantage - NGramSplitter - Teradata Vantage

Teradata® VantageCloud Lake

Deployment

VantageCloud

Edition

Lake

Product

Teradata Vantage

Published

January 2023

Language

English (United States)

Last Update

2024-04-03

dita:mapPath

phg1621910019905.ditamap

dita:ditavalPath

pny1626732985837.ditaval

dita:id

phg1621910019905

NGramSplitter considers each input row to be one document, and returns a row for each unique n-gram in each document. NGramSplitter also returns, for each document, the counts of each n-gram and the total number of n-grams.

NGramSplitter is an algorithm used in natural language processing to divide text into smaller units known as n-grams. An n-gram is a sequence of n items, such as words, letters or characters, taken from a given sample of text or speech. The NGramSplitter algorithm takes a string of text as input and returns a list of n-grams based on a specified value of n.

One potential limitation of the NGramSplitter algorithm is that it can produce a large number of n-grams, especially when n is large. This can result in a high-dimensional feature space that can negatively impact the performance of NLP models. To address this issue, various techniques have been developed to reduce the number of n-grams used in NLP tasks.

Function Information