NGramSplitter Function | Teradata Vantage - NGramSplitter - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-04-06
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantageā„¢

NGramSplitter considers each input row to be one document, and returns a row for each unique n-gram in each document. NGramSplitter also returns, for each document, the counts of each n-gram and the total number of n-grams.

NGramSplitter is an algorithm used in natural language processing to divide text into smaller units known as n-grams. An n-gram is a sequence of n items, such as words, letters or characters, taken from a given sample of text or speech. The NGramSplitter algorithm takes a string of text as input and returns a list of n-grams based on a specified value of n.

One potential limitation of the NGramSplitter algorithm is that it can produce a large number of n-grams, especially when n is large. This can result in a high-dimensional feature space that can negatively impact the performance of NLP models. To address this issue, various techniques have been developed to reduce the number of n-grams used in NLP tasks.