TD_Ngramsplitter Function | Teradata Vantage - TD_Ngramsplitter - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-10-04
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
lifecycle
latest
Product Category
Teradata Vantageā„¢

TD_Ngramsplitter considers each input row to be one document, and returns a row for each unique n-gram in each document. TD_Ngramsplitter also returns, for each document, the counts of each n-gram and the total number of n-grams.

TD_Ngramsplitter is an algorithm used in natural language processing to divide text into smaller units known as n-grams. An n-gram is a sequence of n items, such as words, letters or characters, taken from a given sample of text or speech. The TD_Ngramsplitter algorithm takes a string of text as input and returns a list of n-grams based on a specified value of n.

One potential limitation of the TD_Ngramsplitter algorithm is that it can produce a large number of n-grams, especially when n is large. This can result in a high-dimensional feature space that can negatively impact the performance of NLP models. To address this issue, various techniques have been developed to reduce the number of n-grams used in NLP tasks.