TD_Ngramsplitter Usage Notes | Teradata Vantage - TD_Ngramsplitter Usage Notes - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

TD_Ngramsplitter is a technique used in analytics to break down text data into smaller components called n-grams. An n-gram is a sequence of n words from a given text.

For example, a 2-gram (or bigram) of the sentence "The quick brown fox jumps over the lazy dog" would be "The quick", "quick brown", "brown fox", "fox jumps", "jumps over", "over the", "the lazy", and "lazy dog".

Use TD_Ngramsplitter in analytics for various purposes such as:
  • Text classification: By breaking down text into n-grams, you can create features that represent the context of the text, which can be used for text classification tasks such as sentiment analysis, spam detection, and topic modeling.
  • Language modeling: N-grams are used to build language models that predict the likelihood of a given sequence of words. For example, a trigram language model can predict the likelihood of the next word given the two previous words.
  • Information retrieval: N-grams are also used in information retrieval systems such as search engines to match queries with relevant documents. By breaking down documents into n-grams, you can efficiently index the documents and quickly retrieve relevant documents for a given query.