NGramSplitter Usage Notes | Teradata Vantage - NGramSplitter Usage Notes - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-04-06
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantageā„¢

NGramSplitter is a technique used in analytics to break down text data into smaller components called n-grams. An n-gram is a sequence of n words from a given text.

For example, a 2-gram (or bigram) of the sentence "The quick brown fox jumps over the lazy dog" would be "The quick", "quick brown", "brown fox", "fox jumps", "jumps over", "over the", "the lazy", and "lazy dog".

NGramSplitter is used in analytics for various purposes such as:
  • Text classification: By breaking down text into n-grams, you can create features that represent the context of the text, which can be used for text classification tasks such as sentiment analysis, spam detection, and topic modeling.
  • Language modeling: N-grams are used to build language models that predict the likelihood of a given sequence of words. For example, a trigram language model can predict the likelihood of the next word given the two previous words.
  • Information retrieval: N-grams are also used in information retrieval systems such as search engines to match queries with relevant documents. By breaking down documents into n-grams, you can efficiently index the documents and quickly retrieve relevant documents for a given query.