NGramSplitter Usage Notes | Teradata Vantage - NGramSplitter Usage Notes

NGramSplitter Usage Notes | Teradata Vantage - NGramSplitter Usage Notes - Analytics Database

Database Analytic Functions

Deployment

VantageCloud

VantageCore

Edition

Enterprise

IntelliFlex

VMware

Product

Analytics Database

Release Number

17.20

Published

June 2022

Language

English (United States)

Last Update

2024-04-06

dita:mapPath

gjn1627595495337.ditamap

dita:ditavalPath

ayr1485454803741.ditaval

dita:id

jmh1512506877710

Product Category

Teradata Vantage™

NGramSplitter is a technique used in analytics to break down text data into smaller components called n-grams. An n-gram is a sequence of n words from a given text.

For example, a 2-gram (or bigram) of the sentence "The quick brown fox jumps over the lazy dog" would be "The quick", "quick brown", "brown fox", "fox jumps", "jumps over", "over the", "the lazy", and "lazy dog".

NGramSplitter is used in analytics for various purposes such as:

Text classification: By breaking down text into n-grams, you can create features that represent the context of the text, which can be used for text classification tasks such as sentiment analysis, spam detection, and topic modeling.
Language modeling: N-grams are used to build language models that predict the likelihood of a given sequence of words. For example, a trigram language model can predict the likelihood of the next word given the two previous words.
Information retrieval: N-grams are also used in information retrieval systems such as search engines to match queries with relevant documents. By breaking down documents into n-grams, you can efficiently index the documents and quickly retrieve relevant documents for a given query.