Summary - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product
Aster Analytics
Release Number
6.21
Published
November 2016
Language
English (United States)
Last Update
2018-04-14
dita:mapPath
kiu1466024880662.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1021
lifecycle
previous
Product Category
Software

The nGram function tokenizes (splits) an input stream of text and outputs n multigrams (called n -grams) based on the specified delimiter and reset parameters. nGram provides more flexibility than standard tokenization when performing text analysis. Many two-word phrases carry important meaning (for example, "machine learning") that unigrams (single-word tokens) do not capture. This, combined with additional analytical techniques, can be useful for performing sentiment analysis, topic identification, and document classification.

nGram considers each input row to be one document, and it returns a row for each unique n-gram in each document. nGram also returns, for each document, the counts of each n-gram and the total number of n-grams.