Background - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product
Aster Analytics
Release Number
6.21
Published
November 2016
Language
English (United States)
Last Update
2018-04-14
dita:mapPath
kiu1466024880662.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1021
lifecycle
previous
Product Category
Software

Parsing English language text includes:

  • Punctuating sentences
  • Breaking a sentence into words (tokenizing it)
  • Removing stop words
  • Stemming words (reducing them to their root forms)

The Text_Parser function reads a document into a memory buffer and creates a hash table. The dictionary for the document must not exceed available memory; however, a million-word dictionary with an average word length of ten bytes requires only 10 MB of memory.

The Text_Parser function uses Porter2 as the stemming algorithm.

For general information about tokenization, see:

http://en.wikipedia.org/wiki/Lexical_analysis#Tokenizer

For general information about stemming, see:

http://en.wikipedia.org/wiki/Stemming