Background - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product

Aster Analytics

Release Number

7.00.02

Published

September 2017

Language

English (United States)

Last Update

2018-04-17

dita:mapPath

uce1497542673292.ditamap

dita:ditavalPath

AA-notempfilter_pdf_output.ditaval

dita:id

B700-1022

lifecycle

Product Category

Software

Parsing English language text includes:

Punctuating sentences
Breaking a sentence into words (tokenizing it)
Removing stop words
Stemming words (reducing them to their root forms)

The Text_Parser function reads a document into a memory buffer and creates a hash table. The dictionary for the document must not exceed available memory; however, a million-word dictionary with an average word length of ten bytes requires only 10 MB of memory.

The Text_Parser function uses Porter2 as the stemming algorithm.

For general information about tokenization, see:

http://en.wikipedia.org/wiki/Lexical_analysis#Tokenizer

For general information about stemming, see:

http://en.wikipedia.org/wiki/Stemming