1.1 - 8.10 - TextMorph (ML Engine) - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.1
8.10
Release Date
October 2019
Content Type
Programming Reference
Publication ID
B700-4003-079K
Language
English (United States)

Lemmatization is a basic text analysis tool that determines the lemmas (standard forms) of words, so that all forms of a word can be grouped together, improving the accuracy of text analysis.

The TextMorph function implements a lemmatization algorithm based on the WordNet 3.0 dictionary, which is packaged with the function. If an input word is in the dictionary, the function outputs its morphs with their parts of speech; otherwise, the function outputs the input word itself and sets its part of speech to NULL.

When an input word has multiple morphs, the function outputs them in order of the precedence of their parts of speech: noun, verb, adj, and adv. That is, if an input word has a noun form, it is listed first. If the same word has a verb form, it is listed next, and so on.

Examples of Words and Their Standard Forms
Input Word Standard Forms
books book
ran run
better good, well