TextMorph (ML Engine) - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.10
1.1
Published
October 2019
Language
English (United States)
Last Update
2019-12-31
dita:mapPath
ima1540829771750.ditamap
dita:ditavalPath
jsj1481748799576.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

Lemmatization is a basic text analysis tool that determines the lemmas (standard forms) of words, so that all forms of a word can be grouped together, improving the accuracy of text analysis.

The TextMorph function implements a lemmatization algorithm based on the WordNet 3.0 dictionary, which is packaged with the function. If an input word is in the dictionary, the function outputs its morphs with their parts of speech; otherwise, the function outputs the input word itself and sets its part of speech to NULL.

When an input word has multiple morphs, the function outputs them in order of the precedence of their parts of speech: noun, verb, adj, and adv. That is, if an input word has a noun form, it is listed first. If the same word has a verb form, it is listed next, and so on.

Examples of Words and Their Standard Forms
Input Word Standard Forms
books book
ran run
better good, well