1.1 - 8.10 - TextChunker (ML Engine) - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.1
8.10
Published
October 2019
Content Type
Programming Reference
Publication ID
B700-4003-079K
Language
English (United States)
Last Update
2019-12-31

The TextChunker function divides text into phrases and assigns each phrase a tag that identifies its type.


How Machine Learning Engine function TextChunker works

Text chunking (also called shallow parsing) divides text into phrases in such a way that syntactically related words become members of the same phrase. Phrases do not overlap; that is, a word is a member of only one chunk.

For example, the sentence "He reckons the current account deficit will narrow to only # 1.8 billion in September ." can be divided as follows, with brackets delimiting phrases:

[NP He] [VP reckons] [NP the current account deficit] [VP will narrow] [PP to] [NP only # 1.8 billion] [PP in] [NP September]

After each opening bracket is a tag that identifies the chunk type (NP, VP, and so on). For information about chunk types, see TextChunker Output.

For more information about text chunking, see:
  • Erik F. Tjong Kim Sang and Sabine Buchholz, Introduction to the CoNLL-2000 Shared Task: Chunking. In: Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, 2000.
  • Fei Sha and Fernando Pereira, Shallow Parsing with Conditional Random Fields. [2003]

TextChunker uses files that are preinstalled on ML Engine. For details, see Preinstalled Files That Functions Use.