TextChunker - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

8.00

1.0

Published

May 2019

Language

English (United States)

Last Update

2019-11-22

dita:mapPath

blj1506016597986.ditamap

dita:ditavalPath

blj1506016597986.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

The TextChunker function divides text into phrases and assigns each phrase a tag that identifies its type.

Text chunking (also called shallow parsing) divides text into phrases in such a way that syntactically related words become members of the same phrase. Phrases do not overlap; that is, a word is a member of only one chunk.

For example, the sentence "He reckons the current account deficit will narrow to only # 1.8 billion in September ." can be divided as follows, with brackets delimiting phrases:

[NP He] [VP reckons] [NP the current account deficit] [VP will narrow] [PP to] [NP only # 1.8 billion] [PP in] [NP September]

After each opening bracket is a tag that identifies the chunk type (NP, VP, and so on). For information about chunk types, see TextChunker Output.

For more information about text chunking, see:

Erik F. Tjong Kim Sang and Sabine Buchholz, Introduction to the CoNLL-2000 Shared Task: Chunking. In: Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, 2000.
Fei Sha and Fernando Pereira, Shallow Parsing with Conditional Random Fields. [2003]

TextChunker uses files that are preinstalled on the ML Engine. For details, see Preinstalled Files That Functions Use.