TD_TextParser Function | TextParser | Teradata Vantage - TD_TextParser - Analytics Database

Database Analytic Functions

Analytics Database
Release Number
June 2022
English (United States)
Last Update
Product Category
Teradata Vantageā„¢

A text parser, also known as a text tokenizer, is a software component that breaks a text into its constituent parts, such as words, phrases, sentences, or other meaningful units. Text parsing is an important technique in natural language processing (NLP) and is used in a wide range of applications, from search engines and chatbots to email filters and data analysis tools.

In text analytics, a text parser is often used as the first step in processing text data to extract useful insights. By breaking the text into smaller units, a parser makes it easier to analyze the text and identify patterns, trends, and relationships among the data.

Text parsers can be simple or complex, depending on the type of text data being processed and the level of detail required for analysis. For example, a basic text parser might split a sentence into individual words, while a more advanced parser might recognize parts of speech, identify named entities, or recognize patterns in the text that suggest a particular sentiment or tone.

By breaking text into its constituent parts and analyzing its structure, text parsers enable a variety of tasks, from information extraction and sentiment analysis to machine translation and chatbot dialog generation. Overall, text parser function is a powerful tool for extracting structured information from unstructured or semi-structured text data, making it easier for analysts and data scientists to work with large amounts of text data and gain insights from it.

The TD_TextParser performs the following operations:
  • Tokenizes the text in the specified column
  • Removes the punctuations from the text and converts the text to lowercase
  • Removes stop words from the text and converts the text to their root forms
  • Creates a row for each word in the output table
  • Performs stemming; that is, the function identifies the common root form of a word by removing or replacing word suffixes
The stems resulting from stemming may not be actual words. For example, the stem for 'communicate' is 'commun' and the stem for 'early' is 'earli' (trailing 'y' is replaced by 'i').