1.0 - 8.00 - Text Analysis - Teradata Vantage

Teradata® Vantage Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.0
8.00
Release Date
May 2019
Content Type
Programming Reference
Publication ID
B700-4003-098K
Language
English (United States)
Function Description
Latent Dirichlet Allocation (LDA) Functions Build a topic model based on the supplied training data and parameters, estimate the topic distribution for each document based on the model, and display information from the model.
LevenshteinDistance Computes the Levenshtein distance between two text values, that is, the number of edits needed to transform one string into the other, where edits include insertions, deletions, or substitutions of individual characters.
Naive Bayes Text Classifier Functions Uses the Naive Bayes algorithm to classify data objects.
Named Entity Recognition (NER) Functions Use named entity recognition (NER) to extract features (such as person, location, and organization) when training data models, using either the Conditional Random Fields (CRF) or Max Entropy model.
NGrams Tokenizes (splits) an input stream and emits n multigrams, based on specified delimiter and reset parameters. Useful for sentiment analysis, topic identification, and document classification.
POSTagger Tags the parts-of-speech of input text.
SentenceExtractor Extracts the sentences in the input paragraphs.
Sentiment Extraction Functions Deduce user opinion (positive, negative, or neutral) from text.
Text Classifier Functions Chooses the correct class label for given text.
TextChunker Divides text into phrases and assigns each phrase a tag identifying its type.
TextMorph Provides lemmatization, a basic tool in text analysis. Outputs a standard form of the input words.
TextParser Tokenizes a stream of words, optionally stems them, and outputs the individual words and their counts.
TextTagger Tags input tuples according to user-defined rules that use logical and text processing operators.
TextTokenizer Extracts tokens (for example, words, punctuation marks, and numbers) from text.
TFIDF Evaluates the importance of a word within a specific document, weighted by the number of times the word appears in the entire document set.