Text Analysis - Teradata Vantage - Short descriptions of text analysis functions, with links to their documentation

Machine Learning Engine Analytic Function Reference

Product: Teradata Vantage
Release Number: 8.00; 1.0
Published: May 2019
Language: English (United States)
Last Update: 2019-11-22
dita:mapPath: blj1506016597986.ditamap
dita:ditavalPath: blj1506016597986.ditaval
dita:id: B700-4003
lifecycle: previous
Product Category: Teradata Vantage™

Function	Description
Latent Dirichlet Allocation (LDA) Functions	Build a topic model based on the supplied training data and parameters, estimate the topic distribution for each document based on the model, and display information from the model.
LevenshteinDistance	Computes the Levenshtein distance between two text values, that is, the number of edits needed to transform one string into the other, where edits include insertions, deletions, or substitutions of individual characters.
Naive Bayes Text Classifier Functions	Uses the Naive Bayes algorithm to classify data objects.
Named Entity Recognition (NER) Functions	Use named entity recognition (NER) to extract features (such as person, location, and organization) when training data models, using either the Conditional Random Fields (CRF) or Max Entropy model.
NGrams	Tokenizes (splits) an input stream and emits n multigrams, based on specified delimiter and reset parameters. Useful for sentiment analysis, topic identification, and document classification.
POSTagger	Tags the parts-of-speech of input text.
SentenceExtractor	Extracts the sentences in the input paragraphs.
Sentiment Extraction Functions	Deduce user opinion (positive, negative, or neutral) from text.
Text Classifier Functions	Chooses the correct class label for given text.
TextChunker	Divides text into phrases and assigns each phrase a tag identifying its type.
TextMorph	Provides lemmatization, a basic tool in text analysis. Outputs a standard form of the input words.
TextParser	Tokenizes a stream of words, optionally stems them, and outputs the individual words and their counts.
TextTagger	Tags input tuples according to user-defined rules that use logical and text processing operators.
TextTokenizer	Extracts tokens (for example, words, punctuation marks, and numbers) from text.
TFIDF	Evaluates the importance of a word within a specific document, weighted by the number of times the word appears in the entire document set.