1.1 - 8.10 - Text Analysis - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Teradata Vantage
Release Number
October 2019
Content Type
Programming Reference
Publication ID
English (United States)
Function Description
IdentityMatch (ML Engine) Tries to match enterprise customers with users records provided by external data sources.
LevenshteinDistance (ML Engine) Computes the Levenshtein distance between two text values, that is, the number of edits needed to transform one string into the other, where edits include insertions, deletions, or substitutions of individual characters.
NGramSplitter_MLE (ML Engine) Tokenizes (splits) an input stream and emits n multigrams, based on specified delimiter and reset parameters. Useful for sentiment analysis, topic identification, and document classification.
POSTagger (ML Engine) Tags the parts-of-speech of input text.
SentenceExtractor (ML Engine) Extracts the sentences in the input paragraphs.
StringSimilarity_MLE (ML Engine) Calculates the similarity between two strings, using either the Jaro, Jaro-Winkler, N-Gram, or Levenshtein distance.
TextChunker (ML Engine) Divides text into phrases and assigns each phrase a tag identifying its type.
TextMorph (ML Engine) Provides lemmatization, a basic tool in text analysis. Outputs a standard form of the input words.
TextParser (ML Engine) Tokenizes a stream of words, optionally stems them, and outputs the individual words and their counts.
TextTagger (ML Engine) Tags input tuples according to user-defined rules that use logical and text processing operators.
TextTokenizer (ML Engine) Extracts tokens (for example, words, punctuation marks, and numbers) from text.
TFIDF (ML Engine) Evaluates the importance of a word within a specific document, weighted by the number of times the word appears in the entire document set.
Fellegi-Sunter Functions (ML Engine) FellegiSunter estimates the parameters of the Fellegi-Sunter model, using either supervised or unsupervised learning. FellegiSunterPredict predicts whether a pair of objects are duplicates.
Latent Dirichlet Allocation (LDA) Functions (ML Engine) Build a topic model based on the supplied training data and parameters, estimate the topic distribution for each document based on the model, and display information from the model.
Naive Bayes Text Classifier Functions (ML Engine) Uses the Naive Bayes algorithm to classify data objects.
Named Entity Recognition (NER) Functions (ML Engine) Use named entity recognition (NER) to extract features (such as person, location, and organization) when training data models, using either the Conditional Random Fields (CRF) or Max Entropy model.
Sentiment Extraction Functions (ML Engine) Deduce user opinion (positive, negative, or neutral) from text.
Text Classifier Functions (ML Engine) Choose the correct class label for given text.