Statistical Analysis - Teradata Vantage - Short descriptions of statistical analysis functions, with links to their documentation

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢
Function Description
Approximate Cardinality Computes the approximate global distinct count of the values in one or more columns, scanning the table only once. Counts all children for a specified parent.
Approximate Percentile Computes approximate percentiles for one or more columns, with specified accuracy.
ConfusionMatrix Shows how often a classification algorithm correctly classifies items.
Correlation Computes the global correlation between any pair of table columns.
Cox Functions Cox proportional hazards model functions.
CrossValidation Validates a model by assessing how the results of a statistical analysis generalize to an independent data set.
Distribution Matching Uses hypothesis testing to find the best matching distribution for data.
FMeasure Calculates the accuracy of a test.
Generalized Linear Model Functions Perform linear regression analysis for distribution functions using a user-specified distribution family and link function.
Hidden Markov Model Functions Describes the evolution of observable events that depend on factors not directly observable.
Histogram Calculates the frequency distribution of a data set using sophisticated binning techniques that can automatically calculate the bin width and number of bins. The function maps each input row to one bin and returns the frequency (row count) and proportion (percentage of rows) of each bin.
KNN Uses the kNN algorithm to classify new objects based on their proximity to already-classified objects.
LAR Functions Selects the most important variables one by one and fit the coefficients dynamically.
Linear Regression Functions Create and use linear regression model.
LikelihoodRatioTest Performs the likelihood ratio test for two GLM models.
Moving Average Functions Compute average values in a series.
Percentiles Finds percentiles on a per group basis.
Principal Component Analysis (PCA) Functions Common unsupervised learning technique useful for both exploratory data analysis and dimensionality reduction, often used as the core procedure for factor analysis.
RandomSample Takes a data set and uses a specified sampling method to output one or more random samples, each with a specified size.
Receiver Operating Characteristic (ROC) Takes a set of prediction-actual pairs for a binary classifier and calculates the TPR, FPR, AUC, and Gini coefficient for a range of thresholds.
Sampling Draws rows randomly from input, using either of two sampling schemes.
Shapley Value Functions Computes the Shapley value, typically from nPath function output. The Shapley value is intended to reflect the importance of each player to the coalition in a cooperative game (a game between coalitions of players, rather than between individual players).
Support Vector Machine (SVM) Functions Uses a popular classification algorithm to build a predictive model according to a training set, give a prediction for each sample in the test set, and display the readable information of the model.
VectorDistance Measures the distance between sparse vectors (for example, TF-IDF vectors) in a pairwise manner.
UnivariateStatistics Calculates descriptive statistics for a set of target columns.
VWAP Computes the volume-weighted average price of a traded item (usually an equity share) over a specified time interval.