1.1 - 8.10 - Statistical Analysis - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Teradata Vantage
Release Number
October 2019
Content Type
Programming Reference
Publication ID
English (United States)
Function Description
Approximate Cardinality (ML Engine) Computes the approximate global distinct count of the values in one or more columns, scanning the table only once. Counts all children for a specified parent.
Approximate Percentile (ML Engine) Computes approximate percentiles for one or more columns, with specified accuracy.
ConfusionMatrix (ML Engine) Shows how often a classification algorithm correctly classifies items.
Correlation (ML Engine) Computes the global correlation between any pair of table columns.
CrossValidation (ML Engine) Validates a model by assessing how the results of a statistical analysis generalize to an independent data set.
Distribution Matching (ML Engine) Uses hypothesis testing to find the best matching distribution for data.
FMeasure (ML Engine) Calculates the accuracy of a test.
Histogram (ML Engine) Calculates the frequency distribution of a data set using sophisticated binning techniques that can automatically calculate the bin width and number of bins. The function maps each input row to one bin and returns the frequency (row count) and proportion (percentage of rows) of each bin.
LikelihoodRatioTest (ML Engine) Performs the likelihood ratio test for two GLM models.
Percentiles (ML Engine) Finds percentiles on a per group basis.
Receiver Operating Characteristic (ROC) (ML Engine) Takes a set of prediction-actual pairs for a binary classifier and calculates the TPR, FPR, AUC, and Gini coefficient for a range of thresholds.
VectorDistance (ML Engine) Measures the distance between sparse vectors (for example, TF-IDF vectors) in a pairwise manner.
UnivariateStatistics (ML Engine) Calculates descriptive statistics for a set of target columns.
Principal Component Analysis (PCA) Functions (ML Engine) Common unsupervised learning technique useful for both exploratory data analysis and dimensionality reduction.