Statistical Analysis - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product

Aster Analytics

Release Number

6.21

Published

November 2016

Language

English (United States)

Last Update

2018-04-14

dita:mapPath

kiu1466024880662.ditamap

dita:ditavalPath

AA-notempfilter_pdf_output.ditaval

dita:id

B700-1021

lifecycle

Product Category

Software

Statistical Analysis Functions
Function	Description
Approximate Distinct Count	Computes the approximate global distinct count of the values in one or more columns, scanning the table only once. Counts all children for a specified parent.
Approximate Percentile	Computes approximate percentiles for one or more columns, with specified accuracy.
CMAVG	Computes the cumulative moving average—the average of a value from the beginning of a series.
ConfusionMatrix	Shows how often a classification algorithm correctly classifies items.
Correlation	Computes the global correlation between any pair of table columns.
CoxPH	Estimates coefficients of a Cox proportional hazards model by learning a set of explanatory variables. Generates coefficient and linear prediction tables.
CoxPredict	Takes the coefficient table generated by the CoxPH function and outputs the hazard ratios between predict features and either their corresponding reference features or their unit differences.
CoxSurvFit	Takes the coefficient and linear prediction tables generated by the CoxPH function and outputs a table of survival probabilities.
CrossValidation	Validates a model by assessing how the results of a statistical analysis will generalize to an independent data set.
Distribution Matching	Uses hypothesis testing to find the best matching distribution for data.
EMAVG	Computes the average over a number of points in a time series while applying an exponentially decaying damping (weighting) factor to older values so that more recent values are given a heavier weight in the calculation.
FMeasure	Calculates the accuracy of a test.
GLM	Performs linear regression analysis for any of a number of distribution functions, using a user-specified distribution family and link function.
GLMPredict	Uses the model generated by the Stats GLM function to make predictions for new data.
Hidden Markov Model Functions	Describe the evolution of observable events that depend on factors that are not directly observable. The Hidden Markov Model functions are HMMUnsupervisedLearner, HMMSupervisedLearner, HMMEvaluator, and HMMDecoder.
Histogram	Calculates the frequency distribution of a dataset using sophisticated binning techniques that can automatically calculate the bin width and number of bins. The function maps each input row to one bin and returns the frequency (row count) and proportion (percentage of rows) of each bin.
KNN	Uses the kNN algorithm to classify new objects based on their proximity to already-classified objects.
LARS Functions	Select the most important variables one by one and fit the coefficients dynamically. The LARS functions are LARS and LARSPredict.
Linear Regression	Output the coefficients of the linear regression model represented by the input matrices.
LRTEST	Performs the likelihood ratio test for two GLM models.
Percentile	Finds percentiles on a per group basis.
Principal Component Analysis	Common unsupervised learning technique that is useful for both exploratory data analysis and dimensionality reduction, often used as the core procedure for factor analysis. Implemented by the functions PCA_Map and PCA_Reduce. If the version of PCA_Reduce is AA 6.21 or later, you can input the PCA output to the function PCAPlot.
RandomSample	Takes a data set and uses a specified sampling method to output one or more random samples, each with a specified size.
Sample	Draws rows randomly from input, using either of two sampling schemes.
Shapley Value Functions	Computes the Shapley value, typically from nPath function output. The Shapley value is intended to reflect the importance of each player to the coalition in a cooperative game (a game between coalitions of players, rather than between individual players). The Shapley value functions are GenerateCombination, SortCombination, and AddOnePlayer.
SMAVG	Computes the simple moving average for a number of points in a series.
Support Vector Machine (SVM) Functions	Use a popular classification algorithm to build a predictive model according to a training set, give a prediction for each sample in the test set, and display the readable information of the model. Support Vector Machines include both SparseSVM and DenseSVM functions. The SparseSVM Functions include SparseSVMTrainer, SparseSVMPredictor, and SVMModelPrinter, while the DenseSVM Functions include DenseSVMTrainer, DenseSVMPredictor and DenseSVMModelPrinter.
VectorDistance	Measures the distance between sparse vectors (for example, TF-IDF vectors) in a pairwise manner.
VWAP	Computes the volume-weighted average price of a traded item (usually an equity share) over a specified time interval.
WMAVG	Computes the weighted moving average of a number of points in a time series, applying an arithmetically-decreasing weighting to older values.