Gaussian Mixture Model Functions

Gaussian Mixture Model Functions - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product

Aster Analytics

Release Number

7.00.02

Published

September 2017

Language

English (United States)

Last Update

2018-04-17

dita:mapPath

uce1497542673292.ditamap

dita:ditavalPath

AA-notempfilter_pdf_output.ditaval

dita:id

B700-1022

lifecycle

Product Category

Software

A Gaussian Mixture Model (GMM) is a method of clustering numerical data. Applications that use GMM include market segmentation, network analysis, customer profiling, and recommender systems.

A GMM uses soft assignment; that is, it computes the probability that each point is a member of each cluster. Each cluster in a GMM is specified by a weight, a mean point, and a covariance; therefore, clusters of different eccentricities can be located within data that is not perfectly scaled.

The basic GMM fitting algorithm requires a known, fixed number of clusters. An advanced variant, the Dirichlet Process GMM (DP-GMM), estimates the number of clusters in the data, using an algorithm based on variational Bayesian methods. The DP-GMM uses a "stick-breaking process" to define the prior probability of each cluster, enforcing a "rich-get-richer" clustering approach. The DP-GMM does not start a new cluster unless it is very unlikely that a particular data point is in a preexisting cluster.

You can use GMMs in situations where k-means clustering is insufficient (for example, when clusters are not roughly spherical because input attributes are on different scales or can be correlated with each other). You can use GMMs either directly or in conjunction with k-means clustering. For example, you can use k-means clustering to find an initial set of cluster centers, which you can use to initialize a GMM function that produces a more refined model.

The Aster Analytics GMM package has three functions:

GMMFit, to fit a GMM to training data
GMMPredict, to predict cluster assignments for test data
GMMProfile, to compute statistics about each cluster in a GMM

You can specify whether GMMFit uses a basic GMM or DP-GMM algorithm.