A Gaussian Mixture Model (GMM) is a method of clustering numerical data. Applications that use GMM include market segmentation, network analysis, customer profiling, and recommender systems.
A GMM uses soft assignment; that is, it computes the probability that each point is a member of each cluster. Each cluster in a GMM is specified by a weight, a mean point, and a covariance; therefore, clusters of different eccentricities can be located within data that is not perfectly scaled.
The basic GMM fitting algorithm requires a known, fixed number of clusters. An advanced variant, the Dirichlet Process GMM (DP-GMM), estimates the number of clusters in the data, using an algorithm based on variational Bayesian methods. The DP-GMM uses a "stick-breaking process" to define the prior probability of each cluster, enforcing a "rich-get-richer" clustering approach. The DP-GMM does not start a new cluster unless it is very unlikely that a particular data point is in a preexisting cluster.
You can use GMMs in situations where k-means clustering is insufficient (for example, when clusters are not roughly spherical because input attributes are on different scales or can be correlated with each other). You can use GMMs either directly or in conjunction with k-means clustering. For example, you can use k-means clustering to find an initial set of cluster centers, which you can use to initialize a GMM function that produces a more refined model.
The Aster Analytics GMM package has three functions:
- GMMFit, to fit a GMM to training data
- GMMPredict, to predict cluster assignments for test data
- GMMProfile, to compute statistics about each cluster in a GMM
You can specify whether GMMFit uses a basic GMM or DP-GMM algorithm.