1.1 - 8.10 - Gaussian Mixture Model Functions (ML Engine) - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Teradata Vantage
Release Number
October 2019
Content Type
Programming Reference
Publication ID
English (United States)

A Gaussian Mixture Model (GMM) is a method of clustering numerical data. Applications that use GMM include market segmentation, network analysis, customer profiling, and recommender systems.

A GMM uses soft assignment; that is, it computes the probability that each point is a member of each cluster. Each cluster in a GMM is specified by a weight, a mean point, and a covariance; therefore, clusters of different eccentricities can be located within data that is not perfectly scaled.

The basic GMM fitting algorithm requires a known, fixed number of clusters. An advanced variant, the Dirichlet Process GMM (DP-GMM), estimates the number of clusters in the data, using an algorithm based on variational Bayesian methods. The DP-GMM uses a "stick-breaking process" to define the prior probability of each cluster, enforcing a "rich-get-richer" clustering approach. The DP-GMM does not start a new cluster unless it is very unlikely that a particular data point is in a preexisting cluster.

You can use GMMs in situations where k-means clustering is insufficient (for example, when clusters are not roughly spherical because input attributes are on different scales or can be correlated with each other). You can use GMMs either directly or in conjunction with k-means clustering. For example, you can use k-means clustering to find an initial set of cluster centers, which you can use to initialize a GMM function that produces a more refined model.

Function Description
GMM (ML Engine) Fits GMM to training data.
GMMPredict (ML Engine) Predicts cluster assignments for test data.
GMMProfile (ML Engine) Computes statistics about each cluster in GMM.

You can specify whether GMM uses a basic GMM or DP-GMM algorithm.