7.00.02 - GLM - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Aster Analytics
Release Number
September 2017
English (United States)
Last Update

The generalized linear model (GLM) is an extension of the linear regression model that enables the linear equation to be related to the dependent variables by a link function. GLM performs linear regression analysis for distribution functions using a user-specified distribution family and link function. GLM selects the link function based upon the distribution family and the assumed nonlinear distribution of expected outcomes. The table in Background describes the supported link function combinations.

A GLM has three parts:

  1. A random component—the probability distribution of Y from the exponential family
  2. A fixed linear component—the linear expression of the predictor values (X1,X2,...,Xp), expressed as ƞ or Xβ
  3. A link function that describes the relationship of the distribution function to the expected value of Y (described in the table in Background)

GLM also supports categorical variables. For example, in the following table, size and color are independent (predictive) variables and outcome is the dependent (response) variable. Size is a quantitative variable and color is a qualitative variable (with the values yellow, blue, and red). In regression analysis, a qualitative variable is called a categorical (or dummy) variable.

Categorical Variables
size color outcome
10 yellow 1
5 blue 0
6 red 1
The GLM function implementation uses the Fisher Scoring Algorithm, which scales better than the least-squares algorithm that the glm() function in the R package stats uses. The results of the two algorithms usually match closely. However, when the input data is highly skewed or has a large variance, the Fisher Scoring Algorithm can diverge, and you must use data set knowledge and trial and error to select the optimal family and link functions.