GLM - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product

Aster Analytics

Release Number

7.00.02

Published

September 2017

Language

English (United States)

Last Update

2018-04-17

dita:mapPath

uce1497542673292.ditamap

dita:ditavalPath

AA-notempfilter_pdf_output.ditaval

dita:id

B700-1022

lifecycle

Product Category

Software

The generalized linear model (GLM) is an extension of the linear regression model that enables the linear equation to be related to the dependent variables by a link function. GLM performs linear regression analysis for distribution functions using a user-specified distribution family and link function. GLM selects the link function based upon the distribution family and the assumed nonlinear distribution of expected outcomes. The table in Background describes the supported link function combinations.

A GLM has three parts:

A random component—the probability distribution of Y from the exponential family
A fixed linear component—the linear expression of the predictor values (X1,X2,...,Xp), expressed as ƞ or Xβ
A link function that describes the relationship of the distribution function to the expected value of Y (described in the table in Background)

GLM also supports categorical variables. For example, in the following table, size and color are independent (predictive) variables and outcome is the dependent (response) variable. Size is a quantitative variable and color is a qualitative variable (with the values yellow, blue, and red). In regression analysis, a qualitative variable is called a categorical (or dummy) variable.

Categorical Variables
size	color	outcome
10	yellow	1
5	blue	0
6	red	1

The GLM function implementation uses the Fisher Scoring Algorithm, which scales better than the least-squares algorithm that the glm() function in the R package stats uses. The results of the two algorithms usually match closely. However, when the input data is highly skewed or has a large variance, the Fisher Scoring Algorithm can diverge, and you must use data set knowledge and trial and error to select the optimal family and link functions.