The generalized linear model (GLM) is an extension of the linear regression model that enables the linear equation to be related to the dependent variables by a link function. GLM performs linear regression analysis for distribution functions using a user-specified distribution family and link function. GLM selects the link function based upon the distribution family and the assumed nonlinear distribution of expected outcomes. The table in Background describes the supported link function combinations.
A GLM has three parts:
- A random component—the probability distribution of Y from the exponential family
- A fixed linear component—the linear expression of the predictor values (X1,X2,...,Xp), expressed as ƞ or Xβ
- A link function that describes the relationship of the distribution function to the expected value of Y (described in the table in Background)
GLM also supports categorical variables. For example, in the following table, size and color are independent (predictive) variables and outcome is the dependent (response) variable. Size is a quantitative variable and color is a qualitative variable (with the values yellow, blue, and red). In regression analysis, a qualitative variable is called a categorical (or dummy) variable.
size | color | outcome |
---|---|---|
10 | yellow | 1 |
5 | blue | 0 |
6 | red | 1 |