1.1 - 8.10 - GLM Syntax Elements - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.1
8.10
Release Date
October 2019
Content Type
Programming Reference
Publication ID
B700-4003-079K
Language
English (United States)
OutputTable
Specify the name for the output table of coefficients. This table must not exist.
TargetColumns
[Optional] Specify the name of the column that contains the dependent variable (Y) followed by the names of the columns that contain the predictor variables (Xi), in this format: 'Y,X1,X2,...,Xp'.
Default behavior: The first column of the InputTable is Y and the remaining InputTable columns are Xi, except for the column specified by the WeightColumn syntax element.
CategoricalColumns
[Optional] Specify columnname-value pairs, each of which contains the name of a categorical input column and the category values in that column that the function is to include in the model that it creates.
columnname_value_pair Description
'columnname:max_cardinality' Limits categories in column to max_cardinality to most common ones and groups others together as 'others'.

For example, 'column_a:3' specifies that for column_a, function uses 3 most common categories and sets category of rows that do not belong to those 3 categories to 'others'.

'columnname:(category [,...])' Limits categories in column to those that you specify and groups others together as 'others'.

For example, 'column_a : (red, yellow, blue)' specifies that for column_a, function uses categories red, yellow, and blue, and sets category of rows that do not belong to those categories to 'others'.

'columnname' All category values appear in model.
If you specify the TargetColumns syntax element, the columns that you specify in the CategoricalColumns syntax element must also appear in the TargetColumns syntax element.
For information about columns that you must identify as numeric or categorical, see Identification of Numeric and Categorical Columns.
Family
[Optional] Specify the distribution exponential family, which is one of the following:
  • 'BINOMIAL' (Default)
  • 'LOGISTIC' (equivalent to 'BINOMIAL')
  • 'POISSON'
  • 'GAUSSIAN'
  • 'GAMMA'
  • 'INVERSE_GAUSSIAN'
  • 'NEGATIVE_BINOMIAL'
For Binomial/Logistic and Gaussian applications with high collinearity, Teradata recommends using GLML1L2 (ML Engine) with regularization parameters instead of GLM. GLML1L2 is expected to provide better performance and accuracy.
LinkFunction
[Optional] Specify the link function.
Default: 'CANONICAL'. The canonical link functions (default link functions) and the link functions that are allowed for each exponential family are listed in the tables in Supported Family/Link Function Combinations.
WeightColumn
[Optional] Specify the name of an InputTable column that contains the weights to assign to responses.
You can use non-NULL weights to indicate that different observations have different dispersions (with the weights being inversely proportional to the dispersions). Equivalently, when the weights are positive integers wi, each response yi is the mean of wi unit-weight observations. A binomial GLM uses prior weights to give the number of trials when the response is the proportion of successes. A Poisson GLM rarely uses weights.
If the weight is less than the response value, the function throws an exception. Therefore, if the response value is greater than 1, you must specify a weight that is greater than or equal to the response value.
Default behavior: All observations have equal weight.
StopThreshold
[Optional] Specify the convergence threshold.
Default: 0.01
MaxIterNum
[Optional] Specify the maximum number of iterations that the algorithm runs before quitting if the convergence threshold has not been met. The parameter max_iterations must be a positive INTEGER value.
Default: 25
Intercept
[Optional] Specify whether the function uses an intercept. For example, in ß0+ß1*X1+ß2*X2+ ....+ ßpXp, the intercept is ß0.
Default: 'true'