GLM Syntax Elements - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

9.02

9.01

2.0

1.3

Published

February 2022

Language

English (United States)

Last Update

2022-02-10

dita:mapPath

rnn1580259159235.ditamap

dita:ditavalPath

ybt1582220416951.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

OutputTable

Specify the name for the output table of coefficients. This table must not exist.

StepTable

[Optional] Specify the name for the table to output with Step ('true').

Step

[Optional] Specify whether to use Stepwise Backward Regression for variable selection.

With Step ('false'), the function computes a single model and outputs it in output_table.

With Step ('true'), the function computes multiple models, selects the model with the best Akaike information criterion (AIC) score, and outputs the selected model in output_table. If you omit StepTable, you cannot access the other models that the function computed. If you specify StepTable, the function outputs the other models in step_table. The step_table includes the AIC score for each model at each step.

GLMPredict_MLE needs only the best model (output_table), but you can use step_table to understand how the function selected the best model.

Default: 'false'

TargetColumns

[Optional] Specify the names of the InputTable columns that contain the variables to use as predictors (independent variables) in the model.

Every target_column is numerical unless you specify it with CategoricalColumns.

CategoricalColumns

[Optional] Specify columnname-value pairs, each of which contains the name of a categorical input column and the category values in that column that the function is to include in the model that it creates.

columnname_value_pair	Description
'columnname:max_cardinality'	Limits categories in column to max_cardinality to most common ones and groups others together as 'others'. For example, 'column_a:3' specifies that for column_a, function uses 3 most common categories and sets category of rows that do not belong to those 3 categories to 'others'.
'columnname:(category [,...])'	Limits categories in column to those that you specify and groups others together as 'others'. For example, 'column_a : (red, yellow, blue)' specifies that for column_a, function uses categories red, yellow, and blue, and sets category of rows that do not belong to those categories to 'others'.
'columnname'	All category values appear in model.

If you specify the TargetColumns syntax element, the columns that you specify in the CategoricalColumns syntax element must also appear in the TargetColumns syntax element.

For information about columns that you must identify as numeric or categorical, see Identification of Numeric and Categorical Columns.

Family

[Optional] Specify the distribution exponential family, which is one of the following:

'BINOMIAL' (Default)
'LOGISTIC' (equivalent to 'BINOMIAL')
'POISSON'
'GAUSSIAN'
'GAMMA'
'INVERSE_GAUSSIAN'
'NEGATIVE_BINOMIAL'

For Binomial/Logistic and Gaussian applications with high collinearity, Teradata recommends using GLML1L2 (ML Engine) with regularization parameters instead of GLM. GLML1L2 is expected to provide better performance and accuracy.

LinkFunction

[Optional] Specify the link function.

Default: 'CANONICAL'. The canonical link functions (default link functions) and the link functions that are allowed for each exponential family are listed in the tables in Supported Family/Link Function Combinations.

WeightColumn

[Optional] Specify the name of an InputTable column that contains the weights to assign to responses.

You can use non-NULL weights to indicate that different observations have different dispersions (with the weights being inversely proportional to the dispersions). Equivalently, when the weights are positive integers wi, each response yi is the mean of wi unit-weight observations. A binomial GLM uses prior weights to give the number of trials when the response is the proportion of successes. A Poisson GLM rarely uses weights.

If the weight is less than the response value, the function throws an exception. Therefore, if the response value is greater than 1, you must specify a weight that is greater than or equal to the response value.

Default behavior: All observations have equal weight.

StopThreshold

[Optional] Specify the convergence threshold.

Default: 0.01

MaxIterNum

[Optional] Specify the maximum number of iterations that the algorithm runs before quitting if the convergence threshold has not been met. The parameter max_iterations must be a positive INTEGER value.

Default: 25

Intercept

[Optional] Specify whether the function uses an intercept. For example, in ß0+ß1*X1+ß2*X2+ ....+ ßpXp, the intercept is ß0.

Default: 'true'