GLM Arguments - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product
Aster Analytics
Release Number
7.00.02
Published
September 2017
Language
English (United States)
Last Update
2018-04-17
dita:mapPath
uce1497542673292.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1022
lifecycle
previous
Product Category
Software
InputTable
Specifies the name of the table that contains the columns described in the table in Input.
OutputTable
Specifies the name for the output table of coefficients. This table must not exist. For GLM, the output is written to the screen, and the output table is the table where the coefficients are stored.
InputColumns
[Optional] Specifies the name of the column that contains the dependent variables (Y) followed by the names of the columns that contain the predictor variables (Xi), in this format: 'Y,X1,X2,...,Xp'.

Default behavior: The first column of the input table is Y and the remaining input table columns are Xi, except for the column specified by the Weight argument.

CategoricalColumns
[Optional] Specifies columnname-value pairs, each of which contains the name of a categorical input column and the category values in that column that the function is to include in the model that it generates.

Each columnname-value pair has one these forms:

  • 'columnname:max_cardinality'

    Limits the categories in the column to the max_cardinality most common ones and groups the others together as 'others'. For example, 'column_a:3' specifies that for column_a, the function uses the 3 most common categories and sets the category of the rows that do not belong to those 3 categories to 'others'.

  • 'columnname:(category [, ...])'

    Limits the categories in the column to those that you specify and groups the others together as 'others'. For example, 'column_a : (red, yellow, blue)' specifies that for column_a, the function uses the categories red, yellow, and blue, and sets the category of the rows that do not belong to those categories to 'others'.

  • 'columnname'

    All category values appear in the model.

If you specify the ColumnNames argument, the columns that you specify in the CategoricalColumns argument must also appear in the ColumnNames argument.

Family
[Optional] Specifies the distribution exponential family. Supported values are:
  • 'BINOMIAL' (Default)
  • 'LOGISTIC' (equivalent to 'BINOMIAL')
  • 'POISSON'
  • 'GAUSSIAN'
  • 'GAMMA'
  • 'INVERSE_GAUSSIAN'
  • 'NEGATIVE_BINOMIAL'
Link
[Optional] Specifies the link function. Default: 'CANONICAL'. The canonical link functions (default link functions) and the link functions that are allowed for each exponential family are listed in the table in Background.
Weight
[Optional] Specifies the name of an input table column that contains the weights to assign to responses. Default behavior: All observations have equal weight.

You can use non-NULL weights to indicate that different observations have different dispersions (with the weights being inversely proportional to the dispersions). Equivalently, when the weights are positive integers wi, each response yi is the mean of wi unit-weight observations. A binomial GLM uses prior weights to give the number of trials when the response is the proportion of successes. A Poisson GLM rarely uses weights.

If the weight is less than the response value, the function throws an exception. Therefore, if the response value is greater than 1, you must specify a weight that is greater than or equal to the response value.

Threshold
[Optional] Specify the convergence threshold. Default: 0.01.
MaxIterNum
[Optional] Specifies the maximum number of iterations that the algorithm runs before quitting if the convergence threshold has not been met. The parameter max_iterations must be a positive INTEGER value. Default: 25.
Intercept
[Optional] Specifies whether the function uses an intercept. For example, in ß0+ß1*X1+ß2*X2+ ....+ ßpXp, the intercept is ß0. Default: 'true'.
Step
[Optional] Specifies whether the function uses a step. Default: 'false'. If the function uses a step, it runs with the GLM model that has the lowest Akaike information criterion (AIC) score, drops one predictor from the current predictor group, and repeats this process until no predictor remains.