GLM2 Arguments - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product

Aster Analytics

Release Number

7.00.02

Published

September 2017

Language

English (United States)

Last Update

2018-04-17

dita:mapPath

uce1497542673292.ditamap

dita:ditavalPath

AA-notempfilter_pdf_output.ditaval

dita:id

B700-1022

lifecycle

Product Category

Software

InputTable

Specifies the name of the table that contains the columns described in the table in Input.

ModelTable

Specifies the name for the output table that contains the trained model. The trained model contains parameters, statistics, and the coefficients of the predictors for lambda. This table must not exist.

RegularizationTable

[Optional] Specifies the name for the output table that contains the statistics and coefficients of each lambda. Recommended if you want predicted results for each lambda from GLM2Predict.

InputColumns

Specifies the names of the input_table columns that contain the variables to use as predictors (independent variables).

CategoricalColumns

[Optional] Specifies the names of the input_table columns that contain categorical variables, and which of their categories to use in the model. Default behavior: The function treats all variables as numerical.

Each categorical_column_and_categories has one of these formats:

'categorical_column:max_cardinality'
Uses the most common categories in categorical_column and groups the other categories into the category 'others'. For example, 'column_a:3' specifies that for column_a, the function uses the 3 most common categories and sets the category of the rows that do not belong to those 3 categories to 'others'.
'categorical_column:(category [,...])'
Uses the specified categories of categorical_column and groups the other categories into the category 'others'. For example, 'column_a : (red, yellow, blue)' specifies that for column_a, the function uses the categories red, yellow, and blue, and sets the category of the rows that do not belong to those categories to 'others'.
'categorical_column'
Uses all categories in categorical_column.

If you use this argument, you must also specify each categorical_column in the InputColumns argument.

WeightColumn

[Optional] Specifies the name of the input_table column that contains the weights to assign to responses. Default: 1.

You can use non-NULL weights to indicate that different observations have different dispersions (with the weights being inversely proportional to the dispersions). Equivalently, when the weights are positive integers wi, each response yi is the mean of wi unit-weight observations. A binomial GLM uses prior weights to give the number of trials when the response is the proportion of successes. A Poisson GLM rarely uses weights.

If the weight is less than the response value, the function throws an exception. Therefore, if the response value is greater than 1 (the default weight), you must specify a weight that is greater than or equal to the response value.

ResponseColumn

Specifies the name of the input_table column that contains the responses.

Family

[Optional] Specifies the distribution exponential family. Default: 'GAUSSIAN'.

Lambda

[Optional. Disallowed if NumLambdas is specified.] Specifies the regularization parameter sequence. Each lambda must be a nonnegative DOUBLE PRECISION value. A value of zero disables regularization. Default behavior: The function computes the regularization parameter sequence using the NumLambdas and MinLambdaRatio argument values.

NumLambdas

[Required if Lambda is omitted, otherwise disallowed] Specifies the number of lambda values in the regularization parameter sequence. The num_lambdas must be a positive INTEGER. Default: 100. Maximum: 10,000. The function uses num_lambdas and min_lambda_ratio to compute the regularization parameter sequence.

MinLambdaRatio

[Required if Lambda is omitted, otherwise disallowed] Specifies the minimum lambda value in the regularization parameter sequence (MinLambda) as a fraction of the maximum lambda value in the regularization parameter sequence (MaxLambda). The min_lambda_ratio must be in [0, 1). Default: 0.05 if the number of rows (observations) in the input data set is less than the number of predictors (independent variables), otherwise 0.0001.

To calculate the value of MaxLambda, the function uses the input data set.

To calculate the value of MinLambda, the function uses this formula:

MinLambda = MaxLambda * min_lambda_ratio

To calculate the step for decreasing the lambda value from MaxLambda to MinLambda, the function uses this formula:

min_lambda_ratio (1/(num_lambdas-1))

Threshold

[Optional] Specifies the convergence threshold of coordinate descent. The threshold must be a nonnegative DOUBLE PRECISION value. Default: 1.0e-7.

Alpha

[Optional] Specifies the mixing parameter for penalty computation (see the following table). The alpha must be in [0, 1]. If alpha is in (0,1), it represents α in the elastic net regularization formula in Background. Default: 0.1.

alpha	Mixing Parameter	Parameter Description
0	Ridge	½
(0,1)	Elastic net	((1-α)/2) + α
1	LASSO

MaxIterNum

[Optional] Specifies the maximum number of iterations over the data for all lambda values. The parameter max_iterations must be a positive INTEGER value. Default: 105.

Intercept

[Optional] Specifies whether the function uses an intercept. For example, in β0+β1*X1+β2*X2+ ....+ βpXp, the intercept is β0. Default: 'true'.