GLMPerSegment Syntax Elements - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
9.02
9.01
2.0
1.3
Published
February 2022
Language
English (United States)
Last Update
2022-02-10
dita:mapPath
rnn1580259159235.ditamap
dita:ditavalPath
ybt1582220416951.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantage™
TargetColumns
[Required with CategoricalColumns when AttributeTable is not specified, optional otherwise.] Specify the names of the input table columns that contain the variables to use as predictors (independent variables) in the models.
Every target_column is numerical unless you specify it with CategoricalColumns.
CategoricalColumns
[Optional] Specify the names of the input table columns to treat as categorical variables.
For information about columns that you must identify as numeric or categorical, see Identification of Numeric and Categorical Columns.
ResponseColumn
Specify the name of the input table column that contains the responses.
Family
[Optional] Specify the distribution exponential family.
Default: 'GAUSSIAN'
FitMethod
[Optional] Specify the optimization method, BFGS (Broyden–Fletcher–Goldfarb–Shanno algorithm) or Fisher (Fisher scoring).
When FitMethod is Fisher, for moderate collinearity or high collinearity data, the order of rows can significantly influence the coefficient values of the model. The sensitivity of the ordering is intrinsic to the algorithm and orthogonal to the implementation, resulting in nondeterministic results. For deterministic results, use ORDER BY.
If you specify Fisher, you cannot specify Alpha or RegularizationLambda.
Default: BFGS
Alpha
[Disallowed with FitMethod ('Fisher'), optional otherwise.] Specify the mixing parameter for penalty computation (see the following table). The alpha must be in [0, 1]. If alpha is in (0,1), it represents α in the elastic net regularization formula in Generalized Linear Model (GLM) Functions (ML Engine).
alpha Regularization Type Parameter Description
0 Ridge Formula for ridge regularization, used by Machine Learning Engine function GLML1L2
(0,1) Elastic net Formula for elastic net regularization, used by Machine Learning Engine function GLML1L2
1 LASSO Formula for LASSO regularization, used by Machine Learning Engine function GLML1L2
Default: 0
RegularizationLambda
[Disallowed with FitMethod ('Fisher'), optional otherwise.] Specify the parameter that controls the magnitude of the regularization term. The value lambda must be in the range [0, 100]. The value 0 disables regularization.
The function also accepts the syntax element name Lambda, which was the name of RegularizationLambda in the GLML1L2 function before release MLE 8.10. See Argument Name Changes for Vantage 1.1 and Above.
Default: 0
StopThreshold
[Optional] Specify the convergence threshold. The threshold must be a nonnegative DOUBLE PRECISION value.
Default: 1.0e-7
MaxIterNum
[Optional] Specify the maximum number of iterations over the data. The parameter max_iterations must be a positive INTEGER value in the range [1, 100000].
Default: 10000
FeatureScale
[Optional] Specify whether to scale numeric target columns to the range [-1, 1].
Default: 'false' (no scaling)
CategoricalEncoding
[Optional with CategoricalColumns, disallowed otherwise.] Specify the encoding scheme to use for categorical variables.
Option Description
'Onehot' Expands each category under corresponding column into new column.

For example, if column Programming has categories a, b, and c, function replaces column Programming with columns Programming_a and Programming_b and drops Programming_c.

Can create very high dimensionality. As number of unique categories in a categorical column increases, number of features increases.

'Target' Uses target encoding described in A Preprocessing Scheme for High-Cardinality Categorical Attributes in Classification and Prediction Problems (https://dl.acm.org/citation.cfm?id=507538).

Does not create high dimensionality, but requires careful validation, as it is prone to overfitting when distribution of categorical variables in training data and test data differ significantly.

Default: 'Onehot'
MinSamplesForEncoding
[Optional with CategoricalEncoding ('Target'), disallowed otherwise.] Specify minimum number of samples for target encoding, which is k in the following formula:
Ɣ (n) = 1 / (1 + e-( (n - k)/f))
The Target encoding algorithm uses the hyperparameter Ɣ.
MinSamplesForEncoding is the same as the min_samples_leaf parameter in https://contrib.scikit-learn.org/categorical-encoding/targetencoder.html.
Default: 1
Smoothing
[Optional with CategoricalEncoding ('Target'), disallowed otherwise.] Specify smoothing parameter for target encoding, which is f in the following formula:
Ɣ (n) = 1 / (1 + e-( (n - k)/f))
The Target encoding algorithm uses the hyperparameter Ɣ.
Smoothing is the same as the smoothing parameter in https://contrib.scikit-learn.org/categorical-encoding/targetencoder.html.
Default: 1.0
ErrorHandler
[Optional] Specify whether, when the function finds an error, it continues with the next model:
Value Function Behavior
'true' Function skips partition where error occurs and continues with next partition. Output table displays error for the partitions for which an error occurs.
'false' Function stops and reports error.
Default: 'false'