TD_GLM Function | GLM | Teradata Vantage - TD_GLM - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905
TD_GLM function is a generalized linear model (GLM) that performs regression and classification analysis on data sets, where the response follows an exponential family distribution and supports the following models:
  • Regression (Gaussian family): The loss function is squared error.
  • Binary Classification (Binomial family): The loss function is logistic and implements logistic regression. The response values are 0 or 1.

GLMs are a flexible class of statistical models that extend the linear regression framework to accommodate a wide range of response variables, including binary, count, and continuous data. GLMs assume the response variable has a probability distribution from an exponential family of distributions, which includes commonly-used distributions such as the normal, binomial, and Poisson distributions.

GLMs consist of the following key components:
  • Linear predictor: A predictor variables and their coefficients, similar to linear regression. It uses predictor variables X, and their coefficients β, and η = Xβ.
  • Link function: The relationship of the linear predictor to the mean of the response variable, allowing for non-linear relationships between the predictors and response. It uses the link function g for g(μ) = η.
  • Probability distribution: The variability of the response variable, and is chosen based on the nature of the data. The variance is calculated as Var(Y) = φV(μ), where φ is a scale parameter, and V(μ) is the variance function.
By specifying the appropriate link and variance functions, GLMs can be used to model a wide range of response variables. For example, the logistic regression model for binary data has the following components:
  • Probability distribution: Bernoulli distribution
  • Linear predictor: η = Xβ
  • Link function: logit (g(μ) = logit(μ) = log(μ/(1-μ)))
Similarly, the Poisson regression model for count data has the following components:
  • Probability distribution: Poisson distribution
  • Linear predictor: η = Xβ
  • Link function: log (g(μ) = log(μ))
  • Variance function: Var(Y) = μ

GLMs are fitted using maximum likelihood estimation, which involves finding the parameter values that maximize the likelihood of observing the data given the model. Model fit can be assessed using various goodness-of-fit measures, such as deviance or Pearson chi-squared statistics.

TD_GLM uses the Minibatch Stochastic Gradient Descent (SGD) algorithm. The algorithm estimates the gradient of loss in minibatches, which is defined by the BatchSize argument and updates the model with a learning rate using the LearningRate argument.

The function also supports the following approaches:
  • L1, L2, and Elastic Net Regularization for shrinking model parameters
  • Accelerated learning using Momentum and Nesterov approaches

TD_GLM uses a combination of IterNumNoChange and Tolerance arguments to define the convergence criterion and runs multiple iterations (up to the specified value in the MaxIterNum argument) until the algorithm meets the criterion. MaxIterNum and IterNumNoChange are criteria used to stop learning. To force the function to run through all iterations, specify IterNumNoChange = 0.

The function output is a trained GLM model that is used as input to the TD_GLMPredict function. The output contains model statistics of MSE, Loglikelihood, AIC, and BIC. You can use TD_RegressionEvaluator, TD_ClassificationEvaluator, and TD_ROC functions to perform model evaluation as a post-processing step. When using partition by any, one model is generated. When using partition by key, more than one model is generated if there is more than one partition.

The function only accepts numeric features. Before training, you must convert the categorical features to numeric values, such as using:
  • TD_OneHotEncodingFit/TD_OneHotEncodingTransform
  • TD_OrdinalEncodingFit/TD_OrdinalEncodingTransform
  • TD_TargetEncodingFit/TD_TargetEncodingTransform