- Regression (Gaussian family): The loss function is squared error.
- Binary Classification (Binomial family): The loss function is logistic and implements logistic regression. The response values are 0 or 1.
GLMs are a flexible class of statistical models that extend the linear regression framework to accommodate a wide range of response variables, including binary, count, and continuous data. GLMs assume the response variable has a probability distribution from an exponential family of distributions, which includes commonly-used distributions such as the normal, binomial, and Poisson distributions.
- Linear predictor: A predictor variables and their coefficients, similar to linear regression. It uses predictor variables X, and their coefficients β, and η = Xβ.
- Link function: The relationship of the linear predictor to the mean of the response variable, allowing for non-linear relationships between the predictors and response. It uses the link function g for g(μ) = η.
- Probability distribution: The variability of the response variable, and is chosen based on the nature of the data. The variance is calculated as Var(Y) = φV(μ), where φ is a scale parameter, and V(μ) is the variance function.
- Probability distribution: Bernoulli distribution
- Linear predictor: η = Xβ
- Link function: logit (g(μ) = logit(μ) = log(μ/(1-μ)))
- Probability distribution: Poisson distribution
- Linear predictor: η = Xβ
- Link function: log (g(μ) = log(μ))
- Variance function: Var(Y) = μ
GLMs are fitted using maximum likelihood estimation, which involves finding the parameter values that maximize the likelihood of observing the data given the model. Model fit can be assessed using various goodness-of-fit measures, such as deviance or Pearson chi-squared statistics.
TD_GLM uses the Minibatch Stochastic Gradient Descent (SGD) algorithm. The algorithm estimates the gradient of loss in minibatches, which is defined by the BatchSize argument and updates the model with a learning rate using the LearningRate argument.
- L1, L2, and Elastic Net Regularization for shrinking model parameters
- Accelerated learning using Momentum and Nesterov approaches
TD_GLM uses a combination of IterNumNoChange and Tolerance arguments to define the convergence criterion and runs multiple iterations (up to the specified value in the MaxIterNum argument) until the algorithm meets the criterion. MaxIterNum and IterNumNoChange are criteria used to stop learning. To force the function to run through all iterations, specify IterNumNoChange = 0.
The function output is a trained GLM model that is used as input to the TD_GLMPredict function. The output contains model statistics of MSE, Loglikelihood, AIC, and BIC. You can use TD_RegressionEvaluator, TD_ClassificationEvaluator, and TD_ROC functions to perform model evaluation as a post-processing step. When using partition by any, one model is generated. When using partition by key, more than one model is generated if there is more than one partition.
- TD_OneHotEncodingFit/TD_OneHotEncodingTransform
- TD_OrdinalEncodingFit/TD_OrdinalEncodingTransform
- TD_TargetEncodingFit/TD_TargetEncodingTransform