Description
The GLML1L2 function differs from the GLM function in these ways:
GLML1L2 supports the regularization models Ridge, LASSO, and Elastic Net.
GLML1L2 outputs a model tbl_teradata and, optionally, a factor tbl_teradata (GLM outputs only a model).
Usage
td_glml1l2_mle ( formula = NULL, data = NULL, alpha = 0.0, lambda = 0, max.iter.num = 10000, stop.threshold = 1.0E-7, family = "Gaussian", randomization = FALSE, data.sequence.column = NULL )
Arguments
formula |
Required Argument. |
data |
Required Argument. |
alpha |
Optional Argument. |
lambda |
Optional Argument. |
max.iter.num |
Optional Argument. |
stop.threshold |
Optional Argument. |
family |
Optional Argument. |
randomization |
Optional Argument. |
data.sequence.column |
Optional Argument. |
Value
Function returns an object of class "td_glml1l2_mle" which is a named list containing Teradata tbl objects. Named list members can be referenced directly with the "$" operator using following names:
factor.data
output
Note:
When argument randomization is TRUE or if any categorical columns are provided in formula argument, then and only then the output tbl_teradata object factor.data is created.
factor.data can be used as the input (data) for future GLML1L2 function calls, thereby saving the function from repeating the categorical-to-numerical conversion or randomization.
Examples
# Get the current context/connection con <- td_get_context()$connection # Load example data. loadExampleData("glml1l2_example", "admissions_train", "housing_train") # Create remote tibble objects. admissions_train <- tbl(con, "admissions_train") housing_train <- tbl(con, "housing_train") # Example 1 - # Ridge Regression, family = 'Binomial'. # Because the response variable is binary (the admitted column has # two possible values), the call specifies family = 'Binomial'. # alpha = 0 indicates L2 (ridge regression) regularization. td_glml1l2_mle_out1 <- td_glml1l2_mle(formula = (admitted ~ stats + gpa + masters + programming), data = admissions_train, alpha = 0, lambda = 0.02, family = "Binomial", randomization = TRUE ) # Example 2 - # factor.data (from Example 1) as input data. # Because Randomization was TRUE in the function call that created # the factor.data input, this call does not need it. td_glml1l2_mle_out2 <- td_glml1l2_mle(formula = (admitted ~ masters_yes + stats_novice + programming_novice + stats_beginner + programming_beginner + gpa), data = td_glml1l2_mle_out1$factor.data, alpha = 0, lambda = 0.02, family = "Binomial" ) # Example 3 - # LASSO, family = 'Gaussian'. # Because the response variable has a Gaussian distribution, the call specifies # family = 'Gaussian'. # alpha = 1 indicates L1 (LASSO) regularization. td_glml1l2_mle_out3 <- td_glml1l2_mle(formula = (price ~ lotsize + bedrooms + gashw + driveway + stories + recroom + garagepl + bathrms + homestyle + fullbase + airco + prefarea), data = housing_train, alpha = 1, lambda = 0.02, family = "Gaussian", randomization = TRUE )