Description
The GLML1L2 function differs from the GLM function in these ways:
GLML1L2 supports the regularization models Ridge, LASSO, and Elastic Net.
GLML1L2 outputs a model tbl_teradata and, optionally, a factor tbl_teradata (GLM outputs only a model).
Usage
td_glml1l2_mle (
formula = NULL,
data = NULL,
alpha = 0.0,
lambda = 0,
max.iter.num = 10000,
stop.threshold = 1.0E-7,
family = "Gaussian",
randomization = FALSE,
data.sequence.column = NULL
)
Arguments
formula |
Required Argument. |
data |
Required Argument. |
alpha |
Optional Argument. |
lambda |
Optional Argument. |
max.iter.num |
Optional Argument. |
stop.threshold |
Optional Argument. |
family |
Optional Argument. |
randomization |
Optional Argument. |
data.sequence.column |
Optional Argument. |
Value
Function returns an object of class "td_glml1l2_mle" which is a named
list containing objects of class "tbl_teradata".
Named list members can be referenced directly with the "$" operator
using following names:
factor.data
output
Note:
When argument randomization is TRUE or if any categorical columns are provided in formula argument, then and only then the output tbl_teradata object factor.data is created.
factor.data can be used as the input (data) for future GLML1L2 function calls, thereby saving the function from repeating the categorical-to-numerical conversion or randomization.
Examples
# Get the current context/connection
con <- td_get_context()$connection
# Load example data.
loadExampleData("glml1l2_example", "admissions_train", "housing_train")
# Create object(s) of class "tbl_teradata".
admissions_train <- tbl(con, "admissions_train")
housing_train <- tbl(con, "housing_train")
# Example 1 -
# Ridge Regression, family = 'Binomial'.
# Because the response variable is binary (the admitted column has
# two possible values), the call specifies family = 'Binomial'.
# alpha = 0 indicates L2 (ridge regression) regularization.
td_glml1l2_mle_out1 <- td_glml1l2_mle(formula = (admitted ~ stats + gpa + masters
+ programming),
data = admissions_train,
alpha = 0,
lambda = 0.02,
family = "Binomial",
randomization = TRUE
)
# Example 2 -
# factor.data (from Example 1) as input data.
# Because Randomization was TRUE in the function call that created
# the factor.data input, this call does not need it.
td_glml1l2_mle_out2 <- td_glml1l2_mle(formula = (admitted ~ masters_yes + stats_novice
+ programming_novice + stats_beginner
+ programming_beginner + gpa),
data = td_glml1l2_mle_out1$factor.data,
alpha = 0,
lambda = 0.02,
family = "Binomial"
)
# Example 3 -
# LASSO, family = 'Gaussian'.
# Because the response variable has a Gaussian distribution, the call specifies
# family = 'Gaussian'.
# alpha = 1 indicates L1 (LASSO) regularization.
td_glml1l2_mle_out3 <- td_glml1l2_mle(formula = (price ~ lotsize + bedrooms + gashw + driveway
+ stories + recroom + garagepl + bathrms
+ homestyle + fullbase + airco + prefarea),
data = housing_train,
alpha = 1,
lambda = 0.02,
family = "Gaussian",
randomization = TRUE
)