Description
The XGBoost function takes a training data set and uses gradient
boosting to create a strong classifying model that can be input
to the function XGBoostPredict (td_xgboost_predict_mle
).
The function supports input tables in both dense and sparse format.
Usage
td_xgboost_mle (
formula = NULL,
data = NULL,
id.column = NULL,
loss.function = "SOFTMAX",
prediction.type = "CLASSIFICATION",
reg.lambda = 1,
shrinkage.factor = 0.1,
iter.num = 10,
min.node.size = 1,
max.depth = 5,
variance = 0,
seed = NULL,
attribute.name.column = NULL,
num.boosted.trees = NULL,
attribute.table = NULL,
attribute.value.column = NULL,
column.subsampling = 1.0,
response.column = NULL,
data.sequence.column = NULL,
attribute.table.sequence.column = NULL
)
Arguments
formula |
Required Argument when input data is in dense format. |
data |
Required Argument. |
id.column |
Optional Argument. |
loss.function |
Optional Argument. |
prediction.type |
Optional Argument. |
reg.lambda |
Optional Argument. |
shrinkage.factor |
Optional Argument. |
iter.num |
Optional Argument. |
min.node.size |
Optional Argument. |
max.depth |
Optional Argument. |
variance |
Optional Argument. |
seed |
Optional Argument. |
attribute.name.column |
Optional Argument. |
num.boosted.trees |
Optional Argument. |
attribute.table |
Optional Argument. |
attribute.value.column |
Required if the input data set is in sparse format. |
column.subsampling |
Optional Argument. |
response.column |
Required Argument when "formula" is not specified. |
data.sequence.column |
Optional Argument. |
attribute.table.sequence.column |
Optional Argument. |
Value
Function returns an object of class "td_xgboost_mle" which is a named
list containing objects of class "tbl_teradata".
Named list members can be referenced directly with the "$" operator
using following names:
model.table
output
Examples
# Get the current context/connection
con <- td_get_context()$connection
# Load example data.
loadExampleData("xgboost_example", "housing_train_binary", "iris_train", "sparse_iris_train",
"sparse_iris_attribute")
# Example 1: Binary Classification
# Create object(s) of class "tbl_teradata".
housing_train_binary <- tbl(con, "housing_train_binary")
td_xgboost_out1 <- td_xgboost_mle(data=housing_train_binary,
id.column='sn',
formula = (homestyle ~ driveway + recroom + fullbase + gashw + airco + prefarea +
price + lotsize + bedrooms + bathrms + stories + garagepl),
num.boosted.trees=2,
loss.function='binomial',
prediction.type='classification',
reg.lambda=1,
shrinkage.factor=0.1,
iter.num=10,
min.node.size=1,
max.depth=10
)
# Example 2: Multiple-Class Classification
iris_train <- tbl(con,"iris_train")
td_xgboost_out2 <- td_xgboost_mle(data=iris_train,
id.column='id',
formula = (species ~ sepal_length + sepal_length +
petal_length + petal_width + species),
num.boosted.trees=2,
loss.function='softmax',
reg.lambda=1,
shrinkage.factor=0.1,
iter.num=10,
min.node.size=1,
max.depth=10)
# Example 3: Sparse Input Format. "response.column" argument is specified instead of formula.
sparse_iris_train <- tbl(con,"sparse_iris_train")
sparse_iris_attribute <- tbl(con,"sparse_iris_attribute")
td_xgboost_out3 <- td_xgboost_mle(data=sparse_iris_train,
attribute.table=sparse_iris_attribute,
id.column='id',
attribute.name.column='attribute',
attribute.value.column='value_col',
response.column="species",
loss.function='SOFTMAX',
reg.lambda=1,
num.boosted.trees=2,
shrinkage.factor=0.1,
column.subsampling=1.0,
iter.num=10,
min.node.size=1,
max.depth=10,
variance=0,
seed=1
)