GLMPredictPerSegment
Description
The td_glm_predict_per_segment_sqle()
function uses the model generated by
the td_glm_per_segment_sqle()
function to predict target values (regression)
and class labels (classification) on new input data.
Notes:
All input features must be numeric. The categorical columns should be converted to numerical columns as preprocessing step, such as using the following functions:
*td_one_hot_encoding_fit_sqle()
andtd_one_hot_encoding_transform_sqle()
.
*td_ordinal_encoding_fit_sqle()
andtd_ordinal_encoding_transform_sqle()
.
*td_target_encoding_fit_sqle()
andtd_target_encoding_transform_sqle()
.
Thetd_one_hot_encoding_fit_sqle()
andtd_one_hot_encoding_transform_sqle()
functions support segment functions. However, thetd_ordinal_encoding_fit_sqle()
andtd_ordinal_encoding_transform_sqle()
,td_target_encoding_fit_sqle()
andtd_target_encoding_transform_sqle()
functions do not support segment functions. You must run these functions one-by-one on each partition.The preprocessing steps carried out for
td_glm_per_segment_sqle()
should be done for the test data set as well before prediction.Prediction accuracy metrics such as MSE, precision, recall, ROC are not generated by the function. The user should use
td_regression_evaluator_sqle()
,td_classification_evaluator_sqle()
andtd_roc_sqle()
functions as post-processing steps. These functions do not support segment functions and the workaround is to run these functions one by one on each partition.Any observation with missing value in an input column is ignored and it shows in the output with specific error code. User can use some imputation function, such as
td_simple_impute_fit_sqle()
andtd_simple_impute_transform_sqle()
to do imputation or filling of missing values.
Usage
td_glm_predict_per_segment_sqle (
newdata = NULL,
object = NULL,
id.column = NULL,
accumulate = NULL,
output.prob = FALSE,
output.responses = NULL,
partition.column = NULL,
...
)
Arguments
newdata |
Required Argument. |
object |
Required Argument. |
id.column |
Required Argument. |
accumulate |
Optional Argument. |
output.prob |
Optional Argument. |
output.responses |
Optional Argument.
Types: character OR vector of Strings (character) |
partition.column |
Optional Argument. |
... |
Specifies the generic keyword arguments SQLE functions accept. Below persist: volatile: Function allows the user to partition, hash, order or local order the input data. These generic arguments are available for each argument that accepts tbl_teradata as input and can be accessed as:
Note: |
Value
Function returns an object of class "td_glm_predict_per_segment_sqle"
which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator
using the name(s):result
Examples
# Get the current context/connection.
con <- td_get_context()$connection
# Load the example data.
loadExampleData("decisionforestpredict_example", "housing_train", "housing_test")
# Create tbl_teradata object.
housing_train <- tbl(con, "housing_train")
housing_test <- tbl(con, "housing_test")
# Check the list of available analytic functions.
display_analytic_functions()
# Filter the rows from train and test dataset with homestyle as Classic and Eclectic.
binomial_housing_train = housing_train
binomial_housing_test = housing_test
# td_glm_per_segment_sqle() function requires features in numeric format for processing,
# so dropping the non-numeric columns.
drop_cols <- c("driveway", "recroom", "gashw", "airco", "prefarea",
"fullbase")
binomial_housing_train <- binomial_housing_train
gaussian_housing_train <- binomial_housing_train
binomial_housing_test <- binomial_housing_test
gaussian_housing_test <- binomial_housing_test
# Transform the train dataset categorical values to encoded values.
train_fit_res <- td_ordinal_encoding_fit_sqle(target.column='homestyle',
data=binomial_housing_train)
train_transform_res <- td_ordinal_encoding_transform_sqle(
data=binomial_housing_train,
object=train_fit_res$result,
accumulate=c("sn", "price",
"lotsize","bedrooms",
"bathrms", "stories"))
test_fit <- td_ordinal_encoding_fit_sqle(
target.column='homestyle',
data=binomial_housing_test)
test_transform <- td_ordinal_encoding_transform_sqle(
data=binomial_housing_test,
object=test_fit$result,
accumulate=c("sn", "price", "lotsize",
"bedrooms", "bathrms", "stories"))
# Example 1: Train the model using the 'Gaussian' family.
# Predict the price using td_glm_predict_per_segment_sqle().
# Train the model using the 'Gaussian' family.
GLMPerSegment_out_1 <- td_glm_per_segment_sqle(data=gaussian_housing_train,
data.partition.column="stories",
input.columns=c('garagepl',
'lotsize',
'bedrooms',
'bathrms'),
response.column="price",
family="Gaussian",
iter.max=1000,
batch.size=9)
# Predict the price using td_glm_predict_per_segment_sqle().
GLMPredictPerSegment_out_1 <- td_glm_predict_per_segment_sqle(
newdata=gaussian_housing_test,
newdata.partition.column="stories",
object=GLMPerSegment_out_1,
object.partition.column="stories",
id.column="sn")
# Print the result.
print(GLMPredictPerSegment_out_1$result)
# Example 2: Train the model using the 'Binomial' family.
# Predict the homestyle using td_glm_predict_per_segment_sqle().
# Train the model using the 'Binomial' family.
GLMPerSegment_out_2 <- td_glm_per_segment_sqle(
data=train_transform_res$result,
data.partition.column="stories",
input.columns=c('price', 'lotsize',
'bedrooms', 'bathrms'),
response.column="homestyle",
family="Binomial",
iter.max=100)
# Predict the homestyle using td_glm_predict_per_segment_sqle().
GLMPredictPerSegment_out_2 <- td_glm_predict_per_segment_sqle(
newdata=test_transform$result,
newdata.partition.column="stories",
object=GLMPerSegment_out_2,
object.partition.column="stories",
id.column="sn",
output.prob=TRUE,
output.responses=c("0", "1")
)
# Print the result.
print(GLMPredictPerSegment_out_2$result)
# Alternatively use S3 predict function to run predict on the output of
# td_glm_per_segment_sqle() function.
GLMPredictPerSegment_out_2 <- predict(
GLMPerSegment_out_2,
newdata=test_transform$result,
newdata.partition.column="stories",
object.partition.column="stories",
id.column="sn",
output.prob=TRUE,
output.responses=c("0", "1")
)
# Print the result.
print(GLMPredictPerSegment_out_2$result)