Teradata Package for R Function Reference | 17.20 - GLMPredictPerSegment - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for R
Release Number
17.20
Published
March 2024
ft:locale
en-US
ft:lastEdition
2024-05-03
dita:id
TeradataR_FxRef_Enterprise_1720
Product Category
Teradata Vantage

GLMPredictPerSegment

Description

The td_glm_predict_per_segment_sqle() function uses the model generated by the td_glm_per_segment_sqle() function to predict target values (regression) and class labels (classification) on new input data.

Notes:

  • All input features must be numeric. The categorical columns should be converted to numerical columns as preprocessing step, such as using the following functions:
    * td_one_hot_encoding_fit_sqle() and td_one_hot_encoding_transform_sqle().
    * td_ordinal_encoding_fit_sqle() and td_ordinal_encoding_transform_sqle().
    * td_target_encoding_fit_sqle() and td_target_encoding_transform_sqle().
    The td_one_hot_encoding_fit_sqle() and td_one_hot_encoding_transform_sqle() functions support segment functions. However, the td_ordinal_encoding_fit_sqle() and td_ordinal_encoding_transform_sqle(), td_target_encoding_fit_sqle() and td_target_encoding_transform_sqle() functions do not support segment functions. You must run these functions one-by-one on each partition.

  • The preprocessing steps carried out for td_glm_per_segment_sqle() should be done for the test data set as well before prediction.

  • Prediction accuracy metrics such as MSE, precision, recall, ROC are not generated by the function. The user should use td_regression_evaluator_sqle(), td_classification_evaluator_sqle() and td_roc_sqle() functions as post-processing steps. These functions do not support segment functions and the workaround is to run these functions one by one on each partition.

  • Any observation with missing value in an input column is ignored and it shows in the output with specific error code. User can use some imputation function, such as td_simple_impute_fit_sqle() and td_simple_impute_transform_sqle() to do imputation or filling of missing values.

Usage

  td_glm_predict_per_segment_sqle (
      newdata = NULL,
      object = NULL,
      id.column = NULL,
      accumulate = NULL,
      output.prob = FALSE,
      output.responses = NULL,
      partition.column = NULL,
      ...
  )

Arguments

newdata

Required Argument.
Specifies the tbl_teradata containing the input data.
Types: tbl_teradata

object

Required Argument.
Specifies the tbl_teradata containing the model data
generated by td_glm_per_segment_sqle() function or an instance of td_glm_per_segment_sqle.
Types: tbl_teradata or td_glm_per_segment_sqle

id.column

Required Argument.
Specifies the input data column name that uniquely identifies an observation in the input data.
Types: character

accumulate

Optional Argument.
Specifies the name(s) of input tbl_teradata column(s) to copy to the output.
Types: character OR vector of Strings (character)

output.prob

Optional Argument.
Specifies whether function should output the probability for each response.
Default Value: FALSE
Types: logical

output.responses

Optional Argument.
Specifies the class labels for output probabilities. A label must be 0 or 1.
If not specified, then the function outputs the probability of the predicted response.
Note:

  • The "output.responses" argument works only when "output.prob" is TRUE.

Types: character OR vector of Strings (character)

partition.column

Optional Argument.
Specifies the name of the "input_data" columns on which to partition the input.
The name should be consistent with the "data_partition_column" name. If the "data_partition_column" is unicode with foreign language characters, then it is necessary to specify "partition.column" argument.
Types: character

...

Specifies the generic keyword arguments SQLE functions accept. Below
are the generic keyword arguments:

persist:
Optional Argument.
Specifies whether to persist the results of the
function in a table or not. When set to TRUE, results are persisted in a table; otherwise, results are garbage collected at the end of the session.
Default Value: FALSE
Types: logical

volatile:
Optional Argument.
Specifies whether to put the results of the
function in a volatile table or not. When set to TRUE, results are stored in a volatile table, otherwise not.
Default Value: FALSE
Types: logical

Function allows the user to partition, hash, order or local order the input data. These generic arguments are available for each argument that accepts tbl_teradata as input and can be accessed as:

  • "<input.data.arg.name>.partition.column" accepts character or vector of character (Strings)

  • "<input.data.arg.name>.hash.column" accepts character or vector of character (Strings)

  • "<input.data.arg.name>.order.column" accepts character or vector of character (Strings)

  • "local.order.<input.data.arg.name>" accepts logical

Note:
These generic arguments are supported by tdplyr if the underlying SQLE Engine function supports, else an exception is raised.

Value

Function returns an object of class "td_glm_predict_per_segment_sqle" which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator using the name(s):result

Examples

  
    
    # Get the current context/connection.
    con <- td_get_context()$connection
    
    # Load the example data.
    loadExampleData("decisionforestpredict_example", "housing_train", "housing_test")
    
    # Create tbl_teradata object.
    housing_train <- tbl(con, "housing_train")
    housing_test <- tbl(con, "housing_test")
    
    # Check the list of available analytic functions.
    display_analytic_functions()
    
    # Filter the rows from train and test dataset with homestyle as Classic and Eclectic.
    binomial_housing_train = housing_train 
    binomial_housing_test = housing_test 

    # td_glm_per_segment_sqle() function requires features in numeric format for processing,
    # so dropping the non-numeric columns.
    drop_cols <- c("driveway", "recroom", "gashw", "airco", "prefarea",
                   "fullbase")
    binomial_housing_train <- binomial_housing_train 
    gaussian_housing_train <- binomial_housing_train 
    
    binomial_housing_test <- binomial_housing_test 
    gaussian_housing_test <- binomial_housing_test 
    
    # Transform the train dataset categorical values to encoded values.
    train_fit_res <- td_ordinal_encoding_fit_sqle(target.column='homestyle',
                                                  data=binomial_housing_train)
    
    train_transform_res <- td_ordinal_encoding_transform_sqle(
                            data=binomial_housing_train,
                            object=train_fit_res$result,
                            accumulate=c("sn", "price",
                                         "lotsize","bedrooms",
                                         "bathrms", "stories"))
    
    test_fit <- td_ordinal_encoding_fit_sqle(
                  target.column='homestyle',
                  data=binomial_housing_test)
    
    test_transform <- td_ordinal_encoding_transform_sqle(
                        data=binomial_housing_test,
                        object=test_fit$result,
                        accumulate=c("sn", "price", "lotsize",
                                     "bedrooms", "bathrms", "stories"))
    
    # Example 1: Train the model using the 'Gaussian' family.
    #            Predict the price using td_glm_predict_per_segment_sqle().
    
    # Train the model using the 'Gaussian' family.
    GLMPerSegment_out_1 <- td_glm_per_segment_sqle(data=gaussian_housing_train,
                                                   data.partition.column="stories",
                                                   input.columns=c('garagepl',
                                                                   'lotsize',
                                                                   'bedrooms',
                                                                   'bathrms'),
                                                   response.column="price",
                                                   family="Gaussian",
                                                   iter.max=1000,
                                                   batch.size=9)
    
    # Predict the price using td_glm_predict_per_segment_sqle().
    GLMPredictPerSegment_out_1 <- td_glm_predict_per_segment_sqle(
                                    newdata=gaussian_housing_test,
                                    newdata.partition.column="stories",
                                    object=GLMPerSegment_out_1,
                                    object.partition.column="stories",
                                    id.column="sn")
    
    # Print the result.
    print(GLMPredictPerSegment_out_1$result)
    
    # Example 2: Train the model using the 'Binomial' family.
    #            Predict the homestyle using td_glm_predict_per_segment_sqle().
    
    # Train the model using the 'Binomial' family.
    GLMPerSegment_out_2 <- td_glm_per_segment_sqle(
                            data=train_transform_res$result,
                            data.partition.column="stories",
                            input.columns=c('price', 'lotsize',
                                            'bedrooms', 'bathrms'),
                            response.column="homestyle",
                            family="Binomial",
                            iter.max=100)
    
    # Predict the homestyle using td_glm_predict_per_segment_sqle().
    GLMPredictPerSegment_out_2 <- td_glm_predict_per_segment_sqle(
                                    newdata=test_transform$result,
                                    newdata.partition.column="stories",
                                    object=GLMPerSegment_out_2,
                                    object.partition.column="stories",
                                    id.column="sn",
                                    output.prob=TRUE,
                                    output.responses=c("0", "1")
                                    )
    # Print the result.
    print(GLMPredictPerSegment_out_2$result)
                                    
    # Alternatively use S3 predict function to run predict on the output of
    # td_glm_per_segment_sqle() function.
    GLMPredictPerSegment_out_2 <- predict(
                                    GLMPerSegment_out_2,
                                    newdata=test_transform$result,
                                    newdata.partition.column="stories",
                                    object.partition.column="stories",
                                    id.column="sn",
                                    output.prob=TRUE,
                                    output.responses=c("0", "1")
                                    )
    
    # Print the result.
    print(GLMPredictPerSegment_out_2$result)