Teradata Package for R Function Reference | 17.00 - td_lin_reg_predict_valib - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Product
Teradata Package for R
Release Number
17.00
Published
July 2021
Language
English (United States)
Last Update
2023-08-08
dita:id
B700-4007
NMT
no
Product Category
Teradata Vantage
Linear Regression Predict

Description

Linear Regression Scoring is the application of a Linear Regression model to an input data that contains the same independent variable columns contained in the model. The result is an output score data that minimally contains one or more key columns and an estimate of the dependent variable in the model.

Some of the key features of linear scoring are outlined below:

  1. If one or more group by columns are present in the input data to be scored and the model input data, each row in the input data to be scored is scored using the appropriate model in the model input data.

  2. If an error such as "Constant columns detected" occurs for a particular combination of group by column values, the predicted value of the dependent column is null for any row containing that combination of group by column values. The error message is also placed in the column name in the model report.

Usage

td_lin_reg_predict_valib(model, data, ...)

Arguments

model

Required Argument.
Specifies the input containing the linear model to use in scoring This must be the "model" tbl_teradata generated by td_lin_reg_valib() or a tbl_teradata created on a table generated by 'linear' function from Vantage Analytic Library.
Types: tbl_teradata

data

Required Argument.
Specifies the input data to score.
Types: tbl_teradata

...

Specifies other arguments supported by the function as described in the 'Other Arguments' section.

Value

Function returns an object of class "td_lin_reg_predict_valib" which is a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator using name: result.

Other Arguments

index.columns

Optional Argument.
Specifies the name(s) of the column(s) representing the primary index of the score output. By default, the primary index columns of the score output are the primary index columns of the input. In addition, the index columns need to form a unique key for the score output. Otherwise, there are more than one score for a given observation.
Types: character OR vector of Strings (character)

response.column

Optional Argument.
Specifies the name of the predicted value column. If not used, the name of the dependent column in the input is used.
Note:

  • If the response column is not unique in the score output, '_tm_' is automatically placed in front of the name.

Types: character

accumulate

Optional Argument.
Specifies the name(s) of the column(s) from the input to retain in the output.
Types: character OR vector of Strings (character)

residual.column

Optional Argument.
Specifies the name of a column that contains the residual value (the difference between the predicted and actual value of the dependent variable).
Default Value: 'Residual'
Types: character

Examples


# Notes:
#   1. To execute Vantage Analytic Library functions, set option
#      'val.install.location' to the database name where Vantage analytic
#      library functions are installed.
#   2. Datasets used in these examples can be loaded using Vantage Analytic
#      Library installer.

# Set the option 'val.install.location'.
options(val.install.location = "SYSLIB")

# Get remote data source connection.
con <- td_get_context()$connection

# Create an object of class "tbl_teradata".
df <- tbl(con, "customer")
print(df)

# Example 1: Shows how linear regression model scoring is performed.
# First generate the model using td_lin_reg_valib() function.
lin_reg_obj <- td_lin_reg_valib(data=df,
                                columns=c("age", "years_with_bank",
                                          "nbr_children"),
                                response.column="income")
# Print the model.
print(lin_reg_obj$model)


# Score the data using the linear regression model generated above.
obj <- td_lin_reg_predict_valib(data=df,
                                model=lin_reg_obj$model,
                                response.column="inc")
# Print the results.
print(obj$result)

# Score using S3 predict function and the model generated above.
obj <- predict(object=lin_reg_obj,
               data=df,
               residual.column="residual_col")
# Print the results.
print(obj$result)