Teradata Package for R Function Reference | 17.20 - RegressionEvaluator - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for R
Release Number
17.20
Published
March 2024
ft:locale
en-US
ft:lastEdition
2024-05-03
dita:id
TeradataR_FxRef_Enterprise_1720
Product Category
Teradata Vantage

RegressionEvaluator

Description

The td_regression_evaluator_sqle() function computes metrics to evaluate and compare multiple models and summarizes how close predictions are to their expected values.

Notes:

  • This function requires the UTF8 client character set for UNICODE data.

  • This function does not support Pass Through Characters (PTCs).

  • For information about PTCs, see Teradata Vantage™ - Analytics Database International Character Set Support.

  • This function does not support KanjiSJIS or Graphic data types.

Usage

  td_regression_evaluator_sqle (
      data = NULL,
      observation.column = NULL,
      prediction.column = NULL,
      metrics = NULL,
      independent.features.num = NULL,
      freedom.degrees = NULL
  )

Arguments

data

Required Argument.
Specifies the input tbl_teradata.
Types: tbl_teradata

observation.column

Required Argument.
Specifies the column name in "data" containing observation labels.
Types: character

prediction.column

Required Argument.
Specifies the column name in "data" containing predicted labels.
Types: character

metrics

Optional Argument.
Specifies the list of evaluation metrics. The function returns
the following metrics if the list is not provided:
MAE:
Mean absolute error (MAE) is the arithmetic average of the absolute errors between observed values and predicted values.
MSE:
Mean squared error (MSE) is the average of the squares of the errors between observed values and predicted values.
MSLE:
Mean Square Log Error (MSLE) is the relative difference between the log-transformed observed values and predicted values.
MAPE:
Mean Absolute Percentage Error (MAPE) is the mean or average of the absolute percentage errors of forecasts.
MPE:
Mean percentage error (MPE) is the computed average of percentage errors by which predicted values differ from observed values.
RMSE:
Root means squared error (MSE) is the square root of the average of the squares of the errors between observed values and predicted values.
RMSLE:
Root means Square Log Error (MSLE) is the square root of the relative difference between the log-transformed observed values and predicted values.
R2:
R Squared (R2) is the proportion of the variation in the dependent variable that is predictable from the independent variable(s).
AR2:
Adjusted R-squared (AR2) is a modified version of R-squared that has been adjusted for the independent variable(s) in the model.
EV:
Explained variation (EV) measures the proportion to which a mathematical model accounts for the variation (dispersion) of a given data set.
ME:
Max-Error (ME) is the worst-case error between observed values and predicted values.
MPD:
Mean Poisson Deviance (MPD) is equivalent to Tweedie Deviances when the power parameter value is 1.
MGD:
Mean Gamma Deviance (MGD) is equivalent to Tweedie Deviances when the power parameter value is 2.
FSTAT:
F-statistics (FSTAT) conducts an F-test. An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis.

  • F_score: F_score value from the F-test.

  • F_Critcialvalue: F critical value from the F-test. (alpha, df1, df2, UPPER_TAILED), alpha = 95

  • p_value: Probability value associated with the F_score value (F_score, df1, df2, UPPER_TAILED)

  • F_conclusion: F-test result, either 'reject null hypothesis' or 'fail to reject null hypothesis'. If F_score > F_Critcialvalue, then 'reject null hypothesis' Else 'fail to reject null hypothesis'

Types: character OR vector of Strings (character)

independent.features.num

Optional Argument.
Specifies the number of independent variables in the model.
Required with Adjusted R Squared metric, else ignored.
Types: integer

freedom.degrees

Optional Argument.
Specifies the numerator degrees of freedom (df1) and denominator
degrees of freedom (df2). Required with fstat metric, else ignored.
Types: integer OR vector of integers

**generic_arguments:
Specifies the generic keyword arguments SQLE functions accept. Below
are the generic keyword arguments:

persist:
Optional Argument.
Specifies whether to persist the results of the
function in a table or not. When set to TRUE, results are persisted in a table; otherwise, results are garbage collected at the end of the session.
Default Value: FALSE
Types: logical

volatile:
Optional Argument.
Specifies whether to put the results of the
function in a volatile table or not. When set to TRUE, results are stored in a volatile table, otherwise not.
Default Value: FALSE
Types: logical

Function allows the user to partition, hash, order or local order the input data. These generic arguments are available for each argument that accepts tbl_teradata as input and can be accessed as:

  • "<input_data_arg_name>_partition_column" accepts character or list of character (Strings)

  • "<input_data_arg_name>_hash_column" accepts character or list of character (Strings)

  • "<input_data_arg_name>_order_column" accepts character or list of character (Strings)

  • "local_order_<input_data_arg_name>" accepts logical

Note:
These generic arguments are supported by tdplyr if the underlying SQL Engine function supports, else an exception is raised.

Value

Function returns an object of class "td_regression_evaluator_sqle" which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator using the name(s):result

Examples

  
    
    # Get the current context/connection.
    con <- td_get_context()$connection
    
    # Set the option 'val.install.location'.
    options(val.install.location = "val")
    
    # Create required tbl_teradata.
    # Load the example data.
    loadExampleData("tdplyr_example", "titanic")
    
    # Create tbl_teradata object.
    titanic <- tbl(con, "titanic")
    
    # First generate linear regression model using LinReg() function from 'valib'.
    lin_reg_obj <- td_lin_reg_valib(data=titanic,
                                    columns=c("age", "survived", "pclass"),
                                    response.column="fare")
    
    # Score the data using the linear regression model generated above.
    obj <- td_lin_reg_predict_valib(data=titanic,
                                    model=lin_reg_obj$model,
                                    accumulate = "fare",
                                    response.column="fare_prediction")
    
    # Check the list of available analytic functions.
    display_analytic_functions()
    
    # Example 1 : Compute 'RMSE', 'R2' and 'FSTAT' metrics to evaluate
    #             the model.
    RegressionEvaluator_out <- td_regression_evaluator_sqle(
                                data = obj$result,
                                observation.column = "fare",
                                prediction.column = "fare_prediction",
                                freedom.degrees = c(1, 2),
                                independent.features.num = 2,
                                metrics = c('RMSE','R2','FSTAT'))
    
    # Print the result.
    print(RegressionEvaluator_out$result)