TD_RegressionEvaluator Usage Notes | Teradata Vantage - TD_RegressionEvaluator Usage Notes

TD_RegressionEvaluator Usage Notes | Teradata Vantage - TD_RegressionEvaluator Usage Notes - Analytics Database

Database Analytic Functions

Deployment

VantageCloud

VantageCore

Edition

Enterprise

IntelliFlex

VMware

Product

Analytics Database

Release Number

17.20

Published

June 2022

Language

English (United States)

Last Update

2024-04-06

dita:mapPath

gjn1627595495337.ditamap

dita:ditavalPath

ayr1485454803741.ditaval

dita:id

jmh1512506877710

Product Category

Teradata Vantage™

In linear regression, the relationship between the dependent variable Y and independent variables X is represented by a straight line.

The equation for a simple linear regression model is:

Y= β0+ β1 X+ ϵ

where Y is the dependent variable, X is the independent variable, β0 is the intercept, β1 is the slope, and ∈ is the error term. The slope represents the change in Y for a unit change in X, and the intercept represents the value of Y when X is zero.

The goal of linear regression is to estimate the values of β0 and β1 that minimize the sum of squared errors (SSE) between the predicted values and actual values. SSE is calculated as:

SSE= ∑ (Y-Ŷ)

where Y is the actual value of the dependent variable, and Ŷ is the predicted value.

There are techniques used to estimate the values of β0 and β1, such as ordinary least squares (OLS) and gradient descent. OLS is a method that finds the values of β0 and β1 that minimize SSE by calculating their partial derivatives with respect to SSE and setting them to zero. Gradient descent is an optimization algorithm that iteratively adjusts the values of β0 and β1 to minimize SSE.

Use these metrics to evaluate the performance of a linear regression model:

Metric	Description
R-squared	Measures the proportion of variation in the dependent variable explained by the independent variables
Mean squared error (MSE)	Measures the average of the squared differences between the predicted values and actual values.
Mean absolute error (MAE)	Measures the average of the absolute differences between the predicted values and actual values.