In linear regression, the relationship between the dependent variable Y and independent variables X is represented by a straight line.
The equation for a simple linear regression model is:
Y= β0+ β1 X+ ϵ
where Y is the dependent variable, X is the independent variable, β0 is the intercept, β1 is the slope, and ∈ is the error term. The slope represents the change in Y for a unit change in X, and the intercept represents the value of Y when X is zero.
The goal of linear regression is to estimate the values of β0 and β1 that minimize the sum of squared errors (SSE) between the predicted values and actual values. SSE is calculated as:
SSE= ∑ (Y-Ŷ)
where Y is the actual value of the dependent variable, and Ŷ is the predicted value.
There are techniques used to estimate the values of β0 and β1, such as ordinary least squares (OLS) and gradient descent. OLS is a method that finds the values of β0 and β1 that minimize SSE by calculating their partial derivatives with respect to SSE and setting them to zero. Gradient descent is an optimization algorithm that iteratively adjusts the values of β0 and β1 to minimize SSE.
Use these metrics to evaluate the performance of a linear regression model:
Metric | Description |
---|---|
R-squared | Measures the proportion of variation in the dependent variable explained by the independent variables |
Mean squared error (MSE) | Measures the average of the squared differences between the predicted values and actual values. |
Mean absolute error (MAE) | Measures the average of the absolute differences between the predicted values and actual values. |