Linear regression is one of the oldest and most fundamental types of analysis in statistics. The British scientist Sir Francis Galton originally developed it in the latter part of the 19th century. The term “regression” derives from the nature of his original study in which he found that the children of both tall and short parents tend to “revert” or “regress” toward average heights. [Neter] It has also been associated with the work of Gauss and Legendre who used linear models in working with astronomical data. Linear regression is thought of today as a special case of generalized linear models, which also includes models such as logit models (logistic regression), log-linear models and multinomial response models. [McCullagh]
Why build a linear regression model? It is, after all, one of the simplest types of models that can be built. Why not start out with a more sophisticated model such as a decision tree? One reason is that if a simpler model will suffice, it is better than an unnecessarily complex model. Another reason is to learn about the relationships between a set of observed variables. Is there in fact a linear relationship between each of the observed variables and the variable to predict? Which variables help in predicting the target dependent variable? If a linear relationship does not exist, is there another type of relationship that does? By transforming a variable, say by taking its exponent or log or perhaps squaring it, and then building a linear regression model, these relationships can hopefully be seen. In some cases, it may even be possible to create an essentially non-linear model using linear regression by transforming the data first. In fact, one of the many sophisticated forms of regression, called piecewise linear regression, was designed specifically to build nonlinear models of nonlinear phenomena. Finally, in spite of being a relatively simple type of model, there is a rich set of statistics available to explore the nature of any linear regression model built.