Purpose - Teradata Warehouse Miner

Linear Regression is one of the most fundamental types of predictive modeling algorithms. In linear regression, a dependent numeric variable is expressed in terms of the sum of one or more independent numeric variables, which are each multiplied by a numeric coefficient, usually with a constant term added to the sum of independent variables. It is the coefficients of the independent variables together with a constant term that comprise a linear regression model. Applying these coefficients to the variables (columns) of each observation (row) in a data set (table) is known as scoring, described later in this chapter.

The Linear Regression chapter in Teradata Warehouse Miner User Guide, Volume 3—Analytic Functions, B035-2302, contains a description of the linear regression algorithm included in Teradata Warehouse Miner. The linear regression algorithm is also available as a stand-alone external stored procedure that can be executed directly in the Teradata database, independently of Teradata Warehouse Miner. It is the stand-alone version and its parameters that are described in this document. Some of the key features of this stand-alone version of linear regression are outlined below.

The Teradata supplied table operator CALCMATRIX is used to build a table that represents an extended cross-products matrix that is the input to the algorithm
One or more group by columns may optionally be specified so that an input matrix is built for each combination of group by column values, and subsequently a separate linear model is built for each matrix.
To achieve this, the names of the group by columns are passed to CALCMATRIX as parameters so that it includes them as columns in the matrix table it creates.
The algorithm is partially scalable because the size of each input matrix depends only on the number of independent variables (columns) and not on the size of the input table. The calculations performed on the client workstation however are not scalable when group by columns are used, because each model is built serially based on each matrix in the matrix table.
Teradata Release 14.10 or later is required by this algorithm due to the use of the CALCMATRIX table operator, first available in that release.

To execute the stand-alone version of the linear regression algorithm or to score a model built by this algorithm the td_analyze stored procedure must be installed on the Teradata system, with appropriate permissions granted. Refer to In-Database Analytic Function Setup for instructions on how to install td_analyze.