5.4.5 - Linear Regression Scoring - Teradata Warehouse Miner

In-Database Analytic Functions User Guide

Teradata Warehouse Miner
February 2018
User Guide


Linear Regression Scoring is the application of a Linear Regression model to an input table that contains the same independent variable columns contained in the model. The result is an output score table that minimally contains one or more key columns and an estimate of the dependent variable in the model. The user may also choose to perform model evaluation, either separately or in combination with scoring. When requested, a report is produced as a result data set containing the standard error of estimate as well as the minimum, maximum and average absolute error. When model evaluation is requested, the input table must contain a column representing the dependent variable in the model. When both scoring and evaluation are requested, the output table will automatically include the residual value, calculated as the difference between the original value and the predicted value of the dependent variable. The residual value may also be requested when only scoring is performed.

The Linear Scoring chapter in Teradata Warehouse Miner User Guide, Volume 3—Analytic Functions, B035-2302, contains a description of the linear regression scoring included in Teradata Warehouse Miner. Linear regression scoring is also available as a stand-alone external stored procedure that can be executed directly in the Teradata database, independently of Teradata Warehouse Miner. It is the stand-alone version and its parameters that are described in this document. Some of the key features of this stand-alone version of linear scoring are outlined below.
  • If one or more group by columns are present in the input table to be scored and the model input table, each row in the input table to be scored is scored using the appropriate model in the model input table.
  • If an error such as “Constant columns detected” occurs for a particular combination of group by column values, the predicted value of the dependent column will be null for any row containing that combination of group by column values. The error message will also be placed in the column name in the model report.

To execute the stand-alone version of the linear regression algorithm or to score a model built by this algorithm the td_analyze stored procedure must be installed on the Teradata system, with appropriate permissions granted. Refer to In-Database Analytic Function Setup for instructions on how to install td_analyze.


call twm.td_analyze('linearscore','database=db;tablename=tbl;outputdatabase=out_db;outputtablename=out_tbl;modeldatabase=model_db;modeltablename=model_tbl;index=i1,i2,i3;retain=r1,r2,r3;scoringmethod={score|evaluate|scoreandevaluate};residual=res;predcited=pre; ');

Required Parameters

The database containing the input table.
The database containing the model input table.
The input table containing the linear model to be used in scoring. This table must have been created using the linear function, named with the outputtablename parameter.
The input table to be scored.

Optional Parameters

By default, the primary index columns of the score output table are the primary index columns of the input table. This parameter allows the user to specify one or more different columns for the primary index of the score output table. Regardless of whether the user uses the default setting or specifies different columns, the index columns are included both in the Primary Index clause and the select list. In addition, the index columns should form a unique key for the score output table. Otherwise, there could be more than one score for a given observation.
The database that will contain the output score table.
If outputdatabase and outputtablename are not both specified, a volatile output table with randomly generated name is created in the logon user database.
The name of the score output table containing key columns and predicted values of the dependent variable in the linear model. The output table may also contain retained columns passed through from the input to the output table unchanged, as well as a residual value containing the difference between the actual and predicted values of the dependent variable column. The output table may also contain group by columns if these are present in the model table.

Note that if the output table already exists it must first be dropped by the user if outputdatabase and outputtablename are both specified. If outputdatabase and outputtablename are not both specified because only model evaluation is being performed, a volatile output table with a randomly generated name is created in the logon user database, and the output result set is returned to the user instead.


When overwrite is set to true (default), the output tables are dropped before creating new ones.

If the score method is score or score and evaluate, the name of the predicted value column can be entered here. If not entered here, the name of the dependent column in the input table is used.
If the score method is score and evaluate, the name of a column that contains the residual value (the difference between the predicted and actual value of the dependent variable) can be given here. By default, this column is named “Residual”.
One or more columns from the input table can optionally be specified here to be passed along to the score output table.
Three scoring methods are available as outlined below. By default, the model is scored but not evaluated.
  • Score
  • Evaluate
  • Score and Evaluate


Examples in this section demonstrate the use of Linear Scoring with various available options. To execute the provided examples, the td_analyze function must be installed in a database called twm and the Teradata Warehouse Miner tutorial data must be installed in the twm_source database.
If these examples are executed, do not introduce extra spaces between parameters when copying.

In this example, linear scoring is performed without model evaluation.

call twm.td_analyze('linearscore','database=twm_source;tablename=twm_customer;modeldatabase=twm_results;modeltablename=twm_linear2;outputdatabase=twm_results;outputtablename=twm_linear_score2;predicted=inc');

In this example, model evaluation is performed without scoring.

call twm.td_analyze('linearscore','database=twm_source;tablename=twm_customer;modeldatabase=twm_results;modeltablename=twm_linear2;scoringmethod=evaluate');

In this example, both scoring and model evaluation are performed.

call twm.td_analyze('linearscore','database=twm_source;tablename=twm_customer;modeldatabase=twm_results;modeltablename=twm_linear2;outputdatabase=twm_results;outputtablename=twm_linear_score2_se;scoringmethod=scoreandevaluate;predicted=inc;residual=res');