Linear Regression Scoring - Teradata Warehouse Miner

In-Database Analytic Functions User Guide

Product
Teradata Warehouse Miner
Release Number
5.4.6
Published
November 2018
Language
English (United States)
Last Update
2018-12-07
dita:mapPath
akh1538171534882.ditamap
dita:ditavalPath
ft:empty
dita:id
B035-2306
lifecycle
previous
Product Category
Teradata® Warehouse Miner

Purpose

Linear Regression Scoring is the application of a Linear Regression model to an input table that contains the same independent variable columns contained in the model. The result is an output score table that minimally contains one or more key columns and an estimate of the dependent variable in the model. The user may also choose to perform model evaluation, either separately or in combination with scoring. When requested, a report is produced as a result data set containing the standard error of estimate as well as the minimum, maximum, and average absolute error. When model evaluation is requested, the input table must contain a column representing the dependent variable in the model. When both scoring and evaluation are requested, the output table automatically includes the residual value, calculated as the difference between the original value and the predicted value of the dependent variable. The residual value can also be requested when only scoring is performed.

For more information on linear scoring and a description of the linear regression scoring, see Teradata Warehouse Miner User Guide: Volume 3, Analytic Functions (B035-2302). Linear regression scoring is also available as a stand-alone external stored procedure that can be run directly in the Teradata database, independently of TWM. It is the stand-alone version and its parameters are described in this document. Some of the key features of this stand-alone version of linear scoring are outlined below.
  • If one or more group by columns are present in the input table to be scored and the model input table, each row in the input table to be scored is scored using the appropriate model in the model input table.
  • If an error such as “Constant columns detected” occurs for a particular combination of group by column values, the predicted value of the dependent column is null for any row containing that combination of group by column values. The error message is also placed in the column name in the model report.

To run the stand-alone version of the linear regression algorithm or to score a model built by this algorithm, the td_analyze stored procedure must be installed on the Teradata system with appropriate permissions granted. For instructions on how to install td_analyze, refer to In-Database Analytic Function Setup.

Syntax

call twm.td_analyze('linearscore','database=db;tablename=tbl;outputdatabase=out_db;outputtablename=out_tbl;modeldatabase=model_db;modeltablename=model_tbl;index=i1,i2,i3;retain=r1,r2,r3;scoringmethod={score|evaluate|scoreandevaluate};residual=res;predcited=pre; ');

Required Parameters

database
The database containing the input table.
modeldatabase
The database containing the model input table.
modeltablename
The input table containing the linear model to use in scoring. This table must be created using the linear function, named with the outputtablename parameter.
tablename
The input table to score.

Optional Parameters

index
By default, the primary index columns of the score output table are the primary index columns of the input table. This parameter allows the user to specify one or more different columns for the primary index of the score output table. Regardless of whether the user uses the default setting or specifies different columns, the index columns are included both in the Primary Index clause and the select list. In addition, the index columns needs to form a unique key for the score output table. Otherwise, there are more than one score for a given observation.
outputdatabase
The database that contains the output score table.
If outputdatabase and outputtablename are not both specified, a volatile output table with randomly generated name is created in the logon user database.
outputtablename
The name of the score output table containing key columns and predicted values of the dependent variable in the linear model. The output table may also contain retained columns passed through from the input to the output table unchanged, as well as a residual value containing the difference between the actual and predicted values of the dependent variable column. The output table may also contain group by columns if these are present in the model table.

If the output table exists, it must first be dropped by the user if outputdatabase and outputtablename are both specified. If outputdatabase and outputtablename are not both specified because only model evaluation is being performed, a volatile output table with a randomly generated name is created in the logon user database, and the output result set is returned to the user instead.

overwrite

When overwrite is set to true (default), the output tables are dropped before creating new ones.

predicted
If the score method is score or score and evaluate, the name of the predicted value column is entered here. If not entered here, the name of the dependent column in the input table is used.
residual
If the score method is score and evaluate, the name of a column that contains the residual value (the difference between the predicted and actual value of the dependent variable) is given here. By default, this column is named “Residual”.
retain
One or more columns from the input table can optionally be specified here to be passed along to the score output table.
scoringmethod
Three scoring methods are available as outlined below. By default, the model is scored but not evaluated.
  • Score
  • Evaluate
  • Score and Evaluate

Examples

Examples in this section demonstrate the use of Linear Scoring with options. To run the provided examples, the td_analyze function must be installed in a database called twm and the TWM tutorial data must be installed in the twm_source database.
If these examples are run, do not introduce extra spaces between parameters when copying.

In this example, linear scoring is performed without model evaluation.

call twm.td_analyze('linearscore','database=twm_source;tablename=twm_customer;modeldatabase=twm_results;modeltablename=twm_linear2;outputdatabase=twm_results;outputtablename=twm_linear_score2;predicted=inc');

In this example, model evaluation is performed without scoring.

call twm.td_analyze('linearscore','database=twm_source;tablename=twm_customer;modeldatabase=twm_results;modeltablename=twm_linear2;scoringmethod=evaluate');

In this example, both scoring and model evaluation are performed.

call twm.td_analyze('linearscore','database=twm_source;tablename=twm_customer;modeldatabase=twm_results;modeltablename=twm_linear2;outputdatabase=twm_results;outputtablename=twm_linear_score2_se;scoringmethod=scoreandevaluate;predicted=inc;residual=res');