Purpose
Linear Regression Scoring is the application of a Linear Regression model to an input table that contains the same independent variable columns contained in the model. The result is an output score table that minimally contains one or more key columns and an estimate of the dependent variable in the model. The user may also choose to perform model evaluation, either separately or in combination with scoring. When requested, a report is produced as a result data set containing the standard error of estimate as well as the minimum, maximum, and average absolute error. When model evaluation is requested, the input table must contain a column representing the dependent variable in the model. When both scoring and evaluation are requested, the output table automatically includes the residual value, calculated as the difference between the original value and the predicted value of the dependent variable. The residual value can also be requested when only scoring is performed.
- If one or more group by columns are present in the input table to be scored and the model input table, each row in the input table to be scored is scored using the appropriate model in the model input table.
- If an error such as “Constant columns detected” occurs for a particular combination of group by column values, the predicted value of the dependent column is null for any row containing that combination of group by column values. The error message is also placed in the column name in the model report.
To run the stand-alone version of the linear regression algorithm or to score a model built by this algorithm, the td_analyze stored procedure must be installed on the Teradata system with appropriate permissions granted. For instructions on how to install td_analyze, refer to In-Database Analytic Function Setup.
Syntax
call twm.td_analyze('linearscore','database=db;tablename=tbl;outputdatabase=out_db;outputtablename=out_tbl;modeldatabase=model_db;modeltablename=model_tbl;index=i1,i2,i3;retain=r1,r2,r3;scoringmethod={score|evaluate|scoreandevaluate};residual=res;predcited=pre; ');Required Parameters
- database
- The database containing the input table.
- modeldatabase
- The database containing the model input table.
- modeltablename
- The input table containing the linear model to use in scoring. This table must be created using the linear function, named with the outputtablename parameter.
- tablename
- The input table to score.
Optional Parameters
- index
- By default, the primary index columns of the score output table are the primary index columns of the input table. This parameter allows the user to specify one or more different columns for the primary index of the score output table. Regardless of whether the user uses the default setting or specifies different columns, the index columns are included both in the Primary Index clause and the select list. In addition, the index columns needs to form a unique key for the score output table. Otherwise, there are more than one score for a given observation.
- outputdatabase
- The database that contains the output score table.
- outputtablename
- The name of the score output table containing key columns and predicted values of the dependent variable in the linear model. The output table may also contain retained columns passed through from the input to the output table unchanged, as well as a residual value containing the difference between the actual and predicted values of the dependent variable column. The output table may also contain group by columns if these are present in the model table.
If the output table exists, it must first be dropped by the user if outputdatabase and outputtablename are both specified. If outputdatabase and outputtablename are not both specified because only model evaluation is being performed, a volatile output table with a randomly generated name is created in the logon user database, and the output result set is returned to the user instead.
- overwrite
-
When overwrite is set to true (default), the output tables are dropped before creating new ones.
- predicted
- If the score method is score or score and evaluate, the name of the predicted value column is entered here. If not entered here, the name of the dependent column in the input table is used.
- residual
- If the score method is score and evaluate, the name of a column that contains the residual value (the difference between the predicted and actual value of the dependent variable) is given here. By default, this column is named “Residual”.
- retain
- One or more columns from the input table can optionally be specified here to be passed along to the score output table.
- scoringmethod
- Three scoring methods are available as outlined below. By default, the model is scored but not evaluated.
- Score
- Evaluate
- Score and Evaluate
Examples
In this example, linear scoring is performed without model evaluation.
call twm.td_analyze('linearscore','database=twm_source;tablename=twm_customer;modeldatabase=twm_results;modeltablename=twm_linear2;outputdatabase=twm_results;outputtablename=twm_linear_score2;predicted=inc');
In this example, model evaluation is performed without scoring.
call twm.td_analyze('linearscore','database=twm_source;tablename=twm_customer;modeldatabase=twm_results;modeltablename=twm_linear2;scoringmethod=evaluate');
In this example, both scoring and model evaluation are performed.
call twm.td_analyze('linearscore','database=twm_source;tablename=twm_customer;modeldatabase=twm_results;modeltablename=twm_linear2;outputdatabase=twm_results;outputtablename=twm_linear_score2_se;scoringmethod=scoreandevaluate;predicted=inc;residual=res');