Purpose - Teradata Warehouse Miner

In-Database Analytic Functions User Guide

Product
Teradata Warehouse Miner
Release Number
5.4.4
Published
August 2017
Language
English (United States)
Last Update
2018-05-04
dita:mapPath
guj1484331868727.ditamap
dita:ditavalPath
ft:empty
dita:id
B035-2306
lifecycle
previous
Product Category
Teradata® Warehouse Miner

Linear Regression Scoring is the application of a Linear Regression model to an input table that contains the same independent variable columns contained in the model. The result is an output score table that minimally contains one or more key columns and an estimate of the dependent variable in the model. The user may also choose to perform model evaluation, either separately or in combination with scoring. When requested, a report is produced as a result data set containing the standard error of estimate as well as the minimum, maximum and average absolute error. When model evaluation is requested, the input table must contain a column representing the dependent variable in the model. When both scoring and evaluation are requested, the output table will automatically include the residual value, calculated as the difference between the original value and the predicted value of the dependent variable. The residual value may also be requested when only scoring is performed.

The Linear Scoring chapter in Teradata Warehouse Miner User Guide, Volume 3—Analytic Functions, B035-2302, contains a description of the linear regression scoring included in Teradata Warehouse Miner. Linear regression scoring is also available as a stand-alone external stored procedure that can be executed directly in the Teradata database, independently of Teradata Warehouse Miner. It is the stand-alone version and its parameters that are described in this document. Some of the key features of this stand-alone version of linear scoring are outlined below.
  • If one or more group by columns are present in the input table to be scored and the model input table, each row in the input table to be scored is scored using the appropriate model in the model input table.
  • If an error such as “Constant columns detected” occurs for a particular combination of group by column values, the predicted value of the dependent column will be null for any row containing that combination of group by column values. The error message will also be placed in the column name in the model report.

To execute the stand-alone version of the linear regression algorithm or to score a model built by this algorithm the td_analyze stored procedure must be installed on the Teradata system, with appropriate permissions granted. Refer to In-Database Analytic Function Setup for instructions on how to install td_analyze.