Data Quality Reports | Linear Regression | Vantage Analytics Library - Data Quality Reports - Vantage Analytics Library

Vantage Analytics Library User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
Lake
VMware
Product
Vantage Analytics Library
Release Number
2.2.0
Published
March 2023
Language
English (United States)
Last Update
2024-01-02
dita:mapPath
ibw1595473364329.ditamap
dita:ditavalPath
iup1603985291876.ditaval
dita:id
zyl1473786378775
Product Category
Teradata Vantage

Constant Variables

Before building a model, the linear function checks if any variable has a constant value (that is, a standard deviation of zero). If so, the function stops, notifies you, and outputs a Constant Variables Table report. You can remove the constant variables from the model and rerun the function.

The Constant Variables Table report can include variables that do not have constant values in the data. The cause is a column (or multiple columns) with values that are extremely large and very close, where precision loss causes the standard deviation to appear to be zero. Correct this problem by rescaling the values in such columns before matrix building or analysis, using the transformation function Rescale or Z-Score.

Variable Statistics

If statstable=true, the linear function includes in the XML output string a report that includes the mean value and standard deviation of each model variable, derived from the ESSCP matrix built by the function or provided as input.

Near Dependency

If neardependencyreport=true, the linear function includes in the XML output string a report that shows columns that may be collinear, with their variance proportions, means, and standard deviations. The Near Dependency report greatly simplifies the search for collinear variables or near dependencies in the data. You can specify the thresholds for the condition index and variance proportion.

In the report, near dependencies are in descending order based on their condition index value and variables contributing to a near dependency are in descending order based on their variance proportion.

Here is an example of a Near Dependency report:

Variable Name Factor Condition Index Variance Proportion Mean Standard Deviation
CONSTANT 7 15001.8594 1 * *
cust_id 7 15001.8594 1 1362987.891 293.5012
age 6 52.6169 .9963 33.744 22.3731
combo2 6 52.6169 .9935 25.733 23.4274
children 6 52.6169 .713 .534 1.0029
income 5 35.3599 .9951 16978.026 21586.8442
combo1 5 35.3599 .995 33654.602 43110.862