Correlation Output - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
9.02
9.01
2.0
1.3
Published
February 2022
Language
English (United States)
Last Update
2022-02-10
dita:mapPath
rnn1580259159235.ditamap
dita:ditavalPath
ybt1582220416951.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

Output Message Schema

Column Data Type Description
message VARCHAR Reports that final result is stored in table specified in OutputTable syntax element.

Correlation Output Table Schema, VIF ('false')

Column Data Type Description
partition_column VARCHAR or INTEGER [Column appears once for each specified partition_column.] Defines a partition of VIF scores.
corr_item1 VARCHAR Name of target_column.
corr_item2 VARCHAR Name of target_column.
value_col DOUBLE PRECISION Correlation between corr_item1 and corr_item2.

Correlation Output Table Schema, VIF ('true')

Column Data Type Description
partition_column Same as in InputTable [Column appears once for each specified partition_column.] Defines a partition of VIF scores.
attribute VARCHAR Name of target_column.
vif_score DOUBLE PRECISION Variance inflation factor (VIF) score of attribute at specified iteration number in iteration column.
The function generates a VIF score of NaN if either of the following conditions is true:
  • The correlation matrix is singular.
  • The variance of any attribute is 0, which happens if there is no data or each row has the same value for the entire attribute.
iteration INTEGER Number of VIF iteration. Meaning depends on OutputSummary:
OutputSummary Iteration Number
'true' Number of VIF iteration at which attribute was identified as multicollinear based on vif_score exceedingvif_threshold.

If vif_score never exceeded vif_threshold, number of final VIF iteration.

'false' Number of sequentially increasing VIF iteration.
multicollinear VARCHAR Whether attribute has vif_score greater than vif_threshold ('yes' or 'no'). Meaning depends on OutputSummary:
OutputSummary Iteration Number
'true' Whether attribute is multicollinear.
'false' Whether vif_score exceeded vif_threshold in the iteration.

Multiple attributes can have value 'yes'. At each iteration, attribute with highest vif_score above vif_threshold is removed from data set for next iteration. Attribute with vif_score higher than vif_threshold in one iteration may have lower vif_score in another iteration, after removal of another strongly multicollinear attribute.

This value is never marked 'yes' (multicollinear) for exception_attribute, even if its vif_score exceeds vif_threshold.