VectorDistance Example: Sparse I/O, Default Thresholds | Teradata Vantage - VectorDistance Example: Sparse-Format Input and Output, Default Thresholds

VectorDistance Example: Sparse I/O, Default Thresholds | Teradata Vantage - VectorDistance Example: Sparse-Format Input and Output, Default Thresholds - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

9.02

9.01

2.0

1.3

Published

February 2022

Language

English (United States)

Last Update

2022-02-10

dita:mapPath

rnn1580259159235.ditamap

dita:ditavalPath

ybt1582220416951.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

Input

See VectorDistance Examples Sparse Input.

SQL Call

SELECT * FROM VectorDistance (
  ON target_mobile_data as TargetTable PARTITION BY UserID
  ON ref_mobile_data as ReferenceTable DIMENSION
  USING
  TargetIdColumns ('UserID')
  TargetAttributeNameColumn ('Feature')
  TargetAttributeValueColumn ('value1')
  DistanceMeasure ('Cosine','Euclidean','Manhattan')
) AS dt ORDER BY 1;

Output

 target_userid ref_userid type      distance             
 ------------- ---------- --------- -------------------- 
             1          5 euclidean   1.1246501958870991
             1          5 manhattan            1.7299667
             1          5 cosine        0.45486517827424
             2          5 manhattan                 0.73
             2          5 euclidean   0.5243090691567331
             2          5 cosine     0.02608922985452755
             3          5 cosine    0.024150544155866593
             3          5 manhattan                 0.67
             3          5 euclidean   0.4526588119102511
             4          5 manhattan                 1.42
             4          5 euclidean    1.047091209016674
             4          5 cosine     0.43822243299800046

The following table (which is not output by the VectorDistance function) shows the distances of the target vectors from the reference vector (UserID 5) and their similarity ranks. The shorter the distance, the higher the similarity rank. Similarity rank is independent of measure—if relative distances are shorter in one measure, they are shorter in all measures. UserID 3 is most similar to UserID 5.

Target Distances from Reference and Similarity Ranks
target_userid	Cosine Distance	Euclidean Distance	Manhattan Distance	Similarity Rank
1	0.454865179	1.124650195	1.7299667	4
2	0.02608923	0.524309065	0.72999999	2
3	0.024150545	0.452658811	0.66999999	1
4	0.438222434	1.047091208	1.41999999	3

Download a zip file of all examples and a SQL script file that creates their input tables.