1.1 - 8.10 - FellegiSunterPredict Example: Unsupervised Learning Model - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.1
8.10
Release Date
October 2019
Content Type
Programming Reference
Publication ID
B700-4003-079K
Language
English (United States)

Input

  • Input table: fspredict_input, created from the output of StringSimilarity_MLE Example: Compare src_text2 to tar_text with this SQL code:
    DROP TABLE fspredict_input;
    
    CREATE MULTISET TABLE fspredict_input AS (
      SELECT * FROM StringSimilarity_MLE (
        ON strsimilarity_input PARTITION BY ANY
        USING
        ComparisonColumnPairs (
                      'jaro (src_text2 , tar_text ) AS jaro1_sim',
                      'LD (src_text2 , tar_text, 2) AS ld1_sim',
                      'n_gram (src_text2 , tar_text, 2) AS ngram1_sim',
                      'jaro_winkler (src_text2 , tar_text, 2) AS jw1_sim'
        )
        CaseSensitive ('true')
        Accumulate ('id','src_text2','tar_text')
      ) AS dt1
    ) WITH DATA AS dt2 PARTITION BY id;
    
    SELECT * FROM fspredict_input ORDER BY 1;
  • Model: fg_unsupervised_model, output by FellegiSunter Example: Unsupervised Learning

SQL Call

SELECT * FROM FellegiSunterPredict (
  ON fspredict_input PARTITION BY ANY
  ON fg_unsupervised_model AS Model DIMENSION
  USING
  Accumulate ('id', 'src_text2', 'tar_text', 'jaro1_sim',
              'ld1_sim','ngram1_sim', 'jw1_sim')
) AS dt ORDER BY id;

Output

The final column, match_result, contains the model prediction—M for match, U for no match. The weight column contains the weight of the object pair.

 id src_text2      tar_text       jaro1_sim          ld1_sim            ngram1_sim         jw1_sim            weight              match_result 
 -- -------------- -------------- ------------------ ------------------ ------------------ ------------------ ------------------- ------------ 
  5 allen          allies         0.8222222222222223 0.6666666666666666                0.4 0.8755555555555556 -14.113029607677364 U           
  7 center         centre         0.9444444444444445 0.6666666666666666                0.6 0.9666666666666667  23.276190779622826 P           
  3 acquire        acquiesce      0.8412698412698413 0.6666666666666666                0.5 0.9047619047619048 -14.113029607677364 U           
 12 bear           bear                          1.0                1.0                1.0                1.0  45.529567201568554 M           
  8 cheap          chief          0.7333333333333334                0.4               0.25 0.7866666666666667  -56.22777957673824 U           
  4 cccgggaaccaacc ccagggaaacccac 0.8754578754578755 0.7142857142857143 0.6923076923076923 0.9003663003663004  23.276190779622826 P           
  2 fone           phone          0.7833333333333333                0.6                0.5 0.7833333333333333  -56.22777957673824 U           
  6 angle          angels         0.8777777777777779 0.6666666666666666                0.4 0.9144444444444445 -14.113029607677364 U           
 11 dell           lead                          0.5               0.25                0.0                0.5  -56.22777957673824 U           
  9 circle         circuit         0.746031746031746 0.5714285714285714                0.5 0.8476190476190476  -35.78160366254483 U           
 10 debut          debris         0.7000000000000001                0.5                0.4               0.79  -56.22777957673824 U

Download a zip file of all examples and a SQL script file that creates their input tables from the attachment in the left sidebar.