1.0 - 8.00 - FellegiSunterPredict Example 1: Unsupervised Learning Model - Teradata Vantage

Teradata® Vantage Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.0
8.00
Release Date
May 2019
Content Type
Programming Reference
Publication ID
B700-4003-098K
Language
English (United States)

Input

  • Input table: fspredict_input, created from the output of StringSimilarity Example 2: Compare src_text2 to tar_text with this SQL code:
    DROP TABLE fspredict_input;
    
    CREATE MULTISET TABLE fspredict_input AS (
      SELECT * FROM StringSimilarity (
        ON strsimilarity_input PARTITION BY ANY
        USING
        ComparisonColumnPairs (
                      'jaro (src_text2 , tar_text ) AS jaro1_sim',
                      'LD (src_text2 , tar_text, 2) AS ld1_sim',
                      'n_gram (src_text2 , tar_text, 2) AS ngram1_sim',
                      'jaro_winkler (src_text2 , tar_text, 2) AS jw1_sim'
        )
        CaseSensitive ('true')
        Accumulate ('id','src_text2','tar_text')
      ) AS dt1
    ) WITH DATA AS dt2 PARTITION BY id;
    
    SELECT * FROM fspredict_input ORDER BY 1;
  • Model table: fg_unsupervised_model, output by FellegiSunter Example 1: Unsupervised Learning

SQL Call

SELECT * FROM FellegiSunterPredict (
  ON fspredict_input PARTITION BY ANY
  ON fg_unsupervised_model AS model DIMENSION
  USING
  Accumulate ('id', 'src_text2', 'tar_text', 'jaro1_sim',
              'ld1_sim','ngram1_sim', 'jw1_sim')
) AS dt ORDER BY id;

Output

The final column, match_result, contains the model prediction—M for match, U for no match. The weight column contains the weight of the object pair.

id src_text2 tar_text jaro1_sim ld1_sim ngram1_sim jw1_sim weight match_result
1 astter aster 0.944444444444445 0.833333333333333 0.8 0.961111111111111 44.9951243578567 M
2 fone phone 0.783333333333333 0.6 0.5 0.783333333333333 -55.9137657950372 U
3 acquire acquiesce 0.841269841269841 0.666666666666667 0.5 0.904761904761905 -14.2140648912983 U
4 CCCGGGAACCAACC CCAGGGAAACCCAC 0.875457875457875 0.714285714285714 0.692307692307692 0.9003663003663 22.741745029409 M
5 allen allies 0.822222222222222 0.666666666666667 0.4 0.875555555555556 -14.2140648912983 U
6 angle angels 0.877777777777778 0.666666666666667 0.4 0.914444444444445 -14.2140648912983 U
7 center centre 0.944444444444445 0.666666666666667 0.6 0.966666666666667 22.741745029409 M
8 cheap chief 0.733333333333333 0.4 0.25 0.786666666666667 -55.9137657950372 U
9 circle circuit 0.746031746031746 0.571428571428571 0.5 0.847619047619048 -35.6602399749748 U
10 debut debris 0.7       -55.9137657950372 U
11 dell lead 0.5       -55.9137657950372 U
12 bear bear 1       44.9951243578567 M