FellegiSunter Example 1: Unsupervised Learning - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

The unsupervised model is created by specifying the user defined threshold values for the different metrics in the ComparisonFields argument. The initialization parameters InitialP, InitialM, InitialU, Lambda, and Mu are set to their default values. The column match_tag is not used for unsupervised learning.

Input

  • InputTable: fstrainer_input, created by adding a match_tag column to the output of StringSimilarity Example 1: Compare src_text1 to tar_text:
    DROP TABLE fstrainer_input;
    
    CREATE MULTISET TABLE fstrainer_input AS (
      SELECT * FROM StringSimilarity (
        ON strsimilarity_input PARTITION BY ANY
        USING
        ComparisonColumnPairs (
                    'jaro (src_text1 , tar_text ) AS jaro1_sim',
                    'LD (src_text1 , tar_text, 2) AS ld1_sim',
                    'n_gram (src_text1 , tar_text, 2) AS ngram1_sim',
                    'jaro_winkler (src_text1 , tar_text, 2) AS jw1_sim'
        )
        USING
        CaseSensitive ('true')
        Accumulate ('id','src_text1','tar_text')
      ) AS dt1
    ) WITH DATA AS dt2 PARTITION BY id;
    
    ALTER TABLE fstrainer_input
    ADD column match_tag varchar;
    
    update fstrainer_input set match_tag=  'M' where id = 1;
    update fstrainer_input set match_tag=  'M' where id = 2;
    update fstrainer_input set match_tag=  'M' where id = 3;
    update fstrainer_input set match_tag=  'U' where id = 4;
    update fstrainer_input set match_tag=  'U' where id = 5;
    update fstrainer_input set match_tag=  'M' where id = 6;
    update fstrainer_input set match_tag=  'U' where id = 7;
    update fstrainer_input set match_tag=  'M' where id = 8;
    update fstrainer_input set match_tag=  'M' where id = 9;
    update fstrainer_input set match_tag=  'U' where id = 10;
    update fstrainer_input set match_tag=  'U' where id = 11;
    update fstrainer_input set match_tag=  'U' where id = 12;
    
    SELECT * FROM fstrainer_input ORDER BY 1;

    The match_tag column is used for the supervised FellegiSunter function.

SQL Call

DROP TABLE fg_unsupervised_model;

CREATE MULTISET TABLE "fg_unsupervised_model" AS (
  SELECT * FROM FellegiSunter (
    ON fstrainer_input AS InputTable
    USING
    ComparisonFields ('jaro1_sim: 0.8', 'ld1_sim:0.8', 'ngram1_sim:0.5',
                      'jw1_sim:0.8')
    InitialP (0.1)
    InitialM (0.9)
    InitialU (0.1)
    Lambda (0.9)
    Mu (0.9)
  ) AS dt
) WITH DATA;

Output

This query returns the following table:

SELECT * FROM fg_unsupervised_model ORDER BY 1;
fg_unsupervised_model
_key _value
comparison_field_cnt 4
comparison_field_name_0 jaro1_sim
comparison_field_name_1 ld1_sim
comparison_field_name_2 ngram1_sim
comparison_field_name_3 jw1_sim
comparison_field_threshold_0 0.8
comparison_field_threshold_1 0.8
comparison_field_threshold_2 0.5
comparison_field_threshold_3 0.8
is_supervised false
lambda 0.9
lower_bound -13.7991041364018
m_0 0.9999999
m_1 0.333315282254649
m_2 0.999945434344193
m_3 0.9999999
mu 0.9
p 0.250013539042196
time_used 271.020000 seconds
u_0 0.777773766137301
u_1 1.0E-7
u_2 1.37483178043644E-7
u_3 0.88888688306865
upper_bound 22.2091654088378