Input
- Input table: fspredict_input, created from the output of StringSimilarity Example 2: Compare src_text2 to tar_text with this SQL code:
DROP TABLE fspredict_input; CREATE MULTISET TABLE fspredict_input AS ( SELECT * FROM StringSimilarity ( ON strsimilarity_input PARTITION BY ANY USING ComparisonColumnPairs ( 'jaro (src_text2 , tar_text ) AS jaro1_sim', 'LD (src_text2 , tar_text, 2) AS ld1_sim', 'n_gram (src_text2 , tar_text, 2) AS ngram1_sim', 'jaro_winkler (src_text2 , tar_text, 2) AS jw1_sim' ) CaseSensitive ('true') Accumulate ('id','src_text2','tar_text') ) AS dt1 ) WITH DATA AS dt2 PARTITION BY id; SELECT * FROM fspredict_input ORDER BY 1;
- Model table: fg_unsupervised_model, output by FellegiSunter Example 1: Unsupervised Learning
SQL Call
SELECT * FROM FellegiSunterPredict ( ON fspredict_input PARTITION BY ANY ON fg_unsupervised_model AS model DIMENSION USING Accumulate ('id', 'src_text2', 'tar_text', 'jaro1_sim', 'ld1_sim','ngram1_sim', 'jw1_sim') ) AS dt ORDER BY id;
Output
The final column, match_result, contains the model prediction—M for match, U for no match. The weight column contains the weight of the object pair.
id | src_text2 | tar_text | jaro1_sim | ld1_sim | ngram1_sim | jw1_sim | weight | match_result |
---|---|---|---|---|---|---|---|---|
1 | astter | aster | 0.944444444444445 | 0.833333333333333 | 0.8 | 0.961111111111111 | 44.9951243578567 | M |
2 | fone | phone | 0.783333333333333 | 0.6 | 0.5 | 0.783333333333333 | -55.9137657950372 | U |
3 | acquire | acquiesce | 0.841269841269841 | 0.666666666666667 | 0.5 | 0.904761904761905 | -14.2140648912983 | U |
4 | CCCGGGAACCAACC | CCAGGGAAACCCAC | 0.875457875457875 | 0.714285714285714 | 0.692307692307692 | 0.9003663003663 | 22.741745029409 | M |
5 | allen | allies | 0.822222222222222 | 0.666666666666667 | 0.4 | 0.875555555555556 | -14.2140648912983 | U |
6 | angle | angels | 0.877777777777778 | 0.666666666666667 | 0.4 | 0.914444444444445 | -14.2140648912983 | U |
7 | center | centre | 0.944444444444445 | 0.666666666666667 | 0.6 | 0.966666666666667 | 22.741745029409 | M |
8 | cheap | chief | 0.733333333333333 | 0.4 | 0.25 | 0.786666666666667 | -55.9137657950372 | U |
9 | circle | circuit | 0.746031746031746 | 0.571428571428571 | 0.5 | 0.847619047619048 | -35.6602399749748 | U |
10 | debut | debris | 0.7 | -55.9137657950372 | U | |||
11 | dell | lead | 0.5 | -55.9137657950372 | U | |||
12 | bear | bear | 1 | 44.9951243578567 | M |