Input
- Input table: fspredict_input, created from the output of StringSimilarity_MLE Example: Compare src_text2 to tar_text with this SQL code:
DROP TABLE fspredict_input; CREATE MULTISET TABLE fspredict_input AS ( SELECT * FROM StringSimilarity_MLE ( ON strsimilarity_input PARTITION BY ANY USING ComparisonColumnPairs ( 'jaro (src_text2 , tar_text ) AS jaro1_sim', 'LD (src_text2 , tar_text, 2) AS ld1_sim', 'n_gram (src_text2 , tar_text, 2) AS ngram1_sim', 'jaro_winkler (src_text2 , tar_text, 2) AS jw1_sim' ) CaseSensitive ('true') Accumulate ('id','src_text2','tar_text') ) AS dt1 ) WITH DATA AS dt2 PARTITION BY id; SELECT * FROM fspredict_input ORDER BY 1;
- Model: fg_unsupervised_model, output by FellegiSunter Example: Unsupervised Learning
SQL Call
SELECT * FROM FellegiSunterPredict ( ON fspredict_input PARTITION BY ANY ON fg_unsupervised_model AS Model DIMENSION USING Accumulate ('id', 'src_text2', 'tar_text', 'jaro1_sim', 'ld1_sim','ngram1_sim', 'jw1_sim') ) AS dt ORDER BY id;
Output
The final column, match_result, contains the model prediction—M for match, U for no match. The weight column contains the weight of the object pair.
id src_text2 tar_text jaro1_sim ld1_sim ngram1_sim jw1_sim weight match_result -- -------------- -------------- ------------------ ------------------ ------------------ ------------------ ------------------- ------------ 5 allen allies 0.8222222222222223 0.6666666666666666 0.4 0.8755555555555556 -14.113029607677364 U 7 center centre 0.9444444444444445 0.6666666666666666 0.6 0.9666666666666667 23.276190779622826 P 3 acquire acquiesce 0.8412698412698413 0.6666666666666666 0.5 0.9047619047619048 -14.113029607677364 U 12 bear bear 1.0 1.0 1.0 1.0 45.529567201568554 M 8 cheap chief 0.7333333333333334 0.4 0.25 0.7866666666666667 -56.22777957673824 U 4 cccgggaaccaacc ccagggaaacccac 0.8754578754578755 0.7142857142857143 0.6923076923076923 0.9003663003663004 23.276190779622826 P 2 fone phone 0.7833333333333333 0.6 0.5 0.7833333333333333 -56.22777957673824 U 6 angle angels 0.8777777777777779 0.6666666666666666 0.4 0.9144444444444445 -14.113029607677364 U 11 dell lead 0.5 0.25 0.0 0.5 -56.22777957673824 U 9 circle circuit 0.746031746031746 0.5714285714285714 0.5 0.8476190476190476 -35.78160366254483 U 10 debut debris 0.7000000000000001 0.5 0.4 0.79 -56.22777957673824 U
Download a zip file of all examples and a SQL script file that creates their input tables.