Input
The input table, strsimilarity_input, has two source columns (src_text1 and src_text2) to which the function compares the target column (tar_text). The function calculates the similarity scores by the methods specified by the ComparisonColumnPairs syntax element (jaro, jaro-winkler, NGramSplitter_MLE, Levenshtein Distance). For clarity, separate examples show the comparison of each source column with the target column. With some modifications, you can use the output of this function as input to the FellegiSunter functions.
id | src_text1 | src_text2 | tar_text ----+----------------+----------------+---------------- 2 | hone | fone | phone 5 | alice | allen | allies 7 | senter | center | centre 10 | debt | debut | debris 3 | acqiese | acquire | acquiesce 6 | angela | angle | angels 8 | chef | cheap | chief 11 | deal | dell | lead 4 | AAAACCCCCGGGGA | CCCGGGAACCAACC | CCAGGGAAACCCAC 9 | circus | circle | circuit 12 | bare | bear | bear (11 rows)
SQL Call
SELECT * FROM StringSimilarity_MLE ( ON strsimilarity_input PARTITION BY ANY USING ComparisonColumnPairs ('jaro (src_text1 , tar_text ) AS jaro1_sim', 'LD (src_text1 , tar_text) AS ld1_sim', 'n_gram (src_text1 , tar_text, 2) AS ngram1_sim', 'jaro_winkler (src_text1 , tar_text) AS jw1_sim' ) CaseSensitive ('true') Accumulate ('id', 'src_text1', 'tar_text') ) AS dt ORDER BY id;
Output
id src_text1 tar_text jaro1_sim ld1_sim ngram1_sim jw1_sim -- -------------- -------------- ------------------ ------------------- ------------------- ------------------ 2 hone phone 0.9333333333333332 0.8 0.75 0.9333333333333332 3 acqiese acquiesce 0.9259259259259259 0.7777777777777778 0.5 0.9481481481481482 4 aaaacccccgggga ccagggaaacccac 0.8241758241758242 0.21428571428571427 0.38461538461538464 0.8241758241758242 5 alice allies 0.8222222222222223 0.5 0.4 0.8577777777777779 6 angela angels 0.888888888888889 0.8333333333333334 0.8 0.9333333333333333 7 senter centre 0.8222222222222223 0.5 0.4 0.8222222222222223 8 chef chief 0.9333333333333332 0.8 0.5 0.9466666666666665 9 circus circuit 0.8492063492063492 0.7142857142857143 0.6666666666666666 0.9095238095238095 10 debt debris 0.75 0.5 0.4 0.825 11 deal lead 0.6666666666666666 0.5 0.3333333333333333 0.6666666666666666 12 bare bear 0.8333333333333334 0.5 0.3333333333333333 0.8500000000000001
Download a zip file of all examples and a SQL script file that creates their input tables.