The input table, strsimilarity_input, has two source columns (src_text1 and src_text2) to which the function compares the target column (tar_text). The function calculates the similarity scores by the methods specified by the ComparisonColumnPairs argument (jaro, jaro-winkler, ngram, Levenshtein Distance). For clarity, separate examples show the comparison of each source column with the target column. With some modifications, you can use the output of this function as input to the FellegiSunter functions.
id | src_text1 | src_text2 | tar_text |
---|---|---|---|
1 | astre | astter | aster |
2 | hone | fone | phone |
3 | acqiese | acquire | acquiesce |
4 | AAAACCCCCGGGGA | CCCGGGAACCAACC | CCAGGGAAACCCAC |
5 | alice | allen | allies |
6 | angela | angle | angels |
7 | senter | center | centre |
8 | chef | cheap | chief |
9 | circus | circle | circuit |
10 | debt | debut | debris |
11 | deal | dell | lead |
12 | bare | bear | bear |