The FellegiSunterTrainer function has one required input table, which contains the object pairs, their field-pair similarity values, and (for supervised learning) a tag column. The following table shows the schema of the input table.
Column Name | Data Type | Description |
---|---|---|
field-pair_i_sim | DOUBLE PRECISION | The field-pair similarity value for field-pair i. In input_table, the columns appear in this order: A_field_1, ..., A_field_n, B_field_1, ..., B_field_n, field-pair_1_sim, ..., field-pair_n_sim. |
tag_column | VARCHAR | This column is required only for supervised learning. Row i of this column contains 'M' if field i of object A matches field i of object B; otherwise, it contains 'U'. |
To create the input table
for the FellegiSunterTrainer function, you can use the function StringSimilarity.