Description
The StringSimilarity function calculates the similarity between two strings,
using either the Jaro, Jaro-Winkler, N-Gram, or Levenshtein distance.
The similarity is a value in the range [0, 1].
Usage
td_string_similarity_mle ( data = NULL, comparison.columns = NULL, case.sensitive = NULL, accumulate = NULL, data.sequence.column = NULL )
Arguments
data |
Required Argument. |
comparison.columns |
Required Argument.
Syntax: comparison_type (column1, column2 [, constant]) AS output_column.
You can specify a different comparison_type for every
pair of columns. |
case.sensitive |
Optional Argument. |
accumulate |
Optional Argument. |
data.sequence.column |
Optional Argument. |
Value
Function returns an object of class "td_string_similarity_mle" which is a
named list containing Teradata tbl object.
Named list member can be referenced directly with the "$" operator
using name: result.
Examples
# Get the current context/connection con <- td_get_context()$connection # Load example data. loadExampleData("stringsimilarity_example", "strsimilarity_input") # Create remote tibble objects. strsimilarity_input <- tbl(con, "strsimilarity_input") # Using "jaro" comparison type with a default output column td_string_similarity_out1 <- td_string_similarity_mle(data = strsimilarity_input, comparison.columns = "jaro (src_text1, tar_text)", accumulate = c("id","src_text1","tar_text") ) # Using multiple comparison types and with custom output columns td_string_similarity_out2 <- td_string_similarity_mle(data = strsimilarity_input, comparison.columns = c("jaro (src_text1, tar_text) AS jaro1_sim", "LD (src_text1, tar_text, 2) AS ld1_sim", "n_gram (src_text1, tar_text, 2) AS ngram1_sim", "jaro_winkler (src_text1, tar_text, 2) AS jw1_sim"), case.sensitive = TRUE, accumulate = c("id","src_text1","tar_text") ) # Using a vector for case.sensitive comparisons. # Note: The length of the case.sensitive vector must match the comparison.columns vector argument. td_string_similarity_out3 <- td_string_similarity_mle(data = strsimilarity_input, comparison.columns = c("jaro (src_text2, tar_text) AS jaro2_case_sim", "jaro (src_text2, tar_text) AS jaro2_nocase_sim"), case.sensitive = c(TRUE, FALSE), accumulate = c("id","src_text2","tar_text") )