StringSimilarity
Description
The StringSimilarity function calculates the similarity between two
strings, using either the Jaro, Jaro-Winkler, N-Gram, or
Levenshtein distance. The similarity is a value in the range [0, 1].
Note: This function is only available when tdplyr is connected to Vantage 1.1
or later versions.
Usage
td_string_similarity_sqle (
data = NULL,
comparison.columns = NULL,
case.sensitive = NULL,
accumulate = NULL,
data.order.column = NULL
)
Arguments
data |
Required Argument. |
data.order.column |
Optional Argument. |
comparison.columns |
Required Argument.
You can specify a different comparison type for every pair of
columns. The default output_column is "sim_i", where i is the
sequence number of the column pair. |
case.sensitive |
Optional Argument. |
accumulate |
Optional Argument. |
Value
Function returns an object of class "td_string_similarity_sqle" which
is a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator
using the name: result.
Examples
# Get the current context/connection.
con <- td_get_context()$connection
# Load example data.
loadExampleData("stringsimilarity_example", "strsimilarity_input")
# Create object(s) of class "tbl_teradata".
strsimilarity_input <- tbl(con, "strsimilarity_input")
# Example 1 - Using "jaro" comparison type with a default output column.
td_string_similarity_sqle_out <- td_string_similarity_sqle(
data = strsimilarity_input,
case.sensitive = TRUE,
comparison.columns = c("jaro (src_text2,
tar_text)
AS
jaro2_case_sim"),
accumulate = c("id","src_text1",
"tar_text")
)
# Example 2 - Using multiple comparison types and with custom output
# columns.
td_string_similarity_sqle_out2 <- td_string_similarity_sqle(
data = strsimilarity_input,
comparison.columns =
c("jaro (src_text1, tar_text)
AS jaro1_sim",
"LD (src_text1, tar_text, 2)
AS ld1_sim",
"n_gram (src_text1, tar_text, 2)
AS ngram1_sim",
"jaro_winkler (src_text1, tar_text,
0.2) AS jw1_sim"),
case.sensitive = TRUE,
accumulate = c("id","src_text1",
"tar_text"))