- ON clause
- Specify the table name, view name or query as an InputTable.
- ComparisonColumnPairs
- Specify the names of the input table columns that contain strings to compare (column1 and column2), how to compare them (comparison_type), and (optionally) a constant and the name of the output column for their similarity (output_column). The similarity is a value in the range [0, 1].
- column1 and column2
- If column1 or column2 includes any special characters, enclose the column name with double quotation marks. For example, if column1 and column2 are c(col1) and c(col2), respectively, specify them as "c(col1)" and "c(col2)".
- If column1 or column2 includes double quotation marks, replace each double quotation mark with a pair of double quotation marks. For example, if column1 and column2 are c1"c and c2"c, respectively, specify them as "c1""c" and "c2""c".These rules do not apply to output_column. For example, this is valid syntax: ComparisonColumnPairs ('jaro ("c1""c", "c2""c") AS out"col').
- If column1 or column2 supports more than 200 characters, you can cast it to VARCHAR(200) (see )Casting for Columns with More Than 200 Characters. For information about the CAST operation, see Teradata Vantage™ - SQL Functions, Expressions, and Predicates, B035-1145.
- comparison_type
- Use one of these values:
comparison_type Description 'jaro' Jaro distance. 'jaro_winkler' Jaro-Winkler distance. - 1 for an exact match
- 0 otherwise
Specify the value of factor p with constant. 0 ≤ p ≤ 0.25.
Default: p = 0.1
'n_gram' N-gram similarity. Specify the value of N with constant.
Default: N = 2
'LD' Levenshtein distance. Specify the number of edits needed to transform one string into the other. Edits are insertions, deletions, or substitutions of individual characters.
'LDWS' Levenshtein distance without substitution. Specify the number of edits needed to transform one string into the other using only insertions or deletions of individual characters.
'OSA' Optimal string alignment distance. Specify the number of edits needed to transform one string into the other. Edits are insertions, deletions, substitutions, or transpositions of characters. A substring can be edited only once.
'DL' Damerau-Levenshtein distance. Similar to 'OSA' except that a substring can be editeId any number of times.
'hamming' Hamming distance. - Strings of equal length: Specify the number of positions where corresponding characters differ (that is, minimum number of substitutions needed to transform one string into the other).
- For strings of unequal length: -1.
'LCS' Longest common substring: Length of longest substring common to both strings. 'jaccard' Jaccard indexed-based comparison. 'cosine' Cosine similarity. 'soundexcode' Only for English strings: - -1 if either string has a non-English character.
- 1 if their soundex codes are the same and 0 otherwise.