Required Syntax Elements for StringSimilarity - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
VMware
Enterprise
IntelliFlex
Product
Analytics Database
Release Number
17.20
Published
June 2022
ft:locale
en-US
ft:lastEdition
2025-11-06
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
qkf1628213546010.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantage™
ON clause
Specify the table name, view name or query as an InputTable.
ComparisonColumnPairs
Specify the names of the input table columns that contain strings to compare (column1 and column2), how to compare them (comparison_type), and (optionally) a constant and the name of the output column for their similarity (output_column). The similarity is a value in the range [0, 1].
column1 and column2
  • If column1 or column2 includes any special characters, enclose the column name with double quotation marks. For example, if column1 and column2 are c(col1) and c(col2), respectively, specify them as "c(col1)" and "c(col2)".
  • If column1 or column2 includes double quotation marks, replace each double quotation mark with a pair of double quotation marks. For example, if column1 and column2 are c1"c and c2"c, respectively, specify them as "c1""c" and "c2""c".
    These rules do not apply to output_column. For example, this is valid syntax: ComparisonColumnPairs ('jaro ("c1""c", "c2""c") AS out"col').
  • If column1 or column2 supports more than 200 characters, you can cast it to VARCHAR(200) (see )Casting for Columns with More Than 200 Characters. For information about the CAST operation, see Teradata Vantage™ - SQL Functions, Expressions, and Predicates, B035-1145.
comparison_type
Use one of these values:
comparison_type Description
'jaro' Jaro distance.
'jaro_winkler' Jaro-Winkler distance.
  • 1 for an exact match
  • 0 otherwise

Specify the value of factor p with constant. 0 ≤ p ≤ 0.25.

Default: p = 0.1

'n_gram' N-gram similarity.

Specify the value of N with constant.

Default: N = 2

'LD' Levenshtein distance.

Specify the number of edits needed to transform one string into the other. Edits are insertions, deletions, or substitutions of individual characters.

'LDWS' Levenshtein distance without substitution.

Specify the number of edits needed to transform one string into the other using only insertions or deletions of individual characters.

'OSA' Optimal string alignment distance.

Specify the number of edits needed to transform one string into the other. Edits are insertions, deletions, substitutions, or transpositions of characters. A substring can be edited only once.

'DL' Damerau-Levenshtein distance.

Similar to 'OSA' except that a substring can be editeId any number of times.

'hamming' Hamming distance.
  • Strings of equal length: Specify the number of positions where corresponding characters differ (that is, minimum number of substitutions needed to transform one string into the other).
  • For strings of unequal length: -1.
'LCS' Longest common substring: Length of longest substring common to both strings.
'jaccard' Jaccard indexed-based comparison.
'cosine' Cosine similarity.
'soundexcode' Only for English strings:
  • -1 if either string has a non-English character.
  • 1 if their soundex codes are the same and 0 otherwise.
The function ignores constant for every comparison_type except 'jaro_winkler' and 'n_gram'.
You can specify a different comparison_type for every pair of columns.
Default: output_column is 'sim_i', where i is the sequence number of the column pair.