StringSimilarity Arguments - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

8.00

1.0

Published

May 2019

Language

English (United States)

Last Update

2019-11-22

dita:mapPath

blj1506016597986.ditamap

dita:ditavalPath

blj1506016597986.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

ComparisonColumnPairs

Specify pairs of input table columns that contain strings to compare (column1 and column2), how to compare them (comparison_type), and (optionally) a constant and the name of the output column for their similarity (output_column). The similarity is a value in the range [0, 1].

For comparison_type, use one of these values:

comparison_type	Description
'jaro'	Jaro distance.
'jaro_winkler'	Jaro-Winkler distance: 1 for an exact match, 0 otherwise.
'n-gram'	N-gram similarity. If you specify this comparison type, you can specify the value of N with constant. Default: N = 2
'LD'	Levenshtein distance: Number of edits needed to transform one string into the other. Edits are insertions, deletions, or substitutions of individual characters.

You can specify a different comparison_type for every pair of columns.

Default: output_column is 'sim_i', where i is the sequence number of the column pair.

CaseSensitive

[Optional] Specify whether string comparison is case-sensitive. You can specify either one value for all pairs or one value for each pair. If you specify one value for each pair, the ith value applies to the ith pair.

Default: 'false'

Accumulate

[Optional] Specify the names of input table columns to copy to the output table.