StringSimilarity Arguments - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product
Aster Analytics
Release Number
7.00.02
Published
September 2017
Language
English (United States)
Last Update
2018-04-17
dita:mapPath
uce1497542673292.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1022
lifecycle
previous
Product Category
Software
ComparisonColumnPairs
Specifies pairs of input table columns that contain strings to compare (column1 and column2), how to compare them (comparison_type), and (optionally) a constant and the name of the output column for their similarity (output_column). The similarity is a value in the range [0, 1].
For comparison_type, use one of these values:
  • 'jaro'

    Jaro distance.

  • 'jaro_winkler'

    Jaro-Winkler distance: 1 for an exact match, 0 otherwise.

  • 'n-gram'

    N-gram similarity. If you specify this comparison type, you can specify the value of N with constant.

  • 'LD'

    Levenshtein distance: the number of edits needed to transform one string into the other, where edits include insertions, deletions, or substitutions of individual characters.

You can specify a different comparison_type for every pair of columns.

Default: output_column is 'sim_i', where i is the sequence number of the column pair.

CaseSensitive
[Optional] Specifies whether string comparison is case-sensitive. Default: 'false'.

You can specify either one value for all pairs or one value for each pair. If you specify one value for each pair, the ith value applies to the ith pair.

Accumulate
[Optional] Specifies the names of input table columns to be copied to the output table.