LevenshteinDistance Example - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

The example uses the function to find differences in genome sequences.

Input

These abbreviations apply to row 4 of the input table:

Abbreviation Meaning
A Adenine
C Cytosine
G Guanine
T Thymine
levendist_input
id src_text1 src_text2 tar_text
1 astre astter aster
2 hone fone phone
3 acqiese acquire acquiesce
4 AAAACCCCCGGGGA CCCGGGAACCAACC CCAGGGAAACCCAC
5 alice allen allies
6 angela angle angels
7 senter center centre
8 chef cheap chief
9 circus circle circuit
10 debt debut debris
11 deal dell lead
12 bare bear bear

SQL Call

SELECT * FROM LevenshteinDistance (
  ON levendist_input
  USING
  SourceColumns ('src_text1', 'src_text2')
  TargetColumn ('tar_text')
  LevenshteinThreshold (10)
  Accumulate ('id')
) AS dt ORDER BY id;

Output

id target source distance
1 aster astre 2
1 aster astter 1
2 phone hone 1
2 phone fone 2
3 acquiesce acqiese 2
3 acquiesce acquire 3
4 CCAGGGAAACCCAC AAAACCCCCGGGGA -1
4 CCAGGGAAACCCAC CCCGGGAACCAACC 4
5 allies alice 3
5 allies allen 2
6 angels angela 1
6 angels angle 2
7 centre senter 3
7 centre center 2
8 chief chef 1
8 chief cheap 3
9 circuit circus 2
9 circuit circle 3
10 debris debt 3
10 debris debut 3
11 lead deal 2
11 lead dell 3
12 bear bare 2
12 bear bear 0