LevenshteinDistance Example - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.10
1.1
Published
October 2019
Language
English (United States)
Last Update
2019-12-31
dita:mapPath
ima1540829771750.ditamap
dita:ditavalPath
jsj1481748799576.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

The example uses the function to find differences in genome sequences.

Input

These abbreviations apply to row 4 of the input table, levendist_input:

Abbreviation Meaning
A Adenine
C Cytosine
G Guanine
T Thymine
 id |   src_text1    |   src_text2    |    tar_text    
----+----------------+----------------+----------------
  2 | hone           | fone           | phone
  3 | acqiese        | acquire        | acquiesce
  4 | aaaacccccgggga | cccgggaaccaacc | ccagggaaacccac
  5 | alice          | allen          | allies
  6 | angela         | angle          | angels
  7 | senter         | center         | centre
  8 | chef           | cheap          | chief
  9 | circus         | circle         | circuit
 10 | debt           | debut          | debris
 11 | deal           | dell           | lead
 12 | bare           | bear           | bear
(11 rows)

SQL Call

SELECT * FROM LevenshteinDistance (
  ON levendist_input
  USING
  SourceColumns ('src_text1', 'src_text2')
  TargetColumn ('tar_text')
  LevenshteinThreshold (10)
  Accumulate ('id')
) AS dt ORDER BY id, source;

Output

 id target         source         distance 
 -- -------------- -------------- -------- 
  2 phone          fone                  2
  2 phone          hone                  1
  3 acquiesce      acqiese               2
  3 acquiesce      acquire               3
  4 ccagggaaacccac aaaacccccgggga       -1
  4 ccagggaaacccac cccgggaaccaacc        4
  5 allies         alice                 3
  5 allies         allen                 2
  6 angels         angela                1
  6 angels         angle                 2
  7 centre         center                2
  7 centre         senter                3
  8 chief          cheap                 3
  8 chief          chef                  1
  9 circuit        circle                3
  9 circuit        circus                2
 10 debris         debt                  3
 10 debris         debut                 3
 11 lead           deal                  2
 11 lead           dell                  3
 12 bear           bare                  2
 12 bear           bear                  0

Download a zip file of all examples and a SQL script file that creates their input tables from the attachment in the left sidebar.