IdentityMatch Example | Teradata Vantage - IdentityMatch Example - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
9.02
9.01
2.0
1.3
Published
February 2022
Language
English (United States)
Last Update
2022-02-10
dita:mapPath
rnn1580259159235.ditamap
dita:ditavalPath
ybt1582220416951.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

The objective of the example is to find the credit scores of job applicants by matching the information in two tables.

The example looks for exact matches (NominalMatchColumns) to the email address and approximate matches (FuzzyMatchColumns) for lastname, firstname, zip code, city, and department, with different match metrics and match weights.

Input

  • SourceTable: applicant_reference, which has hypothetical information from job applicants
  • ReferenceTable: applicant_external, an external source table, which has missing and incomplete information, but includes credit scores
SourceTable: applicant_reference
id firstname lastname email city zipcode department gender
1 John Dewey john.dewey@corp-mark.com Sugar Land 77459 Marketing Male
2 Sarah Anders sarah.anders@corp-sales.com Pearland 77584 Sales Female
3 Elizabeth Hall elizabeth.hall@corp-eng.com Galveston 77550 Engineering Female
4 James Nickson james.nick@corp-it.com Pasadena 77501 IT Male
5 Kim Lee kim.lee@corp-sys.com Clear Lake City 77058 Systems Female
6 Jessica Right jessica.right@corp-mark.com Sugar Land 77459 Marketing Female
ReferenceTable: applicant_external
id firstname lastname email city zipcode department creditscore
1 John Dewey john.dewey@corp-mark.com Sugar Land 7774 market 700
2   Hall   Galveston 77550 eng 790
3 Sarah Anders sarah.anders@corp-sales.com pear 77584 sales 650
4 Jessica right   Sugar Land 77459 Marketing 690
5 James Nickson   Pasadena 7750 IT 620
6 Kim       77058 system 570

SQL Call

SELECT * FROM IdentityMatch (
  ON applicant_reference AS SourceTable PARTITION BY ANY
  ON applicant_external AS ReferenceTable DIMENSION
  USING
  IDColumn ('a.id: b.id')
  NominalMatchColumns ('a.email: b.email')
  FuzzyMatchColumns ('a.lastname: b.lastname, JARO-WINKLER, 3',
                     'a.firstname: b.firstname, JARO-WINKLER, 2',
                     'a.zipcode: b.zipcode, JD, 2',
                     'a.city: b.city, LD, 2',
                     'a.department: b.department, COSINE, 1')
  Accumulate ('a.firstname','a.lastname' ,'b.lastname', 'a.email',
              'b.email','a.zipcode', 'b.zipcode', 'a.department',
              'b.department','b.creditscore')
  ThresholdScore (0.5)
) AS dt;

Output

The output table has the matching information from both input tables and the matching score. If multiple row entries exist for the same id, the higher scoring entry gives the best match. For example, the first row (with score 1) is a better match than the second (with score 0.6036). The creditscore column contains applicant credit scores.

 a.id a.firstname a.lastname a.email                     a.zipcode a.department b.id b.lastname b.email                     b.zipcode b.department b.creditscore score  
 ---- ----------- ---------- --------------------------- --------- ------------ ---- ---------- --------------------------- --------- ------------ ------------- ------ 
    6 jessica     right      jessica.right@corp-mark.com     77459 marketing       4 right                                      77459 marketing            690.0      1
    4 james       nickson    james.nick@corp-it.com          77501 it              5 nickson                                     7750 it                   620.0    0.8
    3 elizabeth   hall       elizabeth.hall@corp-eng.com     77550 engineering     2 hall                                       77550 eng                  790.0    0.7
    1 john        dewey      john.dewey@corp-mark.com        77459 marketing       4 right                                      77459 marketing            690.0 0.6036
    1 john        dewey      john.dewey@corp-mark.com        77459 marketing       1 dewey      john.dewey@corp-mark.com         7774 market               700.0      1
    2 sarah       anders     sarah.anders@corp-sales.com     77584 sales           3 anders     sarah.anders@corp-sales.com     77584 sales                650.0      1

Download a zip file of all examples and a SQL script file that creates their input tables.